cupido

lab/cupido

Author	SHA1	Message	Date
Giorgio Gilestro	2b75daa783	Replace fine thumbnail grid with mpv/vlc/ffplay handoff Watching the video play turns out to be much faster than scanning a thumbnail grid. The coarse 10-min thumbnail grid still does rough localisation; after picking, a video player launches at coarse_t-30s paused with frame-accurate scrubbing controls. The analyst reads the exact opening time off the player's OSD and types it into the terminal prompt (default = the coarse pick, so a single Enter keeps the coarse pick if the player is hard to use). Backend auto-detects mpv > vlc > ffplay; gracefully degrades to "use the coarse pick" if no player is installed. New `bad_rois` column captures non-opening sub-arenas (partial-opening videos like the 2024-10-21 set where only the lower half opens). The prompt validates entries are integers in 1..6. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 12:20:09 +01:00
Giorgio Gilestro	125f187187	Extend pick_barrier coarse window to 10 min by default Some videos have late barrier opening (e.g. 5:46) that fell outside the original 5-min search window. Default coarse-grid span is now 600 s (10 s spacing in the 60-thumb grid). Add --coarse-span CLI flag to widen further if needed; auto-suggest scans the same 10-min window. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 12:08:23 +01:00
Giorgio Gilestro	e8c7f23d4d	Replace pick_barrier.py with thumbnail-grid UX Old version showed inter-fly distance plots and asked the analyst to click a timeline. The new version reads frames directly from the .mp4 and shows a 10×6 grid of timestamped thumbnails — the analyst just clicks the frame where the barrier opens. Two-stage refinement: - Coarse grid: 60 thumbs spanning the 5-min search window at ~5 s spacing. Pick the rough moment. - Fine grid: 60 thumbs at 0.2 s spacing centred on the coarse pick. Pick the exact frame. Auto-detector still feeds the starting position. Sequential video decode (one cv2 pass through the relevant range) instead of seek-per- frame, so each grid loads in a few seconds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 12:01:34 +01:00
Giorgio Gilestro	b46c4ac1ba	Add pick_barrier.py interactive annotator + seed CSV with 2025-07-15 pick_barrier.py loops over every tracked DB referenced by the merged TSV, plots windowed mean inter-fly distance for all 6 ROIs in a single figure, and lets the analyst click the moment the barrier opens. Saves to data/metadata/barrier_opening.csv after each pick (crash-safe). Auto-detector best-effort guess shown as orange dotted line — the analyst always has the final say. Output schema: machine_name, session_date, session_time, opening_s, trim_first_s, notes `trim_first_s` lets us record misframed starts so downstream code can ignore the affected window. The 5 2025-07-15 entries are seeded from the original legacy CSV so they're not re-picked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 11:58:54 +01:00
Giorgio Gilestro	e20530219b	Correct barrier-opening times for 16-31-34 and 16-31-41 Both videos have ~60-69 s of misframed data at the start (arena partially out of frame). The original times (25 s, 20 s) were measured on ffmpeg-trimmed copies that no longer exist; on the full untrimmed videos the actual barriers open at 94 s and 89 s respectively. Confirmed by eye. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 11:53:58 +01:00
Giorgio Gilestro	2e80b834ca	Add video duration_s to inventory and propagate to merged TSV build_video_inventory.py now opens each mp4 with cv2 to record duration_s. Cached: a video already in the previous inventory keeps its computed duration, so re-runs only pay the cv2 cost for new recordings. export_video_db_index.py looks up the matched video's duration and writes it as training_video_duration_s / testing_video_duration_s alongside the existing path columns. Useful for spotting unusually short or long sessions and for sanity checks on the tracker output. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 11:13:05 +01:00
Giorgio Gilestro	847d2cbd1b	Merge 2025-07-15 batch into the xlsx; tools to detect & re-track - merge_2025_07_15_into_xlsx.py: pivot the legacy 2025_07_15_metadata_fixed.csv into the unified xlsx schema (one row per fly, training_date_time + testing_date_time). Backs up the xlsx before writing. 24 new rows across machines 076 / 139 / 145 / 268. - pick_targets.py: --video flag to bypass the inventory's in_xlsx filter, so a specific mp4 can be picked outside the normal flow. - explore_barrier_signal.py: visualises raw y(t), per-frame inter-fly distance, and sliding min/mean distance against a known barrier-opening time. Used for prototyping the detector. - detect_barrier_opening.py: per-ROI sliding-window mean-distance change-point estimator (median across ROIs). Currently noisy on a one-video calibration set; will be re-tuned once the 4 missing 2025-07-15 videos are re-tracked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 10:28:25 +01:00
Giorgio Gilestro	8f3c4ca89c	Make flies_analysis_simple robust to bad caches and empty alignment - Cell 6: raise a clear ValueError if no loaded machine has a barrier- opening entry, listing what's loaded vs what's annotated. Previously alignment quietly produced empty DataFrames and we crashed five cells later with a cryptic KeyError on 'distance'. - Cell 10: validate the cached CSVs (presence + expected columns + non-empty) before using them; fall through to recomputation if not. Skip writing the cache when results are empty so future runs don't pick up a 1-byte placeholder. - Cell 3: derive a 'group' column from 'male' so downstream helpers that reference fly['group'] still work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:59:34 +01:00
Giorgio Gilestro	b273255dea	Make load_roi_data progress bar refresh reliably in JupyterLab Prefer tqdm.notebook (HTML widget) over tqdm.auto so JupyterLab gets a proper updating bar even when its text-mode \r refresh doesn't render in-place. Tick per session (2× per fly) instead of per fly so the bar advances roughly every second, and add a postfix showing the current machine + ROI + session — gives visible motion even on slow rows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:43:12 +01:00
Giorgio Gilestro	8abb3d5955	Add tqdm progress bar to load_roi_data Loading the full batch issues 968 SQL queries and takes minutes — show a tqdm progress bar (one tick per fly/ROI row) and print an upfront "this takes 1-3 minutes" notice so the user knows to wait. Uses tqdm.auto so it picks the Jupyter widget when run from a notebook and plain text on the CLI. New `progress=True` parameter on load_roi_data, flip to False for silent batch use. tqdm + ipywidgets added to requirements. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:34:42 +01:00
Giorgio Gilestro	ac3b8c13f0	Move personal TSV into repo's data/metadata/ folder Personal copy of all_video_info_merged.tsv now lives at ~/cupido/data/metadata/all_video_info_merged.tsv (gitignored) instead of ~/cupido_metadata.tsv. That sits next to the other small metadata CSVs (barrier_opening, etc.) — the natural home for it. Updated all five notebooks and processed/README accordingly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:30:22 +01:00
Giorgio Gilestro	f08e4b843d	Per-user metadata TSV — auto-prefer ~/cupido_metadata.tsv if present The shared TSV at /mnt/data/projects/cupido/ is read-only inside the container, so users who want to customize the `include` column (or any metadata) need a personal copy. Notebooks now check for ~/cupido_metadata.tsv first and fall back to the shared master if it doesn't exist. Each user keeps their own edits without stepping on anyone else's analysis. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:25:24 +01:00
Giorgio Gilestro	23050360ea	Remove data/raw/ entirely — all bulky data now under /mnt/data/projects/cupido/ Deleted the 5 stale pre-pipeline tracking DBs and the data/raw/ directory. Dropped DATA_RAW from config.py; build_video_inventory now scans TRACKING_OUTPUT_DIR for already-tracked sessions. Notebooks no longer import DATA_RAW. README, PLANNING and todo updated to reflect that the repo holds only code + small curated metadata, never bulky DBs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:20:25 +01:00
Giorgio Gilestro	9f3ee24a23	Add per-row include flag to TSV; expand flies_analysis_simple narrative - export_video_db_index.py now writes a boolean `include` column (default True). Flip it to False to drop a noisy/unusable row from analysis without deleting it. - load_roi_data filters on `include` automatically (back-compat: missing column = load everything). - flies_analysis_simple.ipynb section headers now explain why each step exists (barrier alignment, body-area baseline, merged-blob heuristic, Hungarian identity tracking) rather than just naming the step. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:09:59 +01:00
Giorgio Gilestro	723d1f3682	Make data paths visible and explicit in flies_analysis notebooks Define METADATA_TSV and TRACKED_DBS up front in cell 1, assert they exist before doing anything else, and pass the loaded metadata to load_roi_data() explicitly. Surfaces path problems immediately with a readable message instead of failing deep inside the loader. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:01:55 +01:00
Giorgio Gilestro	231c7a437f	Remove hardcoded /home/gg paths so the project is portable Notebooks now use Path.home() / "cupido" for the repo root (works for any user inside the JupyterLab container), and the offline-tracking scripts read the ethoscope source-tree location from the new ETHOSCOPE_SRC config constant — defaulting to ~/Code/ethoscope_project/... and overridable via the ETHOSCOPE_SRC environment variable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 08:55:44 +01:00
Giorgio Gilestro	5934dce21e	Simplify path setup in flies_analysis notebooks Replace the cryptic Path("..").resolve() walk-up with explicit DATA_DIR and REPO_ROOT constants, then import the rest of the path constants (DATA_RAW, DATA_METADATA, DATA_PROCESSED, FIGURES) directly from scripts/config.py — single source of truth, easier to read for students. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 08:50:11 +01:00
Giorgio Gilestro	f176224150	Move metadata xlsx/TSV to /mnt/data/projects/cupido/ Consolidates everything bulky (tracking DBs, targets, metadata spreadsheet) under a single DATA_VOLUME root outside the ownCloud-synced repo. Notebooks now use a visible DATA_DIR = Path(...) idiom rather than walking up the filesystem with PROJECT_ROOT.parent — easier for students with no Python background to follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 08:47:15 +01:00
Giorgio Gilestro	ec56e51bf9	Add beginner tutorial notebooks for incoming students Four guided notebooks under notebooks/getting_started/ aimed at someone new to Python and data science. The series progresses: project orientation → Python/pandas crash course → exploring one tracking DB → first trained-vs-naive comparison using load_roi_data + Mann-Whitney U. Each notebook leans heavily on markdown explanations, includes exercises with empty cells, and links out to canonical references (JupyterLab, official Python tutorial, pandas 10-min guide, Wikipedia for stats concepts). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 18:14:17 +01:00
Giorgio Gilestro	7d09523840	Move TARGETS_DIR to /mnt/data/projects/cupido/targets Targets relocated alongside the tracking DBs (out of ownCloud sync) so the docker mount already covers them and ownCloud no longer churns on JSON sidecars. Updated config, fixed a stale docstring in pick_targets, and dropped the now-moot data/targets/*.json gitignore rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 17:13:55 +01:00
Giorgio Gilestro	f60a9d0530	Unify analysis pipeline around the TSV; move tracked DBs out of cloud sync - Tracked DBs now live at /mnt/data/projects/cupido/tracked/ (out of ownCloud to avoid sync conflicts and bandwidth churn). config.py TRACKING_OUTPUT_DIR points there; the docker-compose for ethoscope-lab mounts it world-readable for JupyterHub users. - New scripts/export_video_db_index.py joins all_video_info_merged.xlsx with the video inventory and the on-disk DBs, producing a TSV that has one row per fly/ROI plus training/testing video and DB paths. Handles approximate xlsx times, cross-day training/testing, the 12 AM/PM ambiguity, and date typos. - scripts/load_roi_data.py rewritten as a TSV-driven loader returning a single DataFrame with session and metadata columns. calculate_distances and the two flies_analysis notebooks migrated to use it; downstream trained/naive splits remain available via simple equality filters. - Metadata vocabulary canonicalized: {naïve, niave, untrained, test} all resolve to {trained, naive}. Normalization happens at the TSV-export boundary (idempotent); the xlsx and the 2025-07-15 legacy CSV were edited in place to remove the worst variants. - scripts/monitor_tracking.py rate calculation fixed: with N parallel workers, completions arrive in bursts; the old formula divided by burst width and reported nonsense rates. Now uses a 6 h window denominator. - scripts/track_videos.py: BGRMovieCamera retries cv2.read on transient NFS hiccups and a post-tracking completeness gate (≥ 90 % of expected duration via MAX(t) across all 6 ROIs) deletes silent partial DBs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 15:20:14 +01:00
Giorgio	e4da7691d5	Add offline tracking pipeline for video backlog The 2024 video set in all_video_info_merged.xlsx covers 63 (date, machine) sessions — 129 video instances — that have no auto-detectable targets, so ROI placement requires manual reference-point selection. This commit adds the three-stage pipeline that lets a user click for an hour, then walk away while the tracker grinds overnight: 1. build_video_inventory.py — scan /mnt/ethoscope_data/videos/ and join against the xlsx, producing data/metadata/video_inventory.csv 2. pick_targets.py — interactive matplotlib/Tk picker. User clicks TOP/CORNER/LEFT (the L-shape ethoscope expects); after the third click the 6 ROI rectangles are drawn on top of the frame so geometry can be verified before saving. Also supports marking a video 'unusable' (FOV wrong) so it's permanently skipped, frame stepping by ±1s/±5%/midpoint, point editing in --redo mode, and a crosshair cursor that survives matplotlib's per-motion cursor reset. 3. track_videos.py — headless batch tracker. Reads the JSON sidecars, builds 6 ROIs from the HD-mating-arena geometry, runs MultiFlyTracker against the merged.mp4 via MovieVirtualCamera, writes SQLite DBs to data/tracked/. Idempotent (skips done DBs), parallel via --jobs, subclasses MovieVirtualCamera so frames stay BGR (MultiFlyTracker calls cvtColor(BGR2GRAY) without checking channel count). Plus auto_detect_targets.py (fallback that runs ethoscope's auto-detector in case any videos do have visible target dots), monitor_tracking.py (progress + ETA from data/tracked/ ground truth, --watch for live view), and tracking_geometry.py (single source of truth for the affine math shared by picker and tracker). requirements-tracking.txt pins the extra deps (opencv-python, openpyxl, gitpython, netifaces, mysql-connector-python) — these are only needed for the tracking pipeline, not the existing analysis notebooks. Verified end-to-end on one of the user-picked videos: ~4000 rows/ROI in a 120s slice, fly bounding boxes in the expected 800-2000 px² band. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 17:25:26 +01:00
Giorgio	e7e4db264d	Initial commit: organized project structure for student handoff Reorganized flat 41-file directory into structured layout with: - scripts/ for Python analysis code with shared config.py - notebooks/ for Jupyter analysis notebooks - data/ split into raw/, metadata/, processed/ - docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial - tasks/ with todo checklist and lessons learned - Comprehensive README, PLANNING.md, and .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 16:08:36 +00:00

23 commits