cupido

lab/cupido

Author	SHA1	Message	Date
Giorgio Gilestro	2623df4172	Picker: identify the analyst (initials) per pick Each annotation row now carries an `analyst` column. On first visit the web picker shows a small login modal asking for initials, persists them in localStorage, and shows the badge in the top-right. Click the badge to change identities. Submissions without initials are rejected by the backend (HTTP 400). Skip remains analyst-free. Backfill: every existing barrier_opening.csv row marked as `GG` since all current picks were done by Giorgio. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 14:23:57 +01:00
Giorgio Gilestro	12568b82cc	Welcome modal + port 8085 Add a dismissable welcome modal that walks first-time users through the proper annotation sequence (slider to end → check open ROIs → slider to start → arrow-key fine-tune → click). Stays hidden after the first "Got it" via localStorage; the ? button in the header reopens it any time. Picker keyboard shortcuts are inert while the modal is showing. Container exposes 8085 instead of 8000 (8000 was free, but Giorgio's preferred 8082 is already in use on this host; 8085 is the closest free port). Internal port stays 8000 so the FastAPI app is unchanged. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 14:15:42 +01:00
Giorgio Gilestro	3f0760c98e	Picker: simpler keyboard shortcuts (±5 s / ±30 s) Dropped Ctrl+arrow (±0.1 s) and ,/. frame stepping — too fine for spotting the barrier opening visually. Shift+arrow now jumps ±30 s instead of ±1 s, which matches how analysts actually navigate (5 s for fine, 30 s for skipping ahead). Drag the seekbar if you need sub-second precision. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 14:11:36 +01:00
Giorgio Gilestro	53b45e373b	Dedupe + canonicalise the merged xlsx, then guard the export 108 of 508 rows in all_video_info_merged.xlsx were duplicates left over from merging multiple source spreadsheets — same (date, machine, ROI) appearing under two source_date values, identical data otherwise. The `male` column was also using a mix of variants ('naïve', 'niave', 'naive', 'trained') with the canonical 'naive' a minority of 12/200. scripts/cleanup_xlsx.py Idempotent one-off: backs up the xlsx, dedupes preferring the row whose source_date matches the experiment date, normalises `male` spellings, strips whitespace from string columns. Re-running on a clean file is a no-op. scripts/export_video_db_index.py New _validate_xlsx() runs first thing in main() and aborts the export with an actionable error if duplicates or non-canonical male values are present. Prevents silent regressions when the xlsx is edited or re-merged in the future. Result: TSV is now 400 rows (was 508), exactly 200 trained / 200 naive, no duplicates. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 13:39:57 +01:00
Giorgio Gilestro	4ed988a617	Show experimental metadata above the video in the picker Each video row now carries a `metadata` dict aggregated from the merged TSV: species, memory (STM/LTM), training_length_hr, consolidation_length_hr, age, training/testing date-time, and trained/naive fly counts. The UI renders these as a row of key:value pills above the video, with the session role (training/testing) colour-coded so the analyst can see at a glance what they're picking. The merged TSV currently has duplicate rows per (date, machine, ROI); the aggregator de-dups on those keys so counts aren't doubled. (The duplication itself should be cleaned up upstream.) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 12:54:40 +01:00
Giorgio Gilestro	1a7542def2	Add barrier_picker_app — Dockerised web picker for barrier opening A FastAPI app + plain HTML5 video page that replaces the matplotlib picker. Browse to http://host:8000/, scrub through each video with arrow keys (±5 s, ±1 s with Shift, ±0.1 s with Ctrl, ±1 frame with ,/.), and click one of three buttons: - All barriers open — every ROI usable - Upper barrier opens — ROIs 1,3,5 usable; lower row marked bad - Lower barrier opens — ROIs 2,4,6 usable; upper row marked bad The current playhead time is recorded as opening_s; bad_rois is set accordingly. Also keyboard shortcuts (1/2/3 for the three modes, s/u for skip/unusable). Refresh-safe: every submission persists to data/metadata/barrier_opening.csv before advancing. Server uses byte-range streaming so seeking inside long videos is fast. Dockerfile + docker-compose.yml mount the data volume RO and the metadata folder RW. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 12:33:28 +01:00
Giorgio Gilestro	24403e0474	Force interactive matplotlib backend in pick_barrier Some environments default matplotlib to Agg (non-interactive), which silently no-ops plt.show() — the picker would print "FigureCanvasAgg is non-interactive" and never display the thumbnail grid. Probe TkAgg > QtAgg > Qt5Agg > GTK3Agg before pyplot import. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 12:23:15 +01:00
Giorgio Gilestro	2b75daa783	Replace fine thumbnail grid with mpv/vlc/ffplay handoff Watching the video play turns out to be much faster than scanning a thumbnail grid. The coarse 10-min thumbnail grid still does rough localisation; after picking, a video player launches at coarse_t-30s paused with frame-accurate scrubbing controls. The analyst reads the exact opening time off the player's OSD and types it into the terminal prompt (default = the coarse pick, so a single Enter keeps the coarse pick if the player is hard to use). Backend auto-detects mpv > vlc > ffplay; gracefully degrades to "use the coarse pick" if no player is installed. New `bad_rois` column captures non-opening sub-arenas (partial-opening videos like the 2024-10-21 set where only the lower half opens). The prompt validates entries are integers in 1..6. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 12:20:09 +01:00
Giorgio Gilestro	125f187187	Extend pick_barrier coarse window to 10 min by default Some videos have late barrier opening (e.g. 5:46) that fell outside the original 5-min search window. Default coarse-grid span is now 600 s (10 s spacing in the 60-thumb grid). Add --coarse-span CLI flag to widen further if needed; auto-suggest scans the same 10-min window. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 12:08:23 +01:00
Giorgio Gilestro	e8c7f23d4d	Replace pick_barrier.py with thumbnail-grid UX Old version showed inter-fly distance plots and asked the analyst to click a timeline. The new version reads frames directly from the .mp4 and shows a 10×6 grid of timestamped thumbnails — the analyst just clicks the frame where the barrier opens. Two-stage refinement: - Coarse grid: 60 thumbs spanning the 5-min search window at ~5 s spacing. Pick the rough moment. - Fine grid: 60 thumbs at 0.2 s spacing centred on the coarse pick. Pick the exact frame. Auto-detector still feeds the starting position. Sequential video decode (one cv2 pass through the relevant range) instead of seek-per- frame, so each grid loads in a few seconds. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 12:01:34 +01:00
Giorgio Gilestro	b46c4ac1ba	Add pick_barrier.py interactive annotator + seed CSV with 2025-07-15 pick_barrier.py loops over every tracked DB referenced by the merged TSV, plots windowed mean inter-fly distance for all 6 ROIs in a single figure, and lets the analyst click the moment the barrier opens. Saves to data/metadata/barrier_opening.csv after each pick (crash-safe). Auto-detector best-effort guess shown as orange dotted line — the analyst always has the final say. Output schema: machine_name, session_date, session_time, opening_s, trim_first_s, notes `trim_first_s` lets us record misframed starts so downstream code can ignore the affected window. The 5 2025-07-15 entries are seeded from the original legacy CSV so they're not re-picked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 11:58:54 +01:00
Giorgio Gilestro	e20530219b	Correct barrier-opening times for 16-31-34 and 16-31-41 Both videos have ~60-69 s of misframed data at the start (arena partially out of frame). The original times (25 s, 20 s) were measured on ffmpeg-trimmed copies that no longer exist; on the full untrimmed videos the actual barriers open at 94 s and 89 s respectively. Confirmed by eye. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 11:53:58 +01:00
Giorgio Gilestro	2e80b834ca	Add video duration_s to inventory and propagate to merged TSV build_video_inventory.py now opens each mp4 with cv2 to record duration_s. Cached: a video already in the previous inventory keeps its computed duration, so re-runs only pay the cv2 cost for new recordings. export_video_db_index.py looks up the matched video's duration and writes it as training_video_duration_s / testing_video_duration_s alongside the existing path columns. Useful for spotting unusually short or long sessions and for sanity checks on the tracker output. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 11:13:05 +01:00
Giorgio Gilestro	847d2cbd1b	Merge 2025-07-15 batch into the xlsx; tools to detect & re-track - merge_2025_07_15_into_xlsx.py: pivot the legacy 2025_07_15_metadata_fixed.csv into the unified xlsx schema (one row per fly, training_date_time + testing_date_time). Backs up the xlsx before writing. 24 new rows across machines 076 / 139 / 145 / 268. - pick_targets.py: --video flag to bypass the inventory's in_xlsx filter, so a specific mp4 can be picked outside the normal flow. - explore_barrier_signal.py: visualises raw y(t), per-frame inter-fly distance, and sliding min/mean distance against a known barrier-opening time. Used for prototyping the detector. - detect_barrier_opening.py: per-ROI sliding-window mean-distance change-point estimator (median across ROIs). Currently noisy on a one-video calibration set; will be re-tuned once the 4 missing 2025-07-15 videos are re-tracked. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 10:28:25 +01:00
Giorgio Gilestro	8f3c4ca89c	Make flies_analysis_simple robust to bad caches and empty alignment - Cell 6: raise a clear ValueError if no loaded machine has a barrier- opening entry, listing what's loaded vs what's annotated. Previously alignment quietly produced empty DataFrames and we crashed five cells later with a cryptic KeyError on 'distance'. - Cell 10: validate the cached CSVs (presence + expected columns + non-empty) before using them; fall through to recomputation if not. Skip writing the cache when results are empty so future runs don't pick up a 1-byte placeholder. - Cell 3: derive a 'group' column from 'male' so downstream helpers that reference fly['group'] still work. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:59:34 +01:00
Giorgio Gilestro	b273255dea	Make load_roi_data progress bar refresh reliably in JupyterLab Prefer tqdm.notebook (HTML widget) over tqdm.auto so JupyterLab gets a proper updating bar even when its text-mode \r refresh doesn't render in-place. Tick per session (2× per fly) instead of per fly so the bar advances roughly every second, and add a postfix showing the current machine + ROI + session — gives visible motion even on slow rows. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:43:12 +01:00
Giorgio Gilestro	8abb3d5955	Add tqdm progress bar to load_roi_data Loading the full batch issues 968 SQL queries and takes minutes — show a tqdm progress bar (one tick per fly/ROI row) and print an upfront "this takes 1-3 minutes" notice so the user knows to wait. Uses tqdm.auto so it picks the Jupyter widget when run from a notebook and plain text on the CLI. New `progress=True` parameter on load_roi_data, flip to False for silent batch use. tqdm + ipywidgets added to requirements. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:34:42 +01:00
Giorgio Gilestro	ac3b8c13f0	Move personal TSV into repo's data/metadata/ folder Personal copy of all_video_info_merged.tsv now lives at ~/cupido/data/metadata/all_video_info_merged.tsv (gitignored) instead of ~/cupido_metadata.tsv. That sits next to the other small metadata CSVs (barrier_opening, etc.) — the natural home for it. Updated all five notebooks and processed/README accordingly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:30:22 +01:00
Giorgio Gilestro	f08e4b843d	Per-user metadata TSV — auto-prefer ~/cupido_metadata.tsv if present The shared TSV at /mnt/data/projects/cupido/ is read-only inside the container, so users who want to customize the `include` column (or any metadata) need a personal copy. Notebooks now check for ~/cupido_metadata.tsv first and fall back to the shared master if it doesn't exist. Each user keeps their own edits without stepping on anyone else's analysis. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:25:24 +01:00
Giorgio Gilestro	23050360ea	Remove data/raw/ entirely — all bulky data now under /mnt/data/projects/cupido/ Deleted the 5 stale pre-pipeline tracking DBs and the data/raw/ directory. Dropped DATA_RAW from config.py; build_video_inventory now scans TRACKING_OUTPUT_DIR for already-tracked sessions. Notebooks no longer import DATA_RAW. README, PLANNING and todo updated to reflect that the repo holds only code + small curated metadata, never bulky DBs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:20:25 +01:00
Giorgio Gilestro	9f3ee24a23	Add per-row include flag to TSV; expand flies_analysis_simple narrative - export_video_db_index.py now writes a boolean `include` column (default True). Flip it to False to drop a noisy/unusable row from analysis without deleting it. - load_roi_data filters on `include` automatically (back-compat: missing column = load everything). - flies_analysis_simple.ipynb section headers now explain why each step exists (barrier alignment, body-area baseline, merged-blob heuristic, Hungarian identity tracking) rather than just naming the step. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:09:59 +01:00
Giorgio Gilestro	723d1f3682	Make data paths visible and explicit in flies_analysis notebooks Define METADATA_TSV and TRACKED_DBS up front in cell 1, assert they exist before doing anything else, and pass the loaded metadata to load_roi_data() explicitly. Surfaces path problems immediately with a readable message instead of failing deep inside the loader. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 09:01:55 +01:00
Giorgio Gilestro	231c7a437f	Remove hardcoded /home/gg paths so the project is portable Notebooks now use Path.home() / "cupido" for the repo root (works for any user inside the JupyterLab container), and the offline-tracking scripts read the ethoscope source-tree location from the new ETHOSCOPE_SRC config constant — defaulting to ~/Code/ethoscope_project/... and overridable via the ETHOSCOPE_SRC environment variable. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 08:55:44 +01:00
Giorgio Gilestro	5934dce21e	Simplify path setup in flies_analysis notebooks Replace the cryptic Path("..").resolve() walk-up with explicit DATA_DIR and REPO_ROOT constants, then import the rest of the path constants (DATA_RAW, DATA_METADATA, DATA_PROCESSED, FIGURES) directly from scripts/config.py — single source of truth, easier to read for students. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 08:50:11 +01:00
Giorgio Gilestro	f176224150	Move metadata xlsx/TSV to /mnt/data/projects/cupido/ Consolidates everything bulky (tracking DBs, targets, metadata spreadsheet) under a single DATA_VOLUME root outside the ownCloud-synced repo. Notebooks now use a visible DATA_DIR = Path(...) idiom rather than walking up the filesystem with PROJECT_ROOT.parent — easier for students with no Python background to follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-05-01 08:47:15 +01:00
Giorgio Gilestro	ec56e51bf9	Add beginner tutorial notebooks for incoming students Four guided notebooks under notebooks/getting_started/ aimed at someone new to Python and data science. The series progresses: project orientation → Python/pandas crash course → exploring one tracking DB → first trained-vs-naive comparison using load_roi_data + Mann-Whitney U. Each notebook leans heavily on markdown explanations, includes exercises with empty cells, and links out to canonical references (JupyterLab, official Python tutorial, pandas 10-min guide, Wikipedia for stats concepts). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 18:14:17 +01:00
Giorgio Gilestro	7d09523840	Move TARGETS_DIR to /mnt/data/projects/cupido/targets Targets relocated alongside the tracking DBs (out of ownCloud sync) so the docker mount already covers them and ownCloud no longer churns on JSON sidecars. Updated config, fixed a stale docstring in pick_targets, and dropped the now-moot data/targets/*.json gitignore rule. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 17:13:55 +01:00
Giorgio Gilestro	f60a9d0530	Unify analysis pipeline around the TSV; move tracked DBs out of cloud sync - Tracked DBs now live at /mnt/data/projects/cupido/tracked/ (out of ownCloud to avoid sync conflicts and bandwidth churn). config.py TRACKING_OUTPUT_DIR points there; the docker-compose for ethoscope-lab mounts it world-readable for JupyterHub users. - New scripts/export_video_db_index.py joins all_video_info_merged.xlsx with the video inventory and the on-disk DBs, producing a TSV that has one row per fly/ROI plus training/testing video and DB paths. Handles approximate xlsx times, cross-day training/testing, the 12 AM/PM ambiguity, and date typos. - scripts/load_roi_data.py rewritten as a TSV-driven loader returning a single DataFrame with session and metadata columns. calculate_distances and the two flies_analysis notebooks migrated to use it; downstream trained/naive splits remain available via simple equality filters. - Metadata vocabulary canonicalized: {naïve, niave, untrained, test} all resolve to {trained, naive}. Normalization happens at the TSV-export boundary (idempotent); the xlsx and the 2025-07-15 legacy CSV were edited in place to remove the worst variants. - scripts/monitor_tracking.py rate calculation fixed: with N parallel workers, completions arrive in bursts; the old formula divided by burst width and reported nonsense rates. Now uses a 6 h window denominator. - scripts/track_videos.py: BGRMovieCamera retries cv2.read on transient NFS hiccups and a post-tracking completeness gate (≥ 90 % of expected duration via MAX(t) across all 6 ROIs) deletes silent partial DBs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-30 15:20:14 +01:00
Giorgio	e4da7691d5	Add offline tracking pipeline for video backlog The 2024 video set in all_video_info_merged.xlsx covers 63 (date, machine) sessions — 129 video instances — that have no auto-detectable targets, so ROI placement requires manual reference-point selection. This commit adds the three-stage pipeline that lets a user click for an hour, then walk away while the tracker grinds overnight: 1. build_video_inventory.py — scan /mnt/ethoscope_data/videos/ and join against the xlsx, producing data/metadata/video_inventory.csv 2. pick_targets.py — interactive matplotlib/Tk picker. User clicks TOP/CORNER/LEFT (the L-shape ethoscope expects); after the third click the 6 ROI rectangles are drawn on top of the frame so geometry can be verified before saving. Also supports marking a video 'unusable' (FOV wrong) so it's permanently skipped, frame stepping by ±1s/±5%/midpoint, point editing in --redo mode, and a crosshair cursor that survives matplotlib's per-motion cursor reset. 3. track_videos.py — headless batch tracker. Reads the JSON sidecars, builds 6 ROIs from the HD-mating-arena geometry, runs MultiFlyTracker against the merged.mp4 via MovieVirtualCamera, writes SQLite DBs to data/tracked/. Idempotent (skips done DBs), parallel via --jobs, subclasses MovieVirtualCamera so frames stay BGR (MultiFlyTracker calls cvtColor(BGR2GRAY) without checking channel count). Plus auto_detect_targets.py (fallback that runs ethoscope's auto-detector in case any videos do have visible target dots), monitor_tracking.py (progress + ETA from data/tracked/ ground truth, --watch for live view), and tracking_geometry.py (single source of truth for the affine math shared by picker and tracker). requirements-tracking.txt pins the extra deps (opencv-python, openpyxl, gitpython, netifaces, mysql-connector-python) — these are only needed for the tracking pipeline, not the existing analysis notebooks. Verified end-to-end on one of the user-picked videos: ~4000 rows/ROI in a 120s slice, fly bounding boxes in the expected 800-2000 px² band. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-27 17:25:26 +01:00
Giorgio	e7e4db264d	Initial commit: organized project structure for student handoff Reorganized flat 41-file directory into structured layout with: - scripts/ for Python analysis code with shared config.py - notebooks/ for Jupyter analysis notebooks - data/ split into raw/, metadata/, processed/ - docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial - tasks/ with todo checklist and lessons learned - Comprehensive README, PLANNING.md, and .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 16:08:36 +00:00

30 commits