The 2024 video set in all_video_info_merged.xlsx covers 63 (date, machine)
sessions — 129 video instances — that have no auto-detectable targets, so
ROI placement requires manual reference-point selection. This commit adds
the three-stage pipeline that lets a user click for an hour, then walk
away while the tracker grinds overnight:
1. build_video_inventory.py — scan /mnt/ethoscope_data/videos/ and join
against the xlsx, producing data/metadata/video_inventory.csv
2. pick_targets.py — interactive matplotlib/Tk picker. User clicks
TOP/CORNER/LEFT (the L-shape ethoscope expects); after the third
click the 6 ROI rectangles are drawn on top of the frame so geometry
can be verified before saving. Also supports marking a video
'unusable' (FOV wrong) so it's permanently skipped, frame stepping
by ±1s/±5%/midpoint, point editing in --redo mode, and a crosshair
cursor that survives matplotlib's per-motion cursor reset.
3. track_videos.py — headless batch tracker. Reads the JSON sidecars,
builds 6 ROIs from the HD-mating-arena geometry, runs MultiFlyTracker
against the merged.mp4 via MovieVirtualCamera, writes SQLite DBs to
data/tracked/. Idempotent (skips done DBs), parallel via --jobs,
subclasses MovieVirtualCamera so frames stay BGR (MultiFlyTracker
calls cvtColor(BGR2GRAY) without checking channel count).
Plus auto_detect_targets.py (fallback that runs ethoscope's auto-detector
in case any videos do have visible target dots), monitor_tracking.py
(progress + ETA from data/tracked/ ground truth, --watch for live view),
and tracking_geometry.py (single source of truth for the affine math
shared by picker and tracker).
requirements-tracking.txt pins the extra deps (opencv-python, openpyxl,
gitpython, netifaces, mysql-connector-python) — these are only needed
for the tracking pipeline, not the existing analysis notebooks.
Verified end-to-end on one of the user-picked videos: ~4000 rows/ROI in
a 120s slice, fly bounding boxes in the expected 800-2000 px² band.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5.6 KiB
Task List
Completed Work
- Extract ROI data from SQLite databases grouped by trained/untrained
- Calculate inter-fly distances at each time point
- Align data to barrier opening time (t=0)
- Plot average distance over time (entire experiment + 300s window)
- Track fly identities across frames (Hungarian algorithm)
- Calculate max velocity over 10-second moving windows
- Statistical tests (t-tests, Cohen's d) comparing groups
- ML classification attempt (Logistic Regression, Random Forest)
- Clustering analysis (K-means)
- Organize project structure for student handoff
Priority: Bimodal Hypothesis Analysis
See docs/bimodal_hypothesis.md for detailed methodology.
Phase 1: Per-ROI Feature Extraction
- Compute per-ROI summary statistics from aligned distance data
- Mean distance post-opening (0-300s)
- Median distance post-opening
- Fraction of time at distance < 50px ("close proximity")
- Mean max velocity post-opening
- Create a summary DataFrame with N=18 trained + N=18 untrained rows
- Note: Only 30 ROIs have data (Machine 139 missing = 6 ROIs lost)
Phase 2: Distribution Visualization
- Plot histograms/KDE of per-ROI metrics for each group
- Look for bimodality in trained group vs unimodality in untrained
Phase 3: Formal Bimodality Testing
- Hartigan's dip test on trained per-ROI distributions
- Fit Gaussian Mixture Models (1 vs 2 components) to trained data
- Compare BIC scores to determine optimal number of components
Phase 4: Subgroup Identification
- If bimodal: classify trained ROIs as "learner" vs "non-learner" using GMM posteriors
- Compare learner subgroup vs untrained group (expect larger effect size)
Phase 5: Effect Size Re-estimation
- Mann-Whitney U test (appropriate for small N)
- Bootstrap confidence intervals for effect sizes
- Account for session as random effect
Maintenance Items
- Investigate missing Machine 139 data (has metadata but no tracking DB)
- Add
diptestto requirements.txt when starting bimodal analysis - Consider converting pixel distances to physical units (need calibration)
- The second notebook (
flies_analysis.ipynb) re-runs from DB extraction - consider deprecating
Phase: Offline Tracking of 2024 Video Backlog (added 2026-04-27)
Recap
Tracked so far (5 sessions, all from 2025-07-15, machines 076/145/268). The DBs in
data/raw/ use tracker ConstrainedMultiFlyTracker and template
HD_Mating_Arena_6_ROIS.json (2 flies × 6 ROIs per video).
The metadata file ../all_video_info_merged.xlsx indexes a different set of
experiments: 7 dates from 2024-09-17 → 2024-10-21, 16 ethoscope machines,
63 unique (date, machine) sessions = 484 ROI-rows. None of the already-tracked
sessions are in this xlsx — these are fresh recordings to track.
Inventory: see data/metadata/video_inventory.csv (built by
scripts/build_video_inventory.py).
- 1163 video sessions on disk under
/mnt/ethoscope_data/videos/ - 63/63 xlsx (date, machine) sessions have video on disk
- 129 video instances need tracking (some (date, machine) have 2-4 recordings/day)
Plan
The HD-mating-arena videos have no auto-detectable targets — the user must manually click 3 reference points (L-shape: top, corner, left) per video. Once all targets are picked, tracking can run in the background.
- Step 1 — Inventory:
scripts/build_video_inventory.py→data/metadata/video_inventory.csv. 63 (date,machine) sessions match the xlsx, all videos found, 129 video instances need tracking. - Step 2 — Manual target picker:
scripts/pick_targets.py. Loops over videos within_xlsx & ~already_tracked & no JSON yet; per video, shows a representative frame, captures 3 clicks (top, corner, left), savesdata/targets/<video_basename>.json. Skips videos already done. - Step 3 — Background tracker:
scripts/track_videos.py. Reads target JSONs, builds 6 ROIs from the HD-mating-arena geometry, runsMovieVirtualCamera+MultiFlyTracker+SQLiteResultWriter, writesdata/tracked/<basename>_tracking.db. Idempotent. Smoke-tested end-to-end: 90s of video → ~3000 rows/ROI, areas in 800-2000 band. - Step 4 — Tracking deps:
requirements-tracking.txt.
Still TODO
- User to run
pick_targets.py(interactive — needs DISPLAY) on the 129 pending videos. - Run
track_videos.py --jobs 4against the resulting JSONs. - (Optional)
auto_detect_targets.pyexists as a fallback for videos that DO have visible targets (saves clicks). Confirmed not useful on the 2025-07-15 batch — these arenas don't have black target dots — but worth trying on 2024 batches before falling back to manual. - Decide what to do with the 4 (date, machine) sessions that have 3-4 recordings/day instead of 2 (e.g. ETHOSCOPE_086 on 2024-09-17 has 4). One of them is at lower resolution (1280x960) — likely an aborted take.
Open questions / risks
- Some (date, machine) combos have 3-4 recordings (e.g. ETHOSCOPE_086 on 2024-09-17). Need to figure out which is the real "test" video vs aborted takes — possibly use video duration or filename pattern.
- One mismatched-resolution file:
1280x960@25fps-20qinstead of1920x1088@25fps-28q— flag for inspection. - The original
ConstrainedMultiFlyTrackeris no longer in the ethoscope repo;MultiFlyTrackeris its likely successor. Validate output schema matches what the existing analysis pipeline expects (load_roi_data.py, etc.).
Discovered During Work
(Add new items here as they come up during analysis)