Unify analysis pipeline around the TSV; move tracked DBs out of cloud sync

- Tracked DBs now live at /mnt/data/projects/cupido/tracked/ (out of ownCloud to avoid sync conflicts and bandwidth churn). config.py TRACKING_OUTPUT_DIR points there; the docker-compose for ethoscope-lab mounts it world-readable for JupyterHub users. - New scripts/export_video_db_index.py joins all_video_info_merged.xlsx with the video inventory and the on-disk DBs, producing a TSV that has one row per fly/ROI plus training/testing video and DB paths. Handles approximate xlsx times, cross-day training/testing, the 12 AM/PM ambiguity, and date typos. - scripts/load_roi_data.py rewritten as a TSV-driven loader returning a single DataFrame with session and metadata columns. calculate_distances and the two flies_analysis notebooks migrated to use it; downstream trained/naive splits remain available via simple equality filters. - Metadata vocabulary canonicalized: {naïve, niave, untrained, test} all resolve to {trained, naive}. Normalization happens at the TSV-export boundary (idempotent); the xlsx and the 2025-07-15 legacy CSV were edited in place to remove the worst variants. - scripts/monitor_tracking.py rate calculation fixed: with N parallel workers, completions arrive in bursts; the old formula divided by burst width and reported nonsense rates. Now uses a 6 h window denominator. - scripts/track_videos.py: BGRMovieCamera retries cv2.read on transient NFS hiccups and a post-tracking completeness gate (≥ 90 % of expected duration via MAX(t) across all 6 ROIs) deletes silent partial DBs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 15:20:14 +01:00 · 2026-04-30 15:20:14 +01:00 · f60a9d0530
commit f60a9d0530
parent e4da7691d5
13 changed files with 569 additions and 237 deletions
--- a/tasks/todo.md
+++ b/tasks/todo.md
@ -115,4 +115,26 @@ all targets are picked, tracking can run in the background.

 ## Discovered During Work

-(Add new items here as they come up during analysis)
+### Barrier-opening annotation for the 2024 batch (added 2026-04-30)
+The current `flies_analysis*.ipynb` aligns trajectories to a barrier-opening
+event sourced from `data/metadata/2025_07_15_barrier_opening.csv`. That file
+covers only the 5 machines in the 2025-07-15 experiment. The 2024 batch
+(`/mnt/data/projects/cupido/tracked/`, 113 DBs) has no equivalent annotation
+yet, so all post-alignment cells silently exclude that data.
+
+- [ ] Build a small picker that lets the user scrub through each tracking
+      DB / video and mark the barrier-opening frame, writing a row to a new
+      `data/metadata/barrier_opening_2024.csv` (or extend the existing
+      file with a date column).
+- [ ] Once the 2024 entries exist, update `align_to_opening_time` so it
+      pulls from a unified `barrier_opening` table keyed by
+      `(date, machine_name)` rather than `machine_name` alone.
+
+### Metadata vocabulary normalization (done 2026-04-30)
+The xlsx had inconsistent labels for control flies (`'naïve'`, `'niave'`,
+`'untrained'` plus trailing whitespace). All sources now use a single
+canonical `'naive'`. Normalization happens in
+`scripts/export_video_db_index.py` so re-running it from the xlsx always
+produces a clean TSV. The 2025-07-15 legacy CSV
+(`data/metadata/2025_07_15_metadata_fixed.csv`) was edited in place from
+`'untrained'` → `'naive'`.