Unify analysis pipeline around the TSV; move tracked DBs out of cloud sync

- Tracked DBs now live at /mnt/data/projects/cupido/tracked/ (out of
  ownCloud to avoid sync conflicts and bandwidth churn). config.py
  TRACKING_OUTPUT_DIR points there; the docker-compose for ethoscope-lab
  mounts it world-readable for JupyterHub users.
- New scripts/export_video_db_index.py joins all_video_info_merged.xlsx
  with the video inventory and the on-disk DBs, producing a TSV that has
  one row per fly/ROI plus training/testing video and DB paths. Handles
  approximate xlsx times, cross-day training/testing, the 12 AM/PM
  ambiguity, and date typos.
- scripts/load_roi_data.py rewritten as a TSV-driven loader returning a
  single DataFrame with session and metadata columns. calculate_distances
  and the two flies_analysis notebooks migrated to use it; downstream
  trained/naive splits remain available via simple equality filters.
- Metadata vocabulary canonicalized: {naïve, niave, untrained, test} all
  resolve to {trained, naive}. Normalization happens at the TSV-export
  boundary (idempotent); the xlsx and the 2025-07-15 legacy CSV were
  edited in place to remove the worst variants.
- scripts/monitor_tracking.py rate calculation fixed: with N parallel
  workers, completions arrive in bursts; the old formula divided by burst
  width and reported nonsense rates. Now uses a 6 h window denominator.
- scripts/track_videos.py: BGRMovieCamera retries cv2.read on transient
  NFS hiccups and a post-tracking completeness gate (≥ 90 % of expected
  duration via MAX(t) across all 6 ROIs) deletes silent partial DBs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Giorgio Gilestro 2026-04-30 15:20:14 +01:00
parent e4da7691d5
commit f60a9d0530
13 changed files with 569 additions and 237 deletions

7
.gitignore vendored
View file

@ -2,11 +2,8 @@
data/raw/*.db data/raw/*.db
data/processed/*.csv data/processed/*.csv
# Offline-tracking outputs (reproducible from videos + target JSONs) # Offline-tracking outputs (regenerable from videos + target JSONs)
data/tracked/*.db # DBs live outside the repo at /mnt/data/projects/cupido/tracked/
data/tracked/*.db-wal
data/tracked/*.db-shm
data/tracked/*.db-journal
data/targets/*.json data/targets/*.json
data/metadata/video_inventory.csv data/metadata/video_inventory.csv
data/logs/*.log data/logs/*.log

View file

@ -66,7 +66,7 @@ python scripts/pick_targets.py --redo # re-pick already-picked videos
# 3) batch tracking (idempotent, can run in background) # 3) batch tracking (idempotent, can run in background)
python scripts/track_videos.py --jobs 4 # parallel python scripts/track_videos.py --jobs 4 # parallel
# output → data/tracked/*_tracking.db (SQLite, same schema as data/raw/) # output → /mnt/data/projects/cupido/tracked/*_tracking.db (SQLite, same schema as data/raw/)
``` ```
See `tasks/todo.md` "Offline Tracking" section for the full plan, and See `tasks/todo.md` "Offline Tracking" section for the full plan, and

View file

@ -1,37 +1,37 @@
date,HHMMSS,machine_name,ROI,genotype,group,path,filesize_mb date,HHMMSS,machine_name,ROI,genotype,group,path,filesize_mb
15/07/2025,16-03-10,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4 15/07/2025,16-03-10,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
15/07/2025,16-03-10,76,4,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4 15/07/2025,16-03-10,76,4,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
15/07/2025,16-03-10,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4 15/07/2025,16-03-10,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
15/07/2025,16-03-10,76,5,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4 15/07/2025,16-03-10,76,5,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
15/07/2025,16-03-10,76,3,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4 15/07/2025,16-03-10,76,3,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
15/07/2025,16-03-10,76,1,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4 15/07/2025,16-03-10,76,1,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
15/07/2025,16-31-34,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98 15/07/2025,16-31-34,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
15/07/2025,16-31-34,76,4,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98 15/07/2025,16-31-34,76,4,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
15/07/2025,16-31-34,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98 15/07/2025,16-31-34,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
15/07/2025,16-31-34,76,5,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98 15/07/2025,16-31-34,76,5,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
15/07/2025,16-31-34,76,3,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98 15/07/2025,16-31-34,76,3,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
15/07/2025,16-31-34,76,1,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98 15/07/2025,16-31-34,76,1,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
15/07/2025,16-03-27,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72 15/07/2025,16-03-27,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
15/07/2025,16-03-27,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72 15/07/2025,16-03-27,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
15/07/2025,16-03-27,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72 15/07/2025,16-03-27,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
15/07/2025,16-03-27,145,5,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72 15/07/2025,16-03-27,145,5,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
15/07/2025,16-03-27,145,3,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72 15/07/2025,16-03-27,145,3,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
15/07/2025,16-03-27,145,1,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72 15/07/2025,16-03-27,145,1,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
15/07/2025,16-31-41,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9 15/07/2025,16-31-41,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
15/07/2025,16-31-41,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9 15/07/2025,16-31-41,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
15/07/2025,16-31-41,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9 15/07/2025,16-31-41,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
15/07/2025,16-31-41,145,5,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9 15/07/2025,16-31-41,145,5,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
15/07/2025,16-31-41,145,3,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9 15/07/2025,16-31-41,145,3,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
15/07/2025,16-31-41,145,1,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9 15/07/2025,16-31-41,145,1,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
15/07/2025,16-31-52,139,6,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4 15/07/2025,16-31-52,139,6,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
15/07/2025,16-31-52,139,4,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4 15/07/2025,16-31-52,139,4,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
15/07/2025,16-31-52,139,2,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4 15/07/2025,16-31-52,139,2,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
15/07/2025,16-31-52,139,5,CS,untrained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4 15/07/2025,16-31-52,139,5,CS,naive,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
15/07/2025,16-31-52,139,3,CS,untrained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4 15/07/2025,16-31-52,139,3,CS,naive,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
15/07/2025,16-31-52,139,1,CS,untrained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4 15/07/2025,16-31-52,139,1,CS,naive,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
15/07/2025,16-32-05,268,6,CS,untrained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72 15/07/2025,16-32-05,268,6,CS,naive,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
15/07/2025,16-32-05,268,4,CS,untrained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72 15/07/2025,16-32-05,268,4,CS,naive,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
15/07/2025,16-32-05,268,2,CS,untrained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72 15/07/2025,16-32-05,268,2,CS,naive,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
15/07/2025,16-32-05,268,5,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72 15/07/2025,16-32-05,268,5,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
15/07/2025,16-32-05,268,3,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72 15/07/2025,16-32-05,268,3,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
15/07/2025,16-32-05,268,1,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72 15/07/2025,16-32-05,268,1,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72

1 date HHMMSS machine_name ROI genotype group path filesize_mb
2 15/07/2025 16-03-10 76 6 CS trained /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 59.4
3 15/07/2025 16-03-10 76 4 CS untrained naive /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 59.4
4 15/07/2025 16-03-10 76 2 CS trained /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 59.4
5 15/07/2025 16-03-10 76 5 CS untrained naive /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 59.4
6 15/07/2025 16-03-10 76 3 CS trained /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 59.4
7 15/07/2025 16-03-10 76 1 CS untrained naive /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 59.4
8 15/07/2025 16-31-34 76 6 CS trained /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 78.98
9 15/07/2025 16-31-34 76 4 CS trained /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 78.98
10 15/07/2025 16-31-34 76 2 CS trained /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 78.98
11 15/07/2025 16-31-34 76 5 CS untrained naive /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 78.98
12 15/07/2025 16-31-34 76 3 CS untrained naive /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 78.98
13 15/07/2025 16-31-34 76 1 CS untrained naive /mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4 78.98
14 15/07/2025 16-03-27 145 6 CS trained /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 78.72
15 15/07/2025 16-03-27 145 4 CS trained /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 78.72
16 15/07/2025 16-03-27 145 2 CS trained /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 78.72
17 15/07/2025 16-03-27 145 5 CS untrained naive /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 78.72
18 15/07/2025 16-03-27 145 3 CS untrained naive /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 78.72
19 15/07/2025 16-03-27 145 1 CS untrained naive /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 78.72
20 15/07/2025 16-31-41 145 6 CS trained /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 90.9
21 15/07/2025 16-31-41 145 4 CS trained /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 90.9
22 15/07/2025 16-31-41 145 2 CS trained /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 90.9
23 15/07/2025 16-31-41 145 5 CS untrained naive /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 90.9
24 15/07/2025 16-31-41 145 3 CS untrained naive /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 90.9
25 15/07/2025 16-31-41 145 1 CS untrained naive /mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4 90.9
26 15/07/2025 16-31-52 139 6 CS trained /mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4 73.4
27 15/07/2025 16-31-52 139 4 CS trained /mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4 73.4
28 15/07/2025 16-31-52 139 2 CS trained /mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4 73.4
29 15/07/2025 16-31-52 139 5 CS untrained naive /mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4 73.4
30 15/07/2025 16-31-52 139 3 CS untrained naive /mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4 73.4
31 15/07/2025 16-31-52 139 1 CS untrained naive /mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4 73.4
32 15/07/2025 16-32-05 268 6 CS untrained naive /mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4 43.72
33 15/07/2025 16-32-05 268 4 CS untrained naive /mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4 43.72
34 15/07/2025 16-32-05 268 2 CS untrained naive /mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4 43.72
35 15/07/2025 16-32-05 268 5 CS trained /mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4 43.72
36 15/07/2025 16-32-05 268 3 CS trained /mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4 43.72
37 15/07/2025 16-32-05 268 1 CS trained /mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4 43.72

View file

@ -1,39 +1,47 @@
# Processed Data # Processed Data
Large CSV files generated from the analysis pipeline. All files are gitignored (~370MB total) and can be regenerated. CSVs derived from the tracking DBs (`/mnt/data/projects/cupido/tracked/`)
and the merged TSV (`../../all_video_info_merged.tsv`). All files are
gitignored and regenerable.
## Files and Regeneration ## Files and Regeneration
| File | Description | Generated By | | File | Description | Generated By |
|------|-------------|--------------| |------|-------------|--------------|
| `trained_roi_data.csv` | Raw tracking data for trained ROIs | `scripts/load_roi_data.py` or notebook step 1 | | `distances.csv` | Per-frame inter-fly distances for every (date, machine, ROI, session). Includes metadata columns to filter trained vs naïve, training phase, species, etc. | `scripts/calculate_distances.py` |
| `untrained_roi_data.csv` | Raw tracking data for untrained ROIs | `scripts/load_roi_data.py` or notebook step 1 | | `*_distances_aligned.csv` | (legacy, 2025-07-15 only) distances aligned to barrier opening | `notebooks/flies_analysis*.ipynb` |
| `trained_distances.csv` | Pairwise distances (unaligned) | `scripts/calculate_distances.py` | | `*_tracked.csv` | (legacy) identity-tracked fly positions | `notebooks/flies_analysis_simple.ipynb` |
| `untrained_distances.csv` | Pairwise distances (unaligned) | `scripts/calculate_distances.py` | | `*_max_velocity.csv` | (legacy) max velocity over 10 s windows | `notebooks/flies_analysis_simple.ipynb` |
| `trained_distances_aligned.csv` | Distances aligned to barrier opening | Notebook step 4 |
| `untrained_distances_aligned.csv` | Distances aligned to barrier opening | Notebook step 4 |
| `trained_tracked.csv` | Identity-tracked fly positions | Notebook step 7 |
| `untrained_tracked.csv` | Identity-tracked fly positions | Notebook step 7 |
| `trained_max_velocity.csv` | Max velocity over 10s windows | Notebook step 7 |
| `untrained_max_velocity.csv` | Max velocity over 10s windows | Notebook step 7 |
## To Regenerate All Data ## Loading the data
Run the full notebook `notebooks/flies_analysis_simple.ipynb` with:
```python ```python
recalculate_distances = True import sys
recalculate_tracking = True sys.path.insert(0, "../scripts")
from load_roi_data import load_roi_data
data = load_roi_data() # full batch as one DataFrame
# Or filter the metadata first:
import pandas as pd
tsv = pd.read_csv("../../all_video_info_merged.tsv", sep="\t")
data = load_roi_data(tsv[tsv.species.str.contains("Melanogaster")])
``` ```
**Warning**: Identity tracking and velocity calculations take significant time (~30+ minutes). The returned DataFrame has columns:
`id, t, x, y, w, h, phi, is_inferred, has_interacted, session, ROI, date,
machine_name, species, male, training_date_time, testing_date_time,
training_length_hr, consolidation_length_hr, memory, age`.
## Column Reference `session` is `"training"` or `"testing"`; `male` is `"trained"` or
`"naive"` (canonical — variants like `"naïve"` and `"niave"` are normalized
at the TSV-export step).
### Distance CSVs (`*_distances_aligned.csv`) ## Column Reference (`distances.csv`)
- `machine_name`: Ethoscope machine ID (string)
- `ROI`: ROI number (1-6) - `date`, `machine_name`, `ROI`, `session`: identifies one fly trajectory
- `aligned_time`: Time in ms relative to barrier opening (0 = opening) - `t`: time in ms within that session
- `distance`: Euclidean distance between flies in pixels - `distance`: Euclidean distance between the two flies in pixels
- `n_flies`: Number of flies detected at this time point - `n_flies`: number of fly detections at this frame (1 or 2)
- `area_fly1`, `area_fly2`: Bounding box areas (w*h) in pixels^2 - `area_fly1`, `area_fly2`: bounding-box areas (`w * h`) in pixels²
- `group`: "trained" or "untrained" - `male`: `trained` or `naive` (carried from the xlsx; normalized)
- `species`, `memory`, `age`: experimental metadata

View file

@ -28,7 +28,22 @@
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": "def load_roi_data():\n \"\"\"Load ROI data from SQLite databases and group by trained/untrained\"\"\"\n metadata = pd.read_csv(DATA_METADATA / '2025_07_15_metadata_fixed.csv')\n metadata['machine_name'] = metadata['machine_name'].astype(str)\n \n trained_rois = metadata[metadata['group'] == 'trained']\n untrained_rois = metadata[metadata['group'] == 'untrained']\n \n db_files = list(DATA_RAW.glob('*_tracking.db'))\n \n trained_df = pd.DataFrame()\n untrained_df = pd.DataFrame()\n \n for db_file in db_files:\n print(f\"Processing {db_file.name}\")\n \n pattern = r'_([0-9a-f]{32})__'\n match = re.search(pattern, db_file.name)\n \n if not match:\n print(f\"Could not extract UUID from {db_file.name}\")\n continue\n \n uuid = match.group(1)\n metadata_matches = metadata[metadata['path'].str.contains(uuid, na=False)]\n \n if metadata_matches.empty:\n print(f\"No metadata matches found for UUID {uuid}\")\n continue\n \n machine_id = metadata_matches.iloc[0]['machine_name']\n print(f\"Matched to machine ID: {machine_id}\")\n \n conn = sqlite3.connect(str(db_file))\n \n machine_trained = trained_rois[trained_rois['machine_name'] == machine_id]\n machine_untrained = untrained_rois[untrained_rois['machine_name'] == machine_id]\n \n for _, row in machine_trained.iterrows():\n roi = row['ROI']\n try:\n roi_data = pd.read_sql_query(f\"SELECT * FROM ROI_{roi}\", conn)\n roi_data['machine_name'] = machine_id\n roi_data['ROI'] = roi\n roi_data['group'] = 'trained'\n trained_df = pd.concat([trained_df, roi_data], ignore_index=True)\n except Exception as e:\n print(f\"Error loading ROI_{roi}: {e}\")\n \n for _, row in machine_untrained.iterrows():\n roi = row['ROI']\n try:\n roi_data = pd.read_sql_query(f\"SELECT * FROM ROI_{roi}\", conn)\n roi_data['machine_name'] = machine_id\n roi_data['ROI'] = roi\n roi_data['group'] = 'untrained'\n untrained_df = pd.concat([untrained_df, roi_data], ignore_index=True)\n except Exception as e:\n print(f\"Error loading ROI_{roi}: {e}\")\n \n conn.close()\n \n return trained_df, untrained_df\n\ntrained_data, untrained_data = load_roi_data()\nprint(f\"Trained data shape: {trained_data.shape}\")\nprint(f\"Untrained data shape: {untrained_data.shape}\")\n\ntrained_data.to_csv(DATA_PROCESSED / 'trained_roi_data.csv', index=False)\nuntrained_data.to_csv(DATA_PROCESSED / 'untrained_roi_data.csv', index=False)\nprint(\"Data saved to CSV files\")" "source": [
"# Load tracking data via the unified loader (driven by all_video_info_merged.tsv).\n",
"# Reason: replaces the old data/raw + 2025_07_15_metadata_fixed.csv path with\n",
"# the TSV-based loader that covers the entire batch (2025-07-15 + 2024).\n",
"sys.path.insert(0, str(PROJECT_ROOT / 'scripts'))\n",
"from load_roi_data import load_roi_data\n",
"\n",
"data = load_roi_data()\n",
"# Backwards-compat slices for the rest of the notebook.\n",
"trained_data = data[data['male'] == 'trained'].copy()\n",
"untrained_data = data[data['male'] == 'naive'].copy()\n",
"\n",
"print(f\"all data: {data.shape}\")\n",
"print(f\"trained: {trained_data.shape}\")\n",
"print(f\"naive: {untrained_data.shape}\")\n"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",

View file

@ -28,7 +28,22 @@
"execution_count": null, "execution_count": null,
"metadata": {}, "metadata": {},
"outputs": [], "outputs": [],
"source": "# Load the pre-processed data\ntrained_data = pd.read_csv(DATA_PROCESSED / 'trained_roi_data.csv')\nuntrained_data = pd.read_csv(DATA_PROCESSED / 'untrained_roi_data.csv')\n\nprint(f\"Trained data shape: {trained_data.shape}\")\nprint(f\"Untrained data shape: {untrained_data.shape}\")\nprint(f\"Trained data columns: {list(trained_data.columns)}\")\nprint(f\"Untrained data columns: {list(untrained_data.columns)}\")" "source": [
"# Load tracking data via the unified loader (driven by all_video_info_merged.tsv).\n",
"# Reason: replaces reads of trained_roi_data.csv / untrained_roi_data.csv with\n",
"# the live loader so the notebook always sees the current batch.\n",
"sys.path.insert(0, str(PROJECT_ROOT / 'scripts'))\n",
"from load_roi_data import load_roi_data\n",
"\n",
"data = load_roi_data()\n",
"trained_data = data[data['male'] == 'trained'].copy()\n",
"untrained_data = data[data['male'] == 'naive'].copy()\n",
"\n",
"print(f\"all data shape: {data.shape}\")\n",
"print(f\"Trained data: {trained_data.shape}\")\n",
"print(f\"Naive data: {untrained_data.shape}\")\n",
"print(f\"Columns: {list(trained_data.columns)}\")\n"
]
}, },
{ {
"cell_type": "markdown", "cell_type": "markdown",

View file

@ -1,117 +1,99 @@
import pandas as pd """Compute per-frame inter-fly distances for every (date, machine, ROI, session).
Reads tracking data via :func:`load_roi_data.load_roi_data` (which is driven
by ``all_video_info_merged.tsv``) and produces one distances DataFrame
spanning every fly/session in the batch. Group membership (``trained`` /
``untrained``) is preserved from the ``male`` column.
"""
import numpy as np import numpy as np
import pandas as pd
from scipy.spatial.distance import euclidean from scipy.spatial.distance import euclidean
from config import DATA_PROCESSED from config import DATA_PROCESSED
from load_roi_data import load_roi_data
def calculate_fly_distances(trained_file=None, untrained_file=None): def calculate_fly_distances(data: pd.DataFrame | None = None) -> pd.DataFrame:
"""Calculate distances between flies at each time point. """Compute inter-fly distances over time for every fly/session.
For each time point: For each time point inside one (date, machine, ROI, session) trajectory:
- If two flies are detected: calculate Cartesian distance between them - 2+ flies detected: Euclidean distance between the first two by id
- If one fly is detected: set distance to 0 if area > average area, otherwise NaN - 1 fly detected: distance = 0 if its bbox area exceeds the global
mean (likely a single blob containing both flies), else NaN
Args: Args:
trained_file (Path): Path to trained ROI data CSV. data: optional pre-loaded DataFrame from :func:`load_roi_data`. If
untrained_file (Path): Path to untrained ROI data CSV. None, the full batch is loaded.
Returns: Returns:
tuple: (trained_distances, untrained_distances) DataFrames. DataFrame with one row per (track, time) pair, including ``distance``,
``n_flies``, ``area_fly1``, ``area_fly2``, plus the metadata columns
propagated from the source row (``date``, ``machine_name``, ``ROI``,
``session``, ``male``, ``species``, ``memory``, ``age``).
""" """
if trained_file is None: if data is None:
trained_file = DATA_PROCESSED / 'trained_roi_data.csv' data = load_roi_data()
if untrained_file is None: if data.empty:
untrained_file = DATA_PROCESSED / 'untrained_roi_data.csv' return pd.DataFrame()
trained_df = pd.read_csv(trained_file) data = data.copy()
untrained_df = pd.read_csv(untrained_file) data["area"] = data["w"] * data["h"]
avg_area = data["area"].mean()
trained_df['area'] = trained_df['w'] * trained_df['h']
untrained_df['area'] = untrained_df['w'] * untrained_df['h']
avg_area = np.mean([trained_df['area'].mean(), untrained_df['area'].mean()])
print(f"Average area across all data: {avg_area:.2f}") print(f"Average area across all data: {avg_area:.2f}")
trained_distances = process_distance_data(trained_df, avg_area) # Carry these onto every output row (constant within a track).
untrained_distances = process_distance_data(untrained_df, avg_area) keep_meta = ["date", "machine_name", "ROI", "session", "male",
"species", "memory", "age"]
return trained_distances, untrained_distances rows: list[dict] = []
track_keys = ["date", "machine_name", "ROI", "session"]
for track, track_df in data.groupby(track_keys, sort=False):
def process_distance_data(df, avg_area): meta_row = {k: v for k, v in zip(track_keys, track)}
"""Process a DataFrame to calculate distances between flies at each time point. # Carry the rest of the metadata from any sample (constant per track).
sample = track_df.iloc[0]
Args: for col in keep_meta:
df (pd.DataFrame): Input tracking data. if col not in meta_row:
avg_area (float): Average area threshold for single-fly detection. meta_row[col] = sample[col]
Returns:
pd.DataFrame: Distance data with columns for machine, ROI, time, distance.
"""
results = []
for (machine_name, roi), group in df.groupby(['machine_name', 'ROI']):
for t, time_group in group.groupby('t'):
time_group = time_group.sort_values('id').reset_index(drop=True)
for t, time_group in track_df.groupby("t", sort=False):
time_group = time_group.sort_values("id").reset_index(drop=True)
row = dict(meta_row)
row["t"] = t
if len(time_group) >= 2: if len(time_group) >= 2:
fly1 = time_group.iloc[0] f1, f2 = time_group.iloc[0], time_group.iloc[1]
fly2 = time_group.iloc[1] row["distance"] = euclidean([f1["x"], f1["y"]], [f2["x"], f2["y"]])
distance = euclidean([fly1['x'], fly1['y']], [fly2['x'], fly2['y']]) row["n_flies"] = len(time_group)
row["area_fly1"] = f1["area"]
results.append({ row["area_fly2"] = f2["area"]
'machine_name': machine_name,
'ROI': roi,
't': t,
'distance': distance,
'n_flies': len(time_group),
'area_fly1': fly1['area'],
'area_fly2': fly2['area']
})
elif len(time_group) == 1:
fly = time_group.iloc[0]
area = fly['area']
if area > avg_area:
distance = 0.0
else: else:
distance = np.nan f = time_group.iloc[0]
row["distance"] = 0.0 if f["area"] > avg_area else np.nan
row["n_flies"] = 1
row["area_fly1"] = f["area"]
row["area_fly2"] = np.nan
rows.append(row)
results.append({ return pd.DataFrame(rows)
'machine_name': machine_name,
'ROI': roi,
't': t,
'distance': distance,
'n_flies': 1,
'area_fly1': area,
'area_fly2': np.nan
})
return pd.DataFrame(results)
def main(): def main() -> None:
"""Run distance calculations and save results.""" distances = calculate_fly_distances()
trained_distances, untrained_distances = calculate_fly_distances()
print(f"Trained data distance summary:") print("\nDistance summary:")
print(f" Shape: {trained_distances.shape}") print(f" Shape: {distances.shape}")
print(f" Distance stats:") if not distances.empty:
print(f" Count: {trained_distances['distance'].count()}") print(f" Distance count: {distances['distance'].count()}")
print(f" Mean: {trained_distances['distance'].mean():.2f}") print(f" Distance mean: {distances['distance'].mean():.2f}")
print(f" Std: {trained_distances['distance'].std():.2f}") print(f" Distance std: {distances['distance'].std():.2f}")
male = distances["male"]
print(f" Trained tracks: {(male == 'trained').sum()}")
print(f" Naive tracks: {(male == 'naive').sum()}")
print(f"\nUntrained data distance summary:") DATA_PROCESSED.mkdir(parents=True, exist_ok=True)
print(f" Shape: {untrained_distances.shape}") out = DATA_PROCESSED / "distances.csv"
print(f" Distance stats:") distances.to_csv(out, index=False)
print(f" Count: {untrained_distances['distance'].count()}") print(f"\nSaved {out}")
print(f" Mean: {untrained_distances['distance'].mean():.2f}")
print(f" Std: {untrained_distances['distance'].std():.2f}")
trained_distances.to_csv(DATA_PROCESSED / 'trained_distances.csv', index=False)
untrained_distances.to_csv(DATA_PROCESSED / 'untrained_distances.csv', index=False)
print("\nDistance data saved")
if __name__ == "__main__": if __name__ == "__main__":

View file

@ -13,5 +13,8 @@ VIDEOS_ROOT = Path("/mnt/ethoscope_data/videos")
VIDEO_INFO_XLSX = PROJECT_ROOT.parent / "all_video_info_merged.xlsx" VIDEO_INFO_XLSX = PROJECT_ROOT.parent / "all_video_info_merged.xlsx"
INVENTORY_CSV = DATA_METADATA / "video_inventory.csv" INVENTORY_CSV = DATA_METADATA / "video_inventory.csv"
TARGETS_DIR = PROJECT_ROOT / "data" / "targets" TARGETS_DIR = PROJECT_ROOT / "data" / "targets"
TRACKING_OUTPUT_DIR = PROJECT_ROOT / "data" / "tracked" # Reason: tracking DBs are large binary files that don't belong in
# ownCloud-synced storage (sync conflicts + bandwidth). They live on the
# local data volume instead. Regenerable from videos + target JSONs.
TRACKING_OUTPUT_DIR = Path("/mnt/data/projects/cupido/tracked")
LOGS_DIR = PROJECT_ROOT / "data" / "logs" LOGS_DIR = PROJECT_ROOT / "data" / "logs"

View file

@ -0,0 +1,181 @@
"""Augment all_video_info_merged.xlsx with the input video + tracking DB paths.
Each xlsx row represents one fly (date, machine_name, ROI), observed across a
training session and a testing session. We resolve those two sessions to the
on-disk video files (via the inventory CSV) and to their tracking DBs (under
TRACKING_OUTPUT_DIR), then write the result as TSV.
Output columns added:
training_video_path, training_db_path,
testing_video_path, testing_db_path
Empty values mean either no video matched (rare implies missing inventory
entry) or no DB exists yet (e.g. the one video the completeness gate
rejected).
Usage:
python export_video_db_index.py
python export_video_db_index.py --out path/to/output.tsv
"""
from __future__ import annotations
import argparse
import re
from pathlib import Path
import pandas as pd
from config import INVENTORY_CSV, TRACKING_OUTPUT_DIR, VIDEO_INFO_XLSX
_TIME_RE = re.compile(r"^(\d{8})_(\d{1,2})(\d{2})?(AM|PM)$", re.IGNORECASE)
def parse_xlsx_time(value: str) -> tuple[str, int] | None:
"""Convert '20241021_11AM' / '20240918_1030AM' to (YYYY-MM-DD, minutes24).
Resolution is hour-only when no minutes are given (e.g. '11AM' 11:00).
Returns minutes-from-midnight so we can do nearest-neighbor matching.
"""
if not isinstance(value, str):
return None
m = _TIME_RE.match(value.strip())
if not m:
return None
ymd, hh, mm, ampm = m.groups()
date = f"{ymd[:4]}-{ymd[4:6]}-{ymd[6:8]}"
hour = int(hh)
minute = int(mm) if mm else 0
if ampm.upper() == "PM" and hour != 12:
hour += 12
if ampm.upper() == "AM" and hour == 12:
hour = 0
return date, hour * 60 + minute
def build_session_index(inventory: pd.DataFrame) -> dict[tuple[str, str], list[dict]]:
"""Index inventory rows by (date, machine_name) → list of session dicts."""
idx: dict[tuple[str, str], list[dict]] = {}
for row in inventory.itertuples(index=False):
h, m, _s = (int(p) for p in str(row.session_time).split("-"))
key = (row.session_date, row.machine_name)
idx.setdefault(key, []).append({
"mp4_path": row.mp4_path,
"session_datetime": row.session_datetime,
"minutes": h * 60 + m,
})
return idx
def db_path_for_video(mp4_path: str) -> Path | None:
"""Tracker writes <video_stem>_tracking.db under TRACKING_OUTPUT_DIR."""
stem = Path(mp4_path).stem
db = TRACKING_OUTPUT_DIR / f"{stem}_tracking.db"
return db if db.exists() else None
_TIME_TOLERANCE_MIN = 90 # xlsx labels are approximate ("11AM" → 10:51 is fine)
def resolve_session(
machine_name: str,
when: str,
fallback_date: str | None,
index: dict[tuple[str, str], list[dict]],
) -> tuple[str, str]:
"""Look up the video + db whose start time is closest to `when`.
Match strategy:
1. Use the date embedded in `when` (training/testing can fall on a
different calendar day from the row's ``date`` column).
2. If no candidates exist for that date, fall back to ``fallback_date``
(the xlsx row's ``date`` column). Reason: the xlsx contains
date typos like '20240110_11AM' for an Oct 1 experiment.
Among candidates, pick the video whose start minute is closest to the
xlsx-claimed time, within ±_TIME_TOLERANCE_MIN.
"""
parsed = parse_xlsx_time(when)
if parsed is None:
return "", ""
date, target_min = parsed
candidates = index.get((date, machine_name), [])
if not candidates and fallback_date:
candidates = index.get((fallback_date, machine_name), [])
if not candidates:
return "", ""
def _gap(target: int, c: dict) -> int:
# Reason: xlsx times like '1230AM' are ambiguous (12 AM vs 12 PM).
# We try both the literal time AND a +12-hour shift, picking the
# interpretation that brings us closest to a real session.
return min(abs(c["minutes"] - target), abs(c["minutes"] - (target + 720) % 1440))
best = min(candidates, key=lambda c: _gap(target_min, c))
if _gap(target_min, best) > _TIME_TOLERANCE_MIN:
return "", ""
db = db_path_for_video(best["mp4_path"])
return best["mp4_path"], (str(db) if db else "")
# Variants of "naive" the xlsx has accumulated: 'naïve', 'niave', plus
# trailing whitespace. All collapse to a single canonical 'naive'.
_MALE_NAIVE_VARIANTS = {"naïve", "niave", "naive"}
def _normalize_metadata(df: pd.DataFrame) -> None:
"""Strip whitespace and canonicalize the ``male`` column in place."""
for col in df.select_dtypes(include=("object", "string")).columns:
df[col] = df[col].astype(str).str.strip()
df["male"] = df["male"].apply(
lambda v: "naive" if v.lower() in _MALE_NAIVE_VARIANTS else v
)
def main() -> None:
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument(
"--out",
type=Path,
default=VIDEO_INFO_XLSX.with_suffix(".tsv"),
help="output TSV path (default: alongside the xlsx)",
)
args = parser.parse_args()
inv = pd.read_csv(INVENTORY_CSV)
inv = inv[inv["in_xlsx"]].copy()
index = build_session_index(inv)
df = pd.read_excel(VIDEO_INFO_XLSX)
_normalize_metadata(df)
date_iso = pd.to_datetime(df["date"]).dt.strftime("%Y-%m-%d")
train_videos, train_dbs, test_videos, test_dbs = [], [], [], []
for fallback, row in zip(date_iso, df.itertuples(index=False)):
tv, td = resolve_session(row.machine_name, row.training_date_time, fallback, index)
sv, sd = resolve_session(row.machine_name, row.testing_date_time, fallback, index)
train_videos.append(tv)
train_dbs.append(td)
test_videos.append(sv)
test_dbs.append(sd)
df["training_video_path"] = train_videos
df["training_db_path"] = train_dbs
df["testing_video_path"] = test_videos
df["testing_db_path"] = test_dbs
df.to_csv(args.out, sep="\t", index=False)
n_rows = len(df)
n_train_video = sum(bool(v) for v in train_videos)
n_train_db = sum(bool(v) for v in train_dbs)
n_test_video = sum(bool(v) for v in test_videos)
n_test_db = sum(bool(v) for v in test_dbs)
print(f"wrote {args.out} ({n_rows} rows)")
print(f" training: {n_train_video} with video, {n_train_db} with DB")
print(f" testing: {n_test_video} with video, {n_test_db} with DB")
if __name__ == "__main__":
main()

View file

@ -1,90 +1,113 @@
import pandas as pd """Load ROI tracking data from all sessions into one DataFrame.
Drives off the merged TSV (one row per ROI/fly across training + testing
phases). For each TSV row, opens the corresponding tracking DB and pulls
the matching ROI table, then attaches the experimental metadata.
The TSV is the single source of truth for what data exists and how it
maps to flies and conditions.
"""
import sqlite3 import sqlite3
import re from pathlib import Path
from config import DATA_RAW, DATA_METADATA, DATA_PROCESSED import pandas as pd
from config import VIDEO_INFO_XLSX
def load_roi_data(): # Metadata columns to copy onto every tracking sample. These are the xlsx
"""Load ROI data from SQLite databases and group by trained/untrained. # fields that describe the experimental condition behind each fly/ROI.
# Reason: the ROI column is uppercase ("ROI") for backwards compatibility
# with the existing analysis pipeline (calculate_distances.py, notebooks).
_META_COLS = (
"date",
"machine_name",
"species",
"male",
"training_date_time",
"testing_date_time",
"training_length_hr",
"consolidation_length_hr",
"memory",
"age",
)
def _open_ro(db_path: str, cache: dict) -> sqlite3.Connection | None:
"""Cached read-only sqlite connection. Returns None on failure."""
if not isinstance(db_path, str) or not db_path:
return None
if db_path not in cache:
try:
cache[db_path] = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
except sqlite3.Error as e:
print(f"failed to open {Path(db_path).name}: {e}")
cache[db_path] = None
return cache[db_path]
def load_roi_data(meta: pd.DataFrame | None = None) -> pd.DataFrame:
"""Load ROI tracking data joined with experimental metadata.
For each row in ``meta``, reads the matching ROI table from both the
training DB and the testing DB (whichever exist), and stamps every
sample with the row's metadata plus a ``session`` column
(``"training"`` or ``"testing"``). Rows with empty DB paths (unusable
videos, or videos that didn't pass the completeness gate) are skipped.
Args:
meta: optional DataFrame with the same schema as
``all_video_info_merged.tsv``. Pass a filtered slice to load a
subset (e.g. ``meta[meta.species == 'Melanogaster/CS']``).
Defaults to the full TSV.
Returns: Returns:
tuple: (trained_df, untrained_df) DataFrames with tracking data. DataFrame with columns ``id, t, x, y, w, h, phi, is_inferred,
has_interacted, session, <metadata>`` one row per tracking
sample. Empty if nothing could be loaded.
""" """
metadata = pd.read_csv(DATA_METADATA / '2025_07_15_metadata_fixed.csv') if meta is None:
metadata['machine_name'] = metadata['machine_name'].astype(str) meta = pd.read_csv(VIDEO_INFO_XLSX.with_suffix(".tsv"), sep="\t")
trained_rois = metadata[metadata['group'] == 'trained'] db_cache: dict = {}
untrained_rois = metadata[metadata['group'] == 'untrained'] chunks: list[pd.DataFrame] = []
db_files = list(DATA_RAW.glob('*_tracking.db')) for row in meta.itertuples(index=False):
for session in ("training", "testing"):
trained_df = pd.DataFrame() conn = _open_ro(getattr(row, f"{session}_db_path"), db_cache)
untrained_df = pd.DataFrame() if conn is None:
for db_file in db_files:
print(f"Processing {db_file.name}")
pattern = r'_([0-9a-f]{32})__'
match = re.search(pattern, db_file.name)
if not match:
print(f"Could not extract UUID from {db_file.name}")
continue continue
try:
uuid = match.group(1) df = pd.read_sql_query(
metadata_matches = metadata[metadata['path'].str.contains(uuid, na=False)] f"SELECT * FROM ROI_{int(row.roi)}", conn
)
if metadata_matches.empty: except Exception as e:
print(f"No metadata matches found for UUID {uuid} from {db_file.name}") # Reason: a DB may be missing a ROI table if tracking was
# partial — skip rather than abort the whole batch.
print(f" ROI_{row.roi} from {session} DB: {e}")
continue continue
df["session"] = session
df["ROI"] = int(row.roi)
for col in _META_COLS:
df[col] = getattr(row, col)
chunks.append(df)
machine_id = metadata_matches.iloc[0]['machine_name'] for conn in db_cache.values():
print(f"Matched to machine ID: {machine_id}") if conn is not None:
conn = sqlite3.connect(str(db_file))
machine_trained = trained_rois[trained_rois['machine_name'] == machine_id]
machine_untrained = untrained_rois[untrained_rois['machine_name'] == machine_id]
for _, row in machine_trained.iterrows():
roi = row['ROI']
try:
query = f"SELECT * FROM ROI_{roi}"
roi_data = pd.read_sql_query(query, conn)
roi_data['machine_name'] = machine_id
roi_data['ROI'] = roi
roi_data['group'] = 'trained'
trained_df = pd.concat([trained_df, roi_data], ignore_index=True)
except Exception as e:
print(f"Error loading ROI_{roi} from {db_file.name}: {e}")
for _, row in machine_untrained.iterrows():
roi = row['ROI']
try:
query = f"SELECT * FROM ROI_{roi}"
roi_data = pd.read_sql_query(query, conn)
roi_data['machine_name'] = machine_id
roi_data['ROI'] = roi
roi_data['group'] = 'untrained'
untrained_df = pd.concat([untrained_df, roi_data], ignore_index=True)
except Exception as e:
print(f"Error loading ROI_{roi} from {db_file.name}: {e}")
conn.close() conn.close()
return trained_df, untrained_df return pd.concat(chunks, ignore_index=True) if chunks else pd.DataFrame()
if __name__ == "__main__": if __name__ == "__main__":
trained_data, untrained_data = load_roi_data() data = load_roi_data()
print(f"Trained data shape: {trained_data.shape}") print(f"shape: {data.shape}")
print(f"Untrained data shape: {untrained_data.shape}") if not data.empty:
if not trained_data.empty: print(f"columns: {list(data.columns)}")
print("Trained data columns:", trained_data.columns.tolist()) print(f"sessions: {data['session'].value_counts().to_dict()}")
if not untrained_data.empty: print(f"unique machines: {data['machine_name'].nunique()}")
print("Untrained data columns:", untrained_data.columns.tolist()) print(
f"unique flies (date,machine,roi): "
trained_data.to_csv(DATA_PROCESSED / 'trained_roi_data.csv', index=False) f"{data.groupby(['date','machine_name','roi']).ngroups}"
untrained_data.to_csv(DATA_PROCESSED / 'untrained_roi_data.csv', index=False) )
print("Data saved to trained_roi_data.csv and untrained_roi_data.csv")

View file

@ -97,13 +97,32 @@ def snapshot() -> str:
) )
lines.append(f" errors in log: {len(errors)}") lines.append(f" errors in log: {len(errors)}")
# Rate from the last 10 completions, when available. # Rate from completions in the last 6 h — robust to gaps from killed /
if len(history) >= 2: # restarted runs, while wide enough to span multiple parallel-worker
window = history[-min(10, len(history)) :] # completion bursts. Reason: with 8 workers all started together on
span = window[-1] - window[0] # multi-hour videos, completions arrive in tight bursts every ~video-
if span > 0: # length apart; a 30-min window catches one burst and overestimates by
rate_per_hour = (len(window) - 1) / span * 3600 # ~10×. 6 h spans at least one full burst cycle for typical videos.
lines.append(f" rate (last {len(window) - 1}): {rate_per_hour:.1f} videos/hour") now_ts = time.time()
window_secs = 6 * 3600
recent = [t for t in history if t >= now_ts - window_secs]
if len(recent) >= 2:
# Reason: with N parallel workers, completions arrive in clumps
# (all workers finish near-simultaneously). Dividing N by the *burst*
# span gives nonsense rates. Use the full window as the denominator
# once the batch has been running long enough to fill it; otherwise
# use elapsed-since-first-DB. Detection: if every DB on disk also
# falls inside the window, the batch is younger than the window.
if len(recent) == len(history):
elapsed = max(1.0, now_ts - history[0])
else:
elapsed = float(window_secs)
if elapsed > 0:
rate_per_hour = len(recent) / elapsed * 3600
lines.append(
f" rate (last {len(recent)} in {int(window_secs/3600)} h):"
f" {rate_per_hour:.1f} videos/hour"
)
remaining = max(0, pickable - tracked) remaining = max(0, pickable - tracked)
if rate_per_hour > 0 and remaining > 0: if rate_per_hour > 0 and remaining > 0:
eta_sec = remaining * 3600 / rate_per_hour eta_sec = remaining * 3600 / rate_per_hour
@ -112,6 +131,8 @@ def snapshot() -> str:
f" ETA remaining: {fmt_duration(eta_sec)} " f" ETA remaining: {fmt_duration(eta_sec)} "
f"(done by {eta_at:%H:%M %a})" f"(done by {eta_at:%H:%M %a})"
) )
else:
lines.append(" rate: (warming up — check again in a few min)")
if last_mtime is not None and last_name is not None: if last_mtime is not None and last_name is not None:
ago = (datetime.now() - last_mtime).total_seconds() ago = (datetime.now() - last_mtime).total_seconds()

View file

@ -3,7 +3,7 @@
Reads target JSONs produced by `pick_targets.py`, builds the 6 ROIs of the Reads target JSONs produced by `pick_targets.py`, builds the 6 ROIs of the
HD mating arena from the L-shape reference points, runs ethoscope's HD mating arena from the L-shape reference points, runs ethoscope's
`MultiFlyTracker` against the merged.mp4 file via `MovieVirtualCamera`, and `MultiFlyTracker` against the merged.mp4 file via `MovieVirtualCamera`, and
writes a SQLite DB to `data/tracked/<video_basename>_tracking.db`. writes a SQLite DB to `TRACKING_OUTPUT_DIR/<video_basename>_tracking.db`.
Idempotent: skips videos whose tracking DB already exists (unless --redo). Idempotent: skips videos whose tracking DB already exists (unless --redo).
@ -58,17 +58,46 @@ def track_one(json_path: Path, output_dir: Path, max_duration: float | None,
from ethoscope.io.sqlite import SQLiteResultWriter from ethoscope.io.sqlite import SQLiteResultWriter
from ethoscope.trackers.multi_fly_tracker import MultiFlyTracker from ethoscope.trackers.multi_fly_tracker import MultiFlyTracker
class BGRMovieCamera(MovieVirtualCamera): import time as _time
"""MovieVirtualCamera variant that keeps BGR frames.
MultiFlyTracker calls cv2.cvtColor(img, COLOR_BGR2GRAY) without checking class BGRMovieCamera(MovieVirtualCamera):
whether img is already grayscale, so we must feed it 3-channel input. """MovieVirtualCamera that keeps BGR frames AND retries on transient
read failures.
Two reasons for the override:
1. MultiFlyTracker calls cv2.cvtColor(img, COLOR_BGR2GRAY) without
checking whether img is already grayscale, so we must feed it
3-channel input.
2. cv2.VideoCapture.read() can return False on transient I/O hiccups
(NFS contention when 8 workers pull big mp4s in parallel) without
the file actually being at EOF. A naive "False -> StopIteration"
handling makes the tracker silently exit mid-video and write a
short, lying DB. We retry a few times and only treat persistent
failures within the *interior* of the video as real EOF.
""" """
_retry_count = 5
_retry_backoff_s = 0.25
_eof_safety_frames = 50 # near end-of-file, treat False as legitimate
def _next_image(self): def _next_image(self):
for attempt in range(self._retry_count):
ret, frame = self.capture.read() ret, frame = self.capture.read()
if not ret or frame is None: if ret and frame is not None:
return None
return frame # BGR, untouched return frame # BGR, untouched
# If we're near the genuine end of the file, accept it.
if (
self._has_end_of_file
and self._frame_idx >= self._total_n_frames - self._eof_safety_frames
):
return None
# Otherwise, this is a suspected transient hiccup — back off
# and try again. The capture is still open; cv2 will pick up
# the next decoded frame.
_time.sleep(self._retry_backoff_s)
return None # truly persistent failure
payload = json.loads(json_path.read_text()) payload = json.loads(json_path.read_text())
if payload.get("unusable"): if payload.get("unusable"):
@ -146,6 +175,42 @@ def track_one(json_path: Path, output_dir: Path, max_duration: float | None,
if not out_db.exists(): if not out_db.exists():
return "error", "tracking finished but DB was not created" return "error", "tracking finished but DB was not created"
# Post-tracking sanity check: did we cover most of the source video?
# If not (cv2 retry exhausted, codec corruption, etc.), reject the DB so
# it doesn't get cached as "done" — better an explicit failure than a
# silent partial write.
expected_ms = (cam._total_n_frames / 25.0) * 1000.0
if max_duration is not None:
expected_ms = min(expected_ms, max_duration * 1000.0)
completeness_threshold = 0.90 # require ≥ 90 % of expected duration
# Use MAX(t) across all ROIs — a single ROI can run dry early if its fly
# stops moving, so the latest detection anywhere in the arena is the
# better signal of how far the iterator actually got.
import sqlite3 as _sqlite3
try:
_con = _sqlite3.connect(f"file:{out_db}?mode=ro", uri=True)
t_max = 0
for _i in range(1, 7):
_v = _con.execute(f"SELECT MAX(t) FROM ROI_{_i}").fetchone()[0]
if _v and _v > t_max:
t_max = _v
_con.close()
except Exception:
t_max = 0
if expected_ms > 0 and t_max < expected_ms * completeness_threshold:
out_db.unlink()
for sidecar in (str(out_db) + "-wal", str(out_db) + "-shm"):
Path(sidecar).unlink(missing_ok=True)
ratio = t_max / expected_ms if expected_ms else 0
return (
"error",
f"short output: t_max={t_max} ms vs expected {int(expected_ms)} ms "
f"({ratio*100:.0f}%); DB removed",
)
return "ok", str(out_db) return "ok", str(out_db)

View file

@ -115,4 +115,26 @@ all targets are picked, tracking can run in the background.
## Discovered During Work ## Discovered During Work
(Add new items here as they come up during analysis) ### Barrier-opening annotation for the 2024 batch (added 2026-04-30)
The current `flies_analysis*.ipynb` aligns trajectories to a barrier-opening
event sourced from `data/metadata/2025_07_15_barrier_opening.csv`. That file
covers only the 5 machines in the 2025-07-15 experiment. The 2024 batch
(`/mnt/data/projects/cupido/tracked/`, 113 DBs) has no equivalent annotation
yet, so all post-alignment cells silently exclude that data.
- [ ] Build a small picker that lets the user scrub through each tracking
DB / video and mark the barrier-opening frame, writing a row to a new
`data/metadata/barrier_opening_2024.csv` (or extend the existing
file with a date column).
- [ ] Once the 2024 entries exist, update `align_to_opening_time` so it
pulls from a unified `barrier_opening` table keyed by
`(date, machine_name)` rather than `machine_name` alone.
### Metadata vocabulary normalization (done 2026-04-30)
The xlsx had inconsistent labels for control flies (`'naïve'`, `'niave'`,
`'untrained'` plus trailing whitespace). All sources now use a single
canonical `'naive'`. Normalization happens in
`scripts/export_video_db_index.py` so re-running it from the xlsx always
produces a clean TSV. The 2025-07-15 legacy CSV
(`data/metadata/2025_07_15_metadata_fixed.csv`) was edited in place from
`'untrained'``'naive'`.