Unify analysis pipeline around the TSV; move tracked DBs out of cloud sync
- Tracked DBs now live at /mnt/data/projects/cupido/tracked/ (out of
ownCloud to avoid sync conflicts and bandwidth churn). config.py
TRACKING_OUTPUT_DIR points there; the docker-compose for ethoscope-lab
mounts it world-readable for JupyterHub users.
- New scripts/export_video_db_index.py joins all_video_info_merged.xlsx
with the video inventory and the on-disk DBs, producing a TSV that has
one row per fly/ROI plus training/testing video and DB paths. Handles
approximate xlsx times, cross-day training/testing, the 12 AM/PM
ambiguity, and date typos.
- scripts/load_roi_data.py rewritten as a TSV-driven loader returning a
single DataFrame with session and metadata columns. calculate_distances
and the two flies_analysis notebooks migrated to use it; downstream
trained/naive splits remain available via simple equality filters.
- Metadata vocabulary canonicalized: {naïve, niave, untrained, test} all
resolve to {trained, naive}. Normalization happens at the TSV-export
boundary (idempotent); the xlsx and the 2025-07-15 legacy CSV were
edited in place to remove the worst variants.
- scripts/monitor_tracking.py rate calculation fixed: with N parallel
workers, completions arrive in bursts; the old formula divided by burst
width and reported nonsense rates. Now uses a 6 h window denominator.
- scripts/track_videos.py: BGRMovieCamera retries cv2.read on transient
NFS hiccups and a post-tracking completeness gate (≥ 90 % of expected
duration via MAX(t) across all 6 ROIs) deletes silent partial DBs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
e4da7691d5
commit
f60a9d0530
13 changed files with 569 additions and 237 deletions
7
.gitignore
vendored
7
.gitignore
vendored
|
|
@ -2,11 +2,8 @@
|
||||||
data/raw/*.db
|
data/raw/*.db
|
||||||
data/processed/*.csv
|
data/processed/*.csv
|
||||||
|
|
||||||
# Offline-tracking outputs (reproducible from videos + target JSONs)
|
# Offline-tracking outputs (regenerable from videos + target JSONs)
|
||||||
data/tracked/*.db
|
# DBs live outside the repo at /mnt/data/projects/cupido/tracked/
|
||||||
data/tracked/*.db-wal
|
|
||||||
data/tracked/*.db-shm
|
|
||||||
data/tracked/*.db-journal
|
|
||||||
data/targets/*.json
|
data/targets/*.json
|
||||||
data/metadata/video_inventory.csv
|
data/metadata/video_inventory.csv
|
||||||
data/logs/*.log
|
data/logs/*.log
|
||||||
|
|
|
||||||
|
|
@ -66,7 +66,7 @@ python scripts/pick_targets.py --redo # re-pick already-picked videos
|
||||||
|
|
||||||
# 3) batch tracking (idempotent, can run in background)
|
# 3) batch tracking (idempotent, can run in background)
|
||||||
python scripts/track_videos.py --jobs 4 # parallel
|
python scripts/track_videos.py --jobs 4 # parallel
|
||||||
# output → data/tracked/*_tracking.db (SQLite, same schema as data/raw/)
|
# output → /mnt/data/projects/cupido/tracked/*_tracking.db (SQLite, same schema as data/raw/)
|
||||||
```
|
```
|
||||||
|
|
||||||
See `tasks/todo.md` "Offline Tracking" section for the full plan, and
|
See `tasks/todo.md` "Offline Tracking" section for the full plan, and
|
||||||
|
|
|
||||||
|
|
@ -1,37 +1,37 @@
|
||||||
date,HHMMSS,machine_name,ROI,genotype,group,path,filesize_mb
|
date,HHMMSS,machine_name,ROI,genotype,group,path,filesize_mb
|
||||||
15/07/2025,16-03-10,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,4,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,4,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,5,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,5,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,3,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,3,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,1,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,1,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-31-34,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,4,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,4,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,5,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,5,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,3,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,3,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,1,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,1,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-03-27,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,5,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,5,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,3,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,3,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,1,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,1,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-31-41,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,5,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,5,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,3,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,3,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,1,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,1,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-52,139,6,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,6,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,4,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,4,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,2,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,2,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,5,CS,untrained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,5,CS,naive,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,3,CS,untrained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,3,CS,naive,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,1,CS,untrained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,1,CS,naive,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-32-05,268,6,CS,untrained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,6,CS,naive,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,4,CS,untrained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,4,CS,naive,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,2,CS,untrained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,2,CS,naive,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,5,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,5,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,3,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,3,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,1,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,1,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
|
|
|
||||||
|
|
|
@ -1,39 +1,47 @@
|
||||||
# Processed Data
|
# Processed Data
|
||||||
|
|
||||||
Large CSV files generated from the analysis pipeline. All files are gitignored (~370MB total) and can be regenerated.
|
CSVs derived from the tracking DBs (`/mnt/data/projects/cupido/tracked/`)
|
||||||
|
and the merged TSV (`../../all_video_info_merged.tsv`). All files are
|
||||||
|
gitignored and regenerable.
|
||||||
|
|
||||||
## Files and Regeneration
|
## Files and Regeneration
|
||||||
|
|
||||||
| File | Description | Generated By |
|
| File | Description | Generated By |
|
||||||
|------|-------------|--------------|
|
|------|-------------|--------------|
|
||||||
| `trained_roi_data.csv` | Raw tracking data for trained ROIs | `scripts/load_roi_data.py` or notebook step 1 |
|
| `distances.csv` | Per-frame inter-fly distances for every (date, machine, ROI, session). Includes metadata columns to filter trained vs naïve, training phase, species, etc. | `scripts/calculate_distances.py` |
|
||||||
| `untrained_roi_data.csv` | Raw tracking data for untrained ROIs | `scripts/load_roi_data.py` or notebook step 1 |
|
| `*_distances_aligned.csv` | (legacy, 2025-07-15 only) distances aligned to barrier opening | `notebooks/flies_analysis*.ipynb` |
|
||||||
| `trained_distances.csv` | Pairwise distances (unaligned) | `scripts/calculate_distances.py` |
|
| `*_tracked.csv` | (legacy) identity-tracked fly positions | `notebooks/flies_analysis_simple.ipynb` |
|
||||||
| `untrained_distances.csv` | Pairwise distances (unaligned) | `scripts/calculate_distances.py` |
|
| `*_max_velocity.csv` | (legacy) max velocity over 10 s windows | `notebooks/flies_analysis_simple.ipynb` |
|
||||||
| `trained_distances_aligned.csv` | Distances aligned to barrier opening | Notebook step 4 |
|
|
||||||
| `untrained_distances_aligned.csv` | Distances aligned to barrier opening | Notebook step 4 |
|
|
||||||
| `trained_tracked.csv` | Identity-tracked fly positions | Notebook step 7 |
|
|
||||||
| `untrained_tracked.csv` | Identity-tracked fly positions | Notebook step 7 |
|
|
||||||
| `trained_max_velocity.csv` | Max velocity over 10s windows | Notebook step 7 |
|
|
||||||
| `untrained_max_velocity.csv` | Max velocity over 10s windows | Notebook step 7 |
|
|
||||||
|
|
||||||
## To Regenerate All Data
|
## Loading the data
|
||||||
|
|
||||||
Run the full notebook `notebooks/flies_analysis_simple.ipynb` with:
|
|
||||||
```python
|
```python
|
||||||
recalculate_distances = True
|
import sys
|
||||||
recalculate_tracking = True
|
sys.path.insert(0, "../scripts")
|
||||||
|
from load_roi_data import load_roi_data
|
||||||
|
|
||||||
|
data = load_roi_data() # full batch as one DataFrame
|
||||||
|
# Or filter the metadata first:
|
||||||
|
import pandas as pd
|
||||||
|
tsv = pd.read_csv("../../all_video_info_merged.tsv", sep="\t")
|
||||||
|
data = load_roi_data(tsv[tsv.species.str.contains("Melanogaster")])
|
||||||
```
|
```
|
||||||
|
|
||||||
**Warning**: Identity tracking and velocity calculations take significant time (~30+ minutes).
|
The returned DataFrame has columns:
|
||||||
|
`id, t, x, y, w, h, phi, is_inferred, has_interacted, session, ROI, date,
|
||||||
|
machine_name, species, male, training_date_time, testing_date_time,
|
||||||
|
training_length_hr, consolidation_length_hr, memory, age`.
|
||||||
|
|
||||||
## Column Reference
|
`session` is `"training"` or `"testing"`; `male` is `"trained"` or
|
||||||
|
`"naive"` (canonical — variants like `"naïve"` and `"niave"` are normalized
|
||||||
|
at the TSV-export step).
|
||||||
|
|
||||||
### Distance CSVs (`*_distances_aligned.csv`)
|
## Column Reference (`distances.csv`)
|
||||||
- `machine_name`: Ethoscope machine ID (string)
|
|
||||||
- `ROI`: ROI number (1-6)
|
- `date`, `machine_name`, `ROI`, `session`: identifies one fly trajectory
|
||||||
- `aligned_time`: Time in ms relative to barrier opening (0 = opening)
|
- `t`: time in ms within that session
|
||||||
- `distance`: Euclidean distance between flies in pixels
|
- `distance`: Euclidean distance between the two flies in pixels
|
||||||
- `n_flies`: Number of flies detected at this time point
|
- `n_flies`: number of fly detections at this frame (1 or 2)
|
||||||
- `area_fly1`, `area_fly2`: Bounding box areas (w*h) in pixels^2
|
- `area_fly1`, `area_fly2`: bounding-box areas (`w * h`) in pixels²
|
||||||
- `group`: "trained" or "untrained"
|
- `male`: `trained` or `naive` (carried from the xlsx; normalized)
|
||||||
|
- `species`, `memory`, `age`: experimental metadata
|
||||||
|
|
|
||||||
|
|
@ -28,7 +28,22 @@
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": "def load_roi_data():\n \"\"\"Load ROI data from SQLite databases and group by trained/untrained\"\"\"\n metadata = pd.read_csv(DATA_METADATA / '2025_07_15_metadata_fixed.csv')\n metadata['machine_name'] = metadata['machine_name'].astype(str)\n \n trained_rois = metadata[metadata['group'] == 'trained']\n untrained_rois = metadata[metadata['group'] == 'untrained']\n \n db_files = list(DATA_RAW.glob('*_tracking.db'))\n \n trained_df = pd.DataFrame()\n untrained_df = pd.DataFrame()\n \n for db_file in db_files:\n print(f\"Processing {db_file.name}\")\n \n pattern = r'_([0-9a-f]{32})__'\n match = re.search(pattern, db_file.name)\n \n if not match:\n print(f\"Could not extract UUID from {db_file.name}\")\n continue\n \n uuid = match.group(1)\n metadata_matches = metadata[metadata['path'].str.contains(uuid, na=False)]\n \n if metadata_matches.empty:\n print(f\"No metadata matches found for UUID {uuid}\")\n continue\n \n machine_id = metadata_matches.iloc[0]['machine_name']\n print(f\"Matched to machine ID: {machine_id}\")\n \n conn = sqlite3.connect(str(db_file))\n \n machine_trained = trained_rois[trained_rois['machine_name'] == machine_id]\n machine_untrained = untrained_rois[untrained_rois['machine_name'] == machine_id]\n \n for _, row in machine_trained.iterrows():\n roi = row['ROI']\n try:\n roi_data = pd.read_sql_query(f\"SELECT * FROM ROI_{roi}\", conn)\n roi_data['machine_name'] = machine_id\n roi_data['ROI'] = roi\n roi_data['group'] = 'trained'\n trained_df = pd.concat([trained_df, roi_data], ignore_index=True)\n except Exception as e:\n print(f\"Error loading ROI_{roi}: {e}\")\n \n for _, row in machine_untrained.iterrows():\n roi = row['ROI']\n try:\n roi_data = pd.read_sql_query(f\"SELECT * FROM ROI_{roi}\", conn)\n roi_data['machine_name'] = machine_id\n roi_data['ROI'] = roi\n roi_data['group'] = 'untrained'\n untrained_df = pd.concat([untrained_df, roi_data], ignore_index=True)\n except Exception as e:\n print(f\"Error loading ROI_{roi}: {e}\")\n \n conn.close()\n \n return trained_df, untrained_df\n\ntrained_data, untrained_data = load_roi_data()\nprint(f\"Trained data shape: {trained_data.shape}\")\nprint(f\"Untrained data shape: {untrained_data.shape}\")\n\ntrained_data.to_csv(DATA_PROCESSED / 'trained_roi_data.csv', index=False)\nuntrained_data.to_csv(DATA_PROCESSED / 'untrained_roi_data.csv', index=False)\nprint(\"Data saved to CSV files\")"
|
"source": [
|
||||||
|
"# Load tracking data via the unified loader (driven by all_video_info_merged.tsv).\n",
|
||||||
|
"# Reason: replaces the old data/raw + 2025_07_15_metadata_fixed.csv path with\n",
|
||||||
|
"# the TSV-based loader that covers the entire batch (2025-07-15 + 2024).\n",
|
||||||
|
"sys.path.insert(0, str(PROJECT_ROOT / 'scripts'))\n",
|
||||||
|
"from load_roi_data import load_roi_data\n",
|
||||||
|
"\n",
|
||||||
|
"data = load_roi_data()\n",
|
||||||
|
"# Backwards-compat slices for the rest of the notebook.\n",
|
||||||
|
"trained_data = data[data['male'] == 'trained'].copy()\n",
|
||||||
|
"untrained_data = data[data['male'] == 'naive'].copy()\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"all data: {data.shape}\")\n",
|
||||||
|
"print(f\"trained: {trained_data.shape}\")\n",
|
||||||
|
"print(f\"naive: {untrained_data.shape}\")\n"
|
||||||
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
|
|
@ -219,4 +234,4 @@
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 4
|
"nbformat_minor": 4
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -28,7 +28,22 @@
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": "# Load the pre-processed data\ntrained_data = pd.read_csv(DATA_PROCESSED / 'trained_roi_data.csv')\nuntrained_data = pd.read_csv(DATA_PROCESSED / 'untrained_roi_data.csv')\n\nprint(f\"Trained data shape: {trained_data.shape}\")\nprint(f\"Untrained data shape: {untrained_data.shape}\")\nprint(f\"Trained data columns: {list(trained_data.columns)}\")\nprint(f\"Untrained data columns: {list(untrained_data.columns)}\")"
|
"source": [
|
||||||
|
"# Load tracking data via the unified loader (driven by all_video_info_merged.tsv).\n",
|
||||||
|
"# Reason: replaces reads of trained_roi_data.csv / untrained_roi_data.csv with\n",
|
||||||
|
"# the live loader so the notebook always sees the current batch.\n",
|
||||||
|
"sys.path.insert(0, str(PROJECT_ROOT / 'scripts'))\n",
|
||||||
|
"from load_roi_data import load_roi_data\n",
|
||||||
|
"\n",
|
||||||
|
"data = load_roi_data()\n",
|
||||||
|
"trained_data = data[data['male'] == 'trained'].copy()\n",
|
||||||
|
"untrained_data = data[data['male'] == 'naive'].copy()\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"all data shape: {data.shape}\")\n",
|
||||||
|
"print(f\"Trained data: {trained_data.shape}\")\n",
|
||||||
|
"print(f\"Naive data: {untrained_data.shape}\")\n",
|
||||||
|
"print(f\"Columns: {list(trained_data.columns)}\")\n"
|
||||||
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
|
|
@ -418,4 +433,4 @@
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 4
|
"nbformat_minor": 4
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -1,117 +1,99 @@
|
||||||
import pandas as pd
|
"""Compute per-frame inter-fly distances for every (date, machine, ROI, session).
|
||||||
|
|
||||||
|
Reads tracking data via :func:`load_roi_data.load_roi_data` (which is driven
|
||||||
|
by ``all_video_info_merged.tsv``) and produces one distances DataFrame
|
||||||
|
spanning every fly/session in the batch. Group membership (``trained`` /
|
||||||
|
``untrained``) is preserved from the ``male`` column.
|
||||||
|
"""
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
from scipy.spatial.distance import euclidean
|
from scipy.spatial.distance import euclidean
|
||||||
|
|
||||||
from config import DATA_PROCESSED
|
from config import DATA_PROCESSED
|
||||||
|
from load_roi_data import load_roi_data
|
||||||
|
|
||||||
|
|
||||||
def calculate_fly_distances(trained_file=None, untrained_file=None):
|
def calculate_fly_distances(data: pd.DataFrame | None = None) -> pd.DataFrame:
|
||||||
"""Calculate distances between flies at each time point.
|
"""Compute inter-fly distances over time for every fly/session.
|
||||||
|
|
||||||
For each time point:
|
For each time point inside one (date, machine, ROI, session) trajectory:
|
||||||
- If two flies are detected: calculate Cartesian distance between them
|
- 2+ flies detected: Euclidean distance between the first two by id
|
||||||
- If one fly is detected: set distance to 0 if area > average area, otherwise NaN
|
- 1 fly detected: distance = 0 if its bbox area exceeds the global
|
||||||
|
mean (likely a single blob containing both flies), else NaN
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
trained_file (Path): Path to trained ROI data CSV.
|
data: optional pre-loaded DataFrame from :func:`load_roi_data`. If
|
||||||
untrained_file (Path): Path to untrained ROI data CSV.
|
None, the full batch is loaded.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
tuple: (trained_distances, untrained_distances) DataFrames.
|
DataFrame with one row per (track, time) pair, including ``distance``,
|
||||||
|
``n_flies``, ``area_fly1``, ``area_fly2``, plus the metadata columns
|
||||||
|
propagated from the source row (``date``, ``machine_name``, ``ROI``,
|
||||||
|
``session``, ``male``, ``species``, ``memory``, ``age``).
|
||||||
"""
|
"""
|
||||||
if trained_file is None:
|
if data is None:
|
||||||
trained_file = DATA_PROCESSED / 'trained_roi_data.csv'
|
data = load_roi_data()
|
||||||
if untrained_file is None:
|
if data.empty:
|
||||||
untrained_file = DATA_PROCESSED / 'untrained_roi_data.csv'
|
return pd.DataFrame()
|
||||||
|
|
||||||
trained_df = pd.read_csv(trained_file)
|
data = data.copy()
|
||||||
untrained_df = pd.read_csv(untrained_file)
|
data["area"] = data["w"] * data["h"]
|
||||||
|
avg_area = data["area"].mean()
|
||||||
trained_df['area'] = trained_df['w'] * trained_df['h']
|
|
||||||
untrained_df['area'] = untrained_df['w'] * untrained_df['h']
|
|
||||||
|
|
||||||
avg_area = np.mean([trained_df['area'].mean(), untrained_df['area'].mean()])
|
|
||||||
print(f"Average area across all data: {avg_area:.2f}")
|
print(f"Average area across all data: {avg_area:.2f}")
|
||||||
|
|
||||||
trained_distances = process_distance_data(trained_df, avg_area)
|
# Carry these onto every output row (constant within a track).
|
||||||
untrained_distances = process_distance_data(untrained_df, avg_area)
|
keep_meta = ["date", "machine_name", "ROI", "session", "male",
|
||||||
|
"species", "memory", "age"]
|
||||||
|
|
||||||
return trained_distances, untrained_distances
|
rows: list[dict] = []
|
||||||
|
track_keys = ["date", "machine_name", "ROI", "session"]
|
||||||
|
for track, track_df in data.groupby(track_keys, sort=False):
|
||||||
def process_distance_data(df, avg_area):
|
meta_row = {k: v for k, v in zip(track_keys, track)}
|
||||||
"""Process a DataFrame to calculate distances between flies at each time point.
|
# Carry the rest of the metadata from any sample (constant per track).
|
||||||
|
sample = track_df.iloc[0]
|
||||||
Args:
|
for col in keep_meta:
|
||||||
df (pd.DataFrame): Input tracking data.
|
if col not in meta_row:
|
||||||
avg_area (float): Average area threshold for single-fly detection.
|
meta_row[col] = sample[col]
|
||||||
|
|
||||||
Returns:
|
|
||||||
pd.DataFrame: Distance data with columns for machine, ROI, time, distance.
|
|
||||||
"""
|
|
||||||
results = []
|
|
||||||
|
|
||||||
for (machine_name, roi), group in df.groupby(['machine_name', 'ROI']):
|
|
||||||
for t, time_group in group.groupby('t'):
|
|
||||||
time_group = time_group.sort_values('id').reset_index(drop=True)
|
|
||||||
|
|
||||||
|
for t, time_group in track_df.groupby("t", sort=False):
|
||||||
|
time_group = time_group.sort_values("id").reset_index(drop=True)
|
||||||
|
row = dict(meta_row)
|
||||||
|
row["t"] = t
|
||||||
if len(time_group) >= 2:
|
if len(time_group) >= 2:
|
||||||
fly1 = time_group.iloc[0]
|
f1, f2 = time_group.iloc[0], time_group.iloc[1]
|
||||||
fly2 = time_group.iloc[1]
|
row["distance"] = euclidean([f1["x"], f1["y"]], [f2["x"], f2["y"]])
|
||||||
distance = euclidean([fly1['x'], fly1['y']], [fly2['x'], fly2['y']])
|
row["n_flies"] = len(time_group)
|
||||||
|
row["area_fly1"] = f1["area"]
|
||||||
|
row["area_fly2"] = f2["area"]
|
||||||
|
else:
|
||||||
|
f = time_group.iloc[0]
|
||||||
|
row["distance"] = 0.0 if f["area"] > avg_area else np.nan
|
||||||
|
row["n_flies"] = 1
|
||||||
|
row["area_fly1"] = f["area"]
|
||||||
|
row["area_fly2"] = np.nan
|
||||||
|
rows.append(row)
|
||||||
|
|
||||||
results.append({
|
return pd.DataFrame(rows)
|
||||||
'machine_name': machine_name,
|
|
||||||
'ROI': roi,
|
|
||||||
't': t,
|
|
||||||
'distance': distance,
|
|
||||||
'n_flies': len(time_group),
|
|
||||||
'area_fly1': fly1['area'],
|
|
||||||
'area_fly2': fly2['area']
|
|
||||||
})
|
|
||||||
elif len(time_group) == 1:
|
|
||||||
fly = time_group.iloc[0]
|
|
||||||
area = fly['area']
|
|
||||||
|
|
||||||
if area > avg_area:
|
|
||||||
distance = 0.0
|
|
||||||
else:
|
|
||||||
distance = np.nan
|
|
||||||
|
|
||||||
results.append({
|
|
||||||
'machine_name': machine_name,
|
|
||||||
'ROI': roi,
|
|
||||||
't': t,
|
|
||||||
'distance': distance,
|
|
||||||
'n_flies': 1,
|
|
||||||
'area_fly1': area,
|
|
||||||
'area_fly2': np.nan
|
|
||||||
})
|
|
||||||
|
|
||||||
return pd.DataFrame(results)
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main() -> None:
|
||||||
"""Run distance calculations and save results."""
|
distances = calculate_fly_distances()
|
||||||
trained_distances, untrained_distances = calculate_fly_distances()
|
|
||||||
|
|
||||||
print(f"Trained data distance summary:")
|
print("\nDistance summary:")
|
||||||
print(f" Shape: {trained_distances.shape}")
|
print(f" Shape: {distances.shape}")
|
||||||
print(f" Distance stats:")
|
if not distances.empty:
|
||||||
print(f" Count: {trained_distances['distance'].count()}")
|
print(f" Distance count: {distances['distance'].count()}")
|
||||||
print(f" Mean: {trained_distances['distance'].mean():.2f}")
|
print(f" Distance mean: {distances['distance'].mean():.2f}")
|
||||||
print(f" Std: {trained_distances['distance'].std():.2f}")
|
print(f" Distance std: {distances['distance'].std():.2f}")
|
||||||
|
male = distances["male"]
|
||||||
|
print(f" Trained tracks: {(male == 'trained').sum()}")
|
||||||
|
print(f" Naive tracks: {(male == 'naive').sum()}")
|
||||||
|
|
||||||
print(f"\nUntrained data distance summary:")
|
DATA_PROCESSED.mkdir(parents=True, exist_ok=True)
|
||||||
print(f" Shape: {untrained_distances.shape}")
|
out = DATA_PROCESSED / "distances.csv"
|
||||||
print(f" Distance stats:")
|
distances.to_csv(out, index=False)
|
||||||
print(f" Count: {untrained_distances['distance'].count()}")
|
print(f"\nSaved {out}")
|
||||||
print(f" Mean: {untrained_distances['distance'].mean():.2f}")
|
|
||||||
print(f" Std: {untrained_distances['distance'].std():.2f}")
|
|
||||||
|
|
||||||
trained_distances.to_csv(DATA_PROCESSED / 'trained_distances.csv', index=False)
|
|
||||||
untrained_distances.to_csv(DATA_PROCESSED / 'untrained_distances.csv', index=False)
|
|
||||||
print("\nDistance data saved")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|
|
||||||
|
|
@ -13,5 +13,8 @@ VIDEOS_ROOT = Path("/mnt/ethoscope_data/videos")
|
||||||
VIDEO_INFO_XLSX = PROJECT_ROOT.parent / "all_video_info_merged.xlsx"
|
VIDEO_INFO_XLSX = PROJECT_ROOT.parent / "all_video_info_merged.xlsx"
|
||||||
INVENTORY_CSV = DATA_METADATA / "video_inventory.csv"
|
INVENTORY_CSV = DATA_METADATA / "video_inventory.csv"
|
||||||
TARGETS_DIR = PROJECT_ROOT / "data" / "targets"
|
TARGETS_DIR = PROJECT_ROOT / "data" / "targets"
|
||||||
TRACKING_OUTPUT_DIR = PROJECT_ROOT / "data" / "tracked"
|
# Reason: tracking DBs are large binary files that don't belong in
|
||||||
|
# ownCloud-synced storage (sync conflicts + bandwidth). They live on the
|
||||||
|
# local data volume instead. Regenerable from videos + target JSONs.
|
||||||
|
TRACKING_OUTPUT_DIR = Path("/mnt/data/projects/cupido/tracked")
|
||||||
LOGS_DIR = PROJECT_ROOT / "data" / "logs"
|
LOGS_DIR = PROJECT_ROOT / "data" / "logs"
|
||||||
|
|
|
||||||
181
scripts/export_video_db_index.py
Normal file
181
scripts/export_video_db_index.py
Normal file
|
|
@ -0,0 +1,181 @@
|
||||||
|
"""Augment all_video_info_merged.xlsx with the input video + tracking DB paths.
|
||||||
|
|
||||||
|
Each xlsx row represents one fly (date, machine_name, ROI), observed across a
|
||||||
|
training session and a testing session. We resolve those two sessions to the
|
||||||
|
on-disk video files (via the inventory CSV) and to their tracking DBs (under
|
||||||
|
TRACKING_OUTPUT_DIR), then write the result as TSV.
|
||||||
|
|
||||||
|
Output columns added:
|
||||||
|
training_video_path, training_db_path,
|
||||||
|
testing_video_path, testing_db_path
|
||||||
|
|
||||||
|
Empty values mean either no video matched (rare — implies missing inventory
|
||||||
|
entry) or no DB exists yet (e.g. the one video the completeness gate
|
||||||
|
rejected).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python export_video_db_index.py
|
||||||
|
python export_video_db_index.py --out path/to/output.tsv
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import re
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
from config import INVENTORY_CSV, TRACKING_OUTPUT_DIR, VIDEO_INFO_XLSX
|
||||||
|
|
||||||
|
|
||||||
|
_TIME_RE = re.compile(r"^(\d{8})_(\d{1,2})(\d{2})?(AM|PM)$", re.IGNORECASE)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_xlsx_time(value: str) -> tuple[str, int] | None:
|
||||||
|
"""Convert '20241021_11AM' / '20240918_1030AM' to (YYYY-MM-DD, minutes24).
|
||||||
|
|
||||||
|
Resolution is hour-only when no minutes are given (e.g. '11AM' → 11:00).
|
||||||
|
Returns minutes-from-midnight so we can do nearest-neighbor matching.
|
||||||
|
"""
|
||||||
|
if not isinstance(value, str):
|
||||||
|
return None
|
||||||
|
m = _TIME_RE.match(value.strip())
|
||||||
|
if not m:
|
||||||
|
return None
|
||||||
|
ymd, hh, mm, ampm = m.groups()
|
||||||
|
date = f"{ymd[:4]}-{ymd[4:6]}-{ymd[6:8]}"
|
||||||
|
hour = int(hh)
|
||||||
|
minute = int(mm) if mm else 0
|
||||||
|
if ampm.upper() == "PM" and hour != 12:
|
||||||
|
hour += 12
|
||||||
|
if ampm.upper() == "AM" and hour == 12:
|
||||||
|
hour = 0
|
||||||
|
return date, hour * 60 + minute
|
||||||
|
|
||||||
|
|
||||||
|
def build_session_index(inventory: pd.DataFrame) -> dict[tuple[str, str], list[dict]]:
|
||||||
|
"""Index inventory rows by (date, machine_name) → list of session dicts."""
|
||||||
|
idx: dict[tuple[str, str], list[dict]] = {}
|
||||||
|
for row in inventory.itertuples(index=False):
|
||||||
|
h, m, _s = (int(p) for p in str(row.session_time).split("-"))
|
||||||
|
key = (row.session_date, row.machine_name)
|
||||||
|
idx.setdefault(key, []).append({
|
||||||
|
"mp4_path": row.mp4_path,
|
||||||
|
"session_datetime": row.session_datetime,
|
||||||
|
"minutes": h * 60 + m,
|
||||||
|
})
|
||||||
|
return idx
|
||||||
|
|
||||||
|
|
||||||
|
def db_path_for_video(mp4_path: str) -> Path | None:
|
||||||
|
"""Tracker writes <video_stem>_tracking.db under TRACKING_OUTPUT_DIR."""
|
||||||
|
stem = Path(mp4_path).stem
|
||||||
|
db = TRACKING_OUTPUT_DIR / f"{stem}_tracking.db"
|
||||||
|
return db if db.exists() else None
|
||||||
|
|
||||||
|
|
||||||
|
_TIME_TOLERANCE_MIN = 90 # xlsx labels are approximate ("11AM" → 10:51 is fine)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_session(
|
||||||
|
machine_name: str,
|
||||||
|
when: str,
|
||||||
|
fallback_date: str | None,
|
||||||
|
index: dict[tuple[str, str], list[dict]],
|
||||||
|
) -> tuple[str, str]:
|
||||||
|
"""Look up the video + db whose start time is closest to `when`.
|
||||||
|
|
||||||
|
Match strategy:
|
||||||
|
1. Use the date embedded in `when` (training/testing can fall on a
|
||||||
|
different calendar day from the row's ``date`` column).
|
||||||
|
2. If no candidates exist for that date, fall back to ``fallback_date``
|
||||||
|
(the xlsx row's ``date`` column). Reason: the xlsx contains
|
||||||
|
date typos like '20240110_11AM' for an Oct 1 experiment.
|
||||||
|
|
||||||
|
Among candidates, pick the video whose start minute is closest to the
|
||||||
|
xlsx-claimed time, within ±_TIME_TOLERANCE_MIN.
|
||||||
|
"""
|
||||||
|
parsed = parse_xlsx_time(when)
|
||||||
|
if parsed is None:
|
||||||
|
return "", ""
|
||||||
|
date, target_min = parsed
|
||||||
|
candidates = index.get((date, machine_name), [])
|
||||||
|
if not candidates and fallback_date:
|
||||||
|
candidates = index.get((fallback_date, machine_name), [])
|
||||||
|
if not candidates:
|
||||||
|
return "", ""
|
||||||
|
|
||||||
|
def _gap(target: int, c: dict) -> int:
|
||||||
|
# Reason: xlsx times like '1230AM' are ambiguous (12 AM vs 12 PM).
|
||||||
|
# We try both the literal time AND a +12-hour shift, picking the
|
||||||
|
# interpretation that brings us closest to a real session.
|
||||||
|
return min(abs(c["minutes"] - target), abs(c["minutes"] - (target + 720) % 1440))
|
||||||
|
|
||||||
|
best = min(candidates, key=lambda c: _gap(target_min, c))
|
||||||
|
if _gap(target_min, best) > _TIME_TOLERANCE_MIN:
|
||||||
|
return "", ""
|
||||||
|
db = db_path_for_video(best["mp4_path"])
|
||||||
|
return best["mp4_path"], (str(db) if db else "")
|
||||||
|
|
||||||
|
|
||||||
|
# Variants of "naive" the xlsx has accumulated: 'naïve', 'niave', plus
|
||||||
|
# trailing whitespace. All collapse to a single canonical 'naive'.
|
||||||
|
_MALE_NAIVE_VARIANTS = {"naïve", "niave", "naive"}
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_metadata(df: pd.DataFrame) -> None:
|
||||||
|
"""Strip whitespace and canonicalize the ``male`` column in place."""
|
||||||
|
for col in df.select_dtypes(include=("object", "string")).columns:
|
||||||
|
df[col] = df[col].astype(str).str.strip()
|
||||||
|
df["male"] = df["male"].apply(
|
||||||
|
lambda v: "naive" if v.lower() in _MALE_NAIVE_VARIANTS else v
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__)
|
||||||
|
parser.add_argument(
|
||||||
|
"--out",
|
||||||
|
type=Path,
|
||||||
|
default=VIDEO_INFO_XLSX.with_suffix(".tsv"),
|
||||||
|
help="output TSV path (default: alongside the xlsx)",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
inv = pd.read_csv(INVENTORY_CSV)
|
||||||
|
inv = inv[inv["in_xlsx"]].copy()
|
||||||
|
index = build_session_index(inv)
|
||||||
|
|
||||||
|
df = pd.read_excel(VIDEO_INFO_XLSX)
|
||||||
|
_normalize_metadata(df)
|
||||||
|
date_iso = pd.to_datetime(df["date"]).dt.strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
train_videos, train_dbs, test_videos, test_dbs = [], [], [], []
|
||||||
|
for fallback, row in zip(date_iso, df.itertuples(index=False)):
|
||||||
|
tv, td = resolve_session(row.machine_name, row.training_date_time, fallback, index)
|
||||||
|
sv, sd = resolve_session(row.machine_name, row.testing_date_time, fallback, index)
|
||||||
|
train_videos.append(tv)
|
||||||
|
train_dbs.append(td)
|
||||||
|
test_videos.append(sv)
|
||||||
|
test_dbs.append(sd)
|
||||||
|
|
||||||
|
df["training_video_path"] = train_videos
|
||||||
|
df["training_db_path"] = train_dbs
|
||||||
|
df["testing_video_path"] = test_videos
|
||||||
|
df["testing_db_path"] = test_dbs
|
||||||
|
|
||||||
|
df.to_csv(args.out, sep="\t", index=False)
|
||||||
|
|
||||||
|
n_rows = len(df)
|
||||||
|
n_train_video = sum(bool(v) for v in train_videos)
|
||||||
|
n_train_db = sum(bool(v) for v in train_dbs)
|
||||||
|
n_test_video = sum(bool(v) for v in test_videos)
|
||||||
|
n_test_db = sum(bool(v) for v in test_dbs)
|
||||||
|
print(f"wrote {args.out} ({n_rows} rows)")
|
||||||
|
print(f" training: {n_train_video} with video, {n_train_db} with DB")
|
||||||
|
print(f" testing: {n_test_video} with video, {n_test_db} with DB")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -1,90 +1,113 @@
|
||||||
import pandas as pd
|
"""Load ROI tracking data from all sessions into one DataFrame.
|
||||||
|
|
||||||
|
Drives off the merged TSV (one row per ROI/fly across training + testing
|
||||||
|
phases). For each TSV row, opens the corresponding tracking DB and pulls
|
||||||
|
the matching ROI table, then attaches the experimental metadata.
|
||||||
|
|
||||||
|
The TSV is the single source of truth for what data exists and how it
|
||||||
|
maps to flies and conditions.
|
||||||
|
"""
|
||||||
|
|
||||||
import sqlite3
|
import sqlite3
|
||||||
import re
|
from pathlib import Path
|
||||||
|
|
||||||
from config import DATA_RAW, DATA_METADATA, DATA_PROCESSED
|
import pandas as pd
|
||||||
|
|
||||||
|
from config import VIDEO_INFO_XLSX
|
||||||
|
|
||||||
|
|
||||||
def load_roi_data():
|
# Metadata columns to copy onto every tracking sample. These are the xlsx
|
||||||
"""Load ROI data from SQLite databases and group by trained/untrained.
|
# fields that describe the experimental condition behind each fly/ROI.
|
||||||
|
# Reason: the ROI column is uppercase ("ROI") for backwards compatibility
|
||||||
|
# with the existing analysis pipeline (calculate_distances.py, notebooks).
|
||||||
|
_META_COLS = (
|
||||||
|
"date",
|
||||||
|
"machine_name",
|
||||||
|
"species",
|
||||||
|
"male",
|
||||||
|
"training_date_time",
|
||||||
|
"testing_date_time",
|
||||||
|
"training_length_hr",
|
||||||
|
"consolidation_length_hr",
|
||||||
|
"memory",
|
||||||
|
"age",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _open_ro(db_path: str, cache: dict) -> sqlite3.Connection | None:
|
||||||
|
"""Cached read-only sqlite connection. Returns None on failure."""
|
||||||
|
if not isinstance(db_path, str) or not db_path:
|
||||||
|
return None
|
||||||
|
if db_path not in cache:
|
||||||
|
try:
|
||||||
|
cache[db_path] = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
|
||||||
|
except sqlite3.Error as e:
|
||||||
|
print(f"failed to open {Path(db_path).name}: {e}")
|
||||||
|
cache[db_path] = None
|
||||||
|
return cache[db_path]
|
||||||
|
|
||||||
|
|
||||||
|
def load_roi_data(meta: pd.DataFrame | None = None) -> pd.DataFrame:
|
||||||
|
"""Load ROI tracking data joined with experimental metadata.
|
||||||
|
|
||||||
|
For each row in ``meta``, reads the matching ROI table from both the
|
||||||
|
training DB and the testing DB (whichever exist), and stamps every
|
||||||
|
sample with the row's metadata plus a ``session`` column
|
||||||
|
(``"training"`` or ``"testing"``). Rows with empty DB paths (unusable
|
||||||
|
videos, or videos that didn't pass the completeness gate) are skipped.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
meta: optional DataFrame with the same schema as
|
||||||
|
``all_video_info_merged.tsv``. Pass a filtered slice to load a
|
||||||
|
subset (e.g. ``meta[meta.species == 'Melanogaster/CS']``).
|
||||||
|
Defaults to the full TSV.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
tuple: (trained_df, untrained_df) DataFrames with tracking data.
|
DataFrame with columns ``id, t, x, y, w, h, phi, is_inferred,
|
||||||
|
has_interacted, session, <metadata>`` — one row per tracking
|
||||||
|
sample. Empty if nothing could be loaded.
|
||||||
"""
|
"""
|
||||||
metadata = pd.read_csv(DATA_METADATA / '2025_07_15_metadata_fixed.csv')
|
if meta is None:
|
||||||
metadata['machine_name'] = metadata['machine_name'].astype(str)
|
meta = pd.read_csv(VIDEO_INFO_XLSX.with_suffix(".tsv"), sep="\t")
|
||||||
|
|
||||||
trained_rois = metadata[metadata['group'] == 'trained']
|
db_cache: dict = {}
|
||||||
untrained_rois = metadata[metadata['group'] == 'untrained']
|
chunks: list[pd.DataFrame] = []
|
||||||
|
|
||||||
db_files = list(DATA_RAW.glob('*_tracking.db'))
|
for row in meta.itertuples(index=False):
|
||||||
|
for session in ("training", "testing"):
|
||||||
trained_df = pd.DataFrame()
|
conn = _open_ro(getattr(row, f"{session}_db_path"), db_cache)
|
||||||
untrained_df = pd.DataFrame()
|
if conn is None:
|
||||||
|
continue
|
||||||
for db_file in db_files:
|
|
||||||
print(f"Processing {db_file.name}")
|
|
||||||
|
|
||||||
pattern = r'_([0-9a-f]{32})__'
|
|
||||||
match = re.search(pattern, db_file.name)
|
|
||||||
|
|
||||||
if not match:
|
|
||||||
print(f"Could not extract UUID from {db_file.name}")
|
|
||||||
continue
|
|
||||||
|
|
||||||
uuid = match.group(1)
|
|
||||||
metadata_matches = metadata[metadata['path'].str.contains(uuid, na=False)]
|
|
||||||
|
|
||||||
if metadata_matches.empty:
|
|
||||||
print(f"No metadata matches found for UUID {uuid} from {db_file.name}")
|
|
||||||
continue
|
|
||||||
|
|
||||||
machine_id = metadata_matches.iloc[0]['machine_name']
|
|
||||||
print(f"Matched to machine ID: {machine_id}")
|
|
||||||
|
|
||||||
conn = sqlite3.connect(str(db_file))
|
|
||||||
|
|
||||||
machine_trained = trained_rois[trained_rois['machine_name'] == machine_id]
|
|
||||||
machine_untrained = untrained_rois[untrained_rois['machine_name'] == machine_id]
|
|
||||||
|
|
||||||
for _, row in machine_trained.iterrows():
|
|
||||||
roi = row['ROI']
|
|
||||||
try:
|
try:
|
||||||
query = f"SELECT * FROM ROI_{roi}"
|
df = pd.read_sql_query(
|
||||||
roi_data = pd.read_sql_query(query, conn)
|
f"SELECT * FROM ROI_{int(row.roi)}", conn
|
||||||
roi_data['machine_name'] = machine_id
|
)
|
||||||
roi_data['ROI'] = roi
|
|
||||||
roi_data['group'] = 'trained'
|
|
||||||
trained_df = pd.concat([trained_df, roi_data], ignore_index=True)
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"Error loading ROI_{roi} from {db_file.name}: {e}")
|
# Reason: a DB may be missing a ROI table if tracking was
|
||||||
|
# partial — skip rather than abort the whole batch.
|
||||||
|
print(f" ROI_{row.roi} from {session} DB: {e}")
|
||||||
|
continue
|
||||||
|
df["session"] = session
|
||||||
|
df["ROI"] = int(row.roi)
|
||||||
|
for col in _META_COLS:
|
||||||
|
df[col] = getattr(row, col)
|
||||||
|
chunks.append(df)
|
||||||
|
|
||||||
for _, row in machine_untrained.iterrows():
|
for conn in db_cache.values():
|
||||||
roi = row['ROI']
|
if conn is not None:
|
||||||
try:
|
conn.close()
|
||||||
query = f"SELECT * FROM ROI_{roi}"
|
|
||||||
roi_data = pd.read_sql_query(query, conn)
|
|
||||||
roi_data['machine_name'] = machine_id
|
|
||||||
roi_data['ROI'] = roi
|
|
||||||
roi_data['group'] = 'untrained'
|
|
||||||
untrained_df = pd.concat([untrained_df, roi_data], ignore_index=True)
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Error loading ROI_{roi} from {db_file.name}: {e}")
|
|
||||||
|
|
||||||
conn.close()
|
return pd.concat(chunks, ignore_index=True) if chunks else pd.DataFrame()
|
||||||
|
|
||||||
return trained_df, untrained_df
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
trained_data, untrained_data = load_roi_data()
|
data = load_roi_data()
|
||||||
print(f"Trained data shape: {trained_data.shape}")
|
print(f"shape: {data.shape}")
|
||||||
print(f"Untrained data shape: {untrained_data.shape}")
|
if not data.empty:
|
||||||
if not trained_data.empty:
|
print(f"columns: {list(data.columns)}")
|
||||||
print("Trained data columns:", trained_data.columns.tolist())
|
print(f"sessions: {data['session'].value_counts().to_dict()}")
|
||||||
if not untrained_data.empty:
|
print(f"unique machines: {data['machine_name'].nunique()}")
|
||||||
print("Untrained data columns:", untrained_data.columns.tolist())
|
print(
|
||||||
|
f"unique flies (date,machine,roi): "
|
||||||
trained_data.to_csv(DATA_PROCESSED / 'trained_roi_data.csv', index=False)
|
f"{data.groupby(['date','machine_name','roi']).ngroups}"
|
||||||
untrained_data.to_csv(DATA_PROCESSED / 'untrained_roi_data.csv', index=False)
|
)
|
||||||
print("Data saved to trained_roi_data.csv and untrained_roi_data.csv")
|
|
||||||
|
|
|
||||||
|
|
@ -97,13 +97,32 @@ def snapshot() -> str:
|
||||||
)
|
)
|
||||||
lines.append(f" errors in log: {len(errors)}")
|
lines.append(f" errors in log: {len(errors)}")
|
||||||
|
|
||||||
# Rate from the last 10 completions, when available.
|
# Rate from completions in the last 6 h — robust to gaps from killed /
|
||||||
if len(history) >= 2:
|
# restarted runs, while wide enough to span multiple parallel-worker
|
||||||
window = history[-min(10, len(history)) :]
|
# completion bursts. Reason: with 8 workers all started together on
|
||||||
span = window[-1] - window[0]
|
# multi-hour videos, completions arrive in tight bursts every ~video-
|
||||||
if span > 0:
|
# length apart; a 30-min window catches one burst and overestimates by
|
||||||
rate_per_hour = (len(window) - 1) / span * 3600
|
# ~10×. 6 h spans at least one full burst cycle for typical videos.
|
||||||
lines.append(f" rate (last {len(window) - 1}): {rate_per_hour:.1f} videos/hour")
|
now_ts = time.time()
|
||||||
|
window_secs = 6 * 3600
|
||||||
|
recent = [t for t in history if t >= now_ts - window_secs]
|
||||||
|
if len(recent) >= 2:
|
||||||
|
# Reason: with N parallel workers, completions arrive in clumps
|
||||||
|
# (all workers finish near-simultaneously). Dividing N by the *burst*
|
||||||
|
# span gives nonsense rates. Use the full window as the denominator
|
||||||
|
# once the batch has been running long enough to fill it; otherwise
|
||||||
|
# use elapsed-since-first-DB. Detection: if every DB on disk also
|
||||||
|
# falls inside the window, the batch is younger than the window.
|
||||||
|
if len(recent) == len(history):
|
||||||
|
elapsed = max(1.0, now_ts - history[0])
|
||||||
|
else:
|
||||||
|
elapsed = float(window_secs)
|
||||||
|
if elapsed > 0:
|
||||||
|
rate_per_hour = len(recent) / elapsed * 3600
|
||||||
|
lines.append(
|
||||||
|
f" rate (last {len(recent)} in {int(window_secs/3600)} h):"
|
||||||
|
f" {rate_per_hour:.1f} videos/hour"
|
||||||
|
)
|
||||||
remaining = max(0, pickable - tracked)
|
remaining = max(0, pickable - tracked)
|
||||||
if rate_per_hour > 0 and remaining > 0:
|
if rate_per_hour > 0 and remaining > 0:
|
||||||
eta_sec = remaining * 3600 / rate_per_hour
|
eta_sec = remaining * 3600 / rate_per_hour
|
||||||
|
|
@ -112,6 +131,8 @@ def snapshot() -> str:
|
||||||
f" ETA remaining: {fmt_duration(eta_sec)} "
|
f" ETA remaining: {fmt_duration(eta_sec)} "
|
||||||
f"(done by {eta_at:%H:%M %a})"
|
f"(done by {eta_at:%H:%M %a})"
|
||||||
)
|
)
|
||||||
|
else:
|
||||||
|
lines.append(" rate: (warming up — check again in a few min)")
|
||||||
|
|
||||||
if last_mtime is not None and last_name is not None:
|
if last_mtime is not None and last_name is not None:
|
||||||
ago = (datetime.now() - last_mtime).total_seconds()
|
ago = (datetime.now() - last_mtime).total_seconds()
|
||||||
|
|
|
||||||
|
|
@ -3,7 +3,7 @@
|
||||||
Reads target JSONs produced by `pick_targets.py`, builds the 6 ROIs of the
|
Reads target JSONs produced by `pick_targets.py`, builds the 6 ROIs of the
|
||||||
HD mating arena from the L-shape reference points, runs ethoscope's
|
HD mating arena from the L-shape reference points, runs ethoscope's
|
||||||
`MultiFlyTracker` against the merged.mp4 file via `MovieVirtualCamera`, and
|
`MultiFlyTracker` against the merged.mp4 file via `MovieVirtualCamera`, and
|
||||||
writes a SQLite DB to `data/tracked/<video_basename>_tracking.db`.
|
writes a SQLite DB to `TRACKING_OUTPUT_DIR/<video_basename>_tracking.db`.
|
||||||
|
|
||||||
Idempotent: skips videos whose tracking DB already exists (unless --redo).
|
Idempotent: skips videos whose tracking DB already exists (unless --redo).
|
||||||
|
|
||||||
|
|
@ -58,17 +58,46 @@ def track_one(json_path: Path, output_dir: Path, max_duration: float | None,
|
||||||
from ethoscope.io.sqlite import SQLiteResultWriter
|
from ethoscope.io.sqlite import SQLiteResultWriter
|
||||||
from ethoscope.trackers.multi_fly_tracker import MultiFlyTracker
|
from ethoscope.trackers.multi_fly_tracker import MultiFlyTracker
|
||||||
|
|
||||||
class BGRMovieCamera(MovieVirtualCamera):
|
import time as _time
|
||||||
"""MovieVirtualCamera variant that keeps BGR frames.
|
|
||||||
|
|
||||||
MultiFlyTracker calls cv2.cvtColor(img, COLOR_BGR2GRAY) without checking
|
class BGRMovieCamera(MovieVirtualCamera):
|
||||||
whether img is already grayscale, so we must feed it 3-channel input.
|
"""MovieVirtualCamera that keeps BGR frames AND retries on transient
|
||||||
|
read failures.
|
||||||
|
|
||||||
|
Two reasons for the override:
|
||||||
|
|
||||||
|
1. MultiFlyTracker calls cv2.cvtColor(img, COLOR_BGR2GRAY) without
|
||||||
|
checking whether img is already grayscale, so we must feed it
|
||||||
|
3-channel input.
|
||||||
|
|
||||||
|
2. cv2.VideoCapture.read() can return False on transient I/O hiccups
|
||||||
|
(NFS contention when 8 workers pull big mp4s in parallel) without
|
||||||
|
the file actually being at EOF. A naive "False -> StopIteration"
|
||||||
|
handling makes the tracker silently exit mid-video and write a
|
||||||
|
short, lying DB. We retry a few times and only treat persistent
|
||||||
|
failures within the *interior* of the video as real EOF.
|
||||||
"""
|
"""
|
||||||
|
|
||||||
|
_retry_count = 5
|
||||||
|
_retry_backoff_s = 0.25
|
||||||
|
_eof_safety_frames = 50 # near end-of-file, treat False as legitimate
|
||||||
|
|
||||||
def _next_image(self):
|
def _next_image(self):
|
||||||
ret, frame = self.capture.read()
|
for attempt in range(self._retry_count):
|
||||||
if not ret or frame is None:
|
ret, frame = self.capture.read()
|
||||||
return None
|
if ret and frame is not None:
|
||||||
return frame # BGR, untouched
|
return frame # BGR, untouched
|
||||||
|
# If we're near the genuine end of the file, accept it.
|
||||||
|
if (
|
||||||
|
self._has_end_of_file
|
||||||
|
and self._frame_idx >= self._total_n_frames - self._eof_safety_frames
|
||||||
|
):
|
||||||
|
return None
|
||||||
|
# Otherwise, this is a suspected transient hiccup — back off
|
||||||
|
# and try again. The capture is still open; cv2 will pick up
|
||||||
|
# the next decoded frame.
|
||||||
|
_time.sleep(self._retry_backoff_s)
|
||||||
|
return None # truly persistent failure
|
||||||
|
|
||||||
payload = json.loads(json_path.read_text())
|
payload = json.loads(json_path.read_text())
|
||||||
if payload.get("unusable"):
|
if payload.get("unusable"):
|
||||||
|
|
@ -146,6 +175,42 @@ def track_one(json_path: Path, output_dir: Path, max_duration: float | None,
|
||||||
|
|
||||||
if not out_db.exists():
|
if not out_db.exists():
|
||||||
return "error", "tracking finished but DB was not created"
|
return "error", "tracking finished but DB was not created"
|
||||||
|
|
||||||
|
# Post-tracking sanity check: did we cover most of the source video?
|
||||||
|
# If not (cv2 retry exhausted, codec corruption, etc.), reject the DB so
|
||||||
|
# it doesn't get cached as "done" — better an explicit failure than a
|
||||||
|
# silent partial write.
|
||||||
|
expected_ms = (cam._total_n_frames / 25.0) * 1000.0
|
||||||
|
if max_duration is not None:
|
||||||
|
expected_ms = min(expected_ms, max_duration * 1000.0)
|
||||||
|
completeness_threshold = 0.90 # require ≥ 90 % of expected duration
|
||||||
|
|
||||||
|
# Use MAX(t) across all ROIs — a single ROI can run dry early if its fly
|
||||||
|
# stops moving, so the latest detection anywhere in the arena is the
|
||||||
|
# better signal of how far the iterator actually got.
|
||||||
|
import sqlite3 as _sqlite3
|
||||||
|
try:
|
||||||
|
_con = _sqlite3.connect(f"file:{out_db}?mode=ro", uri=True)
|
||||||
|
t_max = 0
|
||||||
|
for _i in range(1, 7):
|
||||||
|
_v = _con.execute(f"SELECT MAX(t) FROM ROI_{_i}").fetchone()[0]
|
||||||
|
if _v and _v > t_max:
|
||||||
|
t_max = _v
|
||||||
|
_con.close()
|
||||||
|
except Exception:
|
||||||
|
t_max = 0
|
||||||
|
|
||||||
|
if expected_ms > 0 and t_max < expected_ms * completeness_threshold:
|
||||||
|
out_db.unlink()
|
||||||
|
for sidecar in (str(out_db) + "-wal", str(out_db) + "-shm"):
|
||||||
|
Path(sidecar).unlink(missing_ok=True)
|
||||||
|
ratio = t_max / expected_ms if expected_ms else 0
|
||||||
|
return (
|
||||||
|
"error",
|
||||||
|
f"short output: t_max={t_max} ms vs expected {int(expected_ms)} ms "
|
||||||
|
f"({ratio*100:.0f}%); DB removed",
|
||||||
|
)
|
||||||
|
|
||||||
return "ok", str(out_db)
|
return "ok", str(out_db)
|
||||||
|
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -115,4 +115,26 @@ all targets are picked, tracking can run in the background.
|
||||||
|
|
||||||
## Discovered During Work
|
## Discovered During Work
|
||||||
|
|
||||||
(Add new items here as they come up during analysis)
|
### Barrier-opening annotation for the 2024 batch (added 2026-04-30)
|
||||||
|
The current `flies_analysis*.ipynb` aligns trajectories to a barrier-opening
|
||||||
|
event sourced from `data/metadata/2025_07_15_barrier_opening.csv`. That file
|
||||||
|
covers only the 5 machines in the 2025-07-15 experiment. The 2024 batch
|
||||||
|
(`/mnt/data/projects/cupido/tracked/`, 113 DBs) has no equivalent annotation
|
||||||
|
yet, so all post-alignment cells silently exclude that data.
|
||||||
|
|
||||||
|
- [ ] Build a small picker that lets the user scrub through each tracking
|
||||||
|
DB / video and mark the barrier-opening frame, writing a row to a new
|
||||||
|
`data/metadata/barrier_opening_2024.csv` (or extend the existing
|
||||||
|
file with a date column).
|
||||||
|
- [ ] Once the 2024 entries exist, update `align_to_opening_time` so it
|
||||||
|
pulls from a unified `barrier_opening` table keyed by
|
||||||
|
`(date, machine_name)` rather than `machine_name` alone.
|
||||||
|
|
||||||
|
### Metadata vocabulary normalization (done 2026-04-30)
|
||||||
|
The xlsx had inconsistent labels for control flies (`'naïve'`, `'niave'`,
|
||||||
|
`'untrained'` plus trailing whitespace). All sources now use a single
|
||||||
|
canonical `'naive'`. Normalization happens in
|
||||||
|
`scripts/export_video_db_index.py` so re-running it from the xlsx always
|
||||||
|
produces a clean TSV. The 2025-07-15 legacy CSV
|
||||||
|
(`data/metadata/2025_07_15_metadata_fixed.csv`) was edited in place from
|
||||||
|
`'untrained'` → `'naive'`.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue