Compare commits
4 commits
e7e4db264d
...
ec56e51bf9
| Author | SHA1 | Date | |
|---|---|---|---|
| ec56e51bf9 | |||
| 7d09523840 | |||
| f60a9d0530 | |||
| e4da7691d5 |
23 changed files with 3450 additions and 214 deletions
5
.gitignore
vendored
5
.gitignore
vendored
|
|
@ -2,6 +2,11 @@
|
||||||
data/raw/*.db
|
data/raw/*.db
|
||||||
data/processed/*.csv
|
data/processed/*.csv
|
||||||
|
|
||||||
|
# Offline-tracking outputs (regenerable from videos + target JSONs)
|
||||||
|
# DBs and target JSONs live outside the repo at /mnt/data/projects/cupido/
|
||||||
|
data/metadata/video_inventory.csv
|
||||||
|
data/logs/*.log
|
||||||
|
|
||||||
# Generated figures (reproducible from scripts)
|
# Generated figures (reproducible from scripts)
|
||||||
figures/*.png
|
figures/*.png
|
||||||
|
|
||||||
|
|
|
||||||
26
README.md
26
README.md
|
|
@ -46,6 +46,32 @@ The key insight: not all "trained" flies may have actually learned. The trained
|
||||||
|
|
||||||
**Read `docs/bimodal_hypothesis.md` for the detailed analysis plan and code sketches.**
|
**Read `docs/bimodal_hypothesis.md` for the detailed analysis plan and code sketches.**
|
||||||
|
|
||||||
|
## Offline Tracking Pipeline (added Apr 2026)
|
||||||
|
|
||||||
|
For tracking new videos that have **no auto-detectable targets**, the pipeline
|
||||||
|
is split in two stages so you can sit at the screen and click for an hour, then
|
||||||
|
let the tracker grind through overnight.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# extra deps (ethoscope src must be at /home/gg/Code/ethoscope_project/...)
|
||||||
|
pip install -r requirements-tracking.txt
|
||||||
|
|
||||||
|
# 1) build the inventory (xlsx ↔ /mnt/ethoscope_data/videos/)
|
||||||
|
python scripts/build_video_inventory.py
|
||||||
|
|
||||||
|
# 2) interactive: click TOP, CORNER, LEFT on each video (one frame per video)
|
||||||
|
python scripts/pick_targets.py # process all not-yet-picked
|
||||||
|
python scripts/pick_targets.py --redo # re-pick already-picked videos
|
||||||
|
# keys: r=reset n=skip f=jump frame q/ESC=quit ENTER=save
|
||||||
|
|
||||||
|
# 3) batch tracking (idempotent, can run in background)
|
||||||
|
python scripts/track_videos.py --jobs 4 # parallel
|
||||||
|
# output → /mnt/data/projects/cupido/tracked/*_tracking.db (SQLite, same schema as data/raw/)
|
||||||
|
```
|
||||||
|
|
||||||
|
See `tasks/todo.md` "Offline Tracking" section for the full plan, and
|
||||||
|
`data/metadata/video_inventory.csv` for the list of videos to process.
|
||||||
|
|
||||||
## Folder Structure
|
## Folder Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
|
||||||
|
|
@ -1,37 +1,37 @@
|
||||||
date,HHMMSS,machine_name,ROI,genotype,group,path,filesize_mb
|
date,HHMMSS,machine_name,ROI,genotype,group,path,filesize_mb
|
||||||
15/07/2025,16-03-10,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,4,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,4,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,5,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,5,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,3,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,3,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-03-10,76,1,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
15/07/2025,16-03-10,76,1,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-03-10/2025-07-15_16-03-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,59.4
|
||||||
15/07/2025,16-31-34,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,6,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,4,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,4,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,2,CS,trained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,5,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,5,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,3,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,3,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-31-34,76,1,CS,untrained,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
15/07/2025,16-31-34,76,1,CS,naive,/mnt/ethoscope_data/videos/076e2825a7274661bd0697c42d6fa4c0/ETHOSCOPE_076/2025-07-15_16-31-34/2025-07-15_16-31-34_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged.mp4,78.98
|
||||||
15/07/2025,16-03-27,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,5,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,5,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,3,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,3,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-03-27,145,1,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
15/07/2025,16-03-27,145,1,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-03-27/2025-07-15_16-03-27_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,78.72
|
||||||
15/07/2025,16-31-41,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,6,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,4,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,2,CS,trained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,5,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,5,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,3,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,3,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-41,145,1,CS,untrained,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
15/07/2025,16-31-41,145,1,CS,naive,/mnt/ethoscope_data/videos/145bb573497a4e15b0690206748a3af6/ETHOSCOPE_145/2025-07-15_16-31-41/2025-07-15_16-31-41_145bb573497a4e15b0690206748a3af6__1920x1088@25fps-28q_merged.mp4,90.9
|
||||||
15/07/2025,16-31-52,139,6,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,6,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,4,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,4,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,2,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,2,CS,trained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,5,CS,untrained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,5,CS,naive,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,3,CS,untrained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,3,CS,naive,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-31-52,139,1,CS,untrained,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
15/07/2025,16-31-52,139,1,CS,naive,/mnt/ethoscope_data/videos/13924be2046d49f4a641cef2a5559852/ETHOSCOPE_139/2025-07-15_16-31-52/2025-07-15_16-31-52_13924be2046d49f4a641cef2a5559852__1920x1088@25fps-28q_merged.mp4,73.4
|
||||||
15/07/2025,16-32-05,268,6,CS,untrained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,6,CS,naive,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,4,CS,untrained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,4,CS,naive,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,2,CS,untrained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,2,CS,naive,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,5,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,5,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,3,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,3,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
15/07/2025,16-32-05,268,1,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
15/07/2025,16-32-05,268,1,CS,trained,/mnt/ethoscope_data/videos/268102f92f51486f995200c29d980477/ETHOSCOPE_268/2025-07-15_16-32-05/2025-07-15_16-32-05_268102f92f51486f995200c29d980477__1920x1088@25fps-28q_merged.mp4,43.72
|
||||||
|
|
|
||||||
|
|
|
@ -1,39 +1,47 @@
|
||||||
# Processed Data
|
# Processed Data
|
||||||
|
|
||||||
Large CSV files generated from the analysis pipeline. All files are gitignored (~370MB total) and can be regenerated.
|
CSVs derived from the tracking DBs (`/mnt/data/projects/cupido/tracked/`)
|
||||||
|
and the merged TSV (`../../all_video_info_merged.tsv`). All files are
|
||||||
|
gitignored and regenerable.
|
||||||
|
|
||||||
## Files and Regeneration
|
## Files and Regeneration
|
||||||
|
|
||||||
| File | Description | Generated By |
|
| File | Description | Generated By |
|
||||||
|------|-------------|--------------|
|
|------|-------------|--------------|
|
||||||
| `trained_roi_data.csv` | Raw tracking data for trained ROIs | `scripts/load_roi_data.py` or notebook step 1 |
|
| `distances.csv` | Per-frame inter-fly distances for every (date, machine, ROI, session). Includes metadata columns to filter trained vs naïve, training phase, species, etc. | `scripts/calculate_distances.py` |
|
||||||
| `untrained_roi_data.csv` | Raw tracking data for untrained ROIs | `scripts/load_roi_data.py` or notebook step 1 |
|
| `*_distances_aligned.csv` | (legacy, 2025-07-15 only) distances aligned to barrier opening | `notebooks/flies_analysis*.ipynb` |
|
||||||
| `trained_distances.csv` | Pairwise distances (unaligned) | `scripts/calculate_distances.py` |
|
| `*_tracked.csv` | (legacy) identity-tracked fly positions | `notebooks/flies_analysis_simple.ipynb` |
|
||||||
| `untrained_distances.csv` | Pairwise distances (unaligned) | `scripts/calculate_distances.py` |
|
| `*_max_velocity.csv` | (legacy) max velocity over 10 s windows | `notebooks/flies_analysis_simple.ipynb` |
|
||||||
| `trained_distances_aligned.csv` | Distances aligned to barrier opening | Notebook step 4 |
|
|
||||||
| `untrained_distances_aligned.csv` | Distances aligned to barrier opening | Notebook step 4 |
|
|
||||||
| `trained_tracked.csv` | Identity-tracked fly positions | Notebook step 7 |
|
|
||||||
| `untrained_tracked.csv` | Identity-tracked fly positions | Notebook step 7 |
|
|
||||||
| `trained_max_velocity.csv` | Max velocity over 10s windows | Notebook step 7 |
|
|
||||||
| `untrained_max_velocity.csv` | Max velocity over 10s windows | Notebook step 7 |
|
|
||||||
|
|
||||||
## To Regenerate All Data
|
## Loading the data
|
||||||
|
|
||||||
Run the full notebook `notebooks/flies_analysis_simple.ipynb` with:
|
|
||||||
```python
|
```python
|
||||||
recalculate_distances = True
|
import sys
|
||||||
recalculate_tracking = True
|
sys.path.insert(0, "../scripts")
|
||||||
|
from load_roi_data import load_roi_data
|
||||||
|
|
||||||
|
data = load_roi_data() # full batch as one DataFrame
|
||||||
|
# Or filter the metadata first:
|
||||||
|
import pandas as pd
|
||||||
|
tsv = pd.read_csv("../../all_video_info_merged.tsv", sep="\t")
|
||||||
|
data = load_roi_data(tsv[tsv.species.str.contains("Melanogaster")])
|
||||||
```
|
```
|
||||||
|
|
||||||
**Warning**: Identity tracking and velocity calculations take significant time (~30+ minutes).
|
The returned DataFrame has columns:
|
||||||
|
`id, t, x, y, w, h, phi, is_inferred, has_interacted, session, ROI, date,
|
||||||
|
machine_name, species, male, training_date_time, testing_date_time,
|
||||||
|
training_length_hr, consolidation_length_hr, memory, age`.
|
||||||
|
|
||||||
## Column Reference
|
`session` is `"training"` or `"testing"`; `male` is `"trained"` or
|
||||||
|
`"naive"` (canonical — variants like `"naïve"` and `"niave"` are normalized
|
||||||
|
at the TSV-export step).
|
||||||
|
|
||||||
### Distance CSVs (`*_distances_aligned.csv`)
|
## Column Reference (`distances.csv`)
|
||||||
- `machine_name`: Ethoscope machine ID (string)
|
|
||||||
- `ROI`: ROI number (1-6)
|
- `date`, `machine_name`, `ROI`, `session`: identifies one fly trajectory
|
||||||
- `aligned_time`: Time in ms relative to barrier opening (0 = opening)
|
- `t`: time in ms within that session
|
||||||
- `distance`: Euclidean distance between flies in pixels
|
- `distance`: Euclidean distance between the two flies in pixels
|
||||||
- `n_flies`: Number of flies detected at this time point
|
- `n_flies`: number of fly detections at this frame (1 or 2)
|
||||||
- `area_fly1`, `area_fly2`: Bounding box areas (w*h) in pixels^2
|
- `area_fly1`, `area_fly2`: bounding-box areas (`w * h`) in pixels²
|
||||||
- `group`: "trained" or "untrained"
|
- `male`: `trained` or `naive` (carried from the xlsx; normalized)
|
||||||
|
- `species`, `memory`, `age`: experimental metadata
|
||||||
|
|
|
||||||
|
|
@ -28,7 +28,22 @@
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": "def load_roi_data():\n \"\"\"Load ROI data from SQLite databases and group by trained/untrained\"\"\"\n metadata = pd.read_csv(DATA_METADATA / '2025_07_15_metadata_fixed.csv')\n metadata['machine_name'] = metadata['machine_name'].astype(str)\n \n trained_rois = metadata[metadata['group'] == 'trained']\n untrained_rois = metadata[metadata['group'] == 'untrained']\n \n db_files = list(DATA_RAW.glob('*_tracking.db'))\n \n trained_df = pd.DataFrame()\n untrained_df = pd.DataFrame()\n \n for db_file in db_files:\n print(f\"Processing {db_file.name}\")\n \n pattern = r'_([0-9a-f]{32})__'\n match = re.search(pattern, db_file.name)\n \n if not match:\n print(f\"Could not extract UUID from {db_file.name}\")\n continue\n \n uuid = match.group(1)\n metadata_matches = metadata[metadata['path'].str.contains(uuid, na=False)]\n \n if metadata_matches.empty:\n print(f\"No metadata matches found for UUID {uuid}\")\n continue\n \n machine_id = metadata_matches.iloc[0]['machine_name']\n print(f\"Matched to machine ID: {machine_id}\")\n \n conn = sqlite3.connect(str(db_file))\n \n machine_trained = trained_rois[trained_rois['machine_name'] == machine_id]\n machine_untrained = untrained_rois[untrained_rois['machine_name'] == machine_id]\n \n for _, row in machine_trained.iterrows():\n roi = row['ROI']\n try:\n roi_data = pd.read_sql_query(f\"SELECT * FROM ROI_{roi}\", conn)\n roi_data['machine_name'] = machine_id\n roi_data['ROI'] = roi\n roi_data['group'] = 'trained'\n trained_df = pd.concat([trained_df, roi_data], ignore_index=True)\n except Exception as e:\n print(f\"Error loading ROI_{roi}: {e}\")\n \n for _, row in machine_untrained.iterrows():\n roi = row['ROI']\n try:\n roi_data = pd.read_sql_query(f\"SELECT * FROM ROI_{roi}\", conn)\n roi_data['machine_name'] = machine_id\n roi_data['ROI'] = roi\n roi_data['group'] = 'untrained'\n untrained_df = pd.concat([untrained_df, roi_data], ignore_index=True)\n except Exception as e:\n print(f\"Error loading ROI_{roi}: {e}\")\n \n conn.close()\n \n return trained_df, untrained_df\n\ntrained_data, untrained_data = load_roi_data()\nprint(f\"Trained data shape: {trained_data.shape}\")\nprint(f\"Untrained data shape: {untrained_data.shape}\")\n\ntrained_data.to_csv(DATA_PROCESSED / 'trained_roi_data.csv', index=False)\nuntrained_data.to_csv(DATA_PROCESSED / 'untrained_roi_data.csv', index=False)\nprint(\"Data saved to CSV files\")"
|
"source": [
|
||||||
|
"# Load tracking data via the unified loader (driven by all_video_info_merged.tsv).\n",
|
||||||
|
"# Reason: replaces the old data/raw + 2025_07_15_metadata_fixed.csv path with\n",
|
||||||
|
"# the TSV-based loader that covers the entire batch (2025-07-15 + 2024).\n",
|
||||||
|
"sys.path.insert(0, str(PROJECT_ROOT / 'scripts'))\n",
|
||||||
|
"from load_roi_data import load_roi_data\n",
|
||||||
|
"\n",
|
||||||
|
"data = load_roi_data()\n",
|
||||||
|
"# Backwards-compat slices for the rest of the notebook.\n",
|
||||||
|
"trained_data = data[data['male'] == 'trained'].copy()\n",
|
||||||
|
"untrained_data = data[data['male'] == 'naive'].copy()\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"all data: {data.shape}\")\n",
|
||||||
|
"print(f\"trained: {trained_data.shape}\")\n",
|
||||||
|
"print(f\"naive: {untrained_data.shape}\")\n"
|
||||||
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
|
|
@ -219,4 +234,4 @@
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 4
|
"nbformat_minor": 4
|
||||||
}
|
}
|
||||||
|
|
|
||||||
|
|
@ -28,7 +28,22 @@
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": "# Load the pre-processed data\ntrained_data = pd.read_csv(DATA_PROCESSED / 'trained_roi_data.csv')\nuntrained_data = pd.read_csv(DATA_PROCESSED / 'untrained_roi_data.csv')\n\nprint(f\"Trained data shape: {trained_data.shape}\")\nprint(f\"Untrained data shape: {untrained_data.shape}\")\nprint(f\"Trained data columns: {list(trained_data.columns)}\")\nprint(f\"Untrained data columns: {list(untrained_data.columns)}\")"
|
"source": [
|
||||||
|
"# Load tracking data via the unified loader (driven by all_video_info_merged.tsv).\n",
|
||||||
|
"# Reason: replaces reads of trained_roi_data.csv / untrained_roi_data.csv with\n",
|
||||||
|
"# the live loader so the notebook always sees the current batch.\n",
|
||||||
|
"sys.path.insert(0, str(PROJECT_ROOT / 'scripts'))\n",
|
||||||
|
"from load_roi_data import load_roi_data\n",
|
||||||
|
"\n",
|
||||||
|
"data = load_roi_data()\n",
|
||||||
|
"trained_data = data[data['male'] == 'trained'].copy()\n",
|
||||||
|
"untrained_data = data[data['male'] == 'naive'].copy()\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"all data shape: {data.shape}\")\n",
|
||||||
|
"print(f\"Trained data: {trained_data.shape}\")\n",
|
||||||
|
"print(f\"Naive data: {untrained_data.shape}\")\n",
|
||||||
|
"print(f\"Columns: {list(trained_data.columns)}\")\n"
|
||||||
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
|
|
@ -418,4 +433,4 @@
|
||||||
},
|
},
|
||||||
"nbformat": 4,
|
"nbformat": 4,
|
||||||
"nbformat_minor": 4
|
"nbformat_minor": 4
|
||||||
}
|
}
|
||||||
|
|
|
||||||
255
notebooks/getting_started/00_welcome.ipynb
Normal file
255
notebooks/getting_started/00_welcome.ipynb
Normal file
|
|
@ -0,0 +1,255 @@
|
||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5,
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# 00 \u00b7 Welcome to the Cupido fly-tracking project\n",
|
||||||
|
"\n",
|
||||||
|
"Hi! You're about to start working on a project that studies how *Drosophila*\n",
|
||||||
|
"(fruit flies) form **memories of mating experiences** \u2014 and whether trained\n",
|
||||||
|
"flies behave differently from na\u00efve ones in their later courtship.\n",
|
||||||
|
"\n",
|
||||||
|
"**You don't need any prior experience with Python or data science to follow\n",
|
||||||
|
"along.** This series of notebooks will walk you through everything, one\n",
|
||||||
|
"small step at a time.\n",
|
||||||
|
"\n",
|
||||||
|
"> **How to read these notebooks**: each notebook is split into \"cells\".\n",
|
||||||
|
"> Some cells are explanations (like this one), others are code that you\n",
|
||||||
|
"> can **run** by clicking on the cell and pressing `Shift + Enter`. Try it\n",
|
||||||
|
"> on the next cell.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This is a code cell. Click on it and press Shift+Enter to run it.\n",
|
||||||
|
"print(\"Hello, fly world!\")\n",
|
||||||
|
"1 + 1\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You should have seen `Hello, fly world!` printed and the number `2`\n",
|
||||||
|
"appear underneath. If something else happened, ask Giorgio \u2014 that's a\n",
|
||||||
|
"sign the environment isn't set up right.\n",
|
||||||
|
"\n",
|
||||||
|
"If this is the very first time you're using JupyterLab, take 10 minutes\n",
|
||||||
|
"to read the [official \"Getting started with JupyterLab\"\n",
|
||||||
|
"guide](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html).\n",
|
||||||
|
"The most important things to know are:\n",
|
||||||
|
"\n",
|
||||||
|
"- A notebook (`.ipynb` file) is a sequence of **cells**.\n",
|
||||||
|
"- Each cell is either **Markdown** (formatted text, like this) or **Code**\n",
|
||||||
|
" (Python that the computer runs).\n",
|
||||||
|
"- The **kernel** is the running Python process behind the notebook. It\n",
|
||||||
|
" remembers everything you've defined. If something gets weird, restart\n",
|
||||||
|
" the kernel: top menu \u2192 *Kernel* \u2192 *Restart Kernel\u2026*.\n",
|
||||||
|
"- `Shift + Enter` runs a cell and moves to the next one.\n",
|
||||||
|
"- `Ctrl + Enter` runs a cell and stays put.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## What is the project about?\n",
|
||||||
|
"\n",
|
||||||
|
"Drosophila males court females with a stereotyped sequence (chasing,\n",
|
||||||
|
"wing-extension, tapping). When a male is rejected by a female (e.g.\n",
|
||||||
|
"because she's already mated), he **learns** to suppress his courtship \u2014\n",
|
||||||
|
"even toward new, receptive females, for a while. This is a textbook\n",
|
||||||
|
"example of *non-associative learning* in invertebrates ([review on\n",
|
||||||
|
"PubMed](https://pubmed.ncbi.nlm.nih.gov/?term=courtship+conditioning+drosophila)).\n",
|
||||||
|
"\n",
|
||||||
|
"The lab is interested in:\n",
|
||||||
|
"\n",
|
||||||
|
"- Does this learning **transfer across species**? (We have ~7 *Drosophila*\n",
|
||||||
|
" species recorded.)\n",
|
||||||
|
"- How long does the memory last? (training_length_hr,\n",
|
||||||
|
" consolidation_length_hr columns in the metadata.)\n",
|
||||||
|
"- Are there **individual differences** \u2014 do some males learn while others\n",
|
||||||
|
" don't? (The \"bimodal hypothesis\" in `docs/bimodal_hypothesis.md`.)\n",
|
||||||
|
"\n",
|
||||||
|
"Your job, broadly, will be to **turn videos of flies into numbers and\n",
|
||||||
|
"plots that answer these questions.**\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## How an experiment works (the bird's-eye view)\n",
|
||||||
|
"\n",
|
||||||
|
"1. **Training**: a male fly is placed with a non-receptive (mated) female.\n",
|
||||||
|
" He courts, gets rejected, eventually gives up.\n",
|
||||||
|
"2. *Wait* for some hours (the \"consolidation\" period \u2014 gives memory time\n",
|
||||||
|
" to form).\n",
|
||||||
|
"3. **Testing**: same male is placed with a fresh receptive female.\n",
|
||||||
|
" Does he court her vigorously, or has he learned to give up easily?\n",
|
||||||
|
"\n",
|
||||||
|
"Each experiment runs in an **HD mating arena** \u2014 a small chamber with\n",
|
||||||
|
"6 sub-arenas (we call them **ROIs**, for \"regions of interest\"). Each ROI\n",
|
||||||
|
"contains one couple (a male and a female). A camera films the whole arena\n",
|
||||||
|
"from above. So one **video** gives us 6 simultaneous experiments.\n",
|
||||||
|
"\n",
|
||||||
|
"The setup uses [Ethoscopes](https://www.ethoscope.com/) \u2014 open-source\n",
|
||||||
|
"behavioural recording boxes built in this lab. Each ethoscope is a\n",
|
||||||
|
"machine; we have 16 in total, named `ETHOSCOPE_067`, `ETHOSCOPE_076`, etc.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## What does the data look like?\n",
|
||||||
|
"\n",
|
||||||
|
"For each video, the **tracker** (a piece of software that runs after the\n",
|
||||||
|
"recording) finds the flies frame-by-frame and writes their positions to a\n",
|
||||||
|
"**SQLite database** (a single file, ending in `.db`). One DB per video.\n",
|
||||||
|
"Inside each DB there are 6 tables called `ROI_1`, `ROI_2`, \u2026, `ROI_6` \u2014\n",
|
||||||
|
"one per sub-arena. Each row of an ROI table is **one fly detection at one\n",
|
||||||
|
"moment in time** with these columns:\n",
|
||||||
|
"\n",
|
||||||
|
"| column | meaning |\n",
|
||||||
|
"|---|---|\n",
|
||||||
|
"| `id` | row number (auto-incremented) |\n",
|
||||||
|
"| `t` | time in **milliseconds** since the video started |\n",
|
||||||
|
"| `x`, `y` | fly position in **pixels** (top-left corner of the image is 0,0) |\n",
|
||||||
|
"| `w`, `h` | width and height of the bounding box around the fly, in pixels |\n",
|
||||||
|
"| `phi` | orientation angle of the fly |\n",
|
||||||
|
"| `is_inferred` | 1 if the position was guessed (not directly seen), 0 otherwise |\n",
|
||||||
|
"| `has_interacted` | (legacy column, mostly unused) |\n",
|
||||||
|
"\n",
|
||||||
|
"If a single ROI has two flies that the tracker can see, you'll get **two\n",
|
||||||
|
"rows with the same `t`** \u2014 one for each fly. If only one fly is detected\n",
|
||||||
|
"(maybe they're on top of each other), you'll get one row.\n",
|
||||||
|
"\n",
|
||||||
|
"That's the heart of the data. Everything else (distances, velocities,\n",
|
||||||
|
"group comparisons) is computed from these (t, x, y) traces.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Where everything lives\n",
|
||||||
|
"\n",
|
||||||
|
"Take a moment to memorize these locations \u2014 you'll come back to them often.\n",
|
||||||
|
"\n",
|
||||||
|
"| what | where |\n",
|
||||||
|
"|---|---|\n",
|
||||||
|
"| Tracking DBs (SQLite, one per video) | `/mnt/data/projects/cupido/tracked/` |\n",
|
||||||
|
"| Target JSONs (the user-clicked reference points) | `/mnt/data/projects/cupido/targets/` |\n",
|
||||||
|
"| Source video files | `/mnt/ethoscope_data/videos/` |\n",
|
||||||
|
"| Project code (this repo) | `/home/gg/ownCloud/Work/Projects/coding/cupido/tracking/` |\n",
|
||||||
|
"| The metadata table (xlsx + TSV) | `/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv` |\n",
|
||||||
|
"| Your notebooks | `notebooks/getting_started/` (this folder) |\n",
|
||||||
|
"\n",
|
||||||
|
"Let's verify a couple of these from inside Python:\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"from pathlib import Path\n",
|
||||||
|
"\n",
|
||||||
|
"tracked = Path(\"/mnt/data/projects/cupido/tracked\")\n",
|
||||||
|
"targets = Path(\"/mnt/data/projects/cupido/targets\")\n",
|
||||||
|
"\n",
|
||||||
|
"n_dbs = len(list(tracked.glob(\"*_tracking.db\")))\n",
|
||||||
|
"n_jsons = len(list(targets.glob(\"*.json\")))\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"Tracking DBs available: {n_dbs}\")\n",
|
||||||
|
"print(f\"Target JSONs available: {n_jsons}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You should see roughly 113 tracking DBs and 130 target JSONs. If those\n",
|
||||||
|
"numbers are zero, the storage volume isn't mounted \u2014 ask Giorgio.\n",
|
||||||
|
"\n",
|
||||||
|
"> **Note**: the tracking DBs are read-only inside the JupyterLab\n",
|
||||||
|
"> container. You can read them but not modify or delete them. That's a\n",
|
||||||
|
"> deliberate safety measure \u2014 we don't want analysis code accidentally\n",
|
||||||
|
"> corrupting the source data.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Glossary (refer back as needed)\n",
|
||||||
|
"\n",
|
||||||
|
"- **ROI** \u2014 *region of interest*. One sub-arena inside the HD mating\n",
|
||||||
|
" arena. There are 6 ROIs per video, numbered 1\u20136.\n",
|
||||||
|
"- **fly** \u2014 one detection in a single (t, ROI) cell. Two flies in the\n",
|
||||||
|
" same ROI at the same time = two rows with the same `t`.\n",
|
||||||
|
"- **trained** \u2014 the male had a training session before testing.\n",
|
||||||
|
"- **naive** \u2014 the male is a control (no training).\n",
|
||||||
|
"- **training session** \u2014 the recording where the male meets the\n",
|
||||||
|
" non-receptive female (he gets rejected).\n",
|
||||||
|
"- **testing session** \u2014 the recording where the male meets a fresh\n",
|
||||||
|
" receptive female (we measure his courtship).\n",
|
||||||
|
"- **t (milliseconds)** \u2014 time within one session, starting at 0.\n",
|
||||||
|
"- **(x, y) pixels** \u2014 fly position in the image. Top-left is (0, 0); x\n",
|
||||||
|
" grows to the right, y grows **downward** (this is the image-coordinate\n",
|
||||||
|
" convention, opposite of math class).\n",
|
||||||
|
"- **machine_name** \u2014 which ethoscope recorded the video, e.g.\n",
|
||||||
|
" `ETHOSCOPE_076`.\n",
|
||||||
|
"- **species** \u2014 `Melanogaster/CS`, `Sechellia`, `Simulans`, `Yakuba`,\n",
|
||||||
|
" `Erecta`, `Willistoni`, or `CS`.\n",
|
||||||
|
"\n",
|
||||||
|
"If you bump into other terms in the code, ask. Don't guess \u2014 biology\n",
|
||||||
|
"codebases pick up jargon over the years.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## What's next\n",
|
||||||
|
"\n",
|
||||||
|
"When you're ready, open these notebooks **in order**:\n",
|
||||||
|
"\n",
|
||||||
|
"1. `01_python_pandas_basics.ipynb` \u2014 just enough Python and pandas to\n",
|
||||||
|
" read and manipulate tabular data.\n",
|
||||||
|
"2. `02_explore_one_database.ipynb` \u2014 open one tracking DB, plot a fly's\n",
|
||||||
|
" trajectory, see what the numbers actually look like.\n",
|
||||||
|
"3. `03_compare_trained_vs_naive.ipynb` \u2014 your first real analysis,\n",
|
||||||
|
" comparing groups of flies.\n",
|
||||||
|
"\n",
|
||||||
|
"After those, the notebooks one level up (`flies_analysis.ipynb`,\n",
|
||||||
|
"`flies_analysis_simple.ipynb`) contain the analysis pipeline that the\n",
|
||||||
|
"previous student built \u2014 those will make sense once you've worked\n",
|
||||||
|
"through the tutorials.\n",
|
||||||
|
"\n",
|
||||||
|
"Don't try to power through all of them in one sitting. Run a few cells,\n",
|
||||||
|
"read the explanation, **change a number** to see what happens, **break\n",
|
||||||
|
"something on purpose** to see the error message. That's how you learn.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
500
notebooks/getting_started/01_python_pandas_basics.ipynb
Normal file
500
notebooks/getting_started/01_python_pandas_basics.ipynb
Normal file
|
|
@ -0,0 +1,500 @@
|
||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5,
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# 01 \u00b7 Python and pandas \u2014 just enough to be dangerous\n",
|
||||||
|
"\n",
|
||||||
|
"This notebook teaches the **minimum** Python and `pandas` you need to read\n",
|
||||||
|
"the rest of the project's code and write your own analyses.\n",
|
||||||
|
"\n",
|
||||||
|
"If you've never programmed before, don't try to memorize the syntax.\n",
|
||||||
|
"Just run each cell, read what it does, and come back when you're stuck on\n",
|
||||||
|
"something specific. The cheat sheet at the end is the only thing worth\n",
|
||||||
|
"keeping handy.\n",
|
||||||
|
"\n",
|
||||||
|
"External resources, in order of how much time they take:\n",
|
||||||
|
"\n",
|
||||||
|
"- \ud83e\udd98 [Python in 10 minutes (very condensed)](https://www.stavros.io/tutorials/python/)\n",
|
||||||
|
"- \ud83d\udc0d [Official Python tutorial \u2014 chapters 3\u20135](https://docs.python.org/3/tutorial/introduction.html)\n",
|
||||||
|
"- \ud83d\udc3c [pandas in 10 minutes (official)](https://pandas.pydata.org/docs/user_guide/10min.html)\n",
|
||||||
|
"- \ud83d\udcda [Python for Data Analysis (the book)](https://wesmckinney.com/book/) \u2014 free online\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 1. Variables\n",
|
||||||
|
"\n",
|
||||||
|
"A variable is a named box you put a value into. The `=` is **assignment**,\n",
|
||||||
|
"not equality. Read it as \"make `name` refer to `value`\".\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"x = 5\n",
|
||||||
|
"y = 3\n",
|
||||||
|
"total = x + y\n",
|
||||||
|
"print(total)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Re-running the cell after changing `x = 5` to `x = 50` gives a different\n",
|
||||||
|
"answer. Try it.\n",
|
||||||
|
"\n",
|
||||||
|
"Variable names: lowercase letters, digits, and underscores. They can't\n",
|
||||||
|
"start with a digit. Convention is `snake_case`: `mean_distance`, not\n",
|
||||||
|
"`meanDistance` or `MeanDistance`.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 2. Strings and numbers\n",
|
||||||
|
"\n",
|
||||||
|
"A **string** is text in quotes. You can join strings with `+`. You can\n",
|
||||||
|
"turn a number into a string with `str()`, and vice-versa with `int()` /\n",
|
||||||
|
"`float()`.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"species = \"Drosophila melanogaster\"\n",
|
||||||
|
"n_flies = 12\n",
|
||||||
|
"message = \"We tracked \" + str(n_flies) + \" \" + species + \" males.\"\n",
|
||||||
|
"print(message)\n",
|
||||||
|
"\n",
|
||||||
|
"# A nicer way to build strings \u2014 f-strings (note the leading 'f'):\n",
|
||||||
|
"print(f\"We tracked {n_flies} {species} males.\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 3. Lists\n",
|
||||||
|
"\n",
|
||||||
|
"A list is an ordered collection of things. Square brackets, items\n",
|
||||||
|
"separated by commas. You can mix types (but usually shouldn't).\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"machines = [\"ETHOSCOPE_076\", \"ETHOSCOPE_082\", \"ETHOSCOPE_086\"]\n",
|
||||||
|
"print(machines[0]) # first item \u2014 Python counts from 0!\n",
|
||||||
|
"print(machines[-1]) # last item\n",
|
||||||
|
"print(len(machines)) # how many items\n",
|
||||||
|
"print(machines + [\"ETHOSCOPE_140\"]) # concatenate (returns a new list)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 4. Dictionaries\n",
|
||||||
|
"\n",
|
||||||
|
"A dictionary maps **keys** to **values**. Curly braces, `key: value`\n",
|
||||||
|
"pairs.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"fly = {\"species\": \"Sechellia\", \"trained\": True, \"age_days\": 5}\n",
|
||||||
|
"print(fly[\"species\"])\n",
|
||||||
|
"print(fly[\"age_days\"])\n",
|
||||||
|
"fly[\"alive\"] = False # add a new key\n",
|
||||||
|
"print(fly)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 5. Conditions: if / elif / else\n",
|
||||||
|
"\n",
|
||||||
|
"Compare with `==` (equal), `!=` (not equal), `<`, `>`, `<=`, `>=`.\n",
|
||||||
|
"Combine with `and`, `or`, `not`.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"distance_px = 42\n",
|
||||||
|
"\n",
|
||||||
|
"if distance_px < 50:\n",
|
||||||
|
" label = \"close\"\n",
|
||||||
|
"elif distance_px < 200:\n",
|
||||||
|
" label = \"medium\"\n",
|
||||||
|
"else:\n",
|
||||||
|
" label = \"far\"\n",
|
||||||
|
"\n",
|
||||||
|
"print(label)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 6. Loops\n",
|
||||||
|
"\n",
|
||||||
|
"`for x in collection:` runs the indented block once per item.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"for m in machines:\n",
|
||||||
|
" print(f\"Looking at machine {m}\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Looping with an index, when you need it:\n",
|
||||||
|
"for i, m in enumerate(machines):\n",
|
||||||
|
" print(f\"{i}: {m}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 7. Functions\n",
|
||||||
|
"\n",
|
||||||
|
"A function is a named, reusable chunk of code. `def` declares it. `return`\n",
|
||||||
|
"sends a value back to whoever called it.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"def fly_age_in_weeks(days):\n",
|
||||||
|
" \"\"\"Return age in weeks given age in days.\"\"\"\n",
|
||||||
|
" return days / 7\n",
|
||||||
|
"\n",
|
||||||
|
"print(fly_age_in_weeks(14)) # 2.0\n",
|
||||||
|
"print(fly_age_in_weeks(5)) # 0.714\u2026\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 8. Importing libraries\n",
|
||||||
|
"\n",
|
||||||
|
"A library is somebody else's code. We use `import` to pull it into our\n",
|
||||||
|
"notebook.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import math\n",
|
||||||
|
"print(math.sqrt(16)) # 4.0\n",
|
||||||
|
"print(math.pi)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 9. Meet pandas\n",
|
||||||
|
"\n",
|
||||||
|
"Real data is rarely a single number \u2014 it's a **table** with rows and\n",
|
||||||
|
"columns (think Excel). `pandas` is the library that handles tables in\n",
|
||||||
|
"Python. The two main objects are:\n",
|
||||||
|
"\n",
|
||||||
|
"- **`Series`** \u2014 a single column with a name.\n",
|
||||||
|
"- **`DataFrame`** \u2014 a whole table.\n",
|
||||||
|
"\n",
|
||||||
|
"By convention we import pandas as `pd`. Always.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"\n",
|
||||||
|
"# Read the project's metadata TSV (Tab-Separated Values).\n",
|
||||||
|
"tsv_path = \"/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv\"\n",
|
||||||
|
"df = pd.read_csv(tsv_path, sep=\"\\t\")\n",
|
||||||
|
"\n",
|
||||||
|
"# How big is it?\n",
|
||||||
|
"print(f\"Rows: {len(df)}\")\n",
|
||||||
|
"print(f\"Columns: {df.shape[1]}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 10. Looking at the table\n",
|
||||||
|
"\n",
|
||||||
|
"`.head()` shows the first 5 rows. `.tail()` the last 5. `.columns` lists\n",
|
||||||
|
"column names. `.dtypes` shows the type of each column.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"df.head(3)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"print(\"Column names:\")\n",
|
||||||
|
"for c in df.columns:\n",
|
||||||
|
" print(f\" {c}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 11. Selecting columns\n",
|
||||||
|
"\n",
|
||||||
|
"Two main ways to get one column: bracket-indexing (`df[\"name\"]`) or\n",
|
||||||
|
"attribute access (`df.name`). The first works for any column name; the\n",
|
||||||
|
"second only works if the name has no spaces or weird characters.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"df[\"species\"].head()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"df.species.value_counts() # how many rows per species\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 12. Selecting multiple columns\n",
|
||||||
|
"\n",
|
||||||
|
"Pass a **list** of names inside the brackets:\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"df[[\"machine_name\", \"roi\", \"species\", \"male\"]].head()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 13. Filtering rows\n",
|
||||||
|
"\n",
|
||||||
|
"The pattern is `df[condition]`. The condition is a Series of `True`/`False`.\n",
|
||||||
|
"Pandas keeps the rows where it's `True`.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"trained = df[df[\"male\"] == \"trained\"]\n",
|
||||||
|
"print(f\"trained rows: {len(trained)}\")\n",
|
||||||
|
"\n",
|
||||||
|
"mel_only = df[df[\"species\"] == \"Melanogaster/CS\"]\n",
|
||||||
|
"print(f\"Melanogaster/CS rows: {len(mel_only)}\")\n",
|
||||||
|
"\n",
|
||||||
|
"# Combine conditions with & (and) | (or) \u2014 and wrap each part in parentheses.\n",
|
||||||
|
"trained_mel = df[(df[\"male\"] == \"trained\") & (df[\"species\"] == \"Melanogaster/CS\")]\n",
|
||||||
|
"print(f\"trained Mel rows: {len(trained_mel)}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 14. Grouping and counting\n",
|
||||||
|
"\n",
|
||||||
|
"`.groupby(\"col\")` followed by an aggregator like `.size()` or `.mean()`\n",
|
||||||
|
"splits the table by the values in that column and computes something per\n",
|
||||||
|
"group.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# How many ROIs per (species, training condition)?\n",
|
||||||
|
"df.groupby([\"species\", \"male\"]).size()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 15. Quick plots\n",
|
||||||
|
"\n",
|
||||||
|
"DataFrames know how to draw themselves. Under the hood it's `matplotlib`.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"\n",
|
||||||
|
"# How many rows per machine?\n",
|
||||||
|
"df[\"machine_name\"].value_counts().plot(kind=\"bar\", figsize=(10, 4))\n",
|
||||||
|
"plt.title(\"Number of fly-rows per ethoscope machine\")\n",
|
||||||
|
"plt.ylabel(\"rows\")\n",
|
||||||
|
"plt.show()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## 16. Exercises\n",
|
||||||
|
"\n",
|
||||||
|
"Don't skip these. They're how you find out what you actually understood.\n",
|
||||||
|
"\n",
|
||||||
|
"1. How many rows does `df` have where `age` equals `'5-7'`?\n",
|
||||||
|
"2. Print the **unique values** of the `memory` column. (Hint: `df[\"memory\"].unique()`)\n",
|
||||||
|
"3. How many distinct `(date, machine_name)` pairs are in the dataset?\n",
|
||||||
|
" (Hint: `df.groupby([\"date\", \"machine_name\"]).size().shape`.)\n",
|
||||||
|
"4. Make a bar plot of `species` counts. Which species has the most rows?\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Try exercise 1 here\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Try exercise 2 here\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Try exercise 3 here\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Try exercise 4 here\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Cheat sheet\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"df = pd.read_csv(\"file.tsv\", sep=\"\\t\") # read\n",
|
||||||
|
"df.head(); df.tail(); df.shape; df.columns # peek\n",
|
||||||
|
"df[\"col\"]; df[[\"a\", \"b\"]] # select\n",
|
||||||
|
"df[df[\"col\"] == \"value\"] # filter\n",
|
||||||
|
"df.groupby(\"col\").size() # count per group\n",
|
||||||
|
"df.groupby(\"col\")[\"x\"].mean() # mean of x per group\n",
|
||||||
|
"df[\"col\"].value_counts() # quick counts\n",
|
||||||
|
"df[\"col\"].unique() # unique values\n",
|
||||||
|
"df[\"new_col\"] = df[\"w\"] * df[\"h\"] # derived column\n",
|
||||||
|
"df.sort_values(\"col\", ascending=False) # sort\n",
|
||||||
|
"df.plot(...) # quick plot\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"Keep this list open when reading other people's code. Most of pandas is\n",
|
||||||
|
"just combinations of these primitives. When you need more, the official\n",
|
||||||
|
"[pandas user guide](https://pandas.pydata.org/docs/user_guide/index.html)\n",
|
||||||
|
"is excellent.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
439
notebooks/getting_started/02_explore_one_database.ipynb
Normal file
439
notebooks/getting_started/02_explore_one_database.ipynb
Normal file
|
|
@ -0,0 +1,439 @@
|
||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5,
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# 02 \u00b7 A first look at one tracking database\n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook we open **one** of the SQLite databases that the tracker\n",
|
||||||
|
"produced and look at what's actually inside. By the end you'll be able to:\n",
|
||||||
|
"\n",
|
||||||
|
"- list the tables in a `.db` file\n",
|
||||||
|
"- read one ROI's tracking trace into a DataFrame\n",
|
||||||
|
"- plot a fly's path through the arena\n",
|
||||||
|
"- count how many flies are visible at each moment\n",
|
||||||
|
"- compute a simple distance between the two flies in a ROI\n",
|
||||||
|
"\n",
|
||||||
|
"If you're curious how SQLite works, the\n",
|
||||||
|
"[SQLite Quickstart](https://www.sqlite.org/quickstart.html) is short and\n",
|
||||||
|
"worth reading. For our purposes, **SQLite is just a file that contains\n",
|
||||||
|
"several tables you can query like a DataFrame**.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n",
|
||||||
|
"\n",
|
||||||
|
"We import the libraries we need. `sqlite3` is part of Python's standard\n",
|
||||||
|
"library \u2014 no install needed.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import sqlite3\n",
|
||||||
|
"from pathlib import Path\n",
|
||||||
|
"\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"import numpy as np\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Find the databases\n",
|
||||||
|
"\n",
|
||||||
|
"The DBs live at `/mnt/data/projects/cupido/tracked/`. Let's list a few.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"tracked_dir = Path(\"/mnt/data/projects/cupido/tracked\")\n",
|
||||||
|
"db_files = sorted(tracked_dir.glob(\"*_tracking.db\"))\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"Found {len(db_files)} tracking DBs.\")\n",
|
||||||
|
"print(\"\\nFirst 5 by name:\")\n",
|
||||||
|
"for db in db_files[:5]:\n",
|
||||||
|
" print(f\" {db.name}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The filename encodes the date, time, machine UUID, video resolution, and\n",
|
||||||
|
"the suffix `_tracking.db`. For example:\n",
|
||||||
|
"\n",
|
||||||
|
"```\n",
|
||||||
|
"2024-09-17_10-32-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged_tracking.db\n",
|
||||||
|
"\u2514\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2518\u2514\u2500\u2500\u252c\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
|
||||||
|
" date time machine UUID video format\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"Pick one to explore. Feel free to change the index.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"db_path = db_files[0]\n",
|
||||||
|
"print(\"Working with:\", db_path.name)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Open the database\n",
|
||||||
|
"\n",
|
||||||
|
"We open it **read-only** as a safety measure. The `?mode=ro` flag is\n",
|
||||||
|
"SQLite's read-only switch.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"conn = sqlite3.connect(f\"file:{db_path}?mode=ro\", uri=True)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## What tables are inside?\n",
|
||||||
|
"\n",
|
||||||
|
"Every SQLite database has a system table called `sqlite_master` that\n",
|
||||||
|
"lists everything. We can query it like any other table.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"tables = pd.read_sql_query(\n",
|
||||||
|
" \"SELECT name FROM sqlite_master WHERE type='table' ORDER BY name\", conn\n",
|
||||||
|
")\n",
|
||||||
|
"tables\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You should see tables like `ROI_1`, `ROI_2`, \u2026, `ROI_6` (one per\n",
|
||||||
|
"sub-arena), plus housekeeping tables like `METADATA`, `ROI_MAP`,\n",
|
||||||
|
"`VAR_MAP`, `START_EVENTS`. We mostly care about the `ROI_*` ones.\n",
|
||||||
|
"\n",
|
||||||
|
"## Read one ROI\n",
|
||||||
|
"\n",
|
||||||
|
"`pd.read_sql_query()` runs an SQL query against the connection and\n",
|
||||||
|
"returns a DataFrame. The query `SELECT * FROM ROI_1` means *\"give me all\n",
|
||||||
|
"columns and all rows from the table called ROI_1\"*.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"roi1 = pd.read_sql_query(\"SELECT * FROM ROI_1\", conn)\n",
|
||||||
|
"print(f\"shape: {roi1.shape}\") # (rows, columns)\n",
|
||||||
|
"roi1.head()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Understanding the columns\n",
|
||||||
|
"\n",
|
||||||
|
"Refer back to notebook `00_welcome` for the full column reference. Quick\n",
|
||||||
|
"recap of the important ones:\n",
|
||||||
|
"\n",
|
||||||
|
"- `t`: time in **milliseconds** since the video started.\n",
|
||||||
|
"- `x`, `y`: fly position in **pixels**. The image origin (0, 0) is the\n",
|
||||||
|
" **top-left** corner. y grows downward.\n",
|
||||||
|
"- `w`, `h`: bounding-box width/height. Their product (`area = w*h`) is a\n",
|
||||||
|
" rough proxy for \"how big does this blob look\" \u2014 useful for spotting\n",
|
||||||
|
" frames where the tracker merged two flies into one big detection.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Quick descriptive stats\n",
|
||||||
|
"roi1[[\"t\", \"x\", \"y\", \"w\", \"h\"]].describe()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The minimum `t` should be 0 (start of the video). The maximum tells you\n",
|
||||||
|
"how long the recording was. Convert ms to minutes by dividing by 60000:\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"duration_min = roi1[\"t\"].max() / 60_000\n",
|
||||||
|
"print(f\"Session length: {duration_min:.1f} minutes\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## How many flies per frame?\n",
|
||||||
|
"\n",
|
||||||
|
"If two flies are visible in this ROI, we get **two rows per `t`**. Let's\n",
|
||||||
|
"check.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"flies_per_frame = roi1.groupby(\"t\").size()\n",
|
||||||
|
"print(flies_per_frame.value_counts().sort_index())\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The output tells you, e.g., \"100,000 frames had 2 flies visible, 30,000\n",
|
||||||
|
"had 1 fly visible\". Frames with 1 fly usually mean the two flies are\n",
|
||||||
|
"overlapping or one is occluded \u2014 that's something we'll handle properly\n",
|
||||||
|
"in the next notebook.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Plot one fly's trajectory\n",
|
||||||
|
"\n",
|
||||||
|
"We'll plot the position over the first 5 minutes (300 000 ms). For\n",
|
||||||
|
"clarity we'll only look at frames where there were 2 flies and pick the\n",
|
||||||
|
"**first** of the two (sorted by `id`) as \"fly 1\" \u2014 this is a rough\n",
|
||||||
|
"heuristic; identity tracking is harder than it sounds.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Filter to the first 5 minutes\n",
|
||||||
|
"sub = roi1[roi1[\"t\"] <= 5 * 60_000]\n",
|
||||||
|
"\n",
|
||||||
|
"# Pick \"fly 1\" by taking the first row at each time point\n",
|
||||||
|
"fly1 = sub.sort_values([\"t\", \"id\"]).drop_duplicates(\"t\", keep=\"first\")\n",
|
||||||
|
"\n",
|
||||||
|
"plt.figure(figsize=(6, 5))\n",
|
||||||
|
"plt.plot(fly1[\"x\"], fly1[\"y\"], color=\"steelblue\", linewidth=0.5, alpha=0.7)\n",
|
||||||
|
"plt.scatter(fly1[\"x\"].iloc[0], fly1[\"y\"].iloc[0], color=\"green\", label=\"start\", zorder=5)\n",
|
||||||
|
"plt.scatter(fly1[\"x\"].iloc[-1], fly1[\"y\"].iloc[-1], color=\"red\", label=\"end\", zorder=5)\n",
|
||||||
|
"plt.gca().invert_yaxis() # because pixel y grows downward\n",
|
||||||
|
"plt.xlabel(\"x (pixels)\")\n",
|
||||||
|
"plt.ylabel(\"y (pixels)\")\n",
|
||||||
|
"plt.title(f\"Fly 1 trajectory \u2014 first 5 min \u2014 {db_path.name[:30]}\u2026\")\n",
|
||||||
|
"plt.legend()\n",
|
||||||
|
"plt.axis(\"equal\")\n",
|
||||||
|
"plt.show()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"You should see a tangle of lines confined to a roughly rectangular ROI.\n",
|
||||||
|
"That tangle is the fly walking around its sub-arena.\n",
|
||||||
|
"\n",
|
||||||
|
"Notice we did `plt.gca().invert_yaxis()` \u2014 that's because in image\n",
|
||||||
|
"coordinates y grows downward, but humans expect plots where y grows\n",
|
||||||
|
"upward. Without it the plot would be vertically flipped.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Plot position over time\n",
|
||||||
|
"\n",
|
||||||
|
"A trajectory plot collapses time into \"shape on a page\". To see *when*\n",
|
||||||
|
"things happen we need time on the x-axis.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"fig, axes = plt.subplots(2, 1, figsize=(12, 5), sharex=True)\n",
|
||||||
|
"\n",
|
||||||
|
"axes[0].plot(fly1[\"t\"] / 1000, fly1[\"x\"], linewidth=0.5)\n",
|
||||||
|
"axes[0].set_ylabel(\"x (px)\")\n",
|
||||||
|
"axes[0].set_title(f\"Fly 1, ROI 1, {db_path.name[:30]}\u2026\")\n",
|
||||||
|
"\n",
|
||||||
|
"axes[1].plot(fly1[\"t\"] / 1000, fly1[\"y\"], linewidth=0.5, color=\"darkorange\")\n",
|
||||||
|
"axes[1].set_ylabel(\"y (px)\")\n",
|
||||||
|
"axes[1].set_xlabel(\"time (s)\")\n",
|
||||||
|
"axes[1].invert_yaxis()\n",
|
||||||
|
"\n",
|
||||||
|
"plt.tight_layout()\n",
|
||||||
|
"plt.show()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Bursts of variation = active fly. Long flat stretches = the fly is sitting\n",
|
||||||
|
"still. You'll come to recognize courtship vs idling by eye after a while.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Distance between the two flies\n",
|
||||||
|
"\n",
|
||||||
|
"Whenever the ROI has 2 detections at the same `t`, we can compute the\n",
|
||||||
|
"Euclidean distance between them: `sqrt((x1-x2)\u00b2 + (y1-y2)\u00b2)`.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"two_fly_frames = roi1.groupby(\"t\").filter(lambda g: len(g) == 2)\n",
|
||||||
|
"two_fly_frames = two_fly_frames.sort_values([\"t\", \"id\"])\n",
|
||||||
|
"\n",
|
||||||
|
"# Pivot so each row is one timepoint with x1, y1, x2, y2\n",
|
||||||
|
"def pair_up(g):\n",
|
||||||
|
" g = g.reset_index(drop=True)\n",
|
||||||
|
" return pd.Series({\n",
|
||||||
|
" \"x1\": g.loc[0, \"x\"], \"y1\": g.loc[0, \"y\"],\n",
|
||||||
|
" \"x2\": g.loc[1, \"x\"], \"y2\": g.loc[1, \"y\"],\n",
|
||||||
|
" })\n",
|
||||||
|
"\n",
|
||||||
|
"paired = two_fly_frames.groupby(\"t\").apply(pair_up).reset_index()\n",
|
||||||
|
"paired[\"distance_px\"] = np.hypot(paired[\"x1\"] - paired[\"x2\"], paired[\"y1\"] - paired[\"y2\"])\n",
|
||||||
|
"paired.head()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"plt.figure(figsize=(12, 4))\n",
|
||||||
|
"plt.plot(paired[\"t\"] / 1000, paired[\"distance_px\"], linewidth=0.4)\n",
|
||||||
|
"plt.xlabel(\"time (s)\")\n",
|
||||||
|
"plt.ylabel(\"inter-fly distance (px)\")\n",
|
||||||
|
"plt.title(\"Distance between the two flies in ROI 1\")\n",
|
||||||
|
"plt.show()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"This is the kind of trace that drives the rest of the analysis: a male\n",
|
||||||
|
"courting a female stays close (small distance); a male giving up wanders\n",
|
||||||
|
"off (large distance). The shape of this curve is the behavioural readout.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Don't forget to close the connection\n",
|
||||||
|
"\n",
|
||||||
|
"If you opened a connection, close it when you're done. (Not strictly\n",
|
||||||
|
"necessary in a notebook \u2014 Python tidies up \u2014 but a good habit.)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"conn.close()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Exercises\n",
|
||||||
|
"\n",
|
||||||
|
"1. Pick a different DB (change `db_files[0]` to `db_files[10]` for example)\n",
|
||||||
|
" and re-run the trajectory plot. Is the arena bigger / smaller? Why\n",
|
||||||
|
" might that be? (Hint: look at the resolution part of the filename.)\n",
|
||||||
|
"2. Plot the distance trace for **ROI 4** instead of ROI 1.\n",
|
||||||
|
"3. Compute the **percentage of frames** in ROI 1 that had only 1 fly visible.\n",
|
||||||
|
"4. The `area = w * h` column is a useful diagnostic. Plot `area` vs `t`\n",
|
||||||
|
" for fly 1 \u2014 when does the bounding box get unusually large?\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Exercise space\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
398
notebooks/getting_started/03_compare_trained_vs_naive.ipynb
Normal file
398
notebooks/getting_started/03_compare_trained_vs_naive.ipynb
Normal file
|
|
@ -0,0 +1,398 @@
|
||||||
|
{
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 5,
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"display_name": "Python 3",
|
||||||
|
"language": "python",
|
||||||
|
"name": "python3"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"# 03 \u00b7 Your first real analysis: trained vs naive\n",
|
||||||
|
"\n",
|
||||||
|
"In notebook 02 we explored a single database. Now we'll work with **all\n",
|
||||||
|
"of them at once**, compute a simple per-fly metric, and ask the central\n",
|
||||||
|
"question of the project:\n",
|
||||||
|
"\n",
|
||||||
|
"> **Do trained males behave differently from na\u00efve males in the testing\n",
|
||||||
|
"> session?**\n",
|
||||||
|
"\n",
|
||||||
|
"By the end you'll have:\n",
|
||||||
|
"\n",
|
||||||
|
"- loaded every (fly, session) trace into one big DataFrame using the\n",
|
||||||
|
" project's helper function;\n",
|
||||||
|
"- reduced each trace to one number per fly (the *median inter-fly\n",
|
||||||
|
" distance*);\n",
|
||||||
|
"- compared the trained group against the na\u00efve group with a histogram\n",
|
||||||
|
" and a non-parametric statistical test;\n",
|
||||||
|
"- learnt enough to start asking your own questions.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Setup\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"import sys\n",
|
||||||
|
"from pathlib import Path\n",
|
||||||
|
"\n",
|
||||||
|
"import numpy as np\n",
|
||||||
|
"import pandas as pd\n",
|
||||||
|
"import matplotlib.pyplot as plt\n",
|
||||||
|
"from scipy import stats\n",
|
||||||
|
"\n",
|
||||||
|
"# Tell Python where to find the project's helper modules.\n",
|
||||||
|
"PROJECT_ROOT = Path(\"..\").resolve().parent # this notebook is in notebooks/getting_started/\n",
|
||||||
|
"sys.path.insert(0, str(PROJECT_ROOT / \"scripts\"))\n",
|
||||||
|
"\n",
|
||||||
|
"from load_roi_data import load_roi_data\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Loading everything at once \u2014 but carefully\n",
|
||||||
|
"\n",
|
||||||
|
"`load_roi_data()` opens every tracking DB referenced by the metadata TSV\n",
|
||||||
|
"and returns one big DataFrame. **It can be slow and memory-hungry**\n",
|
||||||
|
"(the full batch is ~200 million rows). Always start small.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Load the metadata TSV first \u2014 it's small and fast.\n",
|
||||||
|
"tsv_path = \"/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv\"\n",
|
||||||
|
"meta = pd.read_csv(tsv_path, sep=\"\\t\")\n",
|
||||||
|
"print(f\"metadata rows: {len(meta)}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Pre-filter the metadata before passing it to `load_roi_data`. We'll start\n",
|
||||||
|
"with **just one species and just the testing sessions**, because:\n",
|
||||||
|
"\n",
|
||||||
|
"1. mixing species is a confound (different species behave differently);\n",
|
||||||
|
"2. the question is about behaviour after training, so the testing session\n",
|
||||||
|
" is the relevant one;\n",
|
||||||
|
"3. starting small means we can iterate quickly.\n",
|
||||||
|
"\n",
|
||||||
|
"You can come back later and broaden this filter.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Pick one species. 'Melanogaster/CS' has the most rows (127), so a good default.\n",
|
||||||
|
"sub = meta[meta[\"species\"] == \"Melanogaster/CS\"].copy()\n",
|
||||||
|
"\n",
|
||||||
|
"# We're loading every session for these flies, but the loader stamps each\n",
|
||||||
|
"# row with a 'session' column so we can filter to testing afterwards.\n",
|
||||||
|
"print(f\"selected metadata rows: {len(sub)}\")\n",
|
||||||
|
"print(sub[\"male\"].value_counts())\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# This will take a minute or two and use a chunk of RAM. Be patient.\n",
|
||||||
|
"data = load_roi_data(sub)\n",
|
||||||
|
"print(f\"loaded shape: {data.shape}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## What did we get?\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"data.head(3)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# How big is each session, in tracking samples?\n",
|
||||||
|
"data.groupby([\"session\", \"male\"]).size().unstack(fill_value=0)\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Restrict to the testing session\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"testing = data[data[\"session\"] == \"testing\"].copy()\n",
|
||||||
|
"print(f\"testing samples: {len(testing):,}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Reduce each trace to one number\n",
|
||||||
|
"\n",
|
||||||
|
"Right now each fly contributes **tens of thousands** of (t, x, y) rows.\n",
|
||||||
|
"We can't compare distributions of millions of points across two groups\n",
|
||||||
|
"in any meaningful way. So we **collapse each (date, machine_name, ROI)\n",
|
||||||
|
"trace into a single summary number** \u2014 here, the median distance between\n",
|
||||||
|
"the two flies during testing.\n",
|
||||||
|
"\n",
|
||||||
|
"Why median rather than mean? Because tracker glitches (one fly\n",
|
||||||
|
"temporarily lost) can produce huge spikes that the median ignores.\n",
|
||||||
|
"[Why medians beat means in noisy data\n",
|
||||||
|
"(2-min read)](https://en.wikipedia.org/wiki/Median#Robustness).\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Step 1 \u2014 per-frame distance.\n",
|
||||||
|
"# Take only frames with exactly 2 flies (so we have a real distance).\n",
|
||||||
|
"two_fly = testing.groupby([\"date\", \"machine_name\", \"ROI\", \"t\"]).filter(lambda g: len(g) == 2)\n",
|
||||||
|
"\n",
|
||||||
|
"# For each (track, t), compute the distance between the two rows.\n",
|
||||||
|
"def distance_for_frame(g):\n",
|
||||||
|
" g = g.sort_values(\"id\").reset_index(drop=True)\n",
|
||||||
|
" return np.hypot(g.loc[0, \"x\"] - g.loc[1, \"x\"], g.loc[0, \"y\"] - g.loc[1, \"y\"])\n",
|
||||||
|
"\n",
|
||||||
|
"# This is the slow step. With ~3 M frames it takes a while.\n",
|
||||||
|
"per_frame = (\n",
|
||||||
|
" two_fly\n",
|
||||||
|
" .groupby([\"date\", \"machine_name\", \"ROI\", \"t\", \"male\"])\n",
|
||||||
|
" .apply(distance_for_frame)\n",
|
||||||
|
" .reset_index(name=\"distance_px\")\n",
|
||||||
|
")\n",
|
||||||
|
"print(f\"per-frame distance rows: {len(per_frame):,}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"# Step 2 \u2014 one number per (date, machine_name, ROI).\n",
|
||||||
|
"per_fly = (\n",
|
||||||
|
" per_frame\n",
|
||||||
|
" .groupby([\"date\", \"machine_name\", \"ROI\", \"male\"])[\"distance_px\"]\n",
|
||||||
|
" .median()\n",
|
||||||
|
" .reset_index(name=\"median_distance_px\")\n",
|
||||||
|
")\n",
|
||||||
|
"\n",
|
||||||
|
"# Each row now is \"one fly during testing\", with its median distance.\n",
|
||||||
|
"print(per_fly.shape)\n",
|
||||||
|
"per_fly.head()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Sanity check: how many flies per group?\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"per_fly[\"male\"].value_counts()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"If the numbers are very different, your statistical comparison will be\n",
|
||||||
|
"underpowered for one side. Note them down.\n",
|
||||||
|
"\n",
|
||||||
|
"## Plot the distributions\n",
|
||||||
|
"\n",
|
||||||
|
"The first thing to do with two groups is to **look at them**. Don't trust\n",
|
||||||
|
"a p-value before you've seen the histogram.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"fig, ax = plt.subplots(figsize=(10, 5))\n",
|
||||||
|
"\n",
|
||||||
|
"bins = np.linspace(0, per_fly[\"median_distance_px\"].max(), 40)\n",
|
||||||
|
"\n",
|
||||||
|
"for label, color in [(\"trained\", \"steelblue\"), (\"naive\", \"darkorange\")]:\n",
|
||||||
|
" sub = per_fly[per_fly[\"male\"] == label][\"median_distance_px\"]\n",
|
||||||
|
" ax.hist(sub, bins=bins, alpha=0.6, label=f\"{label} (n={len(sub)})\", color=color)\n",
|
||||||
|
"\n",
|
||||||
|
"ax.set_xlabel(\"median inter-fly distance during testing (px)\")\n",
|
||||||
|
"ax.set_ylabel(\"number of flies\")\n",
|
||||||
|
"ax.set_title(\"Trained vs na\u00efve \u2014 Melanogaster/CS \u2014 testing session\")\n",
|
||||||
|
"ax.legend()\n",
|
||||||
|
"plt.show()\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"**What you might see:**\n",
|
||||||
|
"\n",
|
||||||
|
"- If the trained group's distribution is shifted to **higher** distances,\n",
|
||||||
|
" trained males are spending less time near the female (i.e. they\n",
|
||||||
|
" learned to give up).\n",
|
||||||
|
"- If the two distributions look identical, no learning effect was\n",
|
||||||
|
" measurable with this metric \u2014 but that doesn't mean there's no effect,\n",
|
||||||
|
" just that this particular summary didn't capture it.\n",
|
||||||
|
"- A **bimodal** trained distribution (two humps) would mean some males\n",
|
||||||
|
" learned and others didn't \u2014 the \"individual differences\" story in\n",
|
||||||
|
" `docs/bimodal_hypothesis.md`.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## Add a stat test\n",
|
||||||
|
"\n",
|
||||||
|
"A formal comparison. Because group sizes are small and we don't know if\n",
|
||||||
|
"the data are normally distributed, the\n",
|
||||||
|
"[Mann-Whitney U test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test)\n",
|
||||||
|
"is a safer default than the classic t-test.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": null,
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"trained_vals = per_fly[per_fly[\"male\"] == \"trained\"][\"median_distance_px\"]\n",
|
||||||
|
"naive_vals = per_fly[per_fly[\"male\"] == \"naive\"][\"median_distance_px\"]\n",
|
||||||
|
"\n",
|
||||||
|
"stat, pvalue = stats.mannwhitneyu(trained_vals, naive_vals, alternative=\"two-sided\")\n",
|
||||||
|
"\n",
|
||||||
|
"print(f\"trained median: {trained_vals.median():.1f} px (n={len(trained_vals)})\")\n",
|
||||||
|
"print(f\"naive median: {naive_vals.median():.1f} px (n={len(naive_vals)})\")\n",
|
||||||
|
"print(f\"Mann-Whitney U: {stat:.0f} p-value: {pvalue:.4f}\")\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"**How to read this**: the p-value is the probability of seeing a\n",
|
||||||
|
"difference at least this big *if there were really no difference*. By\n",
|
||||||
|
"convention p < 0.05 is \"interesting\", p < 0.01 is \"fairly convincing\".\n",
|
||||||
|
"But never trust a p-value without:\n",
|
||||||
|
"\n",
|
||||||
|
"1. eyeballing the histogram first (you did);\n",
|
||||||
|
"2. reporting the **effect size**, not just the p-value (e.g. the\n",
|
||||||
|
" difference of medians);\n",
|
||||||
|
"3. understanding that p-values\n",
|
||||||
|
" [say nothing about practical importance](https://www.nature.com/articles/d41586-019-00857-9).\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## What's next?\n",
|
||||||
|
"\n",
|
||||||
|
"- **Pick a different metric**: instead of median distance, try fraction\n",
|
||||||
|
" of time the flies were within 50 px (a \"close-proximity\" metric), or\n",
|
||||||
|
" the maximum velocity per fly. (Velocity needs identity tracking, which\n",
|
||||||
|
" is harder \u2014 see `flies_analysis_simple.ipynb` cell 16 for an example.)\n",
|
||||||
|
"- **Look at it per species**: re-run with `species == \"Sechellia\"` and\n",
|
||||||
|
" compare. Does the effect generalize? Where is it strongest?\n",
|
||||||
|
"- **Look at the bimodality**: a kernel density plot\n",
|
||||||
|
" ([seaborn.kdeplot](https://seaborn.pydata.org/generated/seaborn.kdeplot.html))\n",
|
||||||
|
" will show humps better than a histogram.\n",
|
||||||
|
"- **Time inside the session**: maybe the difference only shows up in the\n",
|
||||||
|
" first few minutes (right after the female is introduced). Slice\n",
|
||||||
|
" `per_frame` by `t` before aggregating.\n",
|
||||||
|
"- **Consult `docs/bimodal_hypothesis.md`**: it lays out a formal plan for\n",
|
||||||
|
" testing the \"some flies learn, others don't\" hypothesis.\n",
|
||||||
|
"\n",
|
||||||
|
"When you write your own analysis, **save it as a new notebook** (don't\n",
|
||||||
|
"edit this one). Copy the setup cells, change the question, change the\n",
|
||||||
|
"plot. That's how analysis projects grow.\n"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"## A note on iteration speed\n",
|
||||||
|
"\n",
|
||||||
|
"The pipeline above is correct but **slow** because we apply a Python\n",
|
||||||
|
"function to every (track, t) group. If you find yourself re-running the\n",
|
||||||
|
"same expensive computation a lot, save the intermediate result to disk:\n",
|
||||||
|
"\n",
|
||||||
|
"```python\n",
|
||||||
|
"per_frame.to_parquet(\"per_frame_distance.parquet\")\n",
|
||||||
|
"# next time:\n",
|
||||||
|
"per_frame = pd.read_parquet(\"per_frame_distance.parquet\")\n",
|
||||||
|
"```\n",
|
||||||
|
"\n",
|
||||||
|
"`parquet` is a fast columnar format. `pip install pyarrow` if your\n",
|
||||||
|
"environment doesn't have it.\n",
|
||||||
|
"\n",
|
||||||
|
"There are also vectorized ways to compute these distances ~100\u00d7 faster\n",
|
||||||
|
"that avoid `groupby().apply()`. Don't worry about that yet \u2014 get a\n",
|
||||||
|
"correct answer first, optimize only if you find yourself waiting.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
15
notebooks/getting_started/README.md
Normal file
15
notebooks/getting_started/README.md
Normal file
|
|
@ -0,0 +1,15 @@
|
||||||
|
# Tutorial notebooks
|
||||||
|
|
||||||
|
Read these in order:
|
||||||
|
|
||||||
|
1. **`00_welcome.ipynb`** — what's the project, where the data lives,
|
||||||
|
how to use a Jupyter notebook.
|
||||||
|
2. **`01_python_pandas_basics.ipynb`** — minimum Python and pandas you
|
||||||
|
need to read project code.
|
||||||
|
3. **`02_explore_one_database.ipynb`** — open one tracking DB, plot a
|
||||||
|
trajectory, compute a single distance.
|
||||||
|
4. **`03_compare_trained_vs_naive.ipynb`** — first real analysis,
|
||||||
|
comparing groups.
|
||||||
|
|
||||||
|
After these, the notebooks one level up (`flies_analysis*.ipynb`) walk
|
||||||
|
through the full analysis pipeline that the previous student built.
|
||||||
11
requirements-tracking.txt
Normal file
11
requirements-tracking.txt
Normal file
|
|
@ -0,0 +1,11 @@
|
||||||
|
# Extra dependencies needed only for the offline-tracking pipeline
|
||||||
|
# (build_video_inventory.py, pick_targets.py, auto_detect_targets.py,
|
||||||
|
# track_videos.py). Not needed for the existing analysis notebooks.
|
||||||
|
#
|
||||||
|
# install with: pip install -r requirements-tracking.txt
|
||||||
|
opencv-python>=4.8
|
||||||
|
openpyxl>=3.1
|
||||||
|
gitpython>=3.1
|
||||||
|
netifaces>=0.11
|
||||||
|
mysql-connector-python>=8.0
|
||||||
|
pyserial>=3.5
|
||||||
119
scripts/auto_detect_targets.py
Normal file
119
scripts/auto_detect_targets.py
Normal file
|
|
@ -0,0 +1,119 @@
|
||||||
|
"""Try auto-detection of L-shape targets on each video and save JSON sidecars.
|
||||||
|
|
||||||
|
Useful for:
|
||||||
|
- videos that DO have visible black-circle targets (saves manual clicks);
|
||||||
|
- as a smoke test of the whole pipeline before running the picker.
|
||||||
|
|
||||||
|
Failure is silent — videos that fail auto-detection are simply not written
|
||||||
|
to disk, leaving them for the manual `pick_targets.py` tool.
|
||||||
|
|
||||||
|
Output JSON has the same shape as the manual picker's so `track_videos.py`
|
||||||
|
can consume either.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import datetime as dt
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
# ethoscope source tree
|
||||||
|
sys.path.insert(0, "/home/gg/Code/ethoscope_project/ethoscope/src/ethoscope")
|
||||||
|
|
||||||
|
from config import INVENTORY_CSV, TARGETS_DIR # noqa: E402
|
||||||
|
|
||||||
|
from ethoscope.roi_builders.target_roi_builder import TargetGridROIBuilder # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def detect_one(video_path: Path, frame_idx: int) -> tuple[list[list[int]], int] | None:
|
||||||
|
"""Run ethoscope target detection on one frame; return (points, frame_idx) or None."""
|
||||||
|
cap = cv2.VideoCapture(str(video_path))
|
||||||
|
if not cap.isOpened():
|
||||||
|
return None
|
||||||
|
n = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||||
|
if n > 0 and frame_idx >= n:
|
||||||
|
frame_idx = max(0, n - 1)
|
||||||
|
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
|
||||||
|
ok, frame = cap.read()
|
||||||
|
cap.release()
|
||||||
|
if not ok or frame is None:
|
||||||
|
return None
|
||||||
|
|
||||||
|
# The detector expects a single-channel image (grey) like ethoscope cameras produce.
|
||||||
|
if frame.ndim == 3:
|
||||||
|
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
|
||||||
|
else:
|
||||||
|
gray = frame
|
||||||
|
|
||||||
|
# We don't actually need a fully-configured grid here — _find_target_coordinates
|
||||||
|
# alone gives us the 3 reference points.
|
||||||
|
builder = TargetGridROIBuilder(n_rows=2, n_cols=3)
|
||||||
|
try:
|
||||||
|
ref = builder._find_target_coordinates(gray)
|
||||||
|
except Exception as e:
|
||||||
|
logging.debug(f"detection failed for {video_path.name}: {e}")
|
||||||
|
return None
|
||||||
|
if ref is None:
|
||||||
|
return None
|
||||||
|
return [[int(p[0]), int(p[1])] for p in ref], frame_idx
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__)
|
||||||
|
parser.add_argument("--frame", type=int, default=125)
|
||||||
|
parser.add_argument("--limit", type=int, default=None)
|
||||||
|
parser.add_argument("--video", type=str, default=None,
|
||||||
|
help="run on a single video path (skips inventory)")
|
||||||
|
parser.add_argument("--overwrite", action="store_true",
|
||||||
|
help="overwrite existing JSON sidecars")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
TARGETS_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
if args.video:
|
||||||
|
videos = [Path(args.video)]
|
||||||
|
else:
|
||||||
|
if not INVENTORY_CSV.exists():
|
||||||
|
sys.exit("Inventory missing — run build_video_inventory.py first.")
|
||||||
|
inv = pd.read_csv(INVENTORY_CSV)
|
||||||
|
todo = inv[inv["in_xlsx"] & ~inv["already_tracked"]]
|
||||||
|
videos = [Path(p) for p in todo["mp4_path"].tolist()]
|
||||||
|
if args.limit:
|
||||||
|
videos = videos[: args.limit]
|
||||||
|
|
||||||
|
n_ok = n_fail = n_skip = 0
|
||||||
|
for v in videos:
|
||||||
|
out = TARGETS_DIR / f"{v.stem}.json"
|
||||||
|
if out.exists() and not args.overwrite:
|
||||||
|
n_skip += 1
|
||||||
|
continue
|
||||||
|
result = detect_one(v, args.frame)
|
||||||
|
if result is None:
|
||||||
|
n_fail += 1
|
||||||
|
print(f" fail: {v.name}")
|
||||||
|
continue
|
||||||
|
points, used_frame = result
|
||||||
|
out.write_text(json.dumps({
|
||||||
|
"video_path": str(v),
|
||||||
|
"frame_index": int(used_frame),
|
||||||
|
"reference_points": points,
|
||||||
|
"order": ["top", "corner", "left"],
|
||||||
|
"picked_at": dt.datetime.now().isoformat(timespec="seconds"),
|
||||||
|
"method": "auto",
|
||||||
|
}, indent=2))
|
||||||
|
n_ok += 1
|
||||||
|
print(f" ok: {v.name} → {points}")
|
||||||
|
|
||||||
|
print(f"\nDone. ok={n_ok} fail={n_fail} skipped(existing)={n_skip}")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
logging.basicConfig(level=logging.WARNING, format="%(levelname)s %(message)s")
|
||||||
|
main()
|
||||||
150
scripts/build_video_inventory.py
Normal file
150
scripts/build_video_inventory.py
Normal file
|
|
@ -0,0 +1,150 @@
|
||||||
|
"""Build an inventory of videos available on disk and join with the metadata xlsx.
|
||||||
|
|
||||||
|
Scans /mnt/ethoscope_data/videos/<uuid>/<machine_name>/<date_time>/*.mp4
|
||||||
|
and produces a CSV mapping each (date, machine_name) row in
|
||||||
|
all_video_info_merged.xlsx to the corresponding merged.mp4 path on disk.
|
||||||
|
|
||||||
|
Output: data/metadata/video_inventory.csv with columns:
|
||||||
|
machine_uuid, machine_name, session_date, session_time, mp4_path,
|
||||||
|
in_xlsx (bool), already_tracked (bool)
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import re
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
from config import DATA_RAW, INVENTORY_CSV, VIDEO_INFO_XLSX, VIDEOS_ROOT
|
||||||
|
|
||||||
|
SESSION_RE = re.compile(r"^(\d{4}-\d{2}-\d{2})_(\d{2}-\d{2}-\d{2})$")
|
||||||
|
|
||||||
|
|
||||||
|
def scan_videos(videos_root: Path) -> pd.DataFrame:
|
||||||
|
"""Walk videos_root and return one row per merged.mp4 found.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
videos_root: Root directory containing <uuid>/<machine_name>/<date_time>/.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
DataFrame with columns: machine_uuid, machine_name, session_date,
|
||||||
|
session_time, session_datetime, mp4_path.
|
||||||
|
"""
|
||||||
|
rows = []
|
||||||
|
for uuid_dir in sorted(videos_root.iterdir()):
|
||||||
|
if not uuid_dir.is_dir():
|
||||||
|
continue
|
||||||
|
for machine_dir in uuid_dir.iterdir():
|
||||||
|
if not machine_dir.is_dir() or not machine_dir.name.startswith("ETHOSCOPE_"):
|
||||||
|
continue
|
||||||
|
for session_dir in machine_dir.iterdir():
|
||||||
|
if not session_dir.is_dir():
|
||||||
|
continue
|
||||||
|
m = SESSION_RE.match(session_dir.name)
|
||||||
|
if not m:
|
||||||
|
continue
|
||||||
|
date_str, time_str = m.group(1), m.group(2)
|
||||||
|
# Prefer *_merged.mp4 if present
|
||||||
|
merged = sorted(session_dir.glob("*_merged.mp4"))
|
||||||
|
if not merged:
|
||||||
|
merged = sorted(session_dir.glob("*.mp4"))
|
||||||
|
if not merged:
|
||||||
|
continue
|
||||||
|
rows.append(
|
||||||
|
{
|
||||||
|
"machine_uuid": uuid_dir.name,
|
||||||
|
"machine_name": machine_dir.name,
|
||||||
|
"session_date": date_str,
|
||||||
|
"session_time": time_str,
|
||||||
|
"session_datetime": f"{date_str}_{time_str}",
|
||||||
|
"mp4_path": str(merged[0]),
|
||||||
|
}
|
||||||
|
)
|
||||||
|
return pd.DataFrame(rows)
|
||||||
|
|
||||||
|
|
||||||
|
def already_tracked_set(data_raw: Path) -> set[tuple[str, str]]:
|
||||||
|
"""Return the set of (date, time) sessions for which a tracking DB exists.
|
||||||
|
|
||||||
|
DBs are named like:
|
||||||
|
2025-07-15_16-03-10_<uuid>__1920x1088@25fps-28q_merged_tracking.db
|
||||||
|
"""
|
||||||
|
out = set()
|
||||||
|
for db in data_raw.glob("*_tracking.db"):
|
||||||
|
m = re.match(r"^(\d{4}-\d{2}-\d{2})_(\d{2}-\d{2}-\d{2})_", db.name)
|
||||||
|
if m:
|
||||||
|
out.add((m.group(1), m.group(2)))
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
print(f"Scanning {VIDEOS_ROOT} ...")
|
||||||
|
videos_df = scan_videos(VIDEOS_ROOT)
|
||||||
|
print(f" found {len(videos_df)} video sessions on disk")
|
||||||
|
|
||||||
|
print(f"Loading metadata xlsx: {VIDEO_INFO_XLSX}")
|
||||||
|
meta = pd.read_excel(VIDEO_INFO_XLSX)
|
||||||
|
meta["session_date"] = meta["date"].dt.strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
# The xlsx has one row per (date, machine, ROI) — collapse to unique sessions
|
||||||
|
meta_sessions = (
|
||||||
|
meta[["session_date", "machine_name"]].drop_duplicates().reset_index(drop=True)
|
||||||
|
)
|
||||||
|
print(f" xlsx contains {len(meta_sessions)} unique (date, machine) sessions")
|
||||||
|
|
||||||
|
# Mark which video sessions are referenced by the xlsx
|
||||||
|
xlsx_keys = set(zip(meta_sessions["session_date"], meta_sessions["machine_name"]))
|
||||||
|
videos_df["in_xlsx"] = videos_df.apply(
|
||||||
|
lambda r: (r["session_date"], r["machine_name"]) in xlsx_keys, axis=1
|
||||||
|
)
|
||||||
|
|
||||||
|
# Mark which already have tracking DBs in data/raw/
|
||||||
|
tracked = already_tracked_set(DATA_RAW)
|
||||||
|
videos_df["already_tracked"] = videos_df.apply(
|
||||||
|
lambda r: (r["session_date"], r["session_time"]) in tracked, axis=1
|
||||||
|
)
|
||||||
|
|
||||||
|
INVENTORY_CSV.parent.mkdir(parents=True, exist_ok=True)
|
||||||
|
videos_df.sort_values(["session_date", "machine_name", "session_time"]).to_csv(
|
||||||
|
INVENTORY_CSV, index=False
|
||||||
|
)
|
||||||
|
|
||||||
|
# Coverage report
|
||||||
|
in_xlsx = videos_df["in_xlsx"]
|
||||||
|
needed = videos_df[in_xlsx & ~videos_df["already_tracked"]]
|
||||||
|
n_xlsx_sessions = len(meta_sessions)
|
||||||
|
n_with_video = videos_df[in_xlsx].drop_duplicates(
|
||||||
|
["session_date", "machine_name"]
|
||||||
|
).shape[0]
|
||||||
|
|
||||||
|
# xlsx sessions that have no video on disk
|
||||||
|
found_keys = set(
|
||||||
|
zip(
|
||||||
|
videos_df.loc[in_xlsx, "session_date"],
|
||||||
|
videos_df.loc[in_xlsx, "machine_name"],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
missing = sorted(xlsx_keys - found_keys)
|
||||||
|
|
||||||
|
print()
|
||||||
|
print("=" * 70)
|
||||||
|
print(f"Wrote inventory: {INVENTORY_CSV}")
|
||||||
|
print(f" total video sessions on disk: {len(videos_df)}")
|
||||||
|
print(f" xlsx unique sessions: {n_xlsx_sessions}")
|
||||||
|
print(f" xlsx sessions with video: {n_with_video}")
|
||||||
|
print(f" xlsx sessions missing video: {len(missing)}")
|
||||||
|
print(f" already tracked (DB exists): {videos_df['already_tracked'].sum()}")
|
||||||
|
print(f" TO TRACK (in_xlsx & ~tracked, video instances): {len(needed)}")
|
||||||
|
|
||||||
|
if missing:
|
||||||
|
print()
|
||||||
|
print("xlsx sessions with NO matching video on disk:")
|
||||||
|
for d, m in missing[:20]:
|
||||||
|
print(f" {d} {m}")
|
||||||
|
if len(missing) > 20:
|
||||||
|
print(f" ... and {len(missing) - 20} more")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -1,117 +1,99 @@
|
||||||
import pandas as pd
|
"""Compute per-frame inter-fly distances for every (date, machine, ROI, session).
|
||||||
|
|
||||||
|
Reads tracking data via :func:`load_roi_data.load_roi_data` (which is driven
|
||||||
|
by ``all_video_info_merged.tsv``) and produces one distances DataFrame
|
||||||
|
spanning every fly/session in the batch. Group membership (``trained`` /
|
||||||
|
``untrained``) is preserved from the ``male`` column.
|
||||||
|
"""
|
||||||
|
|
||||||
import numpy as np
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
from scipy.spatial.distance import euclidean
|
from scipy.spatial.distance import euclidean
|
||||||
|
|
||||||
from config import DATA_PROCESSED
|
from config import DATA_PROCESSED
|
||||||
|
from load_roi_data import load_roi_data
|
||||||
|
|
||||||
|
|
||||||
def calculate_fly_distances(trained_file=None, untrained_file=None):
|
def calculate_fly_distances(data: pd.DataFrame | None = None) -> pd.DataFrame:
|
||||||
"""Calculate distances between flies at each time point.
|
"""Compute inter-fly distances over time for every fly/session.
|
||||||
|
|
||||||
For each time point:
|
For each time point inside one (date, machine, ROI, session) trajectory:
|
||||||
- If two flies are detected: calculate Cartesian distance between them
|
- 2+ flies detected: Euclidean distance between the first two by id
|
||||||
- If one fly is detected: set distance to 0 if area > average area, otherwise NaN
|
- 1 fly detected: distance = 0 if its bbox area exceeds the global
|
||||||
|
mean (likely a single blob containing both flies), else NaN
|
||||||
|
|
||||||
Args:
|
Args:
|
||||||
trained_file (Path): Path to trained ROI data CSV.
|
data: optional pre-loaded DataFrame from :func:`load_roi_data`. If
|
||||||
untrained_file (Path): Path to untrained ROI data CSV.
|
None, the full batch is loaded.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
tuple: (trained_distances, untrained_distances) DataFrames.
|
DataFrame with one row per (track, time) pair, including ``distance``,
|
||||||
|
``n_flies``, ``area_fly1``, ``area_fly2``, plus the metadata columns
|
||||||
|
propagated from the source row (``date``, ``machine_name``, ``ROI``,
|
||||||
|
``session``, ``male``, ``species``, ``memory``, ``age``).
|
||||||
"""
|
"""
|
||||||
if trained_file is None:
|
if data is None:
|
||||||
trained_file = DATA_PROCESSED / 'trained_roi_data.csv'
|
data = load_roi_data()
|
||||||
if untrained_file is None:
|
if data.empty:
|
||||||
untrained_file = DATA_PROCESSED / 'untrained_roi_data.csv'
|
return pd.DataFrame()
|
||||||
|
|
||||||
trained_df = pd.read_csv(trained_file)
|
data = data.copy()
|
||||||
untrained_df = pd.read_csv(untrained_file)
|
data["area"] = data["w"] * data["h"]
|
||||||
|
avg_area = data["area"].mean()
|
||||||
trained_df['area'] = trained_df['w'] * trained_df['h']
|
|
||||||
untrained_df['area'] = untrained_df['w'] * untrained_df['h']
|
|
||||||
|
|
||||||
avg_area = np.mean([trained_df['area'].mean(), untrained_df['area'].mean()])
|
|
||||||
print(f"Average area across all data: {avg_area:.2f}")
|
print(f"Average area across all data: {avg_area:.2f}")
|
||||||
|
|
||||||
trained_distances = process_distance_data(trained_df, avg_area)
|
# Carry these onto every output row (constant within a track).
|
||||||
untrained_distances = process_distance_data(untrained_df, avg_area)
|
keep_meta = ["date", "machine_name", "ROI", "session", "male",
|
||||||
|
"species", "memory", "age"]
|
||||||
|
|
||||||
return trained_distances, untrained_distances
|
rows: list[dict] = []
|
||||||
|
track_keys = ["date", "machine_name", "ROI", "session"]
|
||||||
|
for track, track_df in data.groupby(track_keys, sort=False):
|
||||||
def process_distance_data(df, avg_area):
|
meta_row = {k: v for k, v in zip(track_keys, track)}
|
||||||
"""Process a DataFrame to calculate distances between flies at each time point.
|
# Carry the rest of the metadata from any sample (constant per track).
|
||||||
|
sample = track_df.iloc[0]
|
||||||
Args:
|
for col in keep_meta:
|
||||||
df (pd.DataFrame): Input tracking data.
|
if col not in meta_row:
|
||||||
avg_area (float): Average area threshold for single-fly detection.
|
meta_row[col] = sample[col]
|
||||||
|
|
||||||
Returns:
|
|
||||||
pd.DataFrame: Distance data with columns for machine, ROI, time, distance.
|
|
||||||
"""
|
|
||||||
results = []
|
|
||||||
|
|
||||||
for (machine_name, roi), group in df.groupby(['machine_name', 'ROI']):
|
|
||||||
for t, time_group in group.groupby('t'):
|
|
||||||
time_group = time_group.sort_values('id').reset_index(drop=True)
|
|
||||||
|
|
||||||
|
for t, time_group in track_df.groupby("t", sort=False):
|
||||||
|
time_group = time_group.sort_values("id").reset_index(drop=True)
|
||||||
|
row = dict(meta_row)
|
||||||
|
row["t"] = t
|
||||||
if len(time_group) >= 2:
|
if len(time_group) >= 2:
|
||||||
fly1 = time_group.iloc[0]
|
f1, f2 = time_group.iloc[0], time_group.iloc[1]
|
||||||
fly2 = time_group.iloc[1]
|
row["distance"] = euclidean([f1["x"], f1["y"]], [f2["x"], f2["y"]])
|
||||||
distance = euclidean([fly1['x'], fly1['y']], [fly2['x'], fly2['y']])
|
row["n_flies"] = len(time_group)
|
||||||
|
row["area_fly1"] = f1["area"]
|
||||||
|
row["area_fly2"] = f2["area"]
|
||||||
|
else:
|
||||||
|
f = time_group.iloc[0]
|
||||||
|
row["distance"] = 0.0 if f["area"] > avg_area else np.nan
|
||||||
|
row["n_flies"] = 1
|
||||||
|
row["area_fly1"] = f["area"]
|
||||||
|
row["area_fly2"] = np.nan
|
||||||
|
rows.append(row)
|
||||||
|
|
||||||
results.append({
|
return pd.DataFrame(rows)
|
||||||
'machine_name': machine_name,
|
|
||||||
'ROI': roi,
|
|
||||||
't': t,
|
|
||||||
'distance': distance,
|
|
||||||
'n_flies': len(time_group),
|
|
||||||
'area_fly1': fly1['area'],
|
|
||||||
'area_fly2': fly2['area']
|
|
||||||
})
|
|
||||||
elif len(time_group) == 1:
|
|
||||||
fly = time_group.iloc[0]
|
|
||||||
area = fly['area']
|
|
||||||
|
|
||||||
if area > avg_area:
|
|
||||||
distance = 0.0
|
|
||||||
else:
|
|
||||||
distance = np.nan
|
|
||||||
|
|
||||||
results.append({
|
|
||||||
'machine_name': machine_name,
|
|
||||||
'ROI': roi,
|
|
||||||
't': t,
|
|
||||||
'distance': distance,
|
|
||||||
'n_flies': 1,
|
|
||||||
'area_fly1': area,
|
|
||||||
'area_fly2': np.nan
|
|
||||||
})
|
|
||||||
|
|
||||||
return pd.DataFrame(results)
|
|
||||||
|
|
||||||
|
|
||||||
def main():
|
def main() -> None:
|
||||||
"""Run distance calculations and save results."""
|
distances = calculate_fly_distances()
|
||||||
trained_distances, untrained_distances = calculate_fly_distances()
|
|
||||||
|
|
||||||
print(f"Trained data distance summary:")
|
print("\nDistance summary:")
|
||||||
print(f" Shape: {trained_distances.shape}")
|
print(f" Shape: {distances.shape}")
|
||||||
print(f" Distance stats:")
|
if not distances.empty:
|
||||||
print(f" Count: {trained_distances['distance'].count()}")
|
print(f" Distance count: {distances['distance'].count()}")
|
||||||
print(f" Mean: {trained_distances['distance'].mean():.2f}")
|
print(f" Distance mean: {distances['distance'].mean():.2f}")
|
||||||
print(f" Std: {trained_distances['distance'].std():.2f}")
|
print(f" Distance std: {distances['distance'].std():.2f}")
|
||||||
|
male = distances["male"]
|
||||||
|
print(f" Trained tracks: {(male == 'trained').sum()}")
|
||||||
|
print(f" Naive tracks: {(male == 'naive').sum()}")
|
||||||
|
|
||||||
print(f"\nUntrained data distance summary:")
|
DATA_PROCESSED.mkdir(parents=True, exist_ok=True)
|
||||||
print(f" Shape: {untrained_distances.shape}")
|
out = DATA_PROCESSED / "distances.csv"
|
||||||
print(f" Distance stats:")
|
distances.to_csv(out, index=False)
|
||||||
print(f" Count: {untrained_distances['distance'].count()}")
|
print(f"\nSaved {out}")
|
||||||
print(f" Mean: {untrained_distances['distance'].mean():.2f}")
|
|
||||||
print(f" Std: {untrained_distances['distance'].std():.2f}")
|
|
||||||
|
|
||||||
trained_distances.to_csv(DATA_PROCESSED / 'trained_distances.csv', index=False)
|
|
||||||
untrained_distances.to_csv(DATA_PROCESSED / 'untrained_distances.csv', index=False)
|
|
||||||
print("\nDistance data saved")
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
|
|
|
||||||
|
|
@ -7,3 +7,16 @@ DATA_RAW = PROJECT_ROOT / "data" / "raw"
|
||||||
DATA_METADATA = PROJECT_ROOT / "data" / "metadata"
|
DATA_METADATA = PROJECT_ROOT / "data" / "metadata"
|
||||||
DATA_PROCESSED = PROJECT_ROOT / "data" / "processed"
|
DATA_PROCESSED = PROJECT_ROOT / "data" / "processed"
|
||||||
FIGURES = PROJECT_ROOT / "figures"
|
FIGURES = PROJECT_ROOT / "figures"
|
||||||
|
|
||||||
|
# Offline-tracking pipeline paths
|
||||||
|
VIDEOS_ROOT = Path("/mnt/ethoscope_data/videos")
|
||||||
|
VIDEO_INFO_XLSX = PROJECT_ROOT.parent / "all_video_info_merged.xlsx"
|
||||||
|
INVENTORY_CSV = DATA_METADATA / "video_inventory.csv"
|
||||||
|
# Reason: kept on the local data volume alongside the tracking DBs (out of
|
||||||
|
# ownCloud sync). See TRACKING_OUTPUT_DIR comment below.
|
||||||
|
TARGETS_DIR = Path("/mnt/data/projects/cupido/targets")
|
||||||
|
# Reason: tracking DBs are large binary files that don't belong in
|
||||||
|
# ownCloud-synced storage (sync conflicts + bandwidth). They live on the
|
||||||
|
# local data volume instead. Regenerable from videos + target JSONs.
|
||||||
|
TRACKING_OUTPUT_DIR = Path("/mnt/data/projects/cupido/tracked")
|
||||||
|
LOGS_DIR = PROJECT_ROOT / "data" / "logs"
|
||||||
|
|
|
||||||
181
scripts/export_video_db_index.py
Normal file
181
scripts/export_video_db_index.py
Normal file
|
|
@ -0,0 +1,181 @@
|
||||||
|
"""Augment all_video_info_merged.xlsx with the input video + tracking DB paths.
|
||||||
|
|
||||||
|
Each xlsx row represents one fly (date, machine_name, ROI), observed across a
|
||||||
|
training session and a testing session. We resolve those two sessions to the
|
||||||
|
on-disk video files (via the inventory CSV) and to their tracking DBs (under
|
||||||
|
TRACKING_OUTPUT_DIR), then write the result as TSV.
|
||||||
|
|
||||||
|
Output columns added:
|
||||||
|
training_video_path, training_db_path,
|
||||||
|
testing_video_path, testing_db_path
|
||||||
|
|
||||||
|
Empty values mean either no video matched (rare — implies missing inventory
|
||||||
|
entry) or no DB exists yet (e.g. the one video the completeness gate
|
||||||
|
rejected).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python export_video_db_index.py
|
||||||
|
python export_video_db_index.py --out path/to/output.tsv
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import re
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
from config import INVENTORY_CSV, TRACKING_OUTPUT_DIR, VIDEO_INFO_XLSX
|
||||||
|
|
||||||
|
|
||||||
|
_TIME_RE = re.compile(r"^(\d{8})_(\d{1,2})(\d{2})?(AM|PM)$", re.IGNORECASE)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_xlsx_time(value: str) -> tuple[str, int] | None:
|
||||||
|
"""Convert '20241021_11AM' / '20240918_1030AM' to (YYYY-MM-DD, minutes24).
|
||||||
|
|
||||||
|
Resolution is hour-only when no minutes are given (e.g. '11AM' → 11:00).
|
||||||
|
Returns minutes-from-midnight so we can do nearest-neighbor matching.
|
||||||
|
"""
|
||||||
|
if not isinstance(value, str):
|
||||||
|
return None
|
||||||
|
m = _TIME_RE.match(value.strip())
|
||||||
|
if not m:
|
||||||
|
return None
|
||||||
|
ymd, hh, mm, ampm = m.groups()
|
||||||
|
date = f"{ymd[:4]}-{ymd[4:6]}-{ymd[6:8]}"
|
||||||
|
hour = int(hh)
|
||||||
|
minute = int(mm) if mm else 0
|
||||||
|
if ampm.upper() == "PM" and hour != 12:
|
||||||
|
hour += 12
|
||||||
|
if ampm.upper() == "AM" and hour == 12:
|
||||||
|
hour = 0
|
||||||
|
return date, hour * 60 + minute
|
||||||
|
|
||||||
|
|
||||||
|
def build_session_index(inventory: pd.DataFrame) -> dict[tuple[str, str], list[dict]]:
|
||||||
|
"""Index inventory rows by (date, machine_name) → list of session dicts."""
|
||||||
|
idx: dict[tuple[str, str], list[dict]] = {}
|
||||||
|
for row in inventory.itertuples(index=False):
|
||||||
|
h, m, _s = (int(p) for p in str(row.session_time).split("-"))
|
||||||
|
key = (row.session_date, row.machine_name)
|
||||||
|
idx.setdefault(key, []).append({
|
||||||
|
"mp4_path": row.mp4_path,
|
||||||
|
"session_datetime": row.session_datetime,
|
||||||
|
"minutes": h * 60 + m,
|
||||||
|
})
|
||||||
|
return idx
|
||||||
|
|
||||||
|
|
||||||
|
def db_path_for_video(mp4_path: str) -> Path | None:
|
||||||
|
"""Tracker writes <video_stem>_tracking.db under TRACKING_OUTPUT_DIR."""
|
||||||
|
stem = Path(mp4_path).stem
|
||||||
|
db = TRACKING_OUTPUT_DIR / f"{stem}_tracking.db"
|
||||||
|
return db if db.exists() else None
|
||||||
|
|
||||||
|
|
||||||
|
_TIME_TOLERANCE_MIN = 90 # xlsx labels are approximate ("11AM" → 10:51 is fine)
|
||||||
|
|
||||||
|
|
||||||
|
def resolve_session(
|
||||||
|
machine_name: str,
|
||||||
|
when: str,
|
||||||
|
fallback_date: str | None,
|
||||||
|
index: dict[tuple[str, str], list[dict]],
|
||||||
|
) -> tuple[str, str]:
|
||||||
|
"""Look up the video + db whose start time is closest to `when`.
|
||||||
|
|
||||||
|
Match strategy:
|
||||||
|
1. Use the date embedded in `when` (training/testing can fall on a
|
||||||
|
different calendar day from the row's ``date`` column).
|
||||||
|
2. If no candidates exist for that date, fall back to ``fallback_date``
|
||||||
|
(the xlsx row's ``date`` column). Reason: the xlsx contains
|
||||||
|
date typos like '20240110_11AM' for an Oct 1 experiment.
|
||||||
|
|
||||||
|
Among candidates, pick the video whose start minute is closest to the
|
||||||
|
xlsx-claimed time, within ±_TIME_TOLERANCE_MIN.
|
||||||
|
"""
|
||||||
|
parsed = parse_xlsx_time(when)
|
||||||
|
if parsed is None:
|
||||||
|
return "", ""
|
||||||
|
date, target_min = parsed
|
||||||
|
candidates = index.get((date, machine_name), [])
|
||||||
|
if not candidates and fallback_date:
|
||||||
|
candidates = index.get((fallback_date, machine_name), [])
|
||||||
|
if not candidates:
|
||||||
|
return "", ""
|
||||||
|
|
||||||
|
def _gap(target: int, c: dict) -> int:
|
||||||
|
# Reason: xlsx times like '1230AM' are ambiguous (12 AM vs 12 PM).
|
||||||
|
# We try both the literal time AND a +12-hour shift, picking the
|
||||||
|
# interpretation that brings us closest to a real session.
|
||||||
|
return min(abs(c["minutes"] - target), abs(c["minutes"] - (target + 720) % 1440))
|
||||||
|
|
||||||
|
best = min(candidates, key=lambda c: _gap(target_min, c))
|
||||||
|
if _gap(target_min, best) > _TIME_TOLERANCE_MIN:
|
||||||
|
return "", ""
|
||||||
|
db = db_path_for_video(best["mp4_path"])
|
||||||
|
return best["mp4_path"], (str(db) if db else "")
|
||||||
|
|
||||||
|
|
||||||
|
# Variants of "naive" the xlsx has accumulated: 'naïve', 'niave', plus
|
||||||
|
# trailing whitespace. All collapse to a single canonical 'naive'.
|
||||||
|
_MALE_NAIVE_VARIANTS = {"naïve", "niave", "naive"}
|
||||||
|
|
||||||
|
|
||||||
|
def _normalize_metadata(df: pd.DataFrame) -> None:
|
||||||
|
"""Strip whitespace and canonicalize the ``male`` column in place."""
|
||||||
|
for col in df.select_dtypes(include=("object", "string")).columns:
|
||||||
|
df[col] = df[col].astype(str).str.strip()
|
||||||
|
df["male"] = df["male"].apply(
|
||||||
|
lambda v: "naive" if v.lower() in _MALE_NAIVE_VARIANTS else v
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__)
|
||||||
|
parser.add_argument(
|
||||||
|
"--out",
|
||||||
|
type=Path,
|
||||||
|
default=VIDEO_INFO_XLSX.with_suffix(".tsv"),
|
||||||
|
help="output TSV path (default: alongside the xlsx)",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
inv = pd.read_csv(INVENTORY_CSV)
|
||||||
|
inv = inv[inv["in_xlsx"]].copy()
|
||||||
|
index = build_session_index(inv)
|
||||||
|
|
||||||
|
df = pd.read_excel(VIDEO_INFO_XLSX)
|
||||||
|
_normalize_metadata(df)
|
||||||
|
date_iso = pd.to_datetime(df["date"]).dt.strftime("%Y-%m-%d")
|
||||||
|
|
||||||
|
train_videos, train_dbs, test_videos, test_dbs = [], [], [], []
|
||||||
|
for fallback, row in zip(date_iso, df.itertuples(index=False)):
|
||||||
|
tv, td = resolve_session(row.machine_name, row.training_date_time, fallback, index)
|
||||||
|
sv, sd = resolve_session(row.machine_name, row.testing_date_time, fallback, index)
|
||||||
|
train_videos.append(tv)
|
||||||
|
train_dbs.append(td)
|
||||||
|
test_videos.append(sv)
|
||||||
|
test_dbs.append(sd)
|
||||||
|
|
||||||
|
df["training_video_path"] = train_videos
|
||||||
|
df["training_db_path"] = train_dbs
|
||||||
|
df["testing_video_path"] = test_videos
|
||||||
|
df["testing_db_path"] = test_dbs
|
||||||
|
|
||||||
|
df.to_csv(args.out, sep="\t", index=False)
|
||||||
|
|
||||||
|
n_rows = len(df)
|
||||||
|
n_train_video = sum(bool(v) for v in train_videos)
|
||||||
|
n_train_db = sum(bool(v) for v in train_dbs)
|
||||||
|
n_test_video = sum(bool(v) for v in test_videos)
|
||||||
|
n_test_db = sum(bool(v) for v in test_dbs)
|
||||||
|
print(f"wrote {args.out} ({n_rows} rows)")
|
||||||
|
print(f" training: {n_train_video} with video, {n_train_db} with DB")
|
||||||
|
print(f" testing: {n_test_video} with video, {n_test_db} with DB")
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
|
|
@ -1,90 +1,113 @@
|
||||||
import pandas as pd
|
"""Load ROI tracking data from all sessions into one DataFrame.
|
||||||
|
|
||||||
|
Drives off the merged TSV (one row per ROI/fly across training + testing
|
||||||
|
phases). For each TSV row, opens the corresponding tracking DB and pulls
|
||||||
|
the matching ROI table, then attaches the experimental metadata.
|
||||||
|
|
||||||
|
The TSV is the single source of truth for what data exists and how it
|
||||||
|
maps to flies and conditions.
|
||||||
|
"""
|
||||||
|
|
||||||
import sqlite3
|
import sqlite3
|
||||||
import re
|
from pathlib import Path
|
||||||
|
|
||||||
from config import DATA_RAW, DATA_METADATA, DATA_PROCESSED
|
import pandas as pd
|
||||||
|
|
||||||
|
from config import VIDEO_INFO_XLSX
|
||||||
|
|
||||||
|
|
||||||
def load_roi_data():
|
# Metadata columns to copy onto every tracking sample. These are the xlsx
|
||||||
"""Load ROI data from SQLite databases and group by trained/untrained.
|
# fields that describe the experimental condition behind each fly/ROI.
|
||||||
|
# Reason: the ROI column is uppercase ("ROI") for backwards compatibility
|
||||||
|
# with the existing analysis pipeline (calculate_distances.py, notebooks).
|
||||||
|
_META_COLS = (
|
||||||
|
"date",
|
||||||
|
"machine_name",
|
||||||
|
"species",
|
||||||
|
"male",
|
||||||
|
"training_date_time",
|
||||||
|
"testing_date_time",
|
||||||
|
"training_length_hr",
|
||||||
|
"consolidation_length_hr",
|
||||||
|
"memory",
|
||||||
|
"age",
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
def _open_ro(db_path: str, cache: dict) -> sqlite3.Connection | None:
|
||||||
|
"""Cached read-only sqlite connection. Returns None on failure."""
|
||||||
|
if not isinstance(db_path, str) or not db_path:
|
||||||
|
return None
|
||||||
|
if db_path not in cache:
|
||||||
|
try:
|
||||||
|
cache[db_path] = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
|
||||||
|
except sqlite3.Error as e:
|
||||||
|
print(f"failed to open {Path(db_path).name}: {e}")
|
||||||
|
cache[db_path] = None
|
||||||
|
return cache[db_path]
|
||||||
|
|
||||||
|
|
||||||
|
def load_roi_data(meta: pd.DataFrame | None = None) -> pd.DataFrame:
|
||||||
|
"""Load ROI tracking data joined with experimental metadata.
|
||||||
|
|
||||||
|
For each row in ``meta``, reads the matching ROI table from both the
|
||||||
|
training DB and the testing DB (whichever exist), and stamps every
|
||||||
|
sample with the row's metadata plus a ``session`` column
|
||||||
|
(``"training"`` or ``"testing"``). Rows with empty DB paths (unusable
|
||||||
|
videos, or videos that didn't pass the completeness gate) are skipped.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
meta: optional DataFrame with the same schema as
|
||||||
|
``all_video_info_merged.tsv``. Pass a filtered slice to load a
|
||||||
|
subset (e.g. ``meta[meta.species == 'Melanogaster/CS']``).
|
||||||
|
Defaults to the full TSV.
|
||||||
|
|
||||||
Returns:
|
Returns:
|
||||||
tuple: (trained_df, untrained_df) DataFrames with tracking data.
|
DataFrame with columns ``id, t, x, y, w, h, phi, is_inferred,
|
||||||
|
has_interacted, session, <metadata>`` — one row per tracking
|
||||||
|
sample. Empty if nothing could be loaded.
|
||||||
"""
|
"""
|
||||||
metadata = pd.read_csv(DATA_METADATA / '2025_07_15_metadata_fixed.csv')
|
if meta is None:
|
||||||
metadata['machine_name'] = metadata['machine_name'].astype(str)
|
meta = pd.read_csv(VIDEO_INFO_XLSX.with_suffix(".tsv"), sep="\t")
|
||||||
|
|
||||||
trained_rois = metadata[metadata['group'] == 'trained']
|
db_cache: dict = {}
|
||||||
untrained_rois = metadata[metadata['group'] == 'untrained']
|
chunks: list[pd.DataFrame] = []
|
||||||
|
|
||||||
db_files = list(DATA_RAW.glob('*_tracking.db'))
|
for row in meta.itertuples(index=False):
|
||||||
|
for session in ("training", "testing"):
|
||||||
trained_df = pd.DataFrame()
|
conn = _open_ro(getattr(row, f"{session}_db_path"), db_cache)
|
||||||
untrained_df = pd.DataFrame()
|
if conn is None:
|
||||||
|
continue
|
||||||
for db_file in db_files:
|
|
||||||
print(f"Processing {db_file.name}")
|
|
||||||
|
|
||||||
pattern = r'_([0-9a-f]{32})__'
|
|
||||||
match = re.search(pattern, db_file.name)
|
|
||||||
|
|
||||||
if not match:
|
|
||||||
print(f"Could not extract UUID from {db_file.name}")
|
|
||||||
continue
|
|
||||||
|
|
||||||
uuid = match.group(1)
|
|
||||||
metadata_matches = metadata[metadata['path'].str.contains(uuid, na=False)]
|
|
||||||
|
|
||||||
if metadata_matches.empty:
|
|
||||||
print(f"No metadata matches found for UUID {uuid} from {db_file.name}")
|
|
||||||
continue
|
|
||||||
|
|
||||||
machine_id = metadata_matches.iloc[0]['machine_name']
|
|
||||||
print(f"Matched to machine ID: {machine_id}")
|
|
||||||
|
|
||||||
conn = sqlite3.connect(str(db_file))
|
|
||||||
|
|
||||||
machine_trained = trained_rois[trained_rois['machine_name'] == machine_id]
|
|
||||||
machine_untrained = untrained_rois[untrained_rois['machine_name'] == machine_id]
|
|
||||||
|
|
||||||
for _, row in machine_trained.iterrows():
|
|
||||||
roi = row['ROI']
|
|
||||||
try:
|
try:
|
||||||
query = f"SELECT * FROM ROI_{roi}"
|
df = pd.read_sql_query(
|
||||||
roi_data = pd.read_sql_query(query, conn)
|
f"SELECT * FROM ROI_{int(row.roi)}", conn
|
||||||
roi_data['machine_name'] = machine_id
|
)
|
||||||
roi_data['ROI'] = roi
|
|
||||||
roi_data['group'] = 'trained'
|
|
||||||
trained_df = pd.concat([trained_df, roi_data], ignore_index=True)
|
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print(f"Error loading ROI_{roi} from {db_file.name}: {e}")
|
# Reason: a DB may be missing a ROI table if tracking was
|
||||||
|
# partial — skip rather than abort the whole batch.
|
||||||
|
print(f" ROI_{row.roi} from {session} DB: {e}")
|
||||||
|
continue
|
||||||
|
df["session"] = session
|
||||||
|
df["ROI"] = int(row.roi)
|
||||||
|
for col in _META_COLS:
|
||||||
|
df[col] = getattr(row, col)
|
||||||
|
chunks.append(df)
|
||||||
|
|
||||||
for _, row in machine_untrained.iterrows():
|
for conn in db_cache.values():
|
||||||
roi = row['ROI']
|
if conn is not None:
|
||||||
try:
|
conn.close()
|
||||||
query = f"SELECT * FROM ROI_{roi}"
|
|
||||||
roi_data = pd.read_sql_query(query, conn)
|
|
||||||
roi_data['machine_name'] = machine_id
|
|
||||||
roi_data['ROI'] = roi
|
|
||||||
roi_data['group'] = 'untrained'
|
|
||||||
untrained_df = pd.concat([untrained_df, roi_data], ignore_index=True)
|
|
||||||
except Exception as e:
|
|
||||||
print(f"Error loading ROI_{roi} from {db_file.name}: {e}")
|
|
||||||
|
|
||||||
conn.close()
|
return pd.concat(chunks, ignore_index=True) if chunks else pd.DataFrame()
|
||||||
|
|
||||||
return trained_df, untrained_df
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
if __name__ == "__main__":
|
||||||
trained_data, untrained_data = load_roi_data()
|
data = load_roi_data()
|
||||||
print(f"Trained data shape: {trained_data.shape}")
|
print(f"shape: {data.shape}")
|
||||||
print(f"Untrained data shape: {untrained_data.shape}")
|
if not data.empty:
|
||||||
if not trained_data.empty:
|
print(f"columns: {list(data.columns)}")
|
||||||
print("Trained data columns:", trained_data.columns.tolist())
|
print(f"sessions: {data['session'].value_counts().to_dict()}")
|
||||||
if not untrained_data.empty:
|
print(f"unique machines: {data['machine_name'].nunique()}")
|
||||||
print("Untrained data columns:", untrained_data.columns.tolist())
|
print(
|
||||||
|
f"unique flies (date,machine,roi): "
|
||||||
trained_data.to_csv(DATA_PROCESSED / 'trained_roi_data.csv', index=False)
|
f"{data.groupby(['date','machine_name','roi']).ngroups}"
|
||||||
untrained_data.to_csv(DATA_PROCESSED / 'untrained_roi_data.csv', index=False)
|
)
|
||||||
print("Data saved to trained_roi_data.csv and untrained_roi_data.csv")
|
|
||||||
|
|
|
||||||
176
scripts/monitor_tracking.py
Normal file
176
scripts/monitor_tracking.py
Normal file
|
|
@ -0,0 +1,176 @@
|
||||||
|
"""Live progress + ETA for the offline tracker batch.
|
||||||
|
|
||||||
|
Counts ground-truth (DBs on disk) rather than parsing log lines, so it works
|
||||||
|
whether the batch is running fresh or was resumed after a crash. Errors are
|
||||||
|
parsed out of any *.log files in data/logs/.
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python monitor_tracking.py # one snapshot, exit
|
||||||
|
python monitor_tracking.py --watch # refresh every 10 s
|
||||||
|
python monitor_tracking.py --watch 30 # refresh every 30 s
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import re
|
||||||
|
import time
|
||||||
|
from datetime import datetime, timedelta
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
from config import LOGS_DIR, TARGETS_DIR, TRACKING_OUTPUT_DIR
|
||||||
|
|
||||||
|
|
||||||
|
def count_target_jsons() -> tuple[int, int, list[str]]:
|
||||||
|
"""Return (n_pickable, n_unusable, unusable_video_stems)."""
|
||||||
|
pickable = 0
|
||||||
|
unusable_stems: list[str] = []
|
||||||
|
for j in TARGETS_DIR.glob("*.json"):
|
||||||
|
try:
|
||||||
|
d = json.loads(j.read_text())
|
||||||
|
except Exception:
|
||||||
|
continue
|
||||||
|
if d.get("unusable"):
|
||||||
|
unusable_stems.append(j.stem)
|
||||||
|
elif d.get("reference_points"):
|
||||||
|
pickable += 1
|
||||||
|
return pickable, len(unusable_stems), unusable_stems
|
||||||
|
|
||||||
|
|
||||||
|
def count_tracked_dbs() -> tuple[int, datetime | None, str | None]:
|
||||||
|
"""Return (n_dbs, mtime_of_newest, name_of_newest)."""
|
||||||
|
dbs = list(TRACKING_OUTPUT_DIR.glob("*_tracking.db"))
|
||||||
|
if not dbs:
|
||||||
|
return 0, None, None
|
||||||
|
newest = max(dbs, key=lambda p: p.stat().st_mtime)
|
||||||
|
return len(dbs), datetime.fromtimestamp(newest.stat().st_mtime), newest.stem
|
||||||
|
|
||||||
|
|
||||||
|
def parse_recent_errors(log_dir: Path, tail_lines: int = 5000) -> list[str]:
|
||||||
|
"""Scan the most recent *.log file for lines reporting errors."""
|
||||||
|
if not log_dir.exists():
|
||||||
|
return []
|
||||||
|
logs = sorted(log_dir.glob("*.log"), key=lambda p: p.stat().st_mtime)
|
||||||
|
if not logs:
|
||||||
|
return []
|
||||||
|
latest = logs[-1]
|
||||||
|
try:
|
||||||
|
with latest.open() as f:
|
||||||
|
tail = f.readlines()[-tail_lines:]
|
||||||
|
except Exception:
|
||||||
|
return []
|
||||||
|
out = []
|
||||||
|
for line in tail:
|
||||||
|
if re.search(r":\s*error\b", line) or " error: " in line.lower():
|
||||||
|
out.append(line.rstrip())
|
||||||
|
return out
|
||||||
|
|
||||||
|
|
||||||
|
def db_completion_history() -> list[float]:
|
||||||
|
"""Return mtimes of all tracking DBs, sorted ascending. Used for rate."""
|
||||||
|
return sorted(p.stat().st_mtime for p in TRACKING_OUTPUT_DIR.glob("*_tracking.db"))
|
||||||
|
|
||||||
|
|
||||||
|
def fmt_duration(seconds: float) -> str:
|
||||||
|
if seconds < 60:
|
||||||
|
return f"{int(seconds)} s"
|
||||||
|
if seconds < 3600:
|
||||||
|
return f"{int(seconds // 60)} min"
|
||||||
|
h = int(seconds // 3600)
|
||||||
|
m = int((seconds % 3600) // 60)
|
||||||
|
return f"{h} h {m} min"
|
||||||
|
|
||||||
|
|
||||||
|
def snapshot() -> str:
|
||||||
|
pickable, unusable, _ = count_target_jsons()
|
||||||
|
tracked, last_mtime, last_name = count_tracked_dbs()
|
||||||
|
history = db_completion_history()
|
||||||
|
errors = parse_recent_errors(LOGS_DIR)
|
||||||
|
|
||||||
|
lines = [f"tracking progress @ {datetime.now():%Y-%m-%d %H:%M:%S}"]
|
||||||
|
lines.append(f" pickable JSONs: {pickable}")
|
||||||
|
lines.append(f" unusable JSONs: {unusable} (skipped by tracker)")
|
||||||
|
pct = (tracked / pickable * 100) if pickable else 0
|
||||||
|
lines.append(
|
||||||
|
f" DBs on disk: {tracked} / {pickable} ({pct:.0f}%)"
|
||||||
|
)
|
||||||
|
lines.append(f" errors in log: {len(errors)}")
|
||||||
|
|
||||||
|
# Rate from completions in the last 6 h — robust to gaps from killed /
|
||||||
|
# restarted runs, while wide enough to span multiple parallel-worker
|
||||||
|
# completion bursts. Reason: with 8 workers all started together on
|
||||||
|
# multi-hour videos, completions arrive in tight bursts every ~video-
|
||||||
|
# length apart; a 30-min window catches one burst and overestimates by
|
||||||
|
# ~10×. 6 h spans at least one full burst cycle for typical videos.
|
||||||
|
now_ts = time.time()
|
||||||
|
window_secs = 6 * 3600
|
||||||
|
recent = [t for t in history if t >= now_ts - window_secs]
|
||||||
|
if len(recent) >= 2:
|
||||||
|
# Reason: with N parallel workers, completions arrive in clumps
|
||||||
|
# (all workers finish near-simultaneously). Dividing N by the *burst*
|
||||||
|
# span gives nonsense rates. Use the full window as the denominator
|
||||||
|
# once the batch has been running long enough to fill it; otherwise
|
||||||
|
# use elapsed-since-first-DB. Detection: if every DB on disk also
|
||||||
|
# falls inside the window, the batch is younger than the window.
|
||||||
|
if len(recent) == len(history):
|
||||||
|
elapsed = max(1.0, now_ts - history[0])
|
||||||
|
else:
|
||||||
|
elapsed = float(window_secs)
|
||||||
|
if elapsed > 0:
|
||||||
|
rate_per_hour = len(recent) / elapsed * 3600
|
||||||
|
lines.append(
|
||||||
|
f" rate (last {len(recent)} in {int(window_secs/3600)} h):"
|
||||||
|
f" {rate_per_hour:.1f} videos/hour"
|
||||||
|
)
|
||||||
|
remaining = max(0, pickable - tracked)
|
||||||
|
if rate_per_hour > 0 and remaining > 0:
|
||||||
|
eta_sec = remaining * 3600 / rate_per_hour
|
||||||
|
eta_at = datetime.now() + timedelta(seconds=eta_sec)
|
||||||
|
lines.append(
|
||||||
|
f" ETA remaining: {fmt_duration(eta_sec)} "
|
||||||
|
f"(done by {eta_at:%H:%M %a})"
|
||||||
|
)
|
||||||
|
else:
|
||||||
|
lines.append(" rate: (warming up — check again in a few min)")
|
||||||
|
|
||||||
|
if last_mtime is not None and last_name is not None:
|
||||||
|
ago = (datetime.now() - last_mtime).total_seconds()
|
||||||
|
lines.append(
|
||||||
|
f" most recent DB: {last_name[:60]}... ({fmt_duration(ago)} ago)"
|
||||||
|
)
|
||||||
|
|
||||||
|
if errors:
|
||||||
|
lines.append("")
|
||||||
|
lines.append(f" recent errors ({min(5, len(errors))} of {len(errors)}):")
|
||||||
|
for e in errors[-5:]:
|
||||||
|
lines.append(f" {e[:120]}")
|
||||||
|
|
||||||
|
return "\n".join(lines)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__)
|
||||||
|
parser.add_argument(
|
||||||
|
"--watch", nargs="?", type=int, const=10, default=None,
|
||||||
|
help="refresh every N seconds (default 10 if flag given without value)",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if args.watch is None:
|
||||||
|
print(snapshot())
|
||||||
|
return
|
||||||
|
|
||||||
|
try:
|
||||||
|
while True:
|
||||||
|
# Clear screen and reprint
|
||||||
|
print("\033[2J\033[H", end="")
|
||||||
|
print(snapshot())
|
||||||
|
print(f"\n(refreshing every {args.watch}s — Ctrl-C to exit)")
|
||||||
|
time.sleep(args.watch)
|
||||||
|
except KeyboardInterrupt:
|
||||||
|
print()
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
467
scripts/pick_targets.py
Normal file
467
scripts/pick_targets.py
Normal file
|
|
@ -0,0 +1,467 @@
|
||||||
|
"""Interactive target picker for offline tracking (matplotlib/Tk GUI).
|
||||||
|
|
||||||
|
Loops through videos that need tracking and lets the user click 3 reference
|
||||||
|
points per video in L-shape order:
|
||||||
|
|
||||||
|
1) TOP target (above the corner)
|
||||||
|
2) CORNER target (the right-angle vertex)
|
||||||
|
3) LEFT target (to the left of the corner)
|
||||||
|
|
||||||
|
These three points are the same reference layout used by ethoscope's
|
||||||
|
`TargetGridROIBuilder`: dst_points = [(0, -1), (0, 0), (-1, 0)] in unit
|
||||||
|
coordinates. Saving them as a JSON sidecar lets the offline tracker build the
|
||||||
|
6-ROI HD mating arena grid without needing auto-target detection.
|
||||||
|
|
||||||
|
Output JSON sidecar: TARGETS_DIR/<video_basename>.json
|
||||||
|
{
|
||||||
|
"video_path": "/mnt/.../*.mp4",
|
||||||
|
"frame_index": <int>,
|
||||||
|
"reference_points": [[x0, y0], [x1, y1], [x2, y2]],
|
||||||
|
"order": ["top", "corner", "left"],
|
||||||
|
"picked_at": "<isoformat>"
|
||||||
|
}
|
||||||
|
|
||||||
|
Keys (in the picker window):
|
||||||
|
LEFT-CLICK add a point (top → corner → left)
|
||||||
|
r reset clicks for current video
|
||||||
|
d skip this video for THIS run only (no JSON written)
|
||||||
|
u mark this video unusable (FOV wrong etc.); skipped forever
|
||||||
|
. / , advance / rewind by 25 frames (≈ 1 s @ 25 fps)
|
||||||
|
] / [ advance / rewind by 5% of the video (~3 min in a 1 h video)
|
||||||
|
# jump to the middle of the video
|
||||||
|
enter save the 3 points and move on
|
||||||
|
q / ESC quit picker
|
||||||
|
|
||||||
|
After the 3rd click, the 6 ROI rectangles are drawn over the frame so you
|
||||||
|
can sanity-check the geometry before pressing ENTER.
|
||||||
|
|
||||||
|
With --redo, if a JSON sidecar exists, its points are pre-loaded so you can
|
||||||
|
nudge them rather than restart from scratch.
|
||||||
|
|
||||||
|
Why matplotlib instead of cv2.imshow:
|
||||||
|
OpenCV's bundled GUI uses Qt, which needs XKeyboard + a fonts directory and
|
||||||
|
is fragile over SSH X11-forwarding. matplotlib's TkAgg backend uses pure
|
||||||
|
Tk/X11 and works out of the box on any DISPLAY (and gives free pan/zoom
|
||||||
|
via the toolbar — useful for clicking small targets precisely).
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import datetime as dt
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Force TkAgg BEFORE importing matplotlib. We override even if MPLBACKEND is
|
||||||
|
# already set, because the script is unusable with a non-interactive backend.
|
||||||
|
os.environ["MPLBACKEND"] = "TkAgg"
|
||||||
|
|
||||||
|
import cv2 # noqa: E402
|
||||||
|
import matplotlib # noqa: E402
|
||||||
|
import matplotlib.pyplot as plt # noqa: E402
|
||||||
|
import numpy as np # noqa: E402
|
||||||
|
import pandas as pd # noqa: E402
|
||||||
|
|
||||||
|
# matplotlib.backend_bases exposes the cursor identifiers under different
|
||||||
|
# names depending on version: `Cursors` enum on 3.5+, lowercase `cursors`
|
||||||
|
# instance on older releases. Both have the same integer attributes.
|
||||||
|
try:
|
||||||
|
from matplotlib.backend_bases import Cursors as _Cursors # 3.5+
|
||||||
|
except ImportError:
|
||||||
|
try:
|
||||||
|
from matplotlib.backend_bases import cursors as _Cursors # older
|
||||||
|
except ImportError:
|
||||||
|
_Cursors = None
|
||||||
|
|
||||||
|
# Verify we ended up on an interactive backend; bail loud (with a concrete
|
||||||
|
# explanation) if not. matplotlib silently falls back to 'agg' when its
|
||||||
|
# requested backend can't load, which is hard to debug without help.
|
||||||
|
_backend = matplotlib.get_backend()
|
||||||
|
if _backend.lower() in ("agg", "headless", "template", "pdf", "svg", "ps"):
|
||||||
|
diag = []
|
||||||
|
try:
|
||||||
|
import tkinter as _tk
|
||||||
|
try:
|
||||||
|
_tk.Tk().destroy()
|
||||||
|
diag.append("tkinter import + Tk() instantiation: OK")
|
||||||
|
except Exception as e:
|
||||||
|
diag.append(f"tkinter imported but Tk() failed: {e!r}")
|
||||||
|
except Exception as e:
|
||||||
|
diag.append(f"tkinter import FAILED: {e!r}")
|
||||||
|
diag.append(" → on Manjaro/Arch, run: sudo pacman -S tk")
|
||||||
|
print(
|
||||||
|
f"ERROR: matplotlib loaded the non-interactive backend {_backend!r}.\n"
|
||||||
|
f" Expected 'TkAgg'. Diagnostic info:\n"
|
||||||
|
f" DISPLAY = {os.environ.get('DISPLAY')!r}\n"
|
||||||
|
f" MPLBACKEND = {os.environ.get('MPLBACKEND')!r}\n"
|
||||||
|
f" matplotlib ver = {matplotlib.__version__}\n"
|
||||||
|
+ "\n".join(f" {d}" for d in diag),
|
||||||
|
file=sys.stderr,
|
||||||
|
)
|
||||||
|
sys.exit(2)
|
||||||
|
|
||||||
|
from config import INVENTORY_CSV, TARGETS_DIR # noqa: E402
|
||||||
|
from tracking_geometry import compute_roi_polygons # noqa: E402
|
||||||
|
|
||||||
|
# Strip default matplotlib keybindings that would conflict with ours.
|
||||||
|
for k in ("keymap.home", "keymap.save", "keymap.quit", "keymap.fullscreen",
|
||||||
|
"keymap.pan", "keymap.zoom", "keymap.back", "keymap.forward"):
|
||||||
|
try:
|
||||||
|
plt.rcParams[k] = []
|
||||||
|
except KeyError:
|
||||||
|
pass
|
||||||
|
|
||||||
|
CLICK_LABELS = ("TOP", "CORNER", "LEFT")
|
||||||
|
CLICK_COLORS = ("red", "lime", "deepskyblue")
|
||||||
|
|
||||||
|
|
||||||
|
def grab_frame(
|
||||||
|
video_path: Path, frame_idx: int
|
||||||
|
) -> tuple[np.ndarray, int, int] | None:
|
||||||
|
"""Return (RGB frame, actual_frame_idx, n_frames) from the video, or None.
|
||||||
|
|
||||||
|
Clamps frame_idx to [0, n_frames-1] so callers can step blindly.
|
||||||
|
"""
|
||||||
|
cap = cv2.VideoCapture(str(video_path))
|
||||||
|
if not cap.isOpened():
|
||||||
|
return None
|
||||||
|
n = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
|
||||||
|
if n > 0:
|
||||||
|
frame_idx = max(0, min(frame_idx, n - 1))
|
||||||
|
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_idx)
|
||||||
|
ok, frame = cap.read()
|
||||||
|
cap.release()
|
||||||
|
if not ok or frame is None:
|
||||||
|
return None
|
||||||
|
return cv2.cvtColor(frame, cv2.COLOR_BGR2RGB), frame_idx, n
|
||||||
|
|
||||||
|
|
||||||
|
def pick_one(
|
||||||
|
video_path: Path,
|
||||||
|
frame_idx: int,
|
||||||
|
status_prefix: str,
|
||||||
|
initial_points: list[tuple[float, float]] | None = None,
|
||||||
|
) -> dict | None:
|
||||||
|
"""Show the picker UI for a single video; return the result dict or None."""
|
||||||
|
grabbed = grab_frame(video_path, frame_idx)
|
||||||
|
if grabbed is None:
|
||||||
|
print(f" ! cannot read {video_path}")
|
||||||
|
return None
|
||||||
|
frame, frame_idx, n_frames = grabbed
|
||||||
|
# Big-step size for ] / [ : 5% of total length, ~3 min in a 1h video.
|
||||||
|
big_step = max(1, int(round(0.05 * n_frames))) if n_frames > 0 else 250
|
||||||
|
|
||||||
|
fig, ax = plt.subplots(figsize=(14, 8))
|
||||||
|
try:
|
||||||
|
fig.canvas.manager.set_window_title("pick targets")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
# Use a crosshair cursor over the axes so it's obvious where the click
|
||||||
|
# will land. matplotlib's toolbar resets the cursor to POINTER (arrow) on
|
||||||
|
# every mouse-move when no tool is active, so we intercept set_cursor:
|
||||||
|
# whenever it asks for POINTER, we substitute SELECT_REGION (crosshair).
|
||||||
|
# Tool modes (zoom/pan) keep their native cursors.
|
||||||
|
if _Cursors is not None:
|
||||||
|
_orig_set_cursor = fig.canvas.set_cursor
|
||||||
|
|
||||||
|
def _set_cursor_with_crosshair(cursor):
|
||||||
|
if cursor == _Cursors.POINTER:
|
||||||
|
cursor = _Cursors.SELECT_REGION
|
||||||
|
return _orig_set_cursor(cursor)
|
||||||
|
|
||||||
|
fig.canvas.set_cursor = _set_cursor_with_crosshair
|
||||||
|
try:
|
||||||
|
fig.canvas.set_cursor(_Cursors.SELECT_REGION)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
else:
|
||||||
|
# Last-ditch: just set the Tk widget's cursor once and hope the
|
||||||
|
# toolbar doesn't immediately overwrite it.
|
||||||
|
try:
|
||||||
|
fig.canvas.get_tk_widget().config(cursor="tcross")
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
img_artist = ax.imshow(frame)
|
||||||
|
ax.set_axis_off()
|
||||||
|
fig.tight_layout()
|
||||||
|
|
||||||
|
state = {
|
||||||
|
"points": list(initial_points) if initial_points else [],
|
||||||
|
"action": None, # 'save' | 'skip' | 'quit' | 'unusable'
|
||||||
|
"frame": frame,
|
||||||
|
"frame_idx": frame_idx,
|
||||||
|
"drawn": [], # artists drawn on top of the image
|
||||||
|
}
|
||||||
|
|
||||||
|
def update_title():
|
||||||
|
nb = len(state["points"])
|
||||||
|
nxt = (
|
||||||
|
f"click {CLICK_LABELS[nb]}"
|
||||||
|
if nb < 3
|
||||||
|
else "ENTER=save | r=reset d=skip u=unusable q=quit | . , [ ] # = step frame"
|
||||||
|
)
|
||||||
|
ax.set_title(
|
||||||
|
f'{status_prefix} frame {state["frame_idx"]} | {nxt}',
|
||||||
|
fontsize=10,
|
||||||
|
)
|
||||||
|
|
||||||
|
def redraw_points():
|
||||||
|
for a in state["drawn"]:
|
||||||
|
try:
|
||||||
|
a.remove()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
state["drawn"].clear()
|
||||||
|
for i, (x, y) in enumerate(state["points"]):
|
||||||
|
color = CLICK_COLORS[i]
|
||||||
|
label = CLICK_LABELS[i]
|
||||||
|
(cross,) = ax.plot(x, y, marker="+", color=color, markersize=22, mew=2)
|
||||||
|
(ring,) = ax.plot(
|
||||||
|
x, y, marker="o", color=color, markersize=22,
|
||||||
|
fillstyle="none", mew=2,
|
||||||
|
)
|
||||||
|
txt = ax.text(
|
||||||
|
x + 14, y - 14, label,
|
||||||
|
color=color, fontsize=10, weight="bold",
|
||||||
|
)
|
||||||
|
state["drawn"].extend([cross, ring, txt])
|
||||||
|
if len(state["points"]) >= 2:
|
||||||
|
(line1,) = ax.plot(
|
||||||
|
[state["points"][0][0], state["points"][1][0]],
|
||||||
|
[state["points"][0][1], state["points"][1][1]],
|
||||||
|
color="white", linewidth=0.7, alpha=0.6,
|
||||||
|
)
|
||||||
|
state["drawn"].append(line1)
|
||||||
|
if len(state["points"]) == 3:
|
||||||
|
(line2,) = ax.plot(
|
||||||
|
[state["points"][1][0], state["points"][2][0]],
|
||||||
|
[state["points"][1][1], state["points"][2][1]],
|
||||||
|
color="white", linewidth=0.7, alpha=0.6,
|
||||||
|
)
|
||||||
|
state["drawn"].append(line2)
|
||||||
|
# ROI overlay — draw the 6 computed rectangles on top of the frame
|
||||||
|
try:
|
||||||
|
polys = compute_roi_polygons(state["points"])
|
||||||
|
except Exception as e:
|
||||||
|
polys = []
|
||||||
|
print(f" (ROI preview failed: {e})")
|
||||||
|
for j, poly in enumerate(polys):
|
||||||
|
# Close the polygon by repeating the first point
|
||||||
|
xs = list(poly[:, 0]) + [poly[0, 0]]
|
||||||
|
ys = list(poly[:, 1]) + [poly[0, 1]]
|
||||||
|
(line,) = ax.plot(
|
||||||
|
xs, ys, color="yellow", linewidth=1.5, alpha=0.9,
|
||||||
|
)
|
||||||
|
state["drawn"].append(line)
|
||||||
|
cx = float(np.mean(poly[:, 0]))
|
||||||
|
cy = float(np.mean(poly[:, 1]))
|
||||||
|
lbl = ax.text(
|
||||||
|
cx, cy, str(j + 1),
|
||||||
|
color="yellow", fontsize=14, weight="bold",
|
||||||
|
ha="center", va="center",
|
||||||
|
)
|
||||||
|
state["drawn"].append(lbl)
|
||||||
|
update_title()
|
||||||
|
fig.canvas.draw_idle()
|
||||||
|
|
||||||
|
def reload_frame(new_idx: int):
|
||||||
|
grabbed = grab_frame(video_path, new_idx)
|
||||||
|
if grabbed is None:
|
||||||
|
return
|
||||||
|
new_frame, new_idx, _ = grabbed
|
||||||
|
state["frame"] = new_frame
|
||||||
|
state["frame_idx"] = new_idx
|
||||||
|
img_artist.set_data(new_frame)
|
||||||
|
# Keep clicked targets + ROI overlay in place across frame-stepping —
|
||||||
|
# press 'r' to clear them explicitly.
|
||||||
|
redraw_points()
|
||||||
|
|
||||||
|
def on_click(event):
|
||||||
|
if event.inaxes is not ax:
|
||||||
|
return
|
||||||
|
if event.button != 1: # left click only
|
||||||
|
return
|
||||||
|
if event.xdata is None or event.ydata is None:
|
||||||
|
return
|
||||||
|
# Skip clicks fired while the toolbar's pan/zoom is active.
|
||||||
|
toolbar = getattr(fig.canvas, "toolbar", None)
|
||||||
|
if toolbar is not None and getattr(toolbar, "mode", ""):
|
||||||
|
return
|
||||||
|
x, y = float(event.xdata), float(event.ydata)
|
||||||
|
if len(state["points"]) < 3:
|
||||||
|
state["points"].append((x, y))
|
||||||
|
else:
|
||||||
|
# 3 points already there — replace the nearest one. Lets the user
|
||||||
|
# nudge pre-loaded targets in --redo mode, or correct a bad click.
|
||||||
|
dists = [(x - px) ** 2 + (y - py) ** 2 for px, py in state["points"]]
|
||||||
|
i_nearest = min(range(3), key=dists.__getitem__)
|
||||||
|
state["points"][i_nearest] = (x, y)
|
||||||
|
redraw_points()
|
||||||
|
|
||||||
|
def on_key(event):
|
||||||
|
k = event.key or ""
|
||||||
|
if k in ("escape", "q"):
|
||||||
|
state["action"] = "quit"
|
||||||
|
plt.close(fig)
|
||||||
|
elif k == "r":
|
||||||
|
state["points"].clear()
|
||||||
|
redraw_points()
|
||||||
|
elif k == "d":
|
||||||
|
state["action"] = "skip"
|
||||||
|
plt.close(fig)
|
||||||
|
elif k == "u":
|
||||||
|
state["action"] = "unusable"
|
||||||
|
plt.close(fig)
|
||||||
|
elif k == "enter":
|
||||||
|
if len(state["points"]) == 3:
|
||||||
|
state["action"] = "save"
|
||||||
|
plt.close(fig)
|
||||||
|
elif k == ".":
|
||||||
|
reload_frame(state["frame_idx"] + 25)
|
||||||
|
elif k == ",":
|
||||||
|
reload_frame(state["frame_idx"] - 25)
|
||||||
|
elif k == "]":
|
||||||
|
reload_frame(state["frame_idx"] + big_step)
|
||||||
|
elif k == "[":
|
||||||
|
reload_frame(state["frame_idx"] - big_step)
|
||||||
|
elif k == "#":
|
||||||
|
if n_frames > 0:
|
||||||
|
reload_frame(n_frames // 2)
|
||||||
|
|
||||||
|
fig.canvas.mpl_connect("button_press_event", on_click)
|
||||||
|
fig.canvas.mpl_connect("key_press_event", on_key)
|
||||||
|
update_title()
|
||||||
|
plt.show() # blocks until the figure is closed
|
||||||
|
|
||||||
|
if state["action"] == "save":
|
||||||
|
return {
|
||||||
|
"action": "save",
|
||||||
|
"frame_idx": state["frame_idx"],
|
||||||
|
"points": state["points"],
|
||||||
|
}
|
||||||
|
if state["action"] == "unusable":
|
||||||
|
return {"action": "unusable", "frame_idx": state["frame_idx"]}
|
||||||
|
if state["action"] in ("skip", "quit"):
|
||||||
|
return {"action": state["action"]}
|
||||||
|
# Window closed via the WM "X" button — treat as quit so the loop stops
|
||||||
|
return {"action": "quit"}
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__)
|
||||||
|
parser.add_argument(
|
||||||
|
"--redo", action="store_true",
|
||||||
|
help="re-pick videos that already have JSON sidecars",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--frame", type=int, default=125,
|
||||||
|
help="default frame index to display (default 125 ≈ 5 s @ 25 fps)",
|
||||||
|
)
|
||||||
|
parser.add_argument(
|
||||||
|
"--limit", type=int, default=None,
|
||||||
|
help="only process the first N videos",
|
||||||
|
)
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
if not INVENTORY_CSV.exists():
|
||||||
|
sys.exit(
|
||||||
|
f"Inventory not found at {INVENTORY_CSV}. "
|
||||||
|
"Run build_video_inventory.py first."
|
||||||
|
)
|
||||||
|
|
||||||
|
inv = pd.read_csv(INVENTORY_CSV)
|
||||||
|
todo = inv[inv["in_xlsx"] & ~inv["already_tracked"]].copy()
|
||||||
|
todo = todo.sort_values(
|
||||||
|
["session_date", "machine_name", "session_time"]
|
||||||
|
).reset_index(drop=True)
|
||||||
|
|
||||||
|
TARGETS_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
def sidecar_for(mp4_path: str) -> Path:
|
||||||
|
return TARGETS_DIR / (Path(mp4_path).stem + ".json")
|
||||||
|
|
||||||
|
if not args.redo:
|
||||||
|
todo = todo[
|
||||||
|
~todo["mp4_path"].apply(lambda p: sidecar_for(p).exists())
|
||||||
|
].reset_index(drop=True)
|
||||||
|
|
||||||
|
if args.limit:
|
||||||
|
todo = todo.head(args.limit)
|
||||||
|
|
||||||
|
n = len(todo)
|
||||||
|
if n == 0:
|
||||||
|
print("Nothing to pick. All eligible videos already have target JSONs.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(
|
||||||
|
f"Picking targets for {n} videos. "
|
||||||
|
"Window keys: ENTER=save r=reset d=skip u=unusable q=quit "
|
||||||
|
".,[]=step frame | pan/zoom via toolbar"
|
||||||
|
)
|
||||||
|
saved = skipped = unusable = 0
|
||||||
|
for i, row in todo.iterrows():
|
||||||
|
mp4 = Path(row["mp4_path"])
|
||||||
|
prefix = f"[{i + 1}/{n}] {row['machine_name']} {row['session_datetime']}"
|
||||||
|
print(f"\n{prefix}")
|
||||||
|
|
||||||
|
# If --redo and a JSON sidecar exists, pre-load its points (only for
|
||||||
|
# regular saves — unusable sidecars are left as-is and shown empty).
|
||||||
|
initial_points = None
|
||||||
|
existing = sidecar_for(row["mp4_path"])
|
||||||
|
if args.redo and existing.exists():
|
||||||
|
try:
|
||||||
|
prev = json.loads(existing.read_text())
|
||||||
|
if not prev.get("unusable") and prev.get("reference_points"):
|
||||||
|
initial_points = [tuple(p) for p in prev["reference_points"]]
|
||||||
|
print(f" pre-loaded {len(initial_points)} previous point(s)")
|
||||||
|
except Exception as e:
|
||||||
|
print(f" ! could not read previous sidecar: {e}")
|
||||||
|
|
||||||
|
result = pick_one(mp4, args.frame, prefix, initial_points=initial_points)
|
||||||
|
if result is None or result.get("action") == "quit":
|
||||||
|
print(" quitting picker.")
|
||||||
|
break
|
||||||
|
if result["action"] == "skip":
|
||||||
|
skipped += 1
|
||||||
|
print(" skipped (no JSON written, will be re-asked next run).")
|
||||||
|
continue
|
||||||
|
if result["action"] == "unusable":
|
||||||
|
try:
|
||||||
|
reason = input(" reason for marking unusable (Enter to skip): ").strip()
|
||||||
|
except EOFError:
|
||||||
|
reason = ""
|
||||||
|
payload = {
|
||||||
|
"video_path": str(mp4),
|
||||||
|
"unusable": True,
|
||||||
|
"reason": reason,
|
||||||
|
"marked_at": dt.datetime.now().isoformat(timespec="seconds"),
|
||||||
|
}
|
||||||
|
out_path = sidecar_for(row["mp4_path"])
|
||||||
|
out_path.write_text(json.dumps(payload, indent=2))
|
||||||
|
unusable += 1
|
||||||
|
print(f" marked unusable → {out_path.name}")
|
||||||
|
continue
|
||||||
|
if result["action"] == "save":
|
||||||
|
payload = {
|
||||||
|
"video_path": str(mp4),
|
||||||
|
"frame_index": int(result["frame_idx"]),
|
||||||
|
"reference_points": [list(map(int, p)) for p in result["points"]],
|
||||||
|
"order": ["top", "corner", "left"],
|
||||||
|
"picked_at": dt.datetime.now().isoformat(timespec="seconds"),
|
||||||
|
}
|
||||||
|
out_path = sidecar_for(row["mp4_path"])
|
||||||
|
out_path.write_text(json.dumps(payload, indent=2))
|
||||||
|
saved += 1
|
||||||
|
print(f" saved → {out_path.name}")
|
||||||
|
|
||||||
|
remaining = n - saved - skipped - unusable
|
||||||
|
print(
|
||||||
|
f"\nDone. saved={saved} unusable={unusable} "
|
||||||
|
f"skipped(this run)={skipped} remaining={remaining}"
|
||||||
|
)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
main()
|
||||||
283
scripts/track_videos.py
Normal file
283
scripts/track_videos.py
Normal file
|
|
@ -0,0 +1,283 @@
|
||||||
|
"""Headless offline tracker.
|
||||||
|
|
||||||
|
Reads target JSONs produced by `pick_targets.py`, builds the 6 ROIs of the
|
||||||
|
HD mating arena from the L-shape reference points, runs ethoscope's
|
||||||
|
`MultiFlyTracker` against the merged.mp4 file via `MovieVirtualCamera`, and
|
||||||
|
writes a SQLite DB to `TRACKING_OUTPUT_DIR/<video_basename>_tracking.db`.
|
||||||
|
|
||||||
|
Idempotent: skips videos whose tracking DB already exists (unless --redo).
|
||||||
|
|
||||||
|
Usage:
|
||||||
|
python track_videos.py # process all videos with target JSON
|
||||||
|
python track_videos.py --redo # re-track even if DB exists
|
||||||
|
python track_videos.py --jobs 4 # run up to 4 videos in parallel
|
||||||
|
python track_videos.py --max-duration 1800 # cap each video at 30 min (sec)
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import json
|
||||||
|
import logging
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
import traceback
|
||||||
|
from concurrent.futures import ProcessPoolExecutor, as_completed
|
||||||
|
from pathlib import Path
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
# Import ethoscope from the local source tree (no pip install).
|
||||||
|
ETHOSCOPE_SRC = Path("/home/gg/Code/ethoscope_project/ethoscope/src/ethoscope")
|
||||||
|
sys.path.insert(0, str(ETHOSCOPE_SRC))
|
||||||
|
|
||||||
|
from config import TARGETS_DIR, TRACKING_OUTPUT_DIR # noqa: E402
|
||||||
|
from tracking_geometry import HD_FG_DATA, compute_roi_polygons # noqa: E402
|
||||||
|
|
||||||
|
|
||||||
|
def build_rois_from_targets(reference_points):
|
||||||
|
"""Wrap the shared geometry into ethoscope `ROI` objects."""
|
||||||
|
from ethoscope.core.roi import ROI
|
||||||
|
|
||||||
|
polys = compute_roi_polygons(reference_points)
|
||||||
|
return [ROI(poly.reshape((1, 4, 2)), idx=i + 1) for i, poly in enumerate(polys)]
|
||||||
|
|
||||||
|
|
||||||
|
def track_one(json_path: Path, output_dir: Path, max_duration: float | None,
|
||||||
|
redo: bool) -> tuple[str, str]:
|
||||||
|
"""Track a single video. Returns (status, message). Run in subprocess.
|
||||||
|
|
||||||
|
Statuses: "ok", "skip", "error".
|
||||||
|
"""
|
||||||
|
# Re-import inside subprocess so each worker has its own ethoscope state.
|
||||||
|
import sys as _sys
|
||||||
|
_sys.path.insert(0, str(ETHOSCOPE_SRC))
|
||||||
|
import cv2
|
||||||
|
from ethoscope.core.monitor import Monitor
|
||||||
|
from ethoscope.hardware.input.cameras import MovieVirtualCamera
|
||||||
|
from ethoscope.io.sqlite import SQLiteResultWriter
|
||||||
|
from ethoscope.trackers.multi_fly_tracker import MultiFlyTracker
|
||||||
|
|
||||||
|
import time as _time
|
||||||
|
|
||||||
|
class BGRMovieCamera(MovieVirtualCamera):
|
||||||
|
"""MovieVirtualCamera that keeps BGR frames AND retries on transient
|
||||||
|
read failures.
|
||||||
|
|
||||||
|
Two reasons for the override:
|
||||||
|
|
||||||
|
1. MultiFlyTracker calls cv2.cvtColor(img, COLOR_BGR2GRAY) without
|
||||||
|
checking whether img is already grayscale, so we must feed it
|
||||||
|
3-channel input.
|
||||||
|
|
||||||
|
2. cv2.VideoCapture.read() can return False on transient I/O hiccups
|
||||||
|
(NFS contention when 8 workers pull big mp4s in parallel) without
|
||||||
|
the file actually being at EOF. A naive "False -> StopIteration"
|
||||||
|
handling makes the tracker silently exit mid-video and write a
|
||||||
|
short, lying DB. We retry a few times and only treat persistent
|
||||||
|
failures within the *interior* of the video as real EOF.
|
||||||
|
"""
|
||||||
|
|
||||||
|
_retry_count = 5
|
||||||
|
_retry_backoff_s = 0.25
|
||||||
|
_eof_safety_frames = 50 # near end-of-file, treat False as legitimate
|
||||||
|
|
||||||
|
def _next_image(self):
|
||||||
|
for attempt in range(self._retry_count):
|
||||||
|
ret, frame = self.capture.read()
|
||||||
|
if ret and frame is not None:
|
||||||
|
return frame # BGR, untouched
|
||||||
|
# If we're near the genuine end of the file, accept it.
|
||||||
|
if (
|
||||||
|
self._has_end_of_file
|
||||||
|
and self._frame_idx >= self._total_n_frames - self._eof_safety_frames
|
||||||
|
):
|
||||||
|
return None
|
||||||
|
# Otherwise, this is a suspected transient hiccup — back off
|
||||||
|
# and try again. The capture is still open; cv2 will pick up
|
||||||
|
# the next decoded frame.
|
||||||
|
_time.sleep(self._retry_backoff_s)
|
||||||
|
return None # truly persistent failure
|
||||||
|
|
||||||
|
payload = json.loads(json_path.read_text())
|
||||||
|
if payload.get("unusable"):
|
||||||
|
reason = payload.get("reason") or "no reason given"
|
||||||
|
return "skip", f"marked unusable: {reason}"
|
||||||
|
video_path = Path(payload["video_path"])
|
||||||
|
if not video_path.exists():
|
||||||
|
return "error", f"video missing: {video_path}"
|
||||||
|
|
||||||
|
out_db = output_dir / f"{video_path.stem}_tracking.db"
|
||||||
|
if out_db.exists() and not redo:
|
||||||
|
return "skip", f"DB exists: {out_db.name}"
|
||||||
|
if out_db.exists():
|
||||||
|
out_db.unlink()
|
||||||
|
|
||||||
|
rois = build_rois_from_targets(payload["reference_points"])
|
||||||
|
|
||||||
|
cam_kwargs = {"use_wall_clock": False}
|
||||||
|
if max_duration is not None:
|
||||||
|
cam_kwargs["max_duration"] = max_duration
|
||||||
|
cam = BGRMovieCamera(str(video_path), **cam_kwargs)
|
||||||
|
|
||||||
|
metadata = {
|
||||||
|
"machine_id": payload.get("machine_uuid", "unknown"),
|
||||||
|
"machine_name": payload.get("machine_name", "unknown"),
|
||||||
|
"date_time": int(payload.get("session_epoch", 0)),
|
||||||
|
"frame_width": cam.width,
|
||||||
|
"frame_height": cam.height,
|
||||||
|
"version": "offline-tracker-1",
|
||||||
|
"experimental_info": "{}",
|
||||||
|
"selected_options": json.dumps({
|
||||||
|
"tracker": "MultiFlyTracker",
|
||||||
|
"template": "HD_Mating_Arena_6_ROIS",
|
||||||
|
"fg_data": HD_FG_DATA,
|
||||||
|
"maxN": 2,
|
||||||
|
}),
|
||||||
|
"hardware_info": "{}",
|
||||||
|
"reference_points": str([list(map(int, p)) for p in payload["reference_points"]]),
|
||||||
|
"backup_filename": out_db.name,
|
||||||
|
"result_writer_type": "SQLite3",
|
||||||
|
"sqlite_source_path": str(out_db),
|
||||||
|
}
|
||||||
|
|
||||||
|
tracker_data = {
|
||||||
|
"maxN": 2,
|
||||||
|
"visualise": False,
|
||||||
|
"fg_data": HD_FG_DATA,
|
||||||
|
"adaptive_threshold": True,
|
||||||
|
"min_fg_threshold": 10,
|
||||||
|
"max_fg_threshold": 50,
|
||||||
|
}
|
||||||
|
|
||||||
|
db_credentials = {"name": str(out_db)}
|
||||||
|
rw = SQLiteResultWriter(
|
||||||
|
db_credentials, rois, metadata=metadata,
|
||||||
|
make_dam_like_table=False, take_frame_shots=False, erase_old_db=True,
|
||||||
|
)
|
||||||
|
|
||||||
|
monit = Monitor(
|
||||||
|
cam, MultiFlyTracker, rois,
|
||||||
|
reference_points=payload["reference_points"],
|
||||||
|
data=tracker_data,
|
||||||
|
)
|
||||||
|
|
||||||
|
try:
|
||||||
|
with rw as result_writer:
|
||||||
|
monit.run(result_writer=result_writer, drawer=None, verbose=False)
|
||||||
|
except Exception:
|
||||||
|
return "error", traceback.format_exc(limit=5)
|
||||||
|
finally:
|
||||||
|
try:
|
||||||
|
cam._close()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
if not out_db.exists():
|
||||||
|
return "error", "tracking finished but DB was not created"
|
||||||
|
|
||||||
|
# Post-tracking sanity check: did we cover most of the source video?
|
||||||
|
# If not (cv2 retry exhausted, codec corruption, etc.), reject the DB so
|
||||||
|
# it doesn't get cached as "done" — better an explicit failure than a
|
||||||
|
# silent partial write.
|
||||||
|
expected_ms = (cam._total_n_frames / 25.0) * 1000.0
|
||||||
|
if max_duration is not None:
|
||||||
|
expected_ms = min(expected_ms, max_duration * 1000.0)
|
||||||
|
completeness_threshold = 0.90 # require ≥ 90 % of expected duration
|
||||||
|
|
||||||
|
# Use MAX(t) across all ROIs — a single ROI can run dry early if its fly
|
||||||
|
# stops moving, so the latest detection anywhere in the arena is the
|
||||||
|
# better signal of how far the iterator actually got.
|
||||||
|
import sqlite3 as _sqlite3
|
||||||
|
try:
|
||||||
|
_con = _sqlite3.connect(f"file:{out_db}?mode=ro", uri=True)
|
||||||
|
t_max = 0
|
||||||
|
for _i in range(1, 7):
|
||||||
|
_v = _con.execute(f"SELECT MAX(t) FROM ROI_{_i}").fetchone()[0]
|
||||||
|
if _v and _v > t_max:
|
||||||
|
t_max = _v
|
||||||
|
_con.close()
|
||||||
|
except Exception:
|
||||||
|
t_max = 0
|
||||||
|
|
||||||
|
if expected_ms > 0 and t_max < expected_ms * completeness_threshold:
|
||||||
|
out_db.unlink()
|
||||||
|
for sidecar in (str(out_db) + "-wal", str(out_db) + "-shm"):
|
||||||
|
Path(sidecar).unlink(missing_ok=True)
|
||||||
|
ratio = t_max / expected_ms if expected_ms else 0
|
||||||
|
return (
|
||||||
|
"error",
|
||||||
|
f"short output: t_max={t_max} ms vs expected {int(expected_ms)} ms "
|
||||||
|
f"({ratio*100:.0f}%); DB removed",
|
||||||
|
)
|
||||||
|
|
||||||
|
return "ok", str(out_db)
|
||||||
|
|
||||||
|
|
||||||
|
def main() -> None:
|
||||||
|
parser = argparse.ArgumentParser(description=__doc__)
|
||||||
|
parser.add_argument("--redo", action="store_true", help="re-track even if DB exists")
|
||||||
|
parser.add_argument("--jobs", type=int, default=1, help="parallel workers")
|
||||||
|
parser.add_argument(
|
||||||
|
"--max-duration", type=float, default=None,
|
||||||
|
help="cap each video at this many seconds (default: full video)",
|
||||||
|
)
|
||||||
|
parser.add_argument("--limit", type=int, default=None, help="process only first N")
|
||||||
|
parser.add_argument("--video", type=str, default=None,
|
||||||
|
help="track a single video (mp4 path); requires its target JSON")
|
||||||
|
args = parser.parse_args()
|
||||||
|
|
||||||
|
TRACKING_OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
|
||||||
|
|
||||||
|
if args.video:
|
||||||
|
stem = Path(args.video).stem
|
||||||
|
json_path = TARGETS_DIR / f"{stem}.json"
|
||||||
|
if not json_path.exists():
|
||||||
|
sys.exit(f"No target JSON for {args.video}: expected {json_path}")
|
||||||
|
jsons = [json_path]
|
||||||
|
else:
|
||||||
|
jsons = sorted(TARGETS_DIR.glob("*.json"))
|
||||||
|
|
||||||
|
if args.limit:
|
||||||
|
jsons = jsons[: args.limit]
|
||||||
|
|
||||||
|
if not jsons:
|
||||||
|
print("No target JSONs found. Run pick_targets.py first.")
|
||||||
|
return
|
||||||
|
|
||||||
|
print(f"Tracking {len(jsons)} videos (jobs={args.jobs}, redo={args.redo}).")
|
||||||
|
n_ok = n_skip = n_err = 0
|
||||||
|
|
||||||
|
if args.jobs <= 1:
|
||||||
|
for jp in jsons:
|
||||||
|
print(f" → {jp.name}", flush=True)
|
||||||
|
status, msg = track_one(jp, TRACKING_OUTPUT_DIR, args.max_duration, args.redo)
|
||||||
|
print(f" {status}: {msg.splitlines()[-1] if msg else ''}", flush=True)
|
||||||
|
n_ok += status == "ok"
|
||||||
|
n_skip += status == "skip"
|
||||||
|
n_err += status == "error"
|
||||||
|
else:
|
||||||
|
with ProcessPoolExecutor(max_workers=args.jobs) as ex:
|
||||||
|
futs = {
|
||||||
|
ex.submit(track_one, jp, TRACKING_OUTPUT_DIR, args.max_duration, args.redo): jp
|
||||||
|
for jp in jsons
|
||||||
|
}
|
||||||
|
for fut in as_completed(futs):
|
||||||
|
jp = futs[fut]
|
||||||
|
try:
|
||||||
|
status, msg = fut.result()
|
||||||
|
except Exception as e:
|
||||||
|
status, msg = "error", f"future raised: {e}"
|
||||||
|
print(f" {jp.name}: {status} — {msg.splitlines()[-1] if msg else ''}",
|
||||||
|
flush=True)
|
||||||
|
n_ok += status == "ok"
|
||||||
|
n_skip += status == "skip"
|
||||||
|
n_err += status == "error"
|
||||||
|
|
||||||
|
print(f"\nDone. ok={n_ok} skipped={n_skip} errors={n_err}")
|
||||||
|
sys.exit(0 if n_err == 0 else 1)
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
logging.basicConfig(level=logging.INFO, format="%(levelname)s %(message)s")
|
||||||
|
main()
|
||||||
71
scripts/tracking_geometry.py
Normal file
71
scripts/tracking_geometry.py
Normal file
|
|
@ -0,0 +1,71 @@
|
||||||
|
"""Shared HD-mating-arena ROI geometry, used by both pick_targets.py
|
||||||
|
(for live overlay) and track_videos.py (for actual tracking).
|
||||||
|
|
||||||
|
Pure numpy + cv2; no ethoscope dependency.
|
||||||
|
"""
|
||||||
|
|
||||||
|
from __future__ import annotations
|
||||||
|
|
||||||
|
import itertools
|
||||||
|
|
||||||
|
import cv2
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
# Layout from
|
||||||
|
# ethoscope/.../roi_builders/roi_templates/builtin/HD_Mating_Arena_6_ROIS.json
|
||||||
|
HD_MATING_ARENA = {
|
||||||
|
"n_rows": 2,
|
||||||
|
"n_cols": 3,
|
||||||
|
"top_margin": -0.21,
|
||||||
|
"bottom_margin": -0.13,
|
||||||
|
"left_margin": 0.05,
|
||||||
|
"right_margin": 0.05,
|
||||||
|
"horizontal_fill": 0.85,
|
||||||
|
"vertical_fill": 1.3,
|
||||||
|
}
|
||||||
|
|
||||||
|
HD_FG_DATA = {
|
||||||
|
"sample_size": 400,
|
||||||
|
"normal_limits": [800, 2000],
|
||||||
|
"tolerance": 0.8,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
def compute_roi_polygons(reference_points, layout=HD_MATING_ARENA):
|
||||||
|
"""Map 3 L-shape reference points to 6 ROI polygons, in the order ROI 1..6.
|
||||||
|
|
||||||
|
Reference points must be ordered:
|
||||||
|
[TOP, CORNER, LEFT]
|
||||||
|
matching ethoscope's dst_points = [(0, -1), (0, 0), (-1, 0)].
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
list[np.ndarray] # 6 arrays, each shape (4, 2), int32, in image coords
|
||||||
|
"""
|
||||||
|
ref = np.asarray(reference_points, dtype=np.float32)
|
||||||
|
if ref.shape != (3, 2):
|
||||||
|
raise ValueError(f"reference_points must be 3x2, got shape {ref.shape}")
|
||||||
|
|
||||||
|
dst_points = np.array([(0, -1), (0, 0), (-1, 0)], dtype=np.float32)
|
||||||
|
wrap_mat = cv2.getAffineTransform(dst_points, ref)
|
||||||
|
|
||||||
|
n_col = layout["n_cols"]
|
||||||
|
n_row = layout["n_rows"]
|
||||||
|
tm, bm = layout["top_margin"], layout["bottom_margin"]
|
||||||
|
lm, rm = layout["left_margin"], layout["right_margin"]
|
||||||
|
hf, vf = layout["horizontal_fill"], layout["vertical_fill"]
|
||||||
|
|
||||||
|
y_positions = (np.arange(n_row) * 2.0 + 1) * (1 - tm - bm) / (2 * n_row) + tm
|
||||||
|
x_positions = (np.arange(n_col) * 2.0 + 1) * (1 - lm - rm) / (2 * n_col) + lm
|
||||||
|
centres = [np.array([x, y]) for x, y in itertools.product(x_positions, y_positions)]
|
||||||
|
sign_mat = np.array([[-1, -1], [+1, -1], [+1, +1], [-1, +1]])
|
||||||
|
xy_size = np.array([hf / float(n_col), vf / float(n_row)]) / 2.0
|
||||||
|
rectangles = [sign_mat * xy_size + c for c in centres]
|
||||||
|
|
||||||
|
shift = np.dot(wrap_mat, [1, 1, 0]) - ref[1]
|
||||||
|
|
||||||
|
polys = []
|
||||||
|
for r in rectangles:
|
||||||
|
r3 = np.append(r, np.zeros((4, 1)), axis=1)
|
||||||
|
mapped = np.dot(wrap_mat, r3.T).T - shift
|
||||||
|
polys.append(mapped.astype(np.int32))
|
||||||
|
return polys
|
||||||
|
|
@ -51,6 +51,90 @@ See `docs/bimodal_hypothesis.md` for detailed methodology.
|
||||||
- [ ] Consider converting pixel distances to physical units (need calibration)
|
- [ ] Consider converting pixel distances to physical units (need calibration)
|
||||||
- [ ] The second notebook (`flies_analysis.ipynb`) re-runs from DB extraction - consider deprecating
|
- [ ] The second notebook (`flies_analysis.ipynb`) re-runs from DB extraction - consider deprecating
|
||||||
|
|
||||||
|
## Phase: Offline Tracking of 2024 Video Backlog (added 2026-04-27)
|
||||||
|
|
||||||
|
### Recap
|
||||||
|
|
||||||
|
Tracked so far (5 sessions, all from 2025-07-15, machines 076/145/268). The DBs in
|
||||||
|
`data/raw/` use tracker `ConstrainedMultiFlyTracker` and template
|
||||||
|
`HD_Mating_Arena_6_ROIS.json` (2 flies × 6 ROIs per video).
|
||||||
|
|
||||||
|
The metadata file `../all_video_info_merged.xlsx` indexes a different set of
|
||||||
|
experiments: 7 dates from 2024-09-17 → 2024-10-21, 16 ethoscope machines,
|
||||||
|
63 unique (date, machine) sessions = 484 ROI-rows. **None of the already-tracked
|
||||||
|
sessions are in this xlsx — these are fresh recordings to track.**
|
||||||
|
|
||||||
|
Inventory: see `data/metadata/video_inventory.csv` (built by
|
||||||
|
`scripts/build_video_inventory.py`).
|
||||||
|
- 1163 video sessions on disk under `/mnt/ethoscope_data/videos/`
|
||||||
|
- 63/63 xlsx (date, machine) sessions have video on disk
|
||||||
|
- 129 video instances need tracking (some (date, machine) have 2-4 recordings/day)
|
||||||
|
|
||||||
|
### Plan
|
||||||
|
|
||||||
|
The HD-mating-arena videos have no auto-detectable targets — the user must
|
||||||
|
manually click 3 reference points (L-shape: top, corner, left) per video. Once
|
||||||
|
all targets are picked, tracking can run in the background.
|
||||||
|
|
||||||
|
- [x] **Step 1 — Inventory**: `scripts/build_video_inventory.py` →
|
||||||
|
`data/metadata/video_inventory.csv`. 63 (date,machine) sessions match
|
||||||
|
the xlsx, all videos found, 129 video instances need tracking.
|
||||||
|
- [x] **Step 2 — Manual target picker**: `scripts/pick_targets.py`. Loops over
|
||||||
|
videos with `in_xlsx & ~already_tracked & no JSON yet`; per video, shows
|
||||||
|
a representative frame, captures 3 clicks (top, corner, left), saves
|
||||||
|
`data/targets/<video_basename>.json`. Skips videos already done.
|
||||||
|
- [x] **Step 3 — Background tracker**: `scripts/track_videos.py`. Reads target
|
||||||
|
JSONs, builds 6 ROIs from the HD-mating-arena geometry, runs
|
||||||
|
`MovieVirtualCamera` + `MultiFlyTracker` + `SQLiteResultWriter`, writes
|
||||||
|
`data/tracked/<basename>_tracking.db`. Idempotent. Smoke-tested
|
||||||
|
end-to-end: 90s of video → ~3000 rows/ROI, areas in 800-2000 band.
|
||||||
|
- [x] **Step 4 — Tracking deps**: `requirements-tracking.txt`.
|
||||||
|
|
||||||
|
### Still TODO
|
||||||
|
- [ ] User to run `pick_targets.py` (interactive — needs DISPLAY) on the 129
|
||||||
|
pending videos.
|
||||||
|
- [ ] Run `track_videos.py --jobs 4` against the resulting JSONs.
|
||||||
|
- [ ] (Optional) `auto_detect_targets.py` exists as a fallback for videos that
|
||||||
|
DO have visible targets (saves clicks). Confirmed not useful on the
|
||||||
|
2025-07-15 batch — these arenas don't have black target dots — but worth
|
||||||
|
trying on 2024 batches before falling back to manual.
|
||||||
|
- [ ] Decide what to do with the 4 (date, machine) sessions that have 3-4
|
||||||
|
recordings/day instead of 2 (e.g. ETHOSCOPE_086 on 2024-09-17 has 4).
|
||||||
|
One of them is at lower resolution (1280x960) — likely an aborted take.
|
||||||
|
|
||||||
|
### Open questions / risks
|
||||||
|
|
||||||
|
- Some (date, machine) combos have 3-4 recordings (e.g. ETHOSCOPE_086 on
|
||||||
|
2024-09-17). Need to figure out which is the real "test" video vs aborted
|
||||||
|
takes — possibly use video duration or filename pattern.
|
||||||
|
- One mismatched-resolution file: `1280x960@25fps-20q` instead of
|
||||||
|
`1920x1088@25fps-28q` — flag for inspection.
|
||||||
|
- The original `ConstrainedMultiFlyTracker` is no longer in the ethoscope repo;
|
||||||
|
`MultiFlyTracker` is its likely successor. Validate output schema matches
|
||||||
|
what the existing analysis pipeline expects (`load_roi_data.py`, etc.).
|
||||||
|
|
||||||
## Discovered During Work
|
## Discovered During Work
|
||||||
|
|
||||||
(Add new items here as they come up during analysis)
|
### Barrier-opening annotation for the 2024 batch (added 2026-04-30)
|
||||||
|
The current `flies_analysis*.ipynb` aligns trajectories to a barrier-opening
|
||||||
|
event sourced from `data/metadata/2025_07_15_barrier_opening.csv`. That file
|
||||||
|
covers only the 5 machines in the 2025-07-15 experiment. The 2024 batch
|
||||||
|
(`/mnt/data/projects/cupido/tracked/`, 113 DBs) has no equivalent annotation
|
||||||
|
yet, so all post-alignment cells silently exclude that data.
|
||||||
|
|
||||||
|
- [ ] Build a small picker that lets the user scrub through each tracking
|
||||||
|
DB / video and mark the barrier-opening frame, writing a row to a new
|
||||||
|
`data/metadata/barrier_opening_2024.csv` (or extend the existing
|
||||||
|
file with a date column).
|
||||||
|
- [ ] Once the 2024 entries exist, update `align_to_opening_time` so it
|
||||||
|
pulls from a unified `barrier_opening` table keyed by
|
||||||
|
`(date, machine_name)` rather than `machine_name` alone.
|
||||||
|
|
||||||
|
### Metadata vocabulary normalization (done 2026-04-30)
|
||||||
|
The xlsx had inconsistent labels for control flies (`'naïve'`, `'niave'`,
|
||||||
|
`'untrained'` plus trailing whitespace). All sources now use a single
|
||||||
|
canonical `'naive'`. Normalization happens in
|
||||||
|
`scripts/export_video_db_index.py` so re-running it from the xlsx always
|
||||||
|
produces a clean TSV. The 2025-07-15 legacy CSV
|
||||||
|
(`data/metadata/2025_07_15_metadata_fixed.csv`) was edited in place from
|
||||||
|
`'untrained'` → `'naive'`.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue