cupido/README.md
Giorgio e4da7691d5 Add offline tracking pipeline for video backlog
The 2024 video set in all_video_info_merged.xlsx covers 63 (date, machine)
sessions — 129 video instances — that have no auto-detectable targets, so
ROI placement requires manual reference-point selection. This commit adds
the three-stage pipeline that lets a user click for an hour, then walk
away while the tracker grinds overnight:

  1. build_video_inventory.py — scan /mnt/ethoscope_data/videos/ and join
     against the xlsx, producing data/metadata/video_inventory.csv

  2. pick_targets.py — interactive matplotlib/Tk picker. User clicks
     TOP/CORNER/LEFT (the L-shape ethoscope expects); after the third
     click the 6 ROI rectangles are drawn on top of the frame so geometry
     can be verified before saving. Also supports marking a video
     'unusable' (FOV wrong) so it's permanently skipped, frame stepping
     by ±1s/±5%/midpoint, point editing in --redo mode, and a crosshair
     cursor that survives matplotlib's per-motion cursor reset.

  3. track_videos.py — headless batch tracker. Reads the JSON sidecars,
     builds 6 ROIs from the HD-mating-arena geometry, runs MultiFlyTracker
     against the merged.mp4 via MovieVirtualCamera, writes SQLite DBs to
     data/tracked/. Idempotent (skips done DBs), parallel via --jobs,
     subclasses MovieVirtualCamera so frames stay BGR (MultiFlyTracker
     calls cvtColor(BGR2GRAY) without checking channel count).

Plus auto_detect_targets.py (fallback that runs ethoscope's auto-detector
in case any videos do have visible target dots), monitor_tracking.py
(progress + ETA from data/tracked/ ground truth, --watch for live view),
and tracking_geometry.py (single source of truth for the affine math
shared by picker and tracker).

requirements-tracking.txt pins the extra deps (opencv-python, openpyxl,
gitpython, netifaces, mysql-connector-python) — these are only needed
for the tracking pipeline, not the existing analysis notebooks.

Verified end-to-end on one of the user-picked videos: ~4000 rows/ROI in
a 120s slice, fly bounding boxes in the expected 800-2000 px² band.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 17:25:26 +01:00

5.3 KiB

Cupido: Drosophila Social Interaction Tracking

Behavioral analysis of trained vs untrained Drosophila melanogaster in a barrier-opening social interaction assay. Part of the Cupido project studying learned social behaviors.

Quick Start

# Clone the repository
git clone ssh://git@git.lab.gilest.ro:222/lab/cupido.git
cd cupido

# Create virtual environment
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Get the data files (not in git - ask lab for copies)
# Place .db files in data/raw/
# Place large .csv files in data/processed/

# Run the main analysis notebook
jupyter notebook notebooks/flies_analysis_simple.ipynb

Project Overview

The Experiment

Pairs of flies are placed in chambers (ROIs) separated by a physical barrier. After a configurable delay, the barrier is removed, allowing flies to interact. We track the distance between flies over time to compare social approach behavior between trained (socially experienced) and untrained (naive) groups.

  • 3 ethoscope machines, 5 recording sessions, 6 ROIs each = 30 ROIs with data
  • 18 trained ROIs, 18 untrained ROIs (6 from Machine 139 have no tracking data)
  • See docs/experimental_design.md for full details

Current Findings

Aggregate analysis shows statistically significant but tiny differences:

  • Post-opening distance: Cohen's d = 0.09 (96% distribution overlap)
  • Max velocity (50-200s): Cohen's d = 0.14

These effect sizes are inflated by pseudoreplication (230K data points from 18 independent ROIs per group).

Next Direction: Bimodal Hypothesis

The key insight: not all "trained" flies may have actually learned. The trained group likely contains true learners (showing distinct behavior) and non-learners (indistinguishable from untrained). Testing this requires per-ROI analysis and bimodality testing.

Read docs/bimodal_hypothesis.md for the detailed analysis plan and code sketches.

Offline Tracking Pipeline (added Apr 2026)

For tracking new videos that have no auto-detectable targets, the pipeline is split in two stages so you can sit at the screen and click for an hour, then let the tracker grind through overnight.

# extra deps (ethoscope src must be at /home/gg/Code/ethoscope_project/...)
pip install -r requirements-tracking.txt

# 1) build the inventory (xlsx ↔ /mnt/ethoscope_data/videos/)
python scripts/build_video_inventory.py

# 2) interactive: click TOP, CORNER, LEFT on each video (one frame per video)
python scripts/pick_targets.py             # process all not-yet-picked
python scripts/pick_targets.py --redo      # re-pick already-picked videos
# keys: r=reset  n=skip  f=jump frame  q/ESC=quit  ENTER=save

# 3) batch tracking (idempotent, can run in background)
python scripts/track_videos.py --jobs 4    # parallel
# output → data/tracked/*_tracking.db (SQLite, same schema as data/raw/)

See tasks/todo.md "Offline Tracking" section for the full plan, and data/metadata/video_inventory.csv for the list of videos to process.

Folder Structure

tracking/
├── README.md              # This file
├── PLANNING.md            # Architecture & conventions
├── requirements.txt       # Python dependencies
├── data/
│   ├── raw/               # SQLite tracking databases (gitignored)
│   ├── metadata/          # Experiment metadata CSVs
│   └── processed/         # Generated analysis CSVs (gitignored)
├── scripts/               # Python analysis scripts
│   ├── config.py          # Shared path constants
│   ├── load_roi_data.py   # Extract data from DBs
│   ├── calculate_distances.py
│   ├── analyze_distances.py
│   ├── statistical_tests.py
│   ├── ml_classification.py
│   └── plot_*.py          # Plotting scripts
├── notebooks/             # Jupyter notebooks
│   ├── flies_analysis_simple.ipynb  # Main analysis (use this one)
│   └── flies_analysis.ipynb         # Full pipeline from DB extraction
├── figures/               # Generated plots (gitignored)
├── docs/                  # Scientific documentation
│   ├── analysis_summary.md
│   ├── bimodal_hypothesis.md
│   └── experimental_design.md
└── tasks/
    ├── todo.md            # Task checklist
    └── lessons.md         # Pitfalls & patterns

Data Pipeline

SQLite DBs (data/raw/)
    │
    ▼  load_roi_data.py / notebook step 1
ROI CSVs (data/processed/*_roi_data.csv)
    │
    ▼  notebook steps 2-4
Aligned Distance CSVs (data/processed/*_distances_aligned.csv)
    │
    ├──▶ Plots (figures/)
    ├──▶ Statistical tests
    └──▶ Identity tracking → Velocity analysis

Key Files

File Purpose
notebooks/flies_analysis_simple.ipynb Start here - main analysis notebook
docs/bimodal_hypothesis.md Read next - the new analysis direction
data/metadata/2025_07_15_metadata_fixed.csv ROI-to-group mapping
data/metadata/2025_07_15_barrier_opening.csv Barrier opening times per machine
scripts/config.py Shared path constants for all scripts

Requirements

  • Python 3.10+
  • See requirements.txt for packages (numpy, pandas, matplotlib, seaborn, scipy, scikit-learn, jupyter)
  • Large data files (~370MB CSVs + ~33MB DBs) must be obtained separately