cupido/README.md
Giorgio Gilestro 23050360ea Remove data/raw/ entirely — all bulky data now under /mnt/data/projects/cupido/
Deleted the 5 stale pre-pipeline tracking DBs and the data/raw/ directory.
Dropped DATA_RAW from config.py; build_video_inventory now scans
TRACKING_OUTPUT_DIR for already-tracked sessions. Notebooks no longer
import DATA_RAW. README, PLANNING and todo updated to reflect that the
repo holds only code + small curated metadata, never bulky DBs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:20:25 +01:00

139 lines
5.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Cupido: Drosophila Social Interaction Tracking
Behavioral analysis of trained vs untrained *Drosophila melanogaster* in a barrier-opening social interaction assay. Part of the Cupido project studying learned social behaviors.
## Quick Start
```bash
# Clone the repository
git clone ssh://git@git.lab.gilest.ro:222/lab/cupido.git
cd cupido
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Project data lives outside the repo at /mnt/data/projects/cupido/:
# tracked/ → SQLite tracking DBs
# targets/ → target-point JSONs
# all_video_info_merged.{xlsx,tsv} → metadata spreadsheet
# Generated CSVs land in data/processed/ (gitignored).
# Run the main analysis notebook
jupyter notebook notebooks/flies_analysis_simple.ipynb
```
## Project Overview
### The Experiment
Pairs of flies are placed in chambers (ROIs) separated by a physical barrier. After a configurable delay, the barrier is removed, allowing flies to interact. We track the distance between flies over time to compare social approach behavior between trained (socially experienced) and untrained (naive) groups.
- **3 ethoscope machines**, 5 recording sessions, 6 ROIs each = 30 ROIs with data
- **18 trained ROIs, 18 untrained ROIs** (6 from Machine 139 have no tracking data)
- See `docs/experimental_design.md` for full details
### Current Findings
Aggregate analysis shows statistically significant but **tiny** differences:
- Post-opening distance: Cohen's d = 0.09 (96% distribution overlap)
- Max velocity (50-200s): Cohen's d = 0.14
These effect sizes are inflated by pseudoreplication (230K data points from 18 independent ROIs per group).
### Next Direction: Bimodal Hypothesis
The key insight: not all "trained" flies may have actually learned. The trained group likely contains **true learners** (showing distinct behavior) and **non-learners** (indistinguishable from untrained). Testing this requires per-ROI analysis and bimodality testing.
**Read `docs/bimodal_hypothesis.md` for the detailed analysis plan and code sketches.**
## Offline Tracking Pipeline (added Apr 2026)
For tracking new videos that have **no auto-detectable targets**, the pipeline
is split in two stages so you can sit at the screen and click for an hour, then
let the tracker grind through overnight.
```bash
# extra deps (set ETHOSCOPE_SRC env var if your ethoscope clone isn't at ~/Code/ethoscope_project/...)
pip install -r requirements-tracking.txt
# 1) build the inventory (xlsx ↔ /mnt/ethoscope_data/videos/)
python scripts/build_video_inventory.py
# 2) interactive: click TOP, CORNER, LEFT on each video (one frame per video)
python scripts/pick_targets.py # process all not-yet-picked
python scripts/pick_targets.py --redo # re-pick already-picked videos
# keys: r=reset n=skip f=jump frame q/ESC=quit ENTER=save
# 3) batch tracking (idempotent, can run in background)
python scripts/track_videos.py --jobs 4 # parallel
# output → /mnt/data/projects/cupido/tracked/*_tracking.db (SQLite)
```
See `tasks/todo.md` "Offline Tracking" section for the full plan, and
`data/metadata/video_inventory.csv` for the list of videos to process.
## Folder Structure
```
tracking/
├── README.md # This file
├── PLANNING.md # Architecture & conventions
├── requirements.txt # Python dependencies
├── data/
│ ├── metadata/ # Experiment metadata CSVs (small, hand-curated)
│ ├── processed/ # Generated analysis CSVs (gitignored)
│ └── logs/ # Tracker logs (gitignored)
├── scripts/ # Python analysis scripts
│ ├── config.py # Shared path constants
│ ├── load_roi_data.py # Extract data from DBs
│ ├── calculate_distances.py
│ ├── analyze_distances.py
│ ├── statistical_tests.py
│ ├── ml_classification.py
│ └── plot_*.py # Plotting scripts
├── notebooks/ # Jupyter notebooks
│ ├── flies_analysis_simple.ipynb # Main analysis (use this one)
│ └── flies_analysis.ipynb # Full pipeline from DB extraction
├── figures/ # Generated plots (gitignored)
├── docs/ # Scientific documentation
│ ├── analysis_summary.md
│ ├── bimodal_hypothesis.md
│ └── experimental_design.md
└── tasks/
├── todo.md # Task checklist
└── lessons.md # Pitfalls & patterns
```
## Data Pipeline
```
SQLite DBs (/mnt/data/projects/cupido/tracked/) + merged TSV
▼ scripts/load_roi_data.py
single DataFrame stamped with experimental metadata
▼ notebooks/flies_analysis_simple.ipynb (steps 24)
Aligned distance CSVs (data/processed/*_distances_aligned.csv)
├──▶ Plots (figures/)
├──▶ Statistical tests
└──▶ Identity tracking → Velocity analysis
```
## Key Files
| File | Purpose |
|------|---------|
| `notebooks/flies_analysis_simple.ipynb` | **Start here** - main analysis notebook |
| `docs/bimodal_hypothesis.md` | **Read next** - the new analysis direction |
| `data/metadata/2025_07_15_metadata_fixed.csv` | ROI-to-group mapping |
| `data/metadata/2025_07_15_barrier_opening.csv` | Barrier opening times per machine |
| `scripts/config.py` | Shared path constants for all scripts |
## Requirements
- Python 3.10+
- See `requirements.txt` for packages (numpy, pandas, matplotlib, seaborn, scipy, scikit-learn, jupyter)
- Large data files (~370MB CSVs + ~33MB DBs) must be obtained separately