lab/cupido: Automatic analysis of courtship conditioning in Drosophila

lab/cupido

Automatic analysis of courtship conditioning in Drosophila

Find a file

Giorgio Gilestro 28b7a227c0 load_roi_data: filter on barrier_opening.csv and stamp opening_s For every session (training and testing alike), the loader now looks up the corresponding row in barrier_opening.csv and: - drops the read if the ROI is in bad_rois (barrier never opened for that fly so its tracking has no biological meaning) - drops the read if the session is flagged unusable - stamps the session's opening_s onto every sample so downstream code can compute t_from_opening = t - opening_s Tested against ETHOSCOPE_082 2024-09-17: training (bad_rois=1,3,5) correctly drops ROIs 1/3/5; testing keeps all six; opening_s differs between sessions as expected (646.8 vs 154.7). Opt out with apply_barrier_filter=False if you need raw data.		2026-05-12 09:45:59 +01:00
data	Annotations: complete barrier_opening.csv for all 110 sessions	2026-05-12 09:42:44 +01:00
docs	Initial commit: organized project structure for student handoff	2026-03-05 16:08:36 +00:00
notebooks	Make flies_analysis_simple robust to bad caches and empty alignment	2026-05-01 09:59:34 +01:00
scripts	load_roi_data: filter on barrier_opening.csv and stamp opening_s	2026-05-12 09:45:59 +01:00
tasks	Remove data/raw/ entirely — all bulky data now under /mnt/data/projects/cupido/	2026-05-01 09:20:25 +01:00
.gitignore	Move personal TSV into repo's data/metadata/ folder	2026-05-01 09:30:22 +01:00
PLANNING.md	Remove data/raw/ entirely — all bulky data now under /mnt/data/projects/cupido/	2026-05-01 09:20:25 +01:00
README.md	Remove data/raw/ entirely — all bulky data now under /mnt/data/projects/cupido/	2026-05-01 09:20:25 +01:00
requirements-tracking.txt	Add offline tracking pipeline for video backlog	2026-04-27 17:25:26 +01:00
requirements.txt	Add tqdm progress bar to load_roi_data	2026-05-01 09:34:42 +01:00

README.md

Behavioral analysis of trained vs untrained Drosophila melanogaster in a barrier-opening social interaction assay. Part of the Cupido project studying learned social behaviors.

Quick Start

# Clone the repository
git clone ssh://git@git.lab.gilest.ro:222/lab/cupido.git
cd cupido

# Create virtual environment
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Project data lives outside the repo at /mnt/data/projects/cupido/:
#   tracked/  → SQLite tracking DBs
#   targets/  → target-point JSONs
#   all_video_info_merged.{xlsx,tsv}  → metadata spreadsheet
# Generated CSVs land in data/processed/ (gitignored).

# Run the main analysis notebook
jupyter notebook notebooks/flies_analysis_simple.ipynb

Project Overview

The Experiment

Pairs of flies are placed in chambers (ROIs) separated by a physical barrier. After a configurable delay, the barrier is removed, allowing flies to interact. We track the distance between flies over time to compare social approach behavior between trained (socially experienced) and untrained (naive) groups.

3 ethoscope machines, 5 recording sessions, 6 ROIs each = 30 ROIs with data
18 trained ROIs, 18 untrained ROIs (6 from Machine 139 have no tracking data)
See docs/experimental_design.md for full details

Current Findings

Aggregate analysis shows statistically significant but tiny differences:

Post-opening distance: Cohen's d = 0.09 (96% distribution overlap)
Max velocity (50-200s): Cohen's d = 0.14

These effect sizes are inflated by pseudoreplication (230K data points from 18 independent ROIs per group).

Next Direction: Bimodal Hypothesis

The key insight: not all "trained" flies may have actually learned. The trained group likely contains true learners (showing distinct behavior) and non-learners (indistinguishable from untrained). Testing this requires per-ROI analysis and bimodality testing.

Read docs/bimodal_hypothesis.md for the detailed analysis plan and code sketches.

Offline Tracking Pipeline (added Apr 2026)

For tracking new videos that have no auto-detectable targets, the pipeline is split in two stages so you can sit at the screen and click for an hour, then let the tracker grind through overnight.

# extra deps (set ETHOSCOPE_SRC env var if your ethoscope clone isn't at ~/Code/ethoscope_project/...)
pip install -r requirements-tracking.txt

# 1) build the inventory (xlsx ↔ /mnt/ethoscope_data/videos/)
python scripts/build_video_inventory.py

# 2) interactive: click TOP, CORNER, LEFT on each video (one frame per video)
python scripts/pick_targets.py             # process all not-yet-picked
python scripts/pick_targets.py --redo      # re-pick already-picked videos
# keys: r=reset  n=skip  f=jump frame  q/ESC=quit  ENTER=save

# 3) batch tracking (idempotent, can run in background)
python scripts/track_videos.py --jobs 4    # parallel
# output → /mnt/data/projects/cupido/tracked/*_tracking.db (SQLite)

See tasks/todo.md "Offline Tracking" section for the full plan, and data/metadata/video_inventory.csv for the list of videos to process.

Folder Structure

tracking/
├── README.md              # This file
├── PLANNING.md            # Architecture & conventions
├── requirements.txt       # Python dependencies
├── data/
│   ├── metadata/          # Experiment metadata CSVs (small, hand-curated)
│   ├── processed/         # Generated analysis CSVs (gitignored)
│   └── logs/              # Tracker logs (gitignored)
├── scripts/               # Python analysis scripts
│   ├── config.py          # Shared path constants
│   ├── load_roi_data.py   # Extract data from DBs
│   ├── calculate_distances.py
│   ├── analyze_distances.py
│   ├── statistical_tests.py
│   ├── ml_classification.py
│   └── plot_*.py          # Plotting scripts
├── notebooks/             # Jupyter notebooks
│   ├── flies_analysis_simple.ipynb  # Main analysis (use this one)
│   └── flies_analysis.ipynb         # Full pipeline from DB extraction
├── figures/               # Generated plots (gitignored)
├── docs/                  # Scientific documentation
│   ├── analysis_summary.md
│   ├── bimodal_hypothesis.md
│   └── experimental_design.md
└── tasks/
    ├── todo.md            # Task checklist
    └── lessons.md         # Pitfalls & patterns

Data Pipeline

SQLite DBs (/mnt/data/projects/cupido/tracked/) + merged TSV
    │
    ▼  scripts/load_roi_data.py
single DataFrame stamped with experimental metadata
    │
    ▼  notebooks/flies_analysis_simple.ipynb (steps 2–4)
Aligned distance CSVs (data/processed/*_distances_aligned.csv)
    │
    ├──▶ Plots (figures/)
    ├──▶ Statistical tests
    └──▶ Identity tracking → Velocity analysis

Key Files

File	Purpose
`notebooks/flies_analysis_simple.ipynb`	Start here - main analysis notebook
`docs/bimodal_hypothesis.md`	Read next - the new analysis direction
`data/metadata/2025_07_15_metadata_fixed.csv`	ROI-to-group mapping
`data/metadata/2025_07_15_barrier_opening.csv`	Barrier opening times per machine
`scripts/config.py`	Shared path constants for all scripts

Requirements

Python 3.10+
See requirements.txt for packages (numpy, pandas, matplotlib, seaborn, scipy, scikit-learn, jupyter)
Large data files (~370MB CSVs + ~33MB DBs) must be obtained separately

README.md Unescape Escape

Cupido: Drosophila Social Interaction Tracking