Initial commit: organized project structure for student handoff

Reorganized flat 41-file directory into structured layout with:
- scripts/ for Python analysis code with shared config.py
- notebooks/ for Jupyter analysis notebooks
- data/ split into raw/, metadata/, processed/
- docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial
- tasks/ with todo checklist and lessons learned
- Comprehensive README, PLANNING.md, and .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Giorgio Gilestro 2026-03-05 16:08:36 +00:00
commit e7e4db264d
27 changed files with 3105 additions and 0 deletions

111
README.md Normal file
View file

@ -0,0 +1,111 @@
# Cupido: Drosophila Social Interaction Tracking
Behavioral analysis of trained vs untrained *Drosophila melanogaster* in a barrier-opening social interaction assay. Part of the Cupido project studying learned social behaviors.
## Quick Start
```bash
# Clone the repository
git clone ssh://git@git.lab.gilest.ro:222/lab/cupido.git
cd cupido
# Create virtual environment
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Get the data files (not in git - ask lab for copies)
# Place .db files in data/raw/
# Place large .csv files in data/processed/
# Run the main analysis notebook
jupyter notebook notebooks/flies_analysis_simple.ipynb
```
## Project Overview
### The Experiment
Pairs of flies are placed in chambers (ROIs) separated by a physical barrier. After a configurable delay, the barrier is removed, allowing flies to interact. We track the distance between flies over time to compare social approach behavior between trained (socially experienced) and untrained (naive) groups.
- **3 ethoscope machines**, 5 recording sessions, 6 ROIs each = 30 ROIs with data
- **18 trained ROIs, 18 untrained ROIs** (6 from Machine 139 have no tracking data)
- See `docs/experimental_design.md` for full details
### Current Findings
Aggregate analysis shows statistically significant but **tiny** differences:
- Post-opening distance: Cohen's d = 0.09 (96% distribution overlap)
- Max velocity (50-200s): Cohen's d = 0.14
These effect sizes are inflated by pseudoreplication (230K data points from 18 independent ROIs per group).
### Next Direction: Bimodal Hypothesis
The key insight: not all "trained" flies may have actually learned. The trained group likely contains **true learners** (showing distinct behavior) and **non-learners** (indistinguishable from untrained). Testing this requires per-ROI analysis and bimodality testing.
**Read `docs/bimodal_hypothesis.md` for the detailed analysis plan and code sketches.**
## Folder Structure
```
tracking/
├── README.md # This file
├── PLANNING.md # Architecture & conventions
├── requirements.txt # Python dependencies
├── data/
│ ├── raw/ # SQLite tracking databases (gitignored)
│ ├── metadata/ # Experiment metadata CSVs
│ └── processed/ # Generated analysis CSVs (gitignored)
├── scripts/ # Python analysis scripts
│ ├── config.py # Shared path constants
│ ├── load_roi_data.py # Extract data from DBs
│ ├── calculate_distances.py
│ ├── analyze_distances.py
│ ├── statistical_tests.py
│ ├── ml_classification.py
│ └── plot_*.py # Plotting scripts
├── notebooks/ # Jupyter notebooks
│ ├── flies_analysis_simple.ipynb # Main analysis (use this one)
│ └── flies_analysis.ipynb # Full pipeline from DB extraction
├── figures/ # Generated plots (gitignored)
├── docs/ # Scientific documentation
│ ├── analysis_summary.md
│ ├── bimodal_hypothesis.md
│ └── experimental_design.md
└── tasks/
├── todo.md # Task checklist
└── lessons.md # Pitfalls & patterns
```
## Data Pipeline
```
SQLite DBs (data/raw/)
▼ load_roi_data.py / notebook step 1
ROI CSVs (data/processed/*_roi_data.csv)
▼ notebook steps 2-4
Aligned Distance CSVs (data/processed/*_distances_aligned.csv)
├──▶ Plots (figures/)
├──▶ Statistical tests
└──▶ Identity tracking → Velocity analysis
```
## Key Files
| File | Purpose |
|------|---------|
| `notebooks/flies_analysis_simple.ipynb` | **Start here** - main analysis notebook |
| `docs/bimodal_hypothesis.md` | **Read next** - the new analysis direction |
| `data/metadata/2025_07_15_metadata_fixed.csv` | ROI-to-group mapping |
| `data/metadata/2025_07_15_barrier_opening.csv` | Barrier opening times per machine |
| `scripts/config.py` | Shared path constants for all scripts |
## Requirements
- Python 3.10+
- See `requirements.txt` for packages (numpy, pandas, matplotlib, seaborn, scipy, scikit-learn, jupyter)
- Large data files (~370MB CSVs + ~33MB DBs) must be obtained separately