Initial commit: organized project structure for student handoff
Reorganized flat 41-file directory into structured layout with: - scripts/ for Python analysis code with shared config.py - notebooks/ for Jupyter analysis notebooks - data/ split into raw/, metadata/, processed/ - docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial - tasks/ with todo checklist and lessons learned - Comprehensive README, PLANNING.md, and .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
commit
e7e4db264d
27 changed files with 3105 additions and 0 deletions
45
PLANNING.md
Normal file
45
PLANNING.md
Normal file
|
|
@ -0,0 +1,45 @@
|
|||
# Planning & Architecture
|
||||
|
||||
## Project Overview
|
||||
|
||||
Drosophila behavioral tracking analysis for the Cupido project. Compares social interaction patterns (inter-fly distance, velocity) between trained and untrained flies using a barrier-opening assay recorded on ethoscope platforms.
|
||||
|
||||
## Architecture
|
||||
|
||||
**Pipeline-based**: Raw SQLite DBs -> ROI extraction -> distance calculation -> time alignment -> statistical analysis / visualization.
|
||||
|
||||
**Stack**: Python 3.10+, pandas, scipy, scikit-learn, matplotlib/seaborn, Jupyter.
|
||||
|
||||
## Code Conventions
|
||||
|
||||
- **PEP8** formatting, Google-style docstrings
|
||||
- **Type hints** on function signatures
|
||||
- **Time units**: milliseconds in all data (DB stores ms, barrier CSV stores seconds but is converted to ms on load)
|
||||
- **Distance units**: pixels (no conversion to physical units)
|
||||
- **Path management**: All scripts import from `scripts/config.py` for consistent paths
|
||||
- **Notebooks**: Use `Path("..")` relative paths from `notebooks/` directory
|
||||
|
||||
## Key Caveats
|
||||
|
||||
- **Pseudoreplication**: True N = 18 ROIs per group (not 230K data points). Statistical tests on individual data points are inflated.
|
||||
- **Tiny effect sizes**: Cohen's d ~ 0.09 for distance, ~0.14 for velocity. Statistically significant only due to massive sample size.
|
||||
- **Missing data**: Machine 139 (6 ROIs) has metadata but no tracking DB or barrier opening time.
|
||||
- **Machine name type mismatch**: Metadata stores as int (76), barrier CSV stores as int (076). Must convert to string for matching.
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
tracking/
|
||||
├── data/raw/ # SQLite DBs (gitignored)
|
||||
├── data/metadata/ # Small CSVs (tracked)
|
||||
├── data/processed/ # Large generated CSVs (gitignored)
|
||||
├── scripts/ # Python scripts with config.py imports
|
||||
├── notebooks/ # Jupyter analysis notebooks
|
||||
├── figures/ # Generated plots (gitignored)
|
||||
├── docs/ # Scientific documentation
|
||||
└── tasks/ # Task tracking
|
||||
```
|
||||
|
||||
## Next Direction
|
||||
|
||||
The primary next step is testing the **bimodal hypothesis** - see `docs/bimodal_hypothesis.md` for the full plan. The core idea: aggregate analysis fails because the trained group likely contains both true learners and non-learners, diluting the signal.
|
||||
Loading…
Add table
Add a link
Reference in a new issue