Reorganized flat 41-file directory into structured layout with: - scripts/ for Python analysis code with shared config.py - notebooks/ for Jupyter analysis notebooks - data/ split into raw/, metadata/, processed/ - docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial - tasks/ with todo checklist and lessons learned - Comprehensive README, PLANNING.md, and .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2.3 KiB
2.3 KiB
Task List
Completed Work
- Extract ROI data from SQLite databases grouped by trained/untrained
- Calculate inter-fly distances at each time point
- Align data to barrier opening time (t=0)
- Plot average distance over time (entire experiment + 300s window)
- Track fly identities across frames (Hungarian algorithm)
- Calculate max velocity over 10-second moving windows
- Statistical tests (t-tests, Cohen's d) comparing groups
- ML classification attempt (Logistic Regression, Random Forest)
- Clustering analysis (K-means)
- Organize project structure for student handoff
Priority: Bimodal Hypothesis Analysis
See docs/bimodal_hypothesis.md for detailed methodology.
Phase 1: Per-ROI Feature Extraction
- Compute per-ROI summary statistics from aligned distance data
- Mean distance post-opening (0-300s)
- Median distance post-opening
- Fraction of time at distance < 50px ("close proximity")
- Mean max velocity post-opening
- Create a summary DataFrame with N=18 trained + N=18 untrained rows
- Note: Only 30 ROIs have data (Machine 139 missing = 6 ROIs lost)
Phase 2: Distribution Visualization
- Plot histograms/KDE of per-ROI metrics for each group
- Look for bimodality in trained group vs unimodality in untrained
Phase 3: Formal Bimodality Testing
- Hartigan's dip test on trained per-ROI distributions
- Fit Gaussian Mixture Models (1 vs 2 components) to trained data
- Compare BIC scores to determine optimal number of components
Phase 4: Subgroup Identification
- If bimodal: classify trained ROIs as "learner" vs "non-learner" using GMM posteriors
- Compare learner subgroup vs untrained group (expect larger effect size)
Phase 5: Effect Size Re-estimation
- Mann-Whitney U test (appropriate for small N)
- Bootstrap confidence intervals for effect sizes
- Account for session as random effect
Maintenance Items
- Investigate missing Machine 139 data (has metadata but no tracking DB)
- Add
diptestto requirements.txt when starting bimodal analysis - Consider converting pixel distances to physical units (need calibration)
- The second notebook (
flies_analysis.ipynb) re-runs from DB extraction - consider deprecating
Discovered During Work
(Add new items here as they come up during analysis)