# Task List ## Completed Work - [x] Extract ROI data from SQLite databases grouped by trained/untrained - [x] Calculate inter-fly distances at each time point - [x] Align data to barrier opening time (t=0) - [x] Plot average distance over time (entire experiment + 300s window) - [x] Track fly identities across frames (Hungarian algorithm) - [x] Calculate max velocity over 10-second moving windows - [x] Statistical tests (t-tests, Cohen's d) comparing groups - [x] ML classification attempt (Logistic Regression, Random Forest) - [x] Clustering analysis (K-means) - [x] Organize project structure for student handoff ## Priority: Bimodal Hypothesis Analysis See `docs/bimodal_hypothesis.md` for detailed methodology. ### Phase 1: Per-ROI Feature Extraction - [ ] Compute per-ROI summary statistics from aligned distance data - Mean distance post-opening (0-300s) - Median distance post-opening - Fraction of time at distance < 50px ("close proximity") - Mean max velocity post-opening - [ ] Create a summary DataFrame with N=18 trained + N=18 untrained rows - [ ] **Note**: Only 30 ROIs have data (Machine 139 missing = 6 ROIs lost) ### Phase 2: Distribution Visualization - [ ] Plot histograms/KDE of per-ROI metrics for each group - [ ] Look for bimodality in trained group vs unimodality in untrained ### Phase 3: Formal Bimodality Testing - [ ] Hartigan's dip test on trained per-ROI distributions - [ ] Fit Gaussian Mixture Models (1 vs 2 components) to trained data - [ ] Compare BIC scores to determine optimal number of components ### Phase 4: Subgroup Identification - [ ] If bimodal: classify trained ROIs as "learner" vs "non-learner" using GMM posteriors - [ ] Compare learner subgroup vs untrained group (expect larger effect size) ### Phase 5: Effect Size Re-estimation - [ ] Mann-Whitney U test (appropriate for small N) - [ ] Bootstrap confidence intervals for effect sizes - [ ] Account for session as random effect ## Maintenance Items - [ ] Investigate missing Machine 139 data (has metadata but no tracking DB) - [ ] Add `diptest` to requirements.txt when starting bimodal analysis - [ ] Consider converting pixel distances to physical units (need calibration) - [ ] The second notebook (`flies_analysis.ipynb`) re-runs from DB extraction - consider deprecating ## Discovered During Work (Add new items here as they come up during analysis)