cupido/tasks/todo.md
Giorgio e7e4db264d Initial commit: organized project structure for student handoff
Reorganized flat 41-file directory into structured layout with:
- scripts/ for Python analysis code with shared config.py
- notebooks/ for Jupyter analysis notebooks
- data/ split into raw/, metadata/, processed/
- docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial
- tasks/ with todo checklist and lessons learned
- Comprehensive README, PLANNING.md, and .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 16:08:36 +00:00

2.3 KiB

Task List

Completed Work

  • Extract ROI data from SQLite databases grouped by trained/untrained
  • Calculate inter-fly distances at each time point
  • Align data to barrier opening time (t=0)
  • Plot average distance over time (entire experiment + 300s window)
  • Track fly identities across frames (Hungarian algorithm)
  • Calculate max velocity over 10-second moving windows
  • Statistical tests (t-tests, Cohen's d) comparing groups
  • ML classification attempt (Logistic Regression, Random Forest)
  • Clustering analysis (K-means)
  • Organize project structure for student handoff

Priority: Bimodal Hypothesis Analysis

See docs/bimodal_hypothesis.md for detailed methodology.

Phase 1: Per-ROI Feature Extraction

  • Compute per-ROI summary statistics from aligned distance data
    • Mean distance post-opening (0-300s)
    • Median distance post-opening
    • Fraction of time at distance < 50px ("close proximity")
    • Mean max velocity post-opening
  • Create a summary DataFrame with N=18 trained + N=18 untrained rows
  • Note: Only 30 ROIs have data (Machine 139 missing = 6 ROIs lost)

Phase 2: Distribution Visualization

  • Plot histograms/KDE of per-ROI metrics for each group
  • Look for bimodality in trained group vs unimodality in untrained

Phase 3: Formal Bimodality Testing

  • Hartigan's dip test on trained per-ROI distributions
  • Fit Gaussian Mixture Models (1 vs 2 components) to trained data
  • Compare BIC scores to determine optimal number of components

Phase 4: Subgroup Identification

  • If bimodal: classify trained ROIs as "learner" vs "non-learner" using GMM posteriors
  • Compare learner subgroup vs untrained group (expect larger effect size)

Phase 5: Effect Size Re-estimation

  • Mann-Whitney U test (appropriate for small N)
  • Bootstrap confidence intervals for effect sizes
  • Account for session as random effect

Maintenance Items

  • Investigate missing Machine 139 data (has metadata but no tracking DB)
  • Add diptest to requirements.txt when starting bimodal analysis
  • Consider converting pixel distances to physical units (need calibration)
  • The second notebook (flies_analysis.ipynb) re-runs from DB extraction - consider deprecating

Discovered During Work

(Add new items here as they come up during analysis)