# Processed Data Large CSV files generated from the analysis pipeline. All files are gitignored (~370MB total) and can be regenerated. ## Files and Regeneration | File | Description | Generated By | |------|-------------|--------------| | `trained_roi_data.csv` | Raw tracking data for trained ROIs | `scripts/load_roi_data.py` or notebook step 1 | | `untrained_roi_data.csv` | Raw tracking data for untrained ROIs | `scripts/load_roi_data.py` or notebook step 1 | | `trained_distances.csv` | Pairwise distances (unaligned) | `scripts/calculate_distances.py` | | `untrained_distances.csv` | Pairwise distances (unaligned) | `scripts/calculate_distances.py` | | `trained_distances_aligned.csv` | Distances aligned to barrier opening | Notebook step 4 | | `untrained_distances_aligned.csv` | Distances aligned to barrier opening | Notebook step 4 | | `trained_tracked.csv` | Identity-tracked fly positions | Notebook step 7 | | `untrained_tracked.csv` | Identity-tracked fly positions | Notebook step 7 | | `trained_max_velocity.csv` | Max velocity over 10s windows | Notebook step 7 | | `untrained_max_velocity.csv` | Max velocity over 10s windows | Notebook step 7 | ## To Regenerate All Data Run the full notebook `notebooks/flies_analysis_simple.ipynb` with: ```python recalculate_distances = True recalculate_tracking = True ``` **Warning**: Identity tracking and velocity calculations take significant time (~30+ minutes). ## Column Reference ### Distance CSVs (`*_distances_aligned.csv`) - `machine_name`: Ethoscope machine ID (string) - `ROI`: ROI number (1-6) - `aligned_time`: Time in ms relative to barrier opening (0 = opening) - `distance`: Euclidean distance between flies in pixels - `n_flies`: Number of flies detected at this time point - `area_fly1`, `area_fly2`: Bounding box areas (w*h) in pixels^2 - `group`: "trained" or "untrained"