Remove data/raw/ entirely — all bulky data now under /mnt/data/projects/cupido/

Deleted the 5 stale pre-pipeline tracking DBs and the data/raw/ directory.
Dropped DATA_RAW from config.py; build_video_inventory now scans
TRACKING_OUTPUT_DIR for already-tracked sessions. Notebooks no longer
import DATA_RAW. README, PLANNING and todo updated to reflect that the
repo holds only code + small curated metadata, never bulky DBs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Giorgio Gilestro 2026-05-01 09:20:25 +01:00
parent 9f3ee24a23
commit 23050360ea
9 changed files with 37 additions and 70 deletions

View file

@ -14,9 +14,11 @@ python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Get the data files (not in git - ask lab for copies)
# Place .db files in data/raw/
# Place large .csv files in data/processed/
# Project data lives outside the repo at /mnt/data/projects/cupido/:
# tracked/ → SQLite tracking DBs
# targets/ → target-point JSONs
# all_video_info_merged.{xlsx,tsv} → metadata spreadsheet
# Generated CSVs land in data/processed/ (gitignored).
# Run the main analysis notebook
jupyter notebook notebooks/flies_analysis_simple.ipynb
@ -66,7 +68,7 @@ python scripts/pick_targets.py --redo # re-pick already-picked videos
# 3) batch tracking (idempotent, can run in background)
python scripts/track_videos.py --jobs 4 # parallel
# output → /mnt/data/projects/cupido/tracked/*_tracking.db (SQLite, same schema as data/raw/)
# output → /mnt/data/projects/cupido/tracked/*_tracking.db (SQLite)
```
See `tasks/todo.md` "Offline Tracking" section for the full plan, and
@ -80,9 +82,9 @@ tracking/
├── PLANNING.md # Architecture & conventions
├── requirements.txt # Python dependencies
├── data/
│ ├── raw/ # SQLite tracking databases (gitignored)
│ ├── metadata/ # Experiment metadata CSVs
│ └── processed/ # Generated analysis CSVs (gitignored)
│ ├── metadata/ # Experiment metadata CSVs (small, hand-curated)
│ ├── processed/ # Generated analysis CSVs (gitignored)
│ └── logs/ # Tracker logs (gitignored)
├── scripts/ # Python analysis scripts
│ ├── config.py # Shared path constants
│ ├── load_roi_data.py # Extract data from DBs
@ -107,13 +109,13 @@ tracking/
## Data Pipeline
```
SQLite DBs (data/raw/)
SQLite DBs (/mnt/data/projects/cupido/tracked/) + merged TSV
▼ load_roi_data.py / notebook step 1
ROI CSVs (data/processed/*_roi_data.csv)
scripts/load_roi_data.py
single DataFrame stamped with experimental metadata
▼ notebook steps 2-4
Aligned Distance CSVs (data/processed/*_distances_aligned.csv)
▼ notebooks/flies_analysis_simple.ipynb (steps 24)
Aligned distance CSVs (data/processed/*_distances_aligned.csv)
├──▶ Plots (figures/)
├──▶ Statistical tests