Commit graph

23 commits

Author SHA1 Message Date
2b75daa783 Replace fine thumbnail grid with mpv/vlc/ffplay handoff
Watching the video play turns out to be much faster than scanning a
thumbnail grid. The coarse 10-min thumbnail grid still does rough
localisation; after picking, a video player launches at coarse_t-30s
paused with frame-accurate scrubbing controls. The analyst reads the
exact opening time off the player's OSD and types it into the
terminal prompt (default = the coarse pick, so a single Enter keeps
the coarse pick if the player is hard to use).

Backend auto-detects mpv > vlc > ffplay; gracefully degrades to "use
the coarse pick" if no player is installed.

New `bad_rois` column captures non-opening sub-arenas (partial-opening
videos like the 2024-10-21 set where only the lower half opens). The
prompt validates entries are integers in 1..6.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:20:09 +01:00
125f187187 Extend pick_barrier coarse window to 10 min by default
Some videos have late barrier opening (e.g. 5:46) that fell outside
the original 5-min search window. Default coarse-grid span is now
600 s (10 s spacing in the 60-thumb grid). Add --coarse-span CLI flag
to widen further if needed; auto-suggest scans the same 10-min window.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:08:23 +01:00
e8c7f23d4d Replace pick_barrier.py with thumbnail-grid UX
Old version showed inter-fly distance plots and asked the analyst to
click a timeline. The new version reads frames directly from the .mp4
and shows a 10×6 grid of timestamped thumbnails — the analyst just
clicks the frame where the barrier opens.

Two-stage refinement:
  - Coarse grid: 60 thumbs spanning the 5-min search window at ~5 s
    spacing. Pick the rough moment.
  - Fine grid: 60 thumbs at 0.2 s spacing centred on the coarse pick.
    Pick the exact frame.

Auto-detector still feeds the starting position. Sequential video
decode (one cv2 pass through the relevant range) instead of seek-per-
frame, so each grid loads in a few seconds.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:01:34 +01:00
b46c4ac1ba Add pick_barrier.py interactive annotator + seed CSV with 2025-07-15
pick_barrier.py loops over every tracked DB referenced by the merged
TSV, plots windowed mean inter-fly distance for all 6 ROIs in a single
figure, and lets the analyst click the moment the barrier opens. Saves
to data/metadata/barrier_opening.csv after each pick (crash-safe).
Auto-detector best-effort guess shown as orange dotted line — the
analyst always has the final say.

Output schema:
    machine_name, session_date, session_time, opening_s, trim_first_s, notes

`trim_first_s` lets us record misframed starts so downstream code can
ignore the affected window. The 5 2025-07-15 entries are seeded from
the original legacy CSV so they're not re-picked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 11:58:54 +01:00
e20530219b Correct barrier-opening times for 16-31-34 and 16-31-41
Both videos have ~60-69 s of misframed data at the start (arena partially
out of frame). The original times (25 s, 20 s) were measured on
ffmpeg-trimmed copies that no longer exist; on the full untrimmed videos
the actual barriers open at 94 s and 89 s respectively. Confirmed by eye.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 11:53:58 +01:00
2e80b834ca Add video duration_s to inventory and propagate to merged TSV
build_video_inventory.py now opens each mp4 with cv2 to record
duration_s. Cached: a video already in the previous inventory keeps
its computed duration, so re-runs only pay the cv2 cost for new
recordings.

export_video_db_index.py looks up the matched video's duration and
writes it as training_video_duration_s / testing_video_duration_s
alongside the existing path columns. Useful for spotting unusually
short or long sessions and for sanity checks on the tracker output.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 11:13:05 +01:00
847d2cbd1b Merge 2025-07-15 batch into the xlsx; tools to detect & re-track
- merge_2025_07_15_into_xlsx.py: pivot the legacy 2025_07_15_metadata_fixed.csv
  into the unified xlsx schema (one row per fly, training_date_time +
  testing_date_time). Backs up the xlsx before writing. 24 new rows
  across machines 076 / 139 / 145 / 268.
- pick_targets.py: --video flag to bypass the inventory's in_xlsx filter,
  so a specific mp4 can be picked outside the normal flow.
- explore_barrier_signal.py: visualises raw y(t), per-frame inter-fly
  distance, and sliding min/mean distance against a known
  barrier-opening time. Used for prototyping the detector.
- detect_barrier_opening.py: per-ROI sliding-window mean-distance
  change-point estimator (median across ROIs). Currently noisy on a
  one-video calibration set; will be re-tuned once the 4 missing
  2025-07-15 videos are re-tracked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 10:28:25 +01:00
8f3c4ca89c Make flies_analysis_simple robust to bad caches and empty alignment
- Cell 6: raise a clear ValueError if no loaded machine has a barrier-
  opening entry, listing what's loaded vs what's annotated. Previously
  alignment quietly produced empty DataFrames and we crashed five cells
  later with a cryptic KeyError on 'distance'.
- Cell 10: validate the cached CSVs (presence + expected columns +
  non-empty) before using them; fall through to recomputation if not.
  Skip writing the cache when results are empty so future runs don't
  pick up a 1-byte placeholder.
- Cell 3: derive a 'group' column from 'male' so downstream helpers
  that reference fly['group'] still work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:59:34 +01:00
b273255dea Make load_roi_data progress bar refresh reliably in JupyterLab
Prefer tqdm.notebook (HTML widget) over tqdm.auto so JupyterLab gets a
proper updating bar even when its text-mode \r refresh doesn't render
in-place. Tick per session (2× per fly) instead of per fly so the bar
advances roughly every second, and add a postfix showing the current
machine + ROI + session — gives visible motion even on slow rows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:43:12 +01:00
8abb3d5955 Add tqdm progress bar to load_roi_data
Loading the full batch issues 968 SQL queries and takes minutes — show
a tqdm progress bar (one tick per fly/ROI row) and print an upfront
"this takes 1-3 minutes" notice so the user knows to wait. Uses
tqdm.auto so it picks the Jupyter widget when run from a notebook and
plain text on the CLI. New `progress=True` parameter on load_roi_data,
flip to False for silent batch use. tqdm + ipywidgets added to
requirements.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:34:42 +01:00
ac3b8c13f0 Move personal TSV into repo's data/metadata/ folder
Personal copy of all_video_info_merged.tsv now lives at
~/cupido/data/metadata/all_video_info_merged.tsv (gitignored) instead
of ~/cupido_metadata.tsv. That sits next to the other small metadata
CSVs (barrier_opening, etc.) — the natural home for it. Updated all
five notebooks and processed/README accordingly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:30:22 +01:00
f08e4b843d Per-user metadata TSV — auto-prefer ~/cupido_metadata.tsv if present
The shared TSV at /mnt/data/projects/cupido/ is read-only inside the
container, so users who want to customize the `include` column (or any
metadata) need a personal copy. Notebooks now check for
~/cupido_metadata.tsv first and fall back to the shared master if it
doesn't exist. Each user keeps their own edits without stepping on
anyone else's analysis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:25:24 +01:00
23050360ea Remove data/raw/ entirely — all bulky data now under /mnt/data/projects/cupido/
Deleted the 5 stale pre-pipeline tracking DBs and the data/raw/ directory.
Dropped DATA_RAW from config.py; build_video_inventory now scans
TRACKING_OUTPUT_DIR for already-tracked sessions. Notebooks no longer
import DATA_RAW. README, PLANNING and todo updated to reflect that the
repo holds only code + small curated metadata, never bulky DBs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:20:25 +01:00
9f3ee24a23 Add per-row include flag to TSV; expand flies_analysis_simple narrative
- export_video_db_index.py now writes a boolean `include` column
  (default True). Flip it to False to drop a noisy/unusable row from
  analysis without deleting it.
- load_roi_data filters on `include` automatically (back-compat:
  missing column = load everything).
- flies_analysis_simple.ipynb section headers now explain *why* each
  step exists (barrier alignment, body-area baseline, merged-blob
  heuristic, Hungarian identity tracking) rather than just naming
  the step.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:09:59 +01:00
723d1f3682 Make data paths visible and explicit in flies_analysis notebooks
Define METADATA_TSV and TRACKED_DBS up front in cell 1, assert they
exist before doing anything else, and pass the loaded metadata to
load_roi_data() explicitly. Surfaces path problems immediately with a
readable message instead of failing deep inside the loader.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:01:55 +01:00
231c7a437f Remove hardcoded /home/gg paths so the project is portable
Notebooks now use Path.home() / "cupido" for the repo root (works for
any user inside the JupyterLab container), and the offline-tracking
scripts read the ethoscope source-tree location from the new
ETHOSCOPE_SRC config constant — defaulting to ~/Code/ethoscope_project/...
and overridable via the ETHOSCOPE_SRC environment variable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 08:55:44 +01:00
5934dce21e Simplify path setup in flies_analysis notebooks
Replace the cryptic Path("..").resolve() walk-up with explicit DATA_DIR
and REPO_ROOT constants, then import the rest of the path constants
(DATA_RAW, DATA_METADATA, DATA_PROCESSED, FIGURES) directly from
scripts/config.py — single source of truth, easier to read for students.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 08:50:11 +01:00
f176224150 Move metadata xlsx/TSV to /mnt/data/projects/cupido/
Consolidates everything bulky (tracking DBs, targets, metadata
spreadsheet) under a single DATA_VOLUME root outside the ownCloud-synced
repo. Notebooks now use a visible DATA_DIR = Path(...) idiom rather than
walking up the filesystem with PROJECT_ROOT.parent — easier for students
with no Python background to follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 08:47:15 +01:00
ec56e51bf9 Add beginner tutorial notebooks for incoming students
Four guided notebooks under notebooks/getting_started/ aimed at someone
new to Python and data science. The series progresses: project orientation
→ Python/pandas crash course → exploring one tracking DB → first
trained-vs-naive comparison using load_roi_data + Mann-Whitney U.

Each notebook leans heavily on markdown explanations, includes exercises
with empty cells, and links out to canonical references (JupyterLab,
official Python tutorial, pandas 10-min guide, Wikipedia for stats
concepts).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 18:14:17 +01:00
7d09523840 Move TARGETS_DIR to /mnt/data/projects/cupido/targets
Targets relocated alongside the tracking DBs (out of ownCloud sync) so
the docker mount already covers them and ownCloud no longer churns on
JSON sidecars. Updated config, fixed a stale docstring in pick_targets,
and dropped the now-moot data/targets/*.json gitignore rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 17:13:55 +01:00
f60a9d0530 Unify analysis pipeline around the TSV; move tracked DBs out of cloud sync
- Tracked DBs now live at /mnt/data/projects/cupido/tracked/ (out of
  ownCloud to avoid sync conflicts and bandwidth churn). config.py
  TRACKING_OUTPUT_DIR points there; the docker-compose for ethoscope-lab
  mounts it world-readable for JupyterHub users.
- New scripts/export_video_db_index.py joins all_video_info_merged.xlsx
  with the video inventory and the on-disk DBs, producing a TSV that has
  one row per fly/ROI plus training/testing video and DB paths. Handles
  approximate xlsx times, cross-day training/testing, the 12 AM/PM
  ambiguity, and date typos.
- scripts/load_roi_data.py rewritten as a TSV-driven loader returning a
  single DataFrame with session and metadata columns. calculate_distances
  and the two flies_analysis notebooks migrated to use it; downstream
  trained/naive splits remain available via simple equality filters.
- Metadata vocabulary canonicalized: {naïve, niave, untrained, test} all
  resolve to {trained, naive}. Normalization happens at the TSV-export
  boundary (idempotent); the xlsx and the 2025-07-15 legacy CSV were
  edited in place to remove the worst variants.
- scripts/monitor_tracking.py rate calculation fixed: with N parallel
  workers, completions arrive in bursts; the old formula divided by burst
  width and reported nonsense rates. Now uses a 6 h window denominator.
- scripts/track_videos.py: BGRMovieCamera retries cv2.read on transient
  NFS hiccups and a post-tracking completeness gate (≥ 90 % of expected
  duration via MAX(t) across all 6 ROIs) deletes silent partial DBs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 15:20:14 +01:00
e4da7691d5 Add offline tracking pipeline for video backlog
The 2024 video set in all_video_info_merged.xlsx covers 63 (date, machine)
sessions — 129 video instances — that have no auto-detectable targets, so
ROI placement requires manual reference-point selection. This commit adds
the three-stage pipeline that lets a user click for an hour, then walk
away while the tracker grinds overnight:

  1. build_video_inventory.py — scan /mnt/ethoscope_data/videos/ and join
     against the xlsx, producing data/metadata/video_inventory.csv

  2. pick_targets.py — interactive matplotlib/Tk picker. User clicks
     TOP/CORNER/LEFT (the L-shape ethoscope expects); after the third
     click the 6 ROI rectangles are drawn on top of the frame so geometry
     can be verified before saving. Also supports marking a video
     'unusable' (FOV wrong) so it's permanently skipped, frame stepping
     by ±1s/±5%/midpoint, point editing in --redo mode, and a crosshair
     cursor that survives matplotlib's per-motion cursor reset.

  3. track_videos.py — headless batch tracker. Reads the JSON sidecars,
     builds 6 ROIs from the HD-mating-arena geometry, runs MultiFlyTracker
     against the merged.mp4 via MovieVirtualCamera, writes SQLite DBs to
     data/tracked/. Idempotent (skips done DBs), parallel via --jobs,
     subclasses MovieVirtualCamera so frames stay BGR (MultiFlyTracker
     calls cvtColor(BGR2GRAY) without checking channel count).

Plus auto_detect_targets.py (fallback that runs ethoscope's auto-detector
in case any videos do have visible target dots), monitor_tracking.py
(progress + ETA from data/tracked/ ground truth, --watch for live view),
and tracking_geometry.py (single source of truth for the affine math
shared by picker and tracker).

requirements-tracking.txt pins the extra deps (opencv-python, openpyxl,
gitpython, netifaces, mysql-connector-python) — these are only needed
for the tracking pipeline, not the existing analysis notebooks.

Verified end-to-end on one of the user-picked videos: ~4000 rows/ROI in
a 120s slice, fly bounding boxes in the expected 800-2000 px² band.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 17:25:26 +01:00
e7e4db264d Initial commit: organized project structure for student handoff
Reorganized flat 41-file directory into structured layout with:
- scripts/ for Python analysis code with shared config.py
- notebooks/ for Jupyter analysis notebooks
- data/ split into raw/, metadata/, processed/
- docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial
- tasks/ with todo checklist and lessons learned
- Comprehensive README, PLANNING.md, and .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 16:08:36 +00:00