Watching the video play turns out to be much faster than scanning a
thumbnail grid. The coarse 10-min thumbnail grid still does rough
localisation; after picking, a video player launches at coarse_t-30s
paused with frame-accurate scrubbing controls. The analyst reads the
exact opening time off the player's OSD and types it into the
terminal prompt (default = the coarse pick, so a single Enter keeps
the coarse pick if the player is hard to use).
Backend auto-detects mpv > vlc > ffplay; gracefully degrades to "use
the coarse pick" if no player is installed.
New `bad_rois` column captures non-opening sub-arenas (partial-opening
videos like the 2024-10-21 set where only the lower half opens). The
prompt validates entries are integers in 1..6.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Some videos have late barrier opening (e.g. 5:46) that fell outside
the original 5-min search window. Default coarse-grid span is now
600 s (10 s spacing in the 60-thumb grid). Add --coarse-span CLI flag
to widen further if needed; auto-suggest scans the same 10-min window.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Old version showed inter-fly distance plots and asked the analyst to
click a timeline. The new version reads frames directly from the .mp4
and shows a 10×6 grid of timestamped thumbnails — the analyst just
clicks the frame where the barrier opens.
Two-stage refinement:
- Coarse grid: 60 thumbs spanning the 5-min search window at ~5 s
spacing. Pick the rough moment.
- Fine grid: 60 thumbs at 0.2 s spacing centred on the coarse pick.
Pick the exact frame.
Auto-detector still feeds the starting position. Sequential video
decode (one cv2 pass through the relevant range) instead of seek-per-
frame, so each grid loads in a few seconds.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
pick_barrier.py loops over every tracked DB referenced by the merged
TSV, plots windowed mean inter-fly distance for all 6 ROIs in a single
figure, and lets the analyst click the moment the barrier opens. Saves
to data/metadata/barrier_opening.csv after each pick (crash-safe).
Auto-detector best-effort guess shown as orange dotted line — the
analyst always has the final say.
Output schema:
machine_name, session_date, session_time, opening_s, trim_first_s, notes
`trim_first_s` lets us record misframed starts so downstream code can
ignore the affected window. The 5 2025-07-15 entries are seeded from
the original legacy CSV so they're not re-picked.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Both videos have ~60-69 s of misframed data at the start (arena partially
out of frame). The original times (25 s, 20 s) were measured on
ffmpeg-trimmed copies that no longer exist; on the full untrimmed videos
the actual barriers open at 94 s and 89 s respectively. Confirmed by eye.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
build_video_inventory.py now opens each mp4 with cv2 to record
duration_s. Cached: a video already in the previous inventory keeps
its computed duration, so re-runs only pay the cv2 cost for new
recordings.
export_video_db_index.py looks up the matched video's duration and
writes it as training_video_duration_s / testing_video_duration_s
alongside the existing path columns. Useful for spotting unusually
short or long sessions and for sanity checks on the tracker output.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- merge_2025_07_15_into_xlsx.py: pivot the legacy 2025_07_15_metadata_fixed.csv
into the unified xlsx schema (one row per fly, training_date_time +
testing_date_time). Backs up the xlsx before writing. 24 new rows
across machines 076 / 139 / 145 / 268.
- pick_targets.py: --video flag to bypass the inventory's in_xlsx filter,
so a specific mp4 can be picked outside the normal flow.
- explore_barrier_signal.py: visualises raw y(t), per-frame inter-fly
distance, and sliding min/mean distance against a known
barrier-opening time. Used for prototyping the detector.
- detect_barrier_opening.py: per-ROI sliding-window mean-distance
change-point estimator (median across ROIs). Currently noisy on a
one-video calibration set; will be re-tuned once the 4 missing
2025-07-15 videos are re-tracked.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Cell 6: raise a clear ValueError if no loaded machine has a barrier-
opening entry, listing what's loaded vs what's annotated. Previously
alignment quietly produced empty DataFrames and we crashed five cells
later with a cryptic KeyError on 'distance'.
- Cell 10: validate the cached CSVs (presence + expected columns +
non-empty) before using them; fall through to recomputation if not.
Skip writing the cache when results are empty so future runs don't
pick up a 1-byte placeholder.
- Cell 3: derive a 'group' column from 'male' so downstream helpers
that reference fly['group'] still work.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Prefer tqdm.notebook (HTML widget) over tqdm.auto so JupyterLab gets a
proper updating bar even when its text-mode \r refresh doesn't render
in-place. Tick per session (2× per fly) instead of per fly so the bar
advances roughly every second, and add a postfix showing the current
machine + ROI + session — gives visible motion even on slow rows.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Loading the full batch issues 968 SQL queries and takes minutes — show
a tqdm progress bar (one tick per fly/ROI row) and print an upfront
"this takes 1-3 minutes" notice so the user knows to wait. Uses
tqdm.auto so it picks the Jupyter widget when run from a notebook and
plain text on the CLI. New `progress=True` parameter on load_roi_data,
flip to False for silent batch use. tqdm + ipywidgets added to
requirements.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Personal copy of all_video_info_merged.tsv now lives at
~/cupido/data/metadata/all_video_info_merged.tsv (gitignored) instead
of ~/cupido_metadata.tsv. That sits next to the other small metadata
CSVs (barrier_opening, etc.) — the natural home for it. Updated all
five notebooks and processed/README accordingly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The shared TSV at /mnt/data/projects/cupido/ is read-only inside the
container, so users who want to customize the `include` column (or any
metadata) need a personal copy. Notebooks now check for
~/cupido_metadata.tsv first and fall back to the shared master if it
doesn't exist. Each user keeps their own edits without stepping on
anyone else's analysis.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Deleted the 5 stale pre-pipeline tracking DBs and the data/raw/ directory.
Dropped DATA_RAW from config.py; build_video_inventory now scans
TRACKING_OUTPUT_DIR for already-tracked sessions. Notebooks no longer
import DATA_RAW. README, PLANNING and todo updated to reflect that the
repo holds only code + small curated metadata, never bulky DBs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- export_video_db_index.py now writes a boolean `include` column
(default True). Flip it to False to drop a noisy/unusable row from
analysis without deleting it.
- load_roi_data filters on `include` automatically (back-compat:
missing column = load everything).
- flies_analysis_simple.ipynb section headers now explain *why* each
step exists (barrier alignment, body-area baseline, merged-blob
heuristic, Hungarian identity tracking) rather than just naming
the step.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Define METADATA_TSV and TRACKED_DBS up front in cell 1, assert they
exist before doing anything else, and pass the loaded metadata to
load_roi_data() explicitly. Surfaces path problems immediately with a
readable message instead of failing deep inside the loader.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Notebooks now use Path.home() / "cupido" for the repo root (works for
any user inside the JupyterLab container), and the offline-tracking
scripts read the ethoscope source-tree location from the new
ETHOSCOPE_SRC config constant — defaulting to ~/Code/ethoscope_project/...
and overridable via the ETHOSCOPE_SRC environment variable.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the cryptic Path("..").resolve() walk-up with explicit DATA_DIR
and REPO_ROOT constants, then import the rest of the path constants
(DATA_RAW, DATA_METADATA, DATA_PROCESSED, FIGURES) directly from
scripts/config.py — single source of truth, easier to read for students.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Consolidates everything bulky (tracking DBs, targets, metadata
spreadsheet) under a single DATA_VOLUME root outside the ownCloud-synced
repo. Notebooks now use a visible DATA_DIR = Path(...) idiom rather than
walking up the filesystem with PROJECT_ROOT.parent — easier for students
with no Python background to follow.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Four guided notebooks under notebooks/getting_started/ aimed at someone
new to Python and data science. The series progresses: project orientation
→ Python/pandas crash course → exploring one tracking DB → first
trained-vs-naive comparison using load_roi_data + Mann-Whitney U.
Each notebook leans heavily on markdown explanations, includes exercises
with empty cells, and links out to canonical references (JupyterLab,
official Python tutorial, pandas 10-min guide, Wikipedia for stats
concepts).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Targets relocated alongside the tracking DBs (out of ownCloud sync) so
the docker mount already covers them and ownCloud no longer churns on
JSON sidecars. Updated config, fixed a stale docstring in pick_targets,
and dropped the now-moot data/targets/*.json gitignore rule.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Tracked DBs now live at /mnt/data/projects/cupido/tracked/ (out of
ownCloud to avoid sync conflicts and bandwidth churn). config.py
TRACKING_OUTPUT_DIR points there; the docker-compose for ethoscope-lab
mounts it world-readable for JupyterHub users.
- New scripts/export_video_db_index.py joins all_video_info_merged.xlsx
with the video inventory and the on-disk DBs, producing a TSV that has
one row per fly/ROI plus training/testing video and DB paths. Handles
approximate xlsx times, cross-day training/testing, the 12 AM/PM
ambiguity, and date typos.
- scripts/load_roi_data.py rewritten as a TSV-driven loader returning a
single DataFrame with session and metadata columns. calculate_distances
and the two flies_analysis notebooks migrated to use it; downstream
trained/naive splits remain available via simple equality filters.
- Metadata vocabulary canonicalized: {naïve, niave, untrained, test} all
resolve to {trained, naive}. Normalization happens at the TSV-export
boundary (idempotent); the xlsx and the 2025-07-15 legacy CSV were
edited in place to remove the worst variants.
- scripts/monitor_tracking.py rate calculation fixed: with N parallel
workers, completions arrive in bursts; the old formula divided by burst
width and reported nonsense rates. Now uses a 6 h window denominator.
- scripts/track_videos.py: BGRMovieCamera retries cv2.read on transient
NFS hiccups and a post-tracking completeness gate (≥ 90 % of expected
duration via MAX(t) across all 6 ROIs) deletes silent partial DBs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 2024 video set in all_video_info_merged.xlsx covers 63 (date, machine)
sessions — 129 video instances — that have no auto-detectable targets, so
ROI placement requires manual reference-point selection. This commit adds
the three-stage pipeline that lets a user click for an hour, then walk
away while the tracker grinds overnight:
1. build_video_inventory.py — scan /mnt/ethoscope_data/videos/ and join
against the xlsx, producing data/metadata/video_inventory.csv
2. pick_targets.py — interactive matplotlib/Tk picker. User clicks
TOP/CORNER/LEFT (the L-shape ethoscope expects); after the third
click the 6 ROI rectangles are drawn on top of the frame so geometry
can be verified before saving. Also supports marking a video
'unusable' (FOV wrong) so it's permanently skipped, frame stepping
by ±1s/±5%/midpoint, point editing in --redo mode, and a crosshair
cursor that survives matplotlib's per-motion cursor reset.
3. track_videos.py — headless batch tracker. Reads the JSON sidecars,
builds 6 ROIs from the HD-mating-arena geometry, runs MultiFlyTracker
against the merged.mp4 via MovieVirtualCamera, writes SQLite DBs to
data/tracked/. Idempotent (skips done DBs), parallel via --jobs,
subclasses MovieVirtualCamera so frames stay BGR (MultiFlyTracker
calls cvtColor(BGR2GRAY) without checking channel count).
Plus auto_detect_targets.py (fallback that runs ethoscope's auto-detector
in case any videos do have visible target dots), monitor_tracking.py
(progress + ETA from data/tracked/ ground truth, --watch for live view),
and tracking_geometry.py (single source of truth for the affine math
shared by picker and tracker).
requirements-tracking.txt pins the extra deps (opencv-python, openpyxl,
gitpython, netifaces, mysql-connector-python) — these are only needed
for the tracking pipeline, not the existing analysis notebooks.
Verified end-to-end on one of the user-picked videos: ~4000 rows/ROI in
a 120s slice, fly bounding boxes in the expected 800-2000 px² band.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reorganized flat 41-file directory into structured layout with:
- scripts/ for Python analysis code with shared config.py
- notebooks/ for Jupyter analysis notebooks
- data/ split into raw/, metadata/, processed/
- docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial
- tasks/ with todo checklist and lessons learned
- Comprehensive README, PLANNING.md, and .gitignore
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>