Commit graph

30 commits

Author SHA1 Message Date
2623df4172 Picker: identify the analyst (initials) per pick
Each annotation row now carries an `analyst` column. On first visit the
web picker shows a small login modal asking for initials, persists them
in localStorage, and shows the badge in the top-right. Click the badge
to change identities. Submissions without initials are rejected by the
backend (HTTP 400). Skip remains analyst-free.

Backfill: every existing barrier_opening.csv row marked as `GG` since
all current picks were done by Giorgio.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 14:23:57 +01:00
12568b82cc Welcome modal + port 8085
Add a dismissable welcome modal that walks first-time users through the
proper annotation sequence (slider to end → check open ROIs → slider to
start → arrow-key fine-tune → click). Stays hidden after the first
"Got it" via localStorage; the ? button in the header reopens it any
time. Picker keyboard shortcuts are inert while the modal is showing.

Container exposes 8085 instead of 8000 (8000 was free, but Giorgio's
preferred 8082 is already in use on this host; 8085 is the closest
free port). Internal port stays 8000 so the FastAPI app is unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 14:15:42 +01:00
3f0760c98e Picker: simpler keyboard shortcuts (±5 s / ±30 s)
Dropped Ctrl+arrow (±0.1 s) and ,/. frame stepping — too fine for
spotting the barrier opening visually. Shift+arrow now jumps ±30 s
instead of ±1 s, which matches how analysts actually navigate (5 s
for fine, 30 s for skipping ahead). Drag the seekbar if you need
sub-second precision.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 14:11:36 +01:00
53b45e373b Dedupe + canonicalise the merged xlsx, then guard the export
108 of 508 rows in all_video_info_merged.xlsx were duplicates left over
from merging multiple source spreadsheets — same (date, machine, ROI)
appearing under two source_date values, identical data otherwise. The
`male` column was also using a mix of variants ('naïve', 'niave',
'naive', 'trained') with the canonical 'naive' a minority of 12/200.

scripts/cleanup_xlsx.py
    Idempotent one-off: backs up the xlsx, dedupes preferring the row
    whose source_date matches the experiment date, normalises `male`
    spellings, strips whitespace from string columns. Re-running on a
    clean file is a no-op.

scripts/export_video_db_index.py
    New _validate_xlsx() runs first thing in main() and aborts the
    export with an actionable error if duplicates or non-canonical
    male values are present. Prevents silent regressions when the
    xlsx is edited or re-merged in the future.

Result: TSV is now 400 rows (was 508), exactly 200 trained / 200
naive, no duplicates.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 13:39:57 +01:00
4ed988a617 Show experimental metadata above the video in the picker
Each video row now carries a `metadata` dict aggregated from the
merged TSV: species, memory (STM/LTM), training_length_hr,
consolidation_length_hr, age, training/testing date-time, and
trained/naive fly counts. The UI renders these as a row of key:value
pills above the video, with the session role (training/testing)
colour-coded so the analyst can see at a glance what they're picking.

The merged TSV currently has duplicate rows per (date, machine, ROI);
the aggregator de-dups on those keys so counts aren't doubled. (The
duplication itself should be cleaned up upstream.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:54:40 +01:00
1a7542def2 Add barrier_picker_app — Dockerised web picker for barrier opening
A FastAPI app + plain HTML5 video page that replaces the matplotlib
picker. Browse to http://host:8000/, scrub through each video with
arrow keys (±5 s, ±1 s with Shift, ±0.1 s with Ctrl, ±1 frame with
,/.), and click one of three buttons:
  - All barriers open      — every ROI usable
  - Upper barrier opens    — ROIs 1,3,5 usable; lower row marked bad
  - Lower barrier opens    — ROIs 2,4,6 usable; upper row marked bad

The current playhead time is recorded as opening_s; bad_rois is set
accordingly. Also keyboard shortcuts (1/2/3 for the three modes,
s/u for skip/unusable). Refresh-safe: every submission persists to
data/metadata/barrier_opening.csv before advancing.

Server uses byte-range streaming so seeking inside long videos is
fast. Dockerfile + docker-compose.yml mount the data volume RO and
the metadata folder RW.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:33:28 +01:00
24403e0474 Force interactive matplotlib backend in pick_barrier
Some environments default matplotlib to Agg (non-interactive), which
silently no-ops plt.show() — the picker would print "FigureCanvasAgg
is non-interactive" and never display the thumbnail grid. Probe TkAgg
> QtAgg > Qt5Agg > GTK3Agg before pyplot import.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:23:15 +01:00
2b75daa783 Replace fine thumbnail grid with mpv/vlc/ffplay handoff
Watching the video play turns out to be much faster than scanning a
thumbnail grid. The coarse 10-min thumbnail grid still does rough
localisation; after picking, a video player launches at coarse_t-30s
paused with frame-accurate scrubbing controls. The analyst reads the
exact opening time off the player's OSD and types it into the
terminal prompt (default = the coarse pick, so a single Enter keeps
the coarse pick if the player is hard to use).

Backend auto-detects mpv > vlc > ffplay; gracefully degrades to "use
the coarse pick" if no player is installed.

New `bad_rois` column captures non-opening sub-arenas (partial-opening
videos like the 2024-10-21 set where only the lower half opens). The
prompt validates entries are integers in 1..6.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:20:09 +01:00
125f187187 Extend pick_barrier coarse window to 10 min by default
Some videos have late barrier opening (e.g. 5:46) that fell outside
the original 5-min search window. Default coarse-grid span is now
600 s (10 s spacing in the 60-thumb grid). Add --coarse-span CLI flag
to widen further if needed; auto-suggest scans the same 10-min window.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:08:23 +01:00
e8c7f23d4d Replace pick_barrier.py with thumbnail-grid UX
Old version showed inter-fly distance plots and asked the analyst to
click a timeline. The new version reads frames directly from the .mp4
and shows a 10×6 grid of timestamped thumbnails — the analyst just
clicks the frame where the barrier opens.

Two-stage refinement:
  - Coarse grid: 60 thumbs spanning the 5-min search window at ~5 s
    spacing. Pick the rough moment.
  - Fine grid: 60 thumbs at 0.2 s spacing centred on the coarse pick.
    Pick the exact frame.

Auto-detector still feeds the starting position. Sequential video
decode (one cv2 pass through the relevant range) instead of seek-per-
frame, so each grid loads in a few seconds.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 12:01:34 +01:00
b46c4ac1ba Add pick_barrier.py interactive annotator + seed CSV with 2025-07-15
pick_barrier.py loops over every tracked DB referenced by the merged
TSV, plots windowed mean inter-fly distance for all 6 ROIs in a single
figure, and lets the analyst click the moment the barrier opens. Saves
to data/metadata/barrier_opening.csv after each pick (crash-safe).
Auto-detector best-effort guess shown as orange dotted line — the
analyst always has the final say.

Output schema:
    machine_name, session_date, session_time, opening_s, trim_first_s, notes

`trim_first_s` lets us record misframed starts so downstream code can
ignore the affected window. The 5 2025-07-15 entries are seeded from
the original legacy CSV so they're not re-picked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 11:58:54 +01:00
e20530219b Correct barrier-opening times for 16-31-34 and 16-31-41
Both videos have ~60-69 s of misframed data at the start (arena partially
out of frame). The original times (25 s, 20 s) were measured on
ffmpeg-trimmed copies that no longer exist; on the full untrimmed videos
the actual barriers open at 94 s and 89 s respectively. Confirmed by eye.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 11:53:58 +01:00
2e80b834ca Add video duration_s to inventory and propagate to merged TSV
build_video_inventory.py now opens each mp4 with cv2 to record
duration_s. Cached: a video already in the previous inventory keeps
its computed duration, so re-runs only pay the cv2 cost for new
recordings.

export_video_db_index.py looks up the matched video's duration and
writes it as training_video_duration_s / testing_video_duration_s
alongside the existing path columns. Useful for spotting unusually
short or long sessions and for sanity checks on the tracker output.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 11:13:05 +01:00
847d2cbd1b Merge 2025-07-15 batch into the xlsx; tools to detect & re-track
- merge_2025_07_15_into_xlsx.py: pivot the legacy 2025_07_15_metadata_fixed.csv
  into the unified xlsx schema (one row per fly, training_date_time +
  testing_date_time). Backs up the xlsx before writing. 24 new rows
  across machines 076 / 139 / 145 / 268.
- pick_targets.py: --video flag to bypass the inventory's in_xlsx filter,
  so a specific mp4 can be picked outside the normal flow.
- explore_barrier_signal.py: visualises raw y(t), per-frame inter-fly
  distance, and sliding min/mean distance against a known
  barrier-opening time. Used for prototyping the detector.
- detect_barrier_opening.py: per-ROI sliding-window mean-distance
  change-point estimator (median across ROIs). Currently noisy on a
  one-video calibration set; will be re-tuned once the 4 missing
  2025-07-15 videos are re-tracked.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 10:28:25 +01:00
8f3c4ca89c Make flies_analysis_simple robust to bad caches and empty alignment
- Cell 6: raise a clear ValueError if no loaded machine has a barrier-
  opening entry, listing what's loaded vs what's annotated. Previously
  alignment quietly produced empty DataFrames and we crashed five cells
  later with a cryptic KeyError on 'distance'.
- Cell 10: validate the cached CSVs (presence + expected columns +
  non-empty) before using them; fall through to recomputation if not.
  Skip writing the cache when results are empty so future runs don't
  pick up a 1-byte placeholder.
- Cell 3: derive a 'group' column from 'male' so downstream helpers
  that reference fly['group'] still work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:59:34 +01:00
b273255dea Make load_roi_data progress bar refresh reliably in JupyterLab
Prefer tqdm.notebook (HTML widget) over tqdm.auto so JupyterLab gets a
proper updating bar even when its text-mode \r refresh doesn't render
in-place. Tick per session (2× per fly) instead of per fly so the bar
advances roughly every second, and add a postfix showing the current
machine + ROI + session — gives visible motion even on slow rows.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:43:12 +01:00
8abb3d5955 Add tqdm progress bar to load_roi_data
Loading the full batch issues 968 SQL queries and takes minutes — show
a tqdm progress bar (one tick per fly/ROI row) and print an upfront
"this takes 1-3 minutes" notice so the user knows to wait. Uses
tqdm.auto so it picks the Jupyter widget when run from a notebook and
plain text on the CLI. New `progress=True` parameter on load_roi_data,
flip to False for silent batch use. tqdm + ipywidgets added to
requirements.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:34:42 +01:00
ac3b8c13f0 Move personal TSV into repo's data/metadata/ folder
Personal copy of all_video_info_merged.tsv now lives at
~/cupido/data/metadata/all_video_info_merged.tsv (gitignored) instead
of ~/cupido_metadata.tsv. That sits next to the other small metadata
CSVs (barrier_opening, etc.) — the natural home for it. Updated all
five notebooks and processed/README accordingly.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:30:22 +01:00
f08e4b843d Per-user metadata TSV — auto-prefer ~/cupido_metadata.tsv if present
The shared TSV at /mnt/data/projects/cupido/ is read-only inside the
container, so users who want to customize the `include` column (or any
metadata) need a personal copy. Notebooks now check for
~/cupido_metadata.tsv first and fall back to the shared master if it
doesn't exist. Each user keeps their own edits without stepping on
anyone else's analysis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:25:24 +01:00
23050360ea Remove data/raw/ entirely — all bulky data now under /mnt/data/projects/cupido/
Deleted the 5 stale pre-pipeline tracking DBs and the data/raw/ directory.
Dropped DATA_RAW from config.py; build_video_inventory now scans
TRACKING_OUTPUT_DIR for already-tracked sessions. Notebooks no longer
import DATA_RAW. README, PLANNING and todo updated to reflect that the
repo holds only code + small curated metadata, never bulky DBs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:20:25 +01:00
9f3ee24a23 Add per-row include flag to TSV; expand flies_analysis_simple narrative
- export_video_db_index.py now writes a boolean `include` column
  (default True). Flip it to False to drop a noisy/unusable row from
  analysis without deleting it.
- load_roi_data filters on `include` automatically (back-compat:
  missing column = load everything).
- flies_analysis_simple.ipynb section headers now explain *why* each
  step exists (barrier alignment, body-area baseline, merged-blob
  heuristic, Hungarian identity tracking) rather than just naming
  the step.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:09:59 +01:00
723d1f3682 Make data paths visible and explicit in flies_analysis notebooks
Define METADATA_TSV and TRACKED_DBS up front in cell 1, assert they
exist before doing anything else, and pass the loaded metadata to
load_roi_data() explicitly. Surfaces path problems immediately with a
readable message instead of failing deep inside the loader.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:01:55 +01:00
231c7a437f Remove hardcoded /home/gg paths so the project is portable
Notebooks now use Path.home() / "cupido" for the repo root (works for
any user inside the JupyterLab container), and the offline-tracking
scripts read the ethoscope source-tree location from the new
ETHOSCOPE_SRC config constant — defaulting to ~/Code/ethoscope_project/...
and overridable via the ETHOSCOPE_SRC environment variable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 08:55:44 +01:00
5934dce21e Simplify path setup in flies_analysis notebooks
Replace the cryptic Path("..").resolve() walk-up with explicit DATA_DIR
and REPO_ROOT constants, then import the rest of the path constants
(DATA_RAW, DATA_METADATA, DATA_PROCESSED, FIGURES) directly from
scripts/config.py — single source of truth, easier to read for students.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 08:50:11 +01:00
f176224150 Move metadata xlsx/TSV to /mnt/data/projects/cupido/
Consolidates everything bulky (tracking DBs, targets, metadata
spreadsheet) under a single DATA_VOLUME root outside the ownCloud-synced
repo. Notebooks now use a visible DATA_DIR = Path(...) idiom rather than
walking up the filesystem with PROJECT_ROOT.parent — easier for students
with no Python background to follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 08:47:15 +01:00
ec56e51bf9 Add beginner tutorial notebooks for incoming students
Four guided notebooks under notebooks/getting_started/ aimed at someone
new to Python and data science. The series progresses: project orientation
→ Python/pandas crash course → exploring one tracking DB → first
trained-vs-naive comparison using load_roi_data + Mann-Whitney U.

Each notebook leans heavily on markdown explanations, includes exercises
with empty cells, and links out to canonical references (JupyterLab,
official Python tutorial, pandas 10-min guide, Wikipedia for stats
concepts).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 18:14:17 +01:00
7d09523840 Move TARGETS_DIR to /mnt/data/projects/cupido/targets
Targets relocated alongside the tracking DBs (out of ownCloud sync) so
the docker mount already covers them and ownCloud no longer churns on
JSON sidecars. Updated config, fixed a stale docstring in pick_targets,
and dropped the now-moot data/targets/*.json gitignore rule.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 17:13:55 +01:00
f60a9d0530 Unify analysis pipeline around the TSV; move tracked DBs out of cloud sync
- Tracked DBs now live at /mnt/data/projects/cupido/tracked/ (out of
  ownCloud to avoid sync conflicts and bandwidth churn). config.py
  TRACKING_OUTPUT_DIR points there; the docker-compose for ethoscope-lab
  mounts it world-readable for JupyterHub users.
- New scripts/export_video_db_index.py joins all_video_info_merged.xlsx
  with the video inventory and the on-disk DBs, producing a TSV that has
  one row per fly/ROI plus training/testing video and DB paths. Handles
  approximate xlsx times, cross-day training/testing, the 12 AM/PM
  ambiguity, and date typos.
- scripts/load_roi_data.py rewritten as a TSV-driven loader returning a
  single DataFrame with session and metadata columns. calculate_distances
  and the two flies_analysis notebooks migrated to use it; downstream
  trained/naive splits remain available via simple equality filters.
- Metadata vocabulary canonicalized: {naïve, niave, untrained, test} all
  resolve to {trained, naive}. Normalization happens at the TSV-export
  boundary (idempotent); the xlsx and the 2025-07-15 legacy CSV were
  edited in place to remove the worst variants.
- scripts/monitor_tracking.py rate calculation fixed: with N parallel
  workers, completions arrive in bursts; the old formula divided by burst
  width and reported nonsense rates. Now uses a 6 h window denominator.
- scripts/track_videos.py: BGRMovieCamera retries cv2.read on transient
  NFS hiccups and a post-tracking completeness gate (≥ 90 % of expected
  duration via MAX(t) across all 6 ROIs) deletes silent partial DBs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-30 15:20:14 +01:00
e4da7691d5 Add offline tracking pipeline for video backlog
The 2024 video set in all_video_info_merged.xlsx covers 63 (date, machine)
sessions — 129 video instances — that have no auto-detectable targets, so
ROI placement requires manual reference-point selection. This commit adds
the three-stage pipeline that lets a user click for an hour, then walk
away while the tracker grinds overnight:

  1. build_video_inventory.py — scan /mnt/ethoscope_data/videos/ and join
     against the xlsx, producing data/metadata/video_inventory.csv

  2. pick_targets.py — interactive matplotlib/Tk picker. User clicks
     TOP/CORNER/LEFT (the L-shape ethoscope expects); after the third
     click the 6 ROI rectangles are drawn on top of the frame so geometry
     can be verified before saving. Also supports marking a video
     'unusable' (FOV wrong) so it's permanently skipped, frame stepping
     by ±1s/±5%/midpoint, point editing in --redo mode, and a crosshair
     cursor that survives matplotlib's per-motion cursor reset.

  3. track_videos.py — headless batch tracker. Reads the JSON sidecars,
     builds 6 ROIs from the HD-mating-arena geometry, runs MultiFlyTracker
     against the merged.mp4 via MovieVirtualCamera, writes SQLite DBs to
     data/tracked/. Idempotent (skips done DBs), parallel via --jobs,
     subclasses MovieVirtualCamera so frames stay BGR (MultiFlyTracker
     calls cvtColor(BGR2GRAY) without checking channel count).

Plus auto_detect_targets.py (fallback that runs ethoscope's auto-detector
in case any videos do have visible target dots), monitor_tracking.py
(progress + ETA from data/tracked/ ground truth, --watch for live view),
and tracking_geometry.py (single source of truth for the affine math
shared by picker and tracker).

requirements-tracking.txt pins the extra deps (opencv-python, openpyxl,
gitpython, netifaces, mysql-connector-python) — these are only needed
for the tracking pipeline, not the existing analysis notebooks.

Verified end-to-end on one of the user-picked videos: ~4000 rows/ROI in
a 120s slice, fly bounding boxes in the expected 800-2000 px² band.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-27 17:25:26 +01:00
e7e4db264d Initial commit: organized project structure for student handoff
Reorganized flat 41-file directory into structured layout with:
- scripts/ for Python analysis code with shared config.py
- notebooks/ for Jupyter analysis notebooks
- data/ split into raw/, metadata/, processed/
- docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial
- tasks/ with todo checklist and lessons learned
- Comprehensive README, PLANNING.md, and .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 16:08:36 +00:00