Unify analysis pipeline around the TSV; move tracked DBs out of cloud sync
- Tracked DBs now live at /mnt/data/projects/cupido/tracked/ (out of
ownCloud to avoid sync conflicts and bandwidth churn). config.py
TRACKING_OUTPUT_DIR points there; the docker-compose for ethoscope-lab
mounts it world-readable for JupyterHub users.
- New scripts/export_video_db_index.py joins all_video_info_merged.xlsx
with the video inventory and the on-disk DBs, producing a TSV that has
one row per fly/ROI plus training/testing video and DB paths. Handles
approximate xlsx times, cross-day training/testing, the 12 AM/PM
ambiguity, and date typos.
- scripts/load_roi_data.py rewritten as a TSV-driven loader returning a
single DataFrame with session and metadata columns. calculate_distances
and the two flies_analysis notebooks migrated to use it; downstream
trained/naive splits remain available via simple equality filters.
- Metadata vocabulary canonicalized: {naïve, niave, untrained, test} all
resolve to {trained, naive}. Normalization happens at the TSV-export
boundary (idempotent); the xlsx and the 2025-07-15 legacy CSV were
edited in place to remove the worst variants.
- scripts/monitor_tracking.py rate calculation fixed: with N parallel
workers, completions arrive in bursts; the old formula divided by burst
width and reported nonsense rates. Now uses a 6 h window denominator.
- scripts/track_videos.py: BGRMovieCamera retries cv2.read on transient
NFS hiccups and a post-tracking completeness gate (≥ 90 % of expected
duration via MAX(t) across all 6 ROIs) deletes silent partial DBs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
e4da7691d5
commit
f60a9d0530
13 changed files with 569 additions and 237 deletions
|
|
@ -97,13 +97,32 @@ def snapshot() -> str:
|
|||
)
|
||||
lines.append(f" errors in log: {len(errors)}")
|
||||
|
||||
# Rate from the last 10 completions, when available.
|
||||
if len(history) >= 2:
|
||||
window = history[-min(10, len(history)) :]
|
||||
span = window[-1] - window[0]
|
||||
if span > 0:
|
||||
rate_per_hour = (len(window) - 1) / span * 3600
|
||||
lines.append(f" rate (last {len(window) - 1}): {rate_per_hour:.1f} videos/hour")
|
||||
# Rate from completions in the last 6 h — robust to gaps from killed /
|
||||
# restarted runs, while wide enough to span multiple parallel-worker
|
||||
# completion bursts. Reason: with 8 workers all started together on
|
||||
# multi-hour videos, completions arrive in tight bursts every ~video-
|
||||
# length apart; a 30-min window catches one burst and overestimates by
|
||||
# ~10×. 6 h spans at least one full burst cycle for typical videos.
|
||||
now_ts = time.time()
|
||||
window_secs = 6 * 3600
|
||||
recent = [t for t in history if t >= now_ts - window_secs]
|
||||
if len(recent) >= 2:
|
||||
# Reason: with N parallel workers, completions arrive in clumps
|
||||
# (all workers finish near-simultaneously). Dividing N by the *burst*
|
||||
# span gives nonsense rates. Use the full window as the denominator
|
||||
# once the batch has been running long enough to fill it; otherwise
|
||||
# use elapsed-since-first-DB. Detection: if every DB on disk also
|
||||
# falls inside the window, the batch is younger than the window.
|
||||
if len(recent) == len(history):
|
||||
elapsed = max(1.0, now_ts - history[0])
|
||||
else:
|
||||
elapsed = float(window_secs)
|
||||
if elapsed > 0:
|
||||
rate_per_hour = len(recent) / elapsed * 3600
|
||||
lines.append(
|
||||
f" rate (last {len(recent)} in {int(window_secs/3600)} h):"
|
||||
f" {rate_per_hour:.1f} videos/hour"
|
||||
)
|
||||
remaining = max(0, pickable - tracked)
|
||||
if rate_per_hour > 0 and remaining > 0:
|
||||
eta_sec = remaining * 3600 / rate_per_hour
|
||||
|
|
@ -112,6 +131,8 @@ def snapshot() -> str:
|
|||
f" ETA remaining: {fmt_duration(eta_sec)} "
|
||||
f"(done by {eta_at:%H:%M %a})"
|
||||
)
|
||||
else:
|
||||
lines.append(" rate: (warming up — check again in a few min)")
|
||||
|
||||
if last_mtime is not None and last_name is not None:
|
||||
ago = (datetime.now() - last_mtime).total_seconds()
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue