Per-user metadata TSV — auto-prefer ~/cupido_metadata.tsv if present
The shared TSV at /mnt/data/projects/cupido/ is read-only inside the container, so users who want to customize the `include` column (or any metadata) need a personal copy. Notebooks now check for ~/cupido_metadata.tsv first and fall back to the shared master if it doesn't exist. Each user keeps their own edits without stepping on anyone else's analysis. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
23050360ea
commit
f08e4b843d
6 changed files with 17 additions and 14 deletions
|
|
@ -10,12 +10,12 @@
|
|||
"execution_count": null,
|
||||
"metadata": {},
|
||||
"outputs": [],
|
||||
"source": "import sys\nfrom pathlib import Path\n\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nfrom scipy.spatial.distance import euclidean\nfrom scipy import stats\n\n# ─── Where the data lives ────────────────────────────────────────────────\n# DATA_DIR holds everything bulky/regenerable: the metadata TSV and the\n# tracking SQLite DBs. It's mounted into the container at this fixed path.\n# REPO_ROOT is your checkout of the cupido repo, in your home directory.\n# Path.home() expands to /home/<your-username>, so this works for any\n# user (no hard-coded usernames).\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\nREPO_ROOT = Path.home() / \"cupido\"\n\nMETADATA_TSV = DATA_DIR / \"all_video_info_merged.tsv\"\nTRACKED_DBS = DATA_DIR / \"tracked\"\n\n# Sanity-check the data location up front so any failure here points at\n# the obvious thing — rather than crashing inside load_roi_data later.\nassert METADATA_TSV.exists(), f\"Metadata TSV not found at {METADATA_TSV}\"\nassert TRACKED_DBS.is_dir(), f\"Tracked-DB directory not found at {TRACKED_DBS}\"\n\n# Pull the in-repo path constants (DATA_METADATA, DATA_PROCESSED, FIGURES)\n# from scripts/config.py — single source of truth.\nsys.path.insert(0, str(REPO_ROOT / \"scripts\"))\nfrom config import DATA_METADATA, DATA_PROCESSED, FIGURES\n\n# Plotting style\nplt.style.use('seaborn-v0_8')\nsns.set_palette(\"husl\")\n\nprint(f\"Data directory: {DATA_DIR}\")\nprint(f\"Repo root: {REPO_ROOT}\")\nprint(f\"Metadata TSV: {METADATA_TSV}\")\nprint(f\"Pandas version: {pd.__version__}\")\nprint(f\"NumPy version: {np.__version__}\")\n"
|
||||
"source": "import sys\nfrom pathlib import Path\n\nimport pandas as pd\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport seaborn as sns\nfrom scipy.spatial.distance import euclidean\nfrom scipy import stats\n\n# ─── Where the data lives ────────────────────────────────────────────────\n# DATA_DIR holds everything bulky/regenerable: the metadata TSV and the\n# tracking SQLite DBs. It's mounted into the container at this fixed path.\n# REPO_ROOT is your checkout of the cupido repo, in your home directory.\n# Path.home() expands to /home/<your-username>, so this works for any\n# user (no hard-coded usernames).\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\nREPO_ROOT = Path.home() / \"cupido\"\n\nTRACKED_DBS = DATA_DIR / \"tracked\"\n\n# ─── The metadata TSV — shared master vs. your personal copy ─────────────\n# DATA_DIR is mounted read-only inside the container, so the shared TSV\n# at SHARED_TSV cannot be edited. Fine for read-only analysis. But if\n# you want to flip `include` flags (or otherwise customize the metadata\n# for your own analysis), copy it to your home folder ONCE:\n#\n# $ cp /mnt/data/projects/cupido/all_video_info_merged.tsv ~/cupido_metadata.tsv\n#\n# After that, the auto-select line below will pick up your personal copy\n# automatically. Other users are unaffected.\nSHARED_TSV = DATA_DIR / \"all_video_info_merged.tsv\"\nPERSONAL_TSV = Path.home() / \"cupido_metadata.tsv\"\nMETADATA_TSV = PERSONAL_TSV if PERSONAL_TSV.exists() else SHARED_TSV\n\n# Sanity-check the data location up front so any failure here points at\n# the obvious thing — rather than crashing inside load_roi_data later.\nassert METADATA_TSV.exists(), f\"Metadata TSV not found at {METADATA_TSV}\"\nassert TRACKED_DBS.is_dir(), f\"Tracked-DB directory not found at {TRACKED_DBS}\"\n\n# Pull the in-repo path constants (DATA_METADATA, DATA_PROCESSED, FIGURES)\n# from scripts/config.py — single source of truth.\nsys.path.insert(0, str(REPO_ROOT / \"scripts\"))\nfrom config import DATA_METADATA, DATA_PROCESSED, FIGURES\n\n# Plotting style\nplt.style.use('seaborn-v0_8')\nsns.set_palette(\"husl\")\n\nprint(f\"Data directory: {DATA_DIR}\")\nprint(f\"Repo root: {REPO_ROOT}\")\nprint(f\"Metadata TSV: {METADATA_TSV} ({'personal' if METADATA_TSV == PERSONAL_TSV else 'shared (read-only)'})\")\nprint(f\"Pandas version: {pd.__version__}\")\nprint(f\"NumPy version: {np.__version__}\")\n"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": "## 1. Load the tracking data\n\n`load_roi_data` opens every tracking DB referenced by the merged TSV\nand returns one big DataFrame stamped with experimental metadata\n(species, male/naïve, age, …). The TSV has a boolean `include` column\n(default `True`) — set it to `False` for any row you want to drop\n(e.g. videos that turned out to be too noisy). The loader respects\nthat flag automatically; nothing else needs to change here.\n\nIf you only want a subset, pre-filter `meta` before passing it in\n(e.g. `load_roi_data(meta[meta.species == 'Melanogaster/CS'])`).\n"
|
||||
"source": "## 1. Load the tracking data\n\n`load_roi_data` opens every tracking DB referenced by the merged TSV\nand returns one big DataFrame stamped with experimental metadata\n(species, male/naïve, age, …). The TSV has a boolean `include` column\n(default `True`); the loader skips rows where it's `False`. To customize\nwhich rows you want included, see the personal-copy comment block in\nthe setup cell above — you edit your **own** copy of the TSV, not the\nshared one.\n\nIf you only want a subset, pre-filter `meta` before passing it in\n(e.g. `load_roi_data(meta[meta.species == 'Melanogaster/CS'])`).\n"
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue