Per-user metadata TSV — auto-prefer ~/cupido_metadata.tsv if present
The shared TSV at /mnt/data/projects/cupido/ is read-only inside the container, so users who want to customize the `include` column (or any metadata) need a personal copy. Notebooks now check for ~/cupido_metadata.tsv first and fall back to the shared master if it doesn't exist. Each user keeps their own edits without stepping on anyone else's analysis. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
23050360ea
commit
f08e4b843d
6 changed files with 17 additions and 14 deletions
|
|
@ -161,15 +161,7 @@
|
|||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"You should see roughly 113 tracking DBs and 130 target JSONs. If those\n",
|
||||
"numbers are zero, the storage volume isn't mounted — ask Giorgio.\n",
|
||||
"\n",
|
||||
"> **Note**: the tracking DBs are read-only inside the JupyterLab\n",
|
||||
"> container. You can read them but not modify or delete them. That's a\n",
|
||||
"> deliberate safety measure — we don't want analysis code accidentally\n",
|
||||
"> corrupting the source data.\n"
|
||||
]
|
||||
"source": "You should see roughly 113 tracking DBs and 130 target JSONs. If those\nnumbers are zero, the storage volume isn't mounted — ask Giorgio.\n\n> **Note**: the data volume is **read-only** inside the JupyterLab\n> container. You can read everything but not modify or delete it. That's\n> a deliberate safety measure — we don't want analysis code accidentally\n> corrupting the source data.\n\n### Personalising the metadata TSV\n\nBecause the volume is read-only, the shared metadata file\n`all_video_info_merged.tsv` cannot be edited in place. If you want to\nmark a row as \"skip this fly\" — e.g. by flipping its `include` column to\n`False` because the video is too noisy — copy the file to your home\nfolder **once**:\n\n```bash\ncp /mnt/data/projects/cupido/all_video_info_merged.tsv ~/cupido_metadata.tsv\n```\n\nThe notebooks check for `~/cupido_metadata.tsv` first and fall back to\nthe shared master if your personal copy doesn't exist. Each user keeps\ntheir own edits; nobody steps on anyone else's analysis.\n"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
|
|
|
|||
|
|
@ -257,7 +257,7 @@
|
|||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": "import pandas as pd\nfrom pathlib import Path\n\n# All the project's bulky data lives under /mnt/data/projects/cupido/.\n# This pattern — define one DATA_DIR variable, then build sub-paths from\n# it — is much easier to read (and to update) than hard-coding long\n# strings everywhere.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\ntsv_path = DATA_DIR / \"all_video_info_merged.tsv\"\n\n# Read the project's metadata TSV (Tab-Separated Values).\ndf = pd.read_csv(tsv_path, sep=\"\\t\")\n\n# How big is it?\nprint(f\"Rows: {len(df)}\")\nprint(f\"Columns: {df.shape[1]}\")\n"
|
||||
"source": "import pandas as pd\nfrom pathlib import Path\n\n# All the project's bulky data lives under /mnt/data/projects/cupido/.\n# Defining one DATA_DIR variable and building sub-paths from it is much\n# easier to read (and to update) than hard-coding long strings everywhere.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\n\n# Pick the metadata TSV: prefer your personal copy if you have one,\n# otherwise fall back to the shared (read-only) master. To make a\n# personal copy you can edit, run ONCE in a terminal:\n# cp /mnt/data/projects/cupido/all_video_info_merged.tsv ~/cupido_metadata.tsv\nSHARED_TSV = DATA_DIR / \"all_video_info_merged.tsv\"\nPERSONAL_TSV = Path.home() / \"cupido_metadata.tsv\"\ntsv_path = PERSONAL_TSV if PERSONAL_TSV.exists() else SHARED_TSV\n\n# Read the project's metadata TSV (Tab-Separated Values).\ndf = pd.read_csv(tsv_path, sep=\"\\t\")\n\n# How big is it?\nprint(f\"Reading from: {tsv_path}\")\nprint(f\"Rows: {len(df)}\")\nprint(f\"Columns: {df.shape[1]}\")\n"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
|
|
|
|||
|
|
@ -66,7 +66,7 @@
|
|||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": "# Load the metadata TSV first — it's small and fast.\ntsv_path = DATA_DIR / \"all_video_info_merged.tsv\"\nmeta = pd.read_csv(tsv_path, sep=\"\\t\")\nprint(f\"metadata rows: {len(meta)}\")\n"
|
||||
"source": "# Pick the metadata TSV: prefer your personal copy if you have one,\n# otherwise fall back to the shared (read-only) master.\n#\n# To make a personal copy that you can edit (e.g. flip `include` flags\n# for noisy rows), run this ONCE in a terminal:\n# cp /mnt/data/projects/cupido/all_video_info_merged.tsv ~/cupido_metadata.tsv\nSHARED_TSV = DATA_DIR / \"all_video_info_merged.tsv\"\nPERSONAL_TSV = Path.home() / \"cupido_metadata.tsv\"\ntsv_path = PERSONAL_TSV if PERSONAL_TSV.exists() else SHARED_TSV\n\n# Load the metadata TSV first — it's small and fast.\nmeta = pd.read_csv(tsv_path, sep=\"\\t\")\nprint(f\"loaded {tsv_path} ({'personal' if tsv_path == PERSONAL_TSV else 'shared (read-only)'})\")\nprint(f\"metadata rows: {len(meta)}\")\n"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue