Move metadata xlsx/TSV to /mnt/data/projects/cupido/

Consolidates everything bulky (tracking DBs, targets, metadata
spreadsheet) under a single DATA_VOLUME root outside the ownCloud-synced
repo. Notebooks now use a visible DATA_DIR = Path(...) idiom rather than
walking up the filesystem with PROJECT_ROOT.parent — easier for students
with no Python background to follow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Giorgio Gilestro 2026-05-01 08:47:15 +01:00
parent ec56e51bf9
commit f176224150
8 changed files with 102 additions and 160 deletions

View file

@ -16,11 +16,11 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# 00 \u00b7 Welcome to the Cupido fly-tracking project\n",
"# 00 · Welcome to the Cupido fly-tracking project\n",
"\n",
"Hi! You're about to start working on a project that studies how *Drosophila*\n",
"(fruit flies) form **memories of mating experiences** \u2014 and whether trained\n",
"flies behave differently from na\u00efve ones in their later courtship.\n",
"(fruit flies) form **memories of mating experiences** and whether trained\n",
"flies behave differently from naïve ones in their later courtship.\n",
"\n",
"**You don't need any prior experience with Python or data science to follow\n",
"along.** This series of notebooks will walk you through everything, one\n",
@ -48,7 +48,7 @@
"metadata": {},
"source": [
"You should have seen `Hello, fly world!` printed and the number `2`\n",
"appear underneath. If something else happened, ask Giorgio \u2014 that's a\n",
"appear underneath. If something else happened, ask Giorgio that's a\n",
"sign the environment isn't set up right.\n",
"\n",
"If this is the very first time you're using JupyterLab, take 10 minutes\n",
@ -61,7 +61,7 @@
" (Python that the computer runs).\n",
"- The **kernel** is the running Python process behind the notebook. It\n",
" remembers everything you've defined. If something gets weird, restart\n",
" the kernel: top menu \u2192 *Kernel* \u2192 *Restart Kernel\u2026*.\n",
" the kernel: top menu → *Kernel* → *Restart Kernel…*.\n",
"- `Shift + Enter` runs a cell and moves to the next one.\n",
"- `Ctrl + Enter` runs a cell and stays put.\n"
]
@ -74,7 +74,7 @@
"\n",
"Drosophila males court females with a stereotyped sequence (chasing,\n",
"wing-extension, tapping). When a male is rejected by a female (e.g.\n",
"because she's already mated), he **learns** to suppress his courtship \u2014\n",
"because she's already mated), he **learns** to suppress his courtship \n",
"even toward new, receptive females, for a while. This is a textbook\n",
"example of *non-associative learning* in invertebrates ([review on\n",
"PubMed](https://pubmed.ncbi.nlm.nih.gov/?term=courtship+conditioning+drosophila)).\n",
@ -85,7 +85,7 @@
" species recorded.)\n",
"- How long does the memory last? (training_length_hr,\n",
" consolidation_length_hr columns in the metadata.)\n",
"- Are there **individual differences** \u2014 do some males learn while others\n",
"- Are there **individual differences** do some males learn while others\n",
" don't? (The \"bimodal hypothesis\" in `docs/bimodal_hypothesis.md`.)\n",
"\n",
"Your job, broadly, will be to **turn videos of flies into numbers and\n",
@ -100,17 +100,17 @@
"\n",
"1. **Training**: a male fly is placed with a non-receptive (mated) female.\n",
" He courts, gets rejected, eventually gives up.\n",
"2. *Wait* for some hours (the \"consolidation\" period \u2014 gives memory time\n",
"2. *Wait* for some hours (the \"consolidation\" period gives memory time\n",
" to form).\n",
"3. **Testing**: same male is placed with a fresh receptive female.\n",
" Does he court her vigorously, or has he learned to give up easily?\n",
"\n",
"Each experiment runs in an **HD mating arena** \u2014 a small chamber with\n",
"Each experiment runs in an **HD mating arena** a small chamber with\n",
"6 sub-arenas (we call them **ROIs**, for \"regions of interest\"). Each ROI\n",
"contains one couple (a male and a female). A camera films the whole arena\n",
"from above. So one **video** gives us 6 simultaneous experiments.\n",
"\n",
"The setup uses [Ethoscopes](https://www.ethoscope.com/) \u2014 open-source\n",
"The setup uses [Ethoscopes](https://www.ethoscope.com/) open-source\n",
"behavioural recording boxes built in this lab. Each ethoscope is a\n",
"machine; we have 16 in total, named `ETHOSCOPE_067`, `ETHOSCOPE_076`, etc.\n"
]
@ -124,7 +124,7 @@
"For each video, the **tracker** (a piece of software that runs after the\n",
"recording) finds the flies frame-by-frame and writes their positions to a\n",
"**SQLite database** (a single file, ending in `.db`). One DB per video.\n",
"Inside each DB there are 6 tables called `ROI_1`, `ROI_2`, \u2026, `ROI_6` \u2014\n",
"Inside each DB there are 6 tables called `ROI_1`, `ROI_2`, …, `ROI_6` —\n",
"one per sub-arena. Each row of an ROI table is **one fly detection at one\n",
"moment in time** with these columns:\n",
"\n",
@ -139,7 +139,7 @@
"| `has_interacted` | (legacy column, mostly unused) |\n",
"\n",
"If a single ROI has two flies that the tracker can see, you'll get **two\n",
"rows with the same `t`** \u2014 one for each fly. If only one fly is detected\n",
"rows with the same `t`** one for each fly. If only one fly is detected\n",
"(maybe they're on top of each other), you'll get one row.\n",
"\n",
"That's the heart of the data. Everything else (distances, velocities,\n",
@ -149,51 +149,25 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Where everything lives\n",
"\n",
"Take a moment to memorize these locations \u2014 you'll come back to them often.\n",
"\n",
"| what | where |\n",
"|---|---|\n",
"| Tracking DBs (SQLite, one per video) | `/mnt/data/projects/cupido/tracked/` |\n",
"| Target JSONs (the user-clicked reference points) | `/mnt/data/projects/cupido/targets/` |\n",
"| Source video files | `/mnt/ethoscope_data/videos/` |\n",
"| Project code (this repo) | `/home/gg/ownCloud/Work/Projects/coding/cupido/tracking/` |\n",
"| The metadata table (xlsx + TSV) | `/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv` |\n",
"| Your notebooks | `notebooks/getting_started/` (this folder) |\n",
"\n",
"Let's verify a couple of these from inside Python:\n"
]
"source": "## Where everything lives\n\nTake a moment to memorize these locations — you'll come back to them often.\n\n| what | where |\n|---|---|\n| Tracking DBs (SQLite, one per video) | `/mnt/data/projects/cupido/tracked/` |\n| Target JSONs (the user-clicked reference points) | `/mnt/data/projects/cupido/targets/` |\n| The metadata table (xlsx + TSV) | `/mnt/data/projects/cupido/all_video_info_merged.tsv` |\n| Source video files | `/mnt/ethoscope_data/videos/` |\n| Project code (this repo) | `/home/gg/ownCloud/Work/Projects/coding/cupido/tracking/` |\n| Your notebooks | `notebooks/getting_started/` (this folder) |\n\nNotice the pattern: **everything bulky or regenerable lives under\n`/mnt/data/projects/cupido/`**. The repository itself only stores code,\ndocumentation, and small metadata files. We'll refer to that data\ndirectory as `DATA_DIR` from here on.\n\nLet's verify a couple of these from inside Python:\n"
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"tracked = Path(\"/mnt/data/projects/cupido/tracked\")\n",
"targets = Path(\"/mnt/data/projects/cupido/targets\")\n",
"\n",
"n_dbs = len(list(tracked.glob(\"*_tracking.db\")))\n",
"n_jsons = len(list(targets.glob(\"*.json\")))\n",
"\n",
"print(f\"Tracking DBs available: {n_dbs}\")\n",
"print(f\"Target JSONs available: {n_jsons}\")\n"
]
"source": "from pathlib import Path\n\n# Single root for all the bulky / regenerable project data.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\n\ntracked_dir = DATA_DIR / \"tracked\"\ntargets_dir = DATA_DIR / \"targets\"\nmetadata_tsv = DATA_DIR / \"all_video_info_merged.tsv\"\n\nprint(f\"Tracking DBs available: {len(list(tracked_dir.glob('*_tracking.db')))}\")\nprint(f\"Target JSONs available: {len(list(targets_dir.glob('*.json')))}\")\nprint(f\"Metadata TSV exists: {metadata_tsv.exists()}\")\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should see roughly 113 tracking DBs and 130 target JSONs. If those\n",
"numbers are zero, the storage volume isn't mounted \u2014 ask Giorgio.\n",
"numbers are zero, the storage volume isn't mounted — ask Giorgio.\n",
"\n",
"> **Note**: the tracking DBs are read-only inside the JupyterLab\n",
"> container. You can read them but not modify or delete them. That's a\n",
"> deliberate safety measure \u2014 we don't want analysis code accidentally\n",
"> deliberate safety measure — we don't want analysis code accidentally\n",
"> corrupting the source data.\n"
]
},
@ -203,26 +177,26 @@
"source": [
"## Glossary (refer back as needed)\n",
"\n",
"- **ROI** \u2014 *region of interest*. One sub-arena inside the HD mating\n",
" arena. There are 6 ROIs per video, numbered 1\u20136.\n",
"- **fly** \u2014 one detection in a single (t, ROI) cell. Two flies in the\n",
"- **ROI** *region of interest*. One sub-arena inside the HD mating\n",
" arena. There are 6 ROIs per video, numbered 16.\n",
"- **fly** one detection in a single (t, ROI) cell. Two flies in the\n",
" same ROI at the same time = two rows with the same `t`.\n",
"- **trained** \u2014 the male had a training session before testing.\n",
"- **naive** \u2014 the male is a control (no training).\n",
"- **training session** \u2014 the recording where the male meets the\n",
"- **trained** the male had a training session before testing.\n",
"- **naive** the male is a control (no training).\n",
"- **training session** the recording where the male meets the\n",
" non-receptive female (he gets rejected).\n",
"- **testing session** \u2014 the recording where the male meets a fresh\n",
"- **testing session** the recording where the male meets a fresh\n",
" receptive female (we measure his courtship).\n",
"- **t (milliseconds)** \u2014 time within one session, starting at 0.\n",
"- **(x, y) pixels** \u2014 fly position in the image. Top-left is (0, 0); x\n",
"- **t (milliseconds)** time within one session, starting at 0.\n",
"- **(x, y) pixels** fly position in the image. Top-left is (0, 0); x\n",
" grows to the right, y grows **downward** (this is the image-coordinate\n",
" convention, opposite of math class).\n",
"- **machine_name** \u2014 which ethoscope recorded the video, e.g.\n",
"- **machine_name** which ethoscope recorded the video, e.g.\n",
" `ETHOSCOPE_076`.\n",
"- **species** \u2014 `Melanogaster/CS`, `Sechellia`, `Simulans`, `Yakuba`,\n",
"- **species** `Melanogaster/CS`, `Sechellia`, `Simulans`, `Yakuba`,\n",
" `Erecta`, `Willistoni`, or `CS`.\n",
"\n",
"If you bump into other terms in the code, ask. Don't guess \u2014 biology\n",
"If you bump into other terms in the code, ask. Don't guess biology\n",
"codebases pick up jargon over the years.\n"
]
},
@ -234,16 +208,16 @@
"\n",
"When you're ready, open these notebooks **in order**:\n",
"\n",
"1. `01_python_pandas_basics.ipynb` \u2014 just enough Python and pandas to\n",
"1. `01_python_pandas_basics.ipynb` just enough Python and pandas to\n",
" read and manipulate tabular data.\n",
"2. `02_explore_one_database.ipynb` \u2014 open one tracking DB, plot a fly's\n",
"2. `02_explore_one_database.ipynb` open one tracking DB, plot a fly's\n",
" trajectory, see what the numbers actually look like.\n",
"3. `03_compare_trained_vs_naive.ipynb` \u2014 your first real analysis,\n",
"3. `03_compare_trained_vs_naive.ipynb` your first real analysis,\n",
" comparing groups of flies.\n",
"\n",
"After those, the notebooks one level up (`flies_analysis.ipynb`,\n",
"`flies_analysis_simple.ipynb`) contain the analysis pipeline that the\n",
"previous student built \u2014 those will make sense once you've worked\n",
"previous student built those will make sense once you've worked\n",
"through the tutorials.\n",
"\n",
"Don't try to power through all of them in one sitting. Run a few cells,\n",
@ -252,4 +226,4 @@
]
}
]
}
}