cupido/notebooks/getting_started/00_welcome.ipynb
Giorgio Gilestro f08e4b843d Per-user metadata TSV — auto-prefer ~/cupido_metadata.tsv if present
The shared TSV at /mnt/data/projects/cupido/ is read-only inside the
container, so users who want to customize the `include` column (or any
metadata) need a personal copy. Notebooks now check for
~/cupido_metadata.tsv first and fall back to the shared master if it
doesn't exist. Each user keeps their own edits without stepping on
anyone else's analysis.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-01 09:25:24 +01:00

221 lines
No EOL
12 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 00 · Welcome to the Cupido fly-tracking project\n",
"\n",
"Hi! You're about to start working on a project that studies how *Drosophila*\n",
"(fruit flies) form **memories of mating experiences** — and whether trained\n",
"flies behave differently from naïve ones in their later courtship.\n",
"\n",
"**You don't need any prior experience with Python or data science to follow\n",
"along.** This series of notebooks will walk you through everything, one\n",
"small step at a time.\n",
"\n",
"> **How to read these notebooks**: each notebook is split into \"cells\".\n",
"> Some cells are explanations (like this one), others are code that you\n",
"> can **run** by clicking on the cell and pressing `Shift + Enter`. Try it\n",
"> on the next cell.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"# This is a code cell. Click on it and press Shift+Enter to run it.\n",
"print(\"Hello, fly world!\")\n",
"1 + 1\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should have seen `Hello, fly world!` printed and the number `2`\n",
"appear underneath. If something else happened, ask Giorgio — that's a\n",
"sign the environment isn't set up right.\n",
"\n",
"If this is the very first time you're using JupyterLab, take 10 minutes\n",
"to read the [official \"Getting started with JupyterLab\"\n",
"guide](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html).\n",
"The most important things to know are:\n",
"\n",
"- A notebook (`.ipynb` file) is a sequence of **cells**.\n",
"- Each cell is either **Markdown** (formatted text, like this) or **Code**\n",
" (Python that the computer runs).\n",
"- The **kernel** is the running Python process behind the notebook. It\n",
" remembers everything you've defined. If something gets weird, restart\n",
" the kernel: top menu → *Kernel* → *Restart Kernel…*.\n",
"- `Shift + Enter` runs a cell and moves to the next one.\n",
"- `Ctrl + Enter` runs a cell and stays put.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is the project about?\n",
"\n",
"Drosophila males court females with a stereotyped sequence (chasing,\n",
"wing-extension, tapping). When a male is rejected by a female (e.g.\n",
"because she's already mated), he **learns** to suppress his courtship —\n",
"even toward new, receptive females, for a while. This is a textbook\n",
"example of *non-associative learning* in invertebrates ([review on\n",
"PubMed](https://pubmed.ncbi.nlm.nih.gov/?term=courtship+conditioning+drosophila)).\n",
"\n",
"The lab is interested in:\n",
"\n",
"- Does this learning **transfer across species**? (We have ~7 *Drosophila*\n",
" species recorded.)\n",
"- How long does the memory last? (training_length_hr,\n",
" consolidation_length_hr columns in the metadata.)\n",
"- Are there **individual differences** — do some males learn while others\n",
" don't? (The \"bimodal hypothesis\" in `docs/bimodal_hypothesis.md`.)\n",
"\n",
"Your job, broadly, will be to **turn videos of flies into numbers and\n",
"plots that answer these questions.**\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How an experiment works (the bird's-eye view)\n",
"\n",
"1. **Training**: a male fly is placed with a non-receptive (mated) female.\n",
" He courts, gets rejected, eventually gives up.\n",
"2. *Wait* for some hours (the \"consolidation\" period — gives memory time\n",
" to form).\n",
"3. **Testing**: same male is placed with a fresh receptive female.\n",
" Does he court her vigorously, or has he learned to give up easily?\n",
"\n",
"Each experiment runs in an **HD mating arena** — a small chamber with\n",
"6 sub-arenas (we call them **ROIs**, for \"regions of interest\"). Each ROI\n",
"contains one couple (a male and a female). A camera films the whole arena\n",
"from above. So one **video** gives us 6 simultaneous experiments.\n",
"\n",
"The setup uses [Ethoscopes](https://www.ethoscope.com/) — open-source\n",
"behavioural recording boxes built in this lab. Each ethoscope is a\n",
"machine; we have 16 in total, named `ETHOSCOPE_067`, `ETHOSCOPE_076`, etc.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What does the data look like?\n",
"\n",
"For each video, the **tracker** (a piece of software that runs after the\n",
"recording) finds the flies frame-by-frame and writes their positions to a\n",
"**SQLite database** (a single file, ending in `.db`). One DB per video.\n",
"Inside each DB there are 6 tables called `ROI_1`, `ROI_2`, …, `ROI_6` —\n",
"one per sub-arena. Each row of an ROI table is **one fly detection at one\n",
"moment in time** with these columns:\n",
"\n",
"| column | meaning |\n",
"|---|---|\n",
"| `id` | row number (auto-incremented) |\n",
"| `t` | time in **milliseconds** since the video started |\n",
"| `x`, `y` | fly position in **pixels** (top-left corner of the image is 0,0) |\n",
"| `w`, `h` | width and height of the bounding box around the fly, in pixels |\n",
"| `phi` | orientation angle of the fly |\n",
"| `is_inferred` | 1 if the position was guessed (not directly seen), 0 otherwise |\n",
"| `has_interacted` | (legacy column, mostly unused) |\n",
"\n",
"If a single ROI has two flies that the tracker can see, you'll get **two\n",
"rows with the same `t`** — one for each fly. If only one fly is detected\n",
"(maybe they're on top of each other), you'll get one row.\n",
"\n",
"That's the heart of the data. Everything else (distances, velocities,\n",
"group comparisons) is computed from these (t, x, y) traces.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": "## Where everything lives\n\nTake a moment to memorize these locations — you'll come back to them often.\n\n| what | where |\n|---|---|\n| Tracking DBs (SQLite, one per video) | `/mnt/data/projects/cupido/tracked/` |\n| Target JSONs (the user-clicked reference points) | `/mnt/data/projects/cupido/targets/` |\n| The metadata table (xlsx + TSV) | `/mnt/data/projects/cupido/all_video_info_merged.tsv` |\n| Source video files | `/mnt/ethoscope_data/videos/` |\n| Project code (this repo) | `~/cupido/` (your home directory inside the container) |\n| Your notebooks | `~/cupido/notebooks/getting_started/` (this folder) |\n\nNotice the pattern: **everything bulky or regenerable lives under\n`/mnt/data/projects/cupido/`**. The repository itself, with all the\ncode and notebooks, is checked out into your home directory inside the\nJupyterLab container — that's `~/cupido`, where `~` is shorthand for\nyour home directory (`/home/<your-username>`).\n\nWe'll refer to those two roots as `DATA_DIR` and `REPO_ROOT` from here on.\n\nLet's verify a couple of these from inside Python:\n"
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": "from pathlib import Path\n\n# The two roots we keep coming back to.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\") # bulky data, mounted into the container\nREPO_ROOT = Path.home() / \"cupido\" # the code repo in your home dir\n\ntracked_dir = DATA_DIR / \"tracked\"\ntargets_dir = DATA_DIR / \"targets\"\nmetadata_tsv = DATA_DIR / \"all_video_info_merged.tsv\"\n\nprint(f\"Repo root: {REPO_ROOT} (exists={REPO_ROOT.exists()})\")\nprint(f\"Tracking DBs available: {len(list(tracked_dir.glob('*_tracking.db')))}\")\nprint(f\"Target JSONs available: {len(list(targets_dir.glob('*.json')))}\")\nprint(f\"Metadata TSV exists: {metadata_tsv.exists()}\")\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": "You should see roughly 113 tracking DBs and 130 target JSONs. If those\nnumbers are zero, the storage volume isn't mounted — ask Giorgio.\n\n> **Note**: the data volume is **read-only** inside the JupyterLab\n> container. You can read everything but not modify or delete it. That's\n> a deliberate safety measure — we don't want analysis code accidentally\n> corrupting the source data.\n\n### Personalising the metadata TSV\n\nBecause the volume is read-only, the shared metadata file\n`all_video_info_merged.tsv` cannot be edited in place. If you want to\nmark a row as \"skip this fly\" — e.g. by flipping its `include` column to\n`False` because the video is too noisy — copy the file to your home\nfolder **once**:\n\n```bash\ncp /mnt/data/projects/cupido/all_video_info_merged.tsv ~/cupido_metadata.tsv\n```\n\nThe notebooks check for `~/cupido_metadata.tsv` first and fall back to\nthe shared master if your personal copy doesn't exist. Each user keeps\ntheir own edits; nobody steps on anyone else's analysis.\n"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Glossary (refer back as needed)\n",
"\n",
"- **ROI** — *region of interest*. One sub-arena inside the HD mating\n",
" arena. There are 6 ROIs per video, numbered 16.\n",
"- **fly** — one detection in a single (t, ROI) cell. Two flies in the\n",
" same ROI at the same time = two rows with the same `t`.\n",
"- **trained** — the male had a training session before testing.\n",
"- **naive** — the male is a control (no training).\n",
"- **training session** — the recording where the male meets the\n",
" non-receptive female (he gets rejected).\n",
"- **testing session** — the recording where the male meets a fresh\n",
" receptive female (we measure his courtship).\n",
"- **t (milliseconds)** — time within one session, starting at 0.\n",
"- **(x, y) pixels** — fly position in the image. Top-left is (0, 0); x\n",
" grows to the right, y grows **downward** (this is the image-coordinate\n",
" convention, opposite of math class).\n",
"- **machine_name** — which ethoscope recorded the video, e.g.\n",
" `ETHOSCOPE_076`.\n",
"- **species** — `Melanogaster/CS`, `Sechellia`, `Simulans`, `Yakuba`,\n",
" `Erecta`, `Willistoni`, or `CS`.\n",
"\n",
"If you bump into other terms in the code, ask. Don't guess — biology\n",
"codebases pick up jargon over the years.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What's next\n",
"\n",
"When you're ready, open these notebooks **in order**:\n",
"\n",
"1. `01_python_pandas_basics.ipynb` — just enough Python and pandas to\n",
" read and manipulate tabular data.\n",
"2. `02_explore_one_database.ipynb` — open one tracking DB, plot a fly's\n",
" trajectory, see what the numbers actually look like.\n",
"3. `03_compare_trained_vs_naive.ipynb` — your first real analysis,\n",
" comparing groups of flies.\n",
"\n",
"After those, the notebooks one level up (`flies_analysis.ipynb`,\n",
"`flies_analysis_simple.ipynb`) contain the analysis pipeline that the\n",
"previous student built — those will make sense once you've worked\n",
"through the tutorials.\n",
"\n",
"Don't try to power through all of them in one sitting. Run a few cells,\n",
"read the explanation, **change a number** to see what happens, **break\n",
"something on purpose** to see the error message. That's how you learn.\n"
]
}
]
}