Personal copy of all_video_info_merged.tsv now lives at ~/cupido/data/metadata/all_video_info_merged.tsv (gitignored) instead of ~/cupido_metadata.tsv. That sits next to the other small metadata CSVs (barrier_opening, etc.) — the natural home for it. Updated all five notebooks and processed/README accordingly. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
221 lines
No EOL
12 KiB
Text
221 lines
No EOL
12 KiB
Text
{
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5,
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"name": "python"
|
||
}
|
||
},
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 00 · Welcome to the Cupido fly-tracking project\n",
|
||
"\n",
|
||
"Hi! You're about to start working on a project that studies how *Drosophila*\n",
|
||
"(fruit flies) form **memories of mating experiences** — and whether trained\n",
|
||
"flies behave differently from naïve ones in their later courtship.\n",
|
||
"\n",
|
||
"**You don't need any prior experience with Python or data science to follow\n",
|
||
"along.** This series of notebooks will walk you through everything, one\n",
|
||
"small step at a time.\n",
|
||
"\n",
|
||
"> **How to read these notebooks**: each notebook is split into \"cells\".\n",
|
||
"> Some cells are explanations (like this one), others are code that you\n",
|
||
"> can **run** by clicking on the cell and pressing `Shift + Enter`. Try it\n",
|
||
"> on the next cell.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {},
|
||
"execution_count": null,
|
||
"outputs": [],
|
||
"source": [
|
||
"# This is a code cell. Click on it and press Shift+Enter to run it.\n",
|
||
"print(\"Hello, fly world!\")\n",
|
||
"1 + 1\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"You should have seen `Hello, fly world!` printed and the number `2`\n",
|
||
"appear underneath. If something else happened, ask Giorgio — that's a\n",
|
||
"sign the environment isn't set up right.\n",
|
||
"\n",
|
||
"If this is the very first time you're using JupyterLab, take 10 minutes\n",
|
||
"to read the [official \"Getting started with JupyterLab\"\n",
|
||
"guide](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html).\n",
|
||
"The most important things to know are:\n",
|
||
"\n",
|
||
"- A notebook (`.ipynb` file) is a sequence of **cells**.\n",
|
||
"- Each cell is either **Markdown** (formatted text, like this) or **Code**\n",
|
||
" (Python that the computer runs).\n",
|
||
"- The **kernel** is the running Python process behind the notebook. It\n",
|
||
" remembers everything you've defined. If something gets weird, restart\n",
|
||
" the kernel: top menu → *Kernel* → *Restart Kernel…*.\n",
|
||
"- `Shift + Enter` runs a cell and moves to the next one.\n",
|
||
"- `Ctrl + Enter` runs a cell and stays put.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## What is the project about?\n",
|
||
"\n",
|
||
"Drosophila males court females with a stereotyped sequence (chasing,\n",
|
||
"wing-extension, tapping). When a male is rejected by a female (e.g.\n",
|
||
"because she's already mated), he **learns** to suppress his courtship —\n",
|
||
"even toward new, receptive females, for a while. This is a textbook\n",
|
||
"example of *non-associative learning* in invertebrates ([review on\n",
|
||
"PubMed](https://pubmed.ncbi.nlm.nih.gov/?term=courtship+conditioning+drosophila)).\n",
|
||
"\n",
|
||
"The lab is interested in:\n",
|
||
"\n",
|
||
"- Does this learning **transfer across species**? (We have ~7 *Drosophila*\n",
|
||
" species recorded.)\n",
|
||
"- How long does the memory last? (training_length_hr,\n",
|
||
" consolidation_length_hr columns in the metadata.)\n",
|
||
"- Are there **individual differences** — do some males learn while others\n",
|
||
" don't? (The \"bimodal hypothesis\" in `docs/bimodal_hypothesis.md`.)\n",
|
||
"\n",
|
||
"Your job, broadly, will be to **turn videos of flies into numbers and\n",
|
||
"plots that answer these questions.**\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## How an experiment works (the bird's-eye view)\n",
|
||
"\n",
|
||
"1. **Training**: a male fly is placed with a non-receptive (mated) female.\n",
|
||
" He courts, gets rejected, eventually gives up.\n",
|
||
"2. *Wait* for some hours (the \"consolidation\" period — gives memory time\n",
|
||
" to form).\n",
|
||
"3. **Testing**: same male is placed with a fresh receptive female.\n",
|
||
" Does he court her vigorously, or has he learned to give up easily?\n",
|
||
"\n",
|
||
"Each experiment runs in an **HD mating arena** — a small chamber with\n",
|
||
"6 sub-arenas (we call them **ROIs**, for \"regions of interest\"). Each ROI\n",
|
||
"contains one couple (a male and a female). A camera films the whole arena\n",
|
||
"from above. So one **video** gives us 6 simultaneous experiments.\n",
|
||
"\n",
|
||
"The setup uses [Ethoscopes](https://www.ethoscope.com/) — open-source\n",
|
||
"behavioural recording boxes built in this lab. Each ethoscope is a\n",
|
||
"machine; we have 16 in total, named `ETHOSCOPE_067`, `ETHOSCOPE_076`, etc.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## What does the data look like?\n",
|
||
"\n",
|
||
"For each video, the **tracker** (a piece of software that runs after the\n",
|
||
"recording) finds the flies frame-by-frame and writes their positions to a\n",
|
||
"**SQLite database** (a single file, ending in `.db`). One DB per video.\n",
|
||
"Inside each DB there are 6 tables called `ROI_1`, `ROI_2`, …, `ROI_6` —\n",
|
||
"one per sub-arena. Each row of an ROI table is **one fly detection at one\n",
|
||
"moment in time** with these columns:\n",
|
||
"\n",
|
||
"| column | meaning |\n",
|
||
"|---|---|\n",
|
||
"| `id` | row number (auto-incremented) |\n",
|
||
"| `t` | time in **milliseconds** since the video started |\n",
|
||
"| `x`, `y` | fly position in **pixels** (top-left corner of the image is 0,0) |\n",
|
||
"| `w`, `h` | width and height of the bounding box around the fly, in pixels |\n",
|
||
"| `phi` | orientation angle of the fly |\n",
|
||
"| `is_inferred` | 1 if the position was guessed (not directly seen), 0 otherwise |\n",
|
||
"| `has_interacted` | (legacy column, mostly unused) |\n",
|
||
"\n",
|
||
"If a single ROI has two flies that the tracker can see, you'll get **two\n",
|
||
"rows with the same `t`** — one for each fly. If only one fly is detected\n",
|
||
"(maybe they're on top of each other), you'll get one row.\n",
|
||
"\n",
|
||
"That's the heart of the data. Everything else (distances, velocities,\n",
|
||
"group comparisons) is computed from these (t, x, y) traces.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": "## Where everything lives\n\nTake a moment to memorize these locations — you'll come back to them often.\n\n| what | where |\n|---|---|\n| Tracking DBs (SQLite, one per video) | `/mnt/data/projects/cupido/tracked/` |\n| Target JSONs (the user-clicked reference points) | `/mnt/data/projects/cupido/targets/` |\n| The metadata table (xlsx + TSV) | `/mnt/data/projects/cupido/all_video_info_merged.tsv` |\n| Source video files | `/mnt/ethoscope_data/videos/` |\n| Project code (this repo) | `~/cupido/` (your home directory inside the container) |\n| Your notebooks | `~/cupido/notebooks/getting_started/` (this folder) |\n\nNotice the pattern: **everything bulky or regenerable lives under\n`/mnt/data/projects/cupido/`**. The repository itself, with all the\ncode and notebooks, is checked out into your home directory inside the\nJupyterLab container — that's `~/cupido`, where `~` is shorthand for\nyour home directory (`/home/<your-username>`).\n\nWe'll refer to those two roots as `DATA_DIR` and `REPO_ROOT` from here on.\n\nLet's verify a couple of these from inside Python:\n"
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"metadata": {},
|
||
"execution_count": null,
|
||
"outputs": [],
|
||
"source": "from pathlib import Path\n\n# The two roots we keep coming back to.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\") # bulky data, mounted into the container\nREPO_ROOT = Path.home() / \"cupido\" # the code repo in your home dir\n\ntracked_dir = DATA_DIR / \"tracked\"\ntargets_dir = DATA_DIR / \"targets\"\nmetadata_tsv = DATA_DIR / \"all_video_info_merged.tsv\"\n\nprint(f\"Repo root: {REPO_ROOT} (exists={REPO_ROOT.exists()})\")\nprint(f\"Tracking DBs available: {len(list(tracked_dir.glob('*_tracking.db')))}\")\nprint(f\"Target JSONs available: {len(list(targets_dir.glob('*.json')))}\")\nprint(f\"Metadata TSV exists: {metadata_tsv.exists()}\")\n"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": "You should see roughly 113 tracking DBs and 130 target JSONs. If those\nnumbers are zero, the storage volume isn't mounted — ask Giorgio.\n\n> **Note**: the data volume is **read-only** inside the JupyterLab\n> container. You can read everything but not modify or delete it. That's\n> a deliberate safety measure — we don't want analysis code accidentally\n> corrupting the source data.\n\n### Personalising the metadata TSV\n\nBecause the volume is read-only, the shared metadata file\n`all_video_info_merged.tsv` cannot be edited in place. If you want to\nmark a row as \"skip this fly\" — e.g. by flipping its `include` column to\n`False` because the video is too noisy — copy the file into the repo's\n`data/metadata/` folder **once**:\n\n```bash\ncp /mnt/data/projects/cupido/all_video_info_merged.tsv ~/cupido/data/metadata/\n```\n\nThat location is gitignored, so your edits stay local. The notebooks\ncheck for `~/cupido/data/metadata/all_video_info_merged.tsv` first and\nfall back to the shared master if your personal copy doesn't exist.\nEach user keeps their own edits; nobody steps on anyone else's analysis.\n"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## Glossary (refer back as needed)\n",
|
||
"\n",
|
||
"- **ROI** — *region of interest*. One sub-arena inside the HD mating\n",
|
||
" arena. There are 6 ROIs per video, numbered 1–6.\n",
|
||
"- **fly** — one detection in a single (t, ROI) cell. Two flies in the\n",
|
||
" same ROI at the same time = two rows with the same `t`.\n",
|
||
"- **trained** — the male had a training session before testing.\n",
|
||
"- **naive** — the male is a control (no training).\n",
|
||
"- **training session** — the recording where the male meets the\n",
|
||
" non-receptive female (he gets rejected).\n",
|
||
"- **testing session** — the recording where the male meets a fresh\n",
|
||
" receptive female (we measure his courtship).\n",
|
||
"- **t (milliseconds)** — time within one session, starting at 0.\n",
|
||
"- **(x, y) pixels** — fly position in the image. Top-left is (0, 0); x\n",
|
||
" grows to the right, y grows **downward** (this is the image-coordinate\n",
|
||
" convention, opposite of math class).\n",
|
||
"- **machine_name** — which ethoscope recorded the video, e.g.\n",
|
||
" `ETHOSCOPE_076`.\n",
|
||
"- **species** — `Melanogaster/CS`, `Sechellia`, `Simulans`, `Yakuba`,\n",
|
||
" `Erecta`, `Willistoni`, or `CS`.\n",
|
||
"\n",
|
||
"If you bump into other terms in the code, ask. Don't guess — biology\n",
|
||
"codebases pick up jargon over the years.\n"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"metadata": {},
|
||
"source": [
|
||
"## What's next\n",
|
||
"\n",
|
||
"When you're ready, open these notebooks **in order**:\n",
|
||
"\n",
|
||
"1. `01_python_pandas_basics.ipynb` — just enough Python and pandas to\n",
|
||
" read and manipulate tabular data.\n",
|
||
"2. `02_explore_one_database.ipynb` — open one tracking DB, plot a fly's\n",
|
||
" trajectory, see what the numbers actually look like.\n",
|
||
"3. `03_compare_trained_vs_naive.ipynb` — your first real analysis,\n",
|
||
" comparing groups of flies.\n",
|
||
"\n",
|
||
"After those, the notebooks one level up (`flies_analysis.ipynb`,\n",
|
||
"`flies_analysis_simple.ipynb`) contain the analysis pipeline that the\n",
|
||
"previous student built — those will make sense once you've worked\n",
|
||
"through the tutorials.\n",
|
||
"\n",
|
||
"Don't try to power through all of them in one sitting. Run a few cells,\n",
|
||
"read the explanation, **change a number** to see what happens, **break\n",
|
||
"something on purpose** to see the error message. That's how you learn.\n"
|
||
]
|
||
}
|
||
]
|
||
} |