Add beginner tutorial notebooks for incoming students

Four guided notebooks under notebooks/getting_started/ aimed at someone
new to Python and data science. The series progresses: project orientation
→ Python/pandas crash course → exploring one tracking DB → first
trained-vs-naive comparison using load_roi_data + Mann-Whitney U.

Each notebook leans heavily on markdown explanations, includes exercises
with empty cells, and links out to canonical references (JupyterLab,
official Python tutorial, pandas 10-min guide, Wikipedia for stats
concepts).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Giorgio Gilestro 2026-04-30 18:14:17 +01:00
parent 7d09523840
commit ec56e51bf9
5 changed files with 1607 additions and 0 deletions

View file

@ -0,0 +1,255 @@
{
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 00 \u00b7 Welcome to the Cupido fly-tracking project\n",
"\n",
"Hi! You're about to start working on a project that studies how *Drosophila*\n",
"(fruit flies) form **memories of mating experiences** \u2014 and whether trained\n",
"flies behave differently from na\u00efve ones in their later courtship.\n",
"\n",
"**You don't need any prior experience with Python or data science to follow\n",
"along.** This series of notebooks will walk you through everything, one\n",
"small step at a time.\n",
"\n",
"> **How to read these notebooks**: each notebook is split into \"cells\".\n",
"> Some cells are explanations (like this one), others are code that you\n",
"> can **run** by clicking on the cell and pressing `Shift + Enter`. Try it\n",
"> on the next cell.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"# This is a code cell. Click on it and press Shift+Enter to run it.\n",
"print(\"Hello, fly world!\")\n",
"1 + 1\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should have seen `Hello, fly world!` printed and the number `2`\n",
"appear underneath. If something else happened, ask Giorgio \u2014 that's a\n",
"sign the environment isn't set up right.\n",
"\n",
"If this is the very first time you're using JupyterLab, take 10 minutes\n",
"to read the [official \"Getting started with JupyterLab\"\n",
"guide](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html).\n",
"The most important things to know are:\n",
"\n",
"- A notebook (`.ipynb` file) is a sequence of **cells**.\n",
"- Each cell is either **Markdown** (formatted text, like this) or **Code**\n",
" (Python that the computer runs).\n",
"- The **kernel** is the running Python process behind the notebook. It\n",
" remembers everything you've defined. If something gets weird, restart\n",
" the kernel: top menu \u2192 *Kernel* \u2192 *Restart Kernel\u2026*.\n",
"- `Shift + Enter` runs a cell and moves to the next one.\n",
"- `Ctrl + Enter` runs a cell and stays put.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What is the project about?\n",
"\n",
"Drosophila males court females with a stereotyped sequence (chasing,\n",
"wing-extension, tapping). When a male is rejected by a female (e.g.\n",
"because she's already mated), he **learns** to suppress his courtship \u2014\n",
"even toward new, receptive females, for a while. This is a textbook\n",
"example of *non-associative learning* in invertebrates ([review on\n",
"PubMed](https://pubmed.ncbi.nlm.nih.gov/?term=courtship+conditioning+drosophila)).\n",
"\n",
"The lab is interested in:\n",
"\n",
"- Does this learning **transfer across species**? (We have ~7 *Drosophila*\n",
" species recorded.)\n",
"- How long does the memory last? (training_length_hr,\n",
" consolidation_length_hr columns in the metadata.)\n",
"- Are there **individual differences** \u2014 do some males learn while others\n",
" don't? (The \"bimodal hypothesis\" in `docs/bimodal_hypothesis.md`.)\n",
"\n",
"Your job, broadly, will be to **turn videos of flies into numbers and\n",
"plots that answer these questions.**\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## How an experiment works (the bird's-eye view)\n",
"\n",
"1. **Training**: a male fly is placed with a non-receptive (mated) female.\n",
" He courts, gets rejected, eventually gives up.\n",
"2. *Wait* for some hours (the \"consolidation\" period \u2014 gives memory time\n",
" to form).\n",
"3. **Testing**: same male is placed with a fresh receptive female.\n",
" Does he court her vigorously, or has he learned to give up easily?\n",
"\n",
"Each experiment runs in an **HD mating arena** \u2014 a small chamber with\n",
"6 sub-arenas (we call them **ROIs**, for \"regions of interest\"). Each ROI\n",
"contains one couple (a male and a female). A camera films the whole arena\n",
"from above. So one **video** gives us 6 simultaneous experiments.\n",
"\n",
"The setup uses [Ethoscopes](https://www.ethoscope.com/) \u2014 open-source\n",
"behavioural recording boxes built in this lab. Each ethoscope is a\n",
"machine; we have 16 in total, named `ETHOSCOPE_067`, `ETHOSCOPE_076`, etc.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What does the data look like?\n",
"\n",
"For each video, the **tracker** (a piece of software that runs after the\n",
"recording) finds the flies frame-by-frame and writes their positions to a\n",
"**SQLite database** (a single file, ending in `.db`). One DB per video.\n",
"Inside each DB there are 6 tables called `ROI_1`, `ROI_2`, \u2026, `ROI_6` \u2014\n",
"one per sub-arena. Each row of an ROI table is **one fly detection at one\n",
"moment in time** with these columns:\n",
"\n",
"| column | meaning |\n",
"|---|---|\n",
"| `id` | row number (auto-incremented) |\n",
"| `t` | time in **milliseconds** since the video started |\n",
"| `x`, `y` | fly position in **pixels** (top-left corner of the image is 0,0) |\n",
"| `w`, `h` | width and height of the bounding box around the fly, in pixels |\n",
"| `phi` | orientation angle of the fly |\n",
"| `is_inferred` | 1 if the position was guessed (not directly seen), 0 otherwise |\n",
"| `has_interacted` | (legacy column, mostly unused) |\n",
"\n",
"If a single ROI has two flies that the tracker can see, you'll get **two\n",
"rows with the same `t`** \u2014 one for each fly. If only one fly is detected\n",
"(maybe they're on top of each other), you'll get one row.\n",
"\n",
"That's the heart of the data. Everything else (distances, velocities,\n",
"group comparisons) is computed from these (t, x, y) traces.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Where everything lives\n",
"\n",
"Take a moment to memorize these locations \u2014 you'll come back to them often.\n",
"\n",
"| what | where |\n",
"|---|---|\n",
"| Tracking DBs (SQLite, one per video) | `/mnt/data/projects/cupido/tracked/` |\n",
"| Target JSONs (the user-clicked reference points) | `/mnt/data/projects/cupido/targets/` |\n",
"| Source video files | `/mnt/ethoscope_data/videos/` |\n",
"| Project code (this repo) | `/home/gg/ownCloud/Work/Projects/coding/cupido/tracking/` |\n",
"| The metadata table (xlsx + TSV) | `/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv` |\n",
"| Your notebooks | `notebooks/getting_started/` (this folder) |\n",
"\n",
"Let's verify a couple of these from inside Python:\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"from pathlib import Path\n",
"\n",
"tracked = Path(\"/mnt/data/projects/cupido/tracked\")\n",
"targets = Path(\"/mnt/data/projects/cupido/targets\")\n",
"\n",
"n_dbs = len(list(tracked.glob(\"*_tracking.db\")))\n",
"n_jsons = len(list(targets.glob(\"*.json\")))\n",
"\n",
"print(f\"Tracking DBs available: {n_dbs}\")\n",
"print(f\"Target JSONs available: {n_jsons}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You should see roughly 113 tracking DBs and 130 target JSONs. If those\n",
"numbers are zero, the storage volume isn't mounted \u2014 ask Giorgio.\n",
"\n",
"> **Note**: the tracking DBs are read-only inside the JupyterLab\n",
"> container. You can read them but not modify or delete them. That's a\n",
"> deliberate safety measure \u2014 we don't want analysis code accidentally\n",
"> corrupting the source data.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Glossary (refer back as needed)\n",
"\n",
"- **ROI** \u2014 *region of interest*. One sub-arena inside the HD mating\n",
" arena. There are 6 ROIs per video, numbered 1\u20136.\n",
"- **fly** \u2014 one detection in a single (t, ROI) cell. Two flies in the\n",
" same ROI at the same time = two rows with the same `t`.\n",
"- **trained** \u2014 the male had a training session before testing.\n",
"- **naive** \u2014 the male is a control (no training).\n",
"- **training session** \u2014 the recording where the male meets the\n",
" non-receptive female (he gets rejected).\n",
"- **testing session** \u2014 the recording where the male meets a fresh\n",
" receptive female (we measure his courtship).\n",
"- **t (milliseconds)** \u2014 time within one session, starting at 0.\n",
"- **(x, y) pixels** \u2014 fly position in the image. Top-left is (0, 0); x\n",
" grows to the right, y grows **downward** (this is the image-coordinate\n",
" convention, opposite of math class).\n",
"- **machine_name** \u2014 which ethoscope recorded the video, e.g.\n",
" `ETHOSCOPE_076`.\n",
"- **species** \u2014 `Melanogaster/CS`, `Sechellia`, `Simulans`, `Yakuba`,\n",
" `Erecta`, `Willistoni`, or `CS`.\n",
"\n",
"If you bump into other terms in the code, ask. Don't guess \u2014 biology\n",
"codebases pick up jargon over the years.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## What's next\n",
"\n",
"When you're ready, open these notebooks **in order**:\n",
"\n",
"1. `01_python_pandas_basics.ipynb` \u2014 just enough Python and pandas to\n",
" read and manipulate tabular data.\n",
"2. `02_explore_one_database.ipynb` \u2014 open one tracking DB, plot a fly's\n",
" trajectory, see what the numbers actually look like.\n",
"3. `03_compare_trained_vs_naive.ipynb` \u2014 your first real analysis,\n",
" comparing groups of flies.\n",
"\n",
"After those, the notebooks one level up (`flies_analysis.ipynb`,\n",
"`flies_analysis_simple.ipynb`) contain the analysis pipeline that the\n",
"previous student built \u2014 those will make sense once you've worked\n",
"through the tutorials.\n",
"\n",
"Don't try to power through all of them in one sitting. Run a few cells,\n",
"read the explanation, **change a number** to see what happens, **break\n",
"something on purpose** to see the error message. That's how you learn.\n"
]
}
]
}