Four guided notebooks under notebooks/getting_started/ aimed at someone new to Python and data science. The series progresses: project orientation → Python/pandas crash course → exploring one tracking DB → first trained-vs-naive comparison using load_roi_data + Mann-Whitney U. Each notebook leans heavily on markdown explanations, includes exercises with empty cells, and links out to canonical references (JupyterLab, official Python tutorial, pandas 10-min guide, Wikipedia for stats concepts). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
255 lines
11 KiB
Text
255 lines
11 KiB
Text
{
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5,
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"name": "python"
|
|
}
|
|
},
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"# 00 \u00b7 Welcome to the Cupido fly-tracking project\n",
|
|
"\n",
|
|
"Hi! You're about to start working on a project that studies how *Drosophila*\n",
|
|
"(fruit flies) form **memories of mating experiences** \u2014 and whether trained\n",
|
|
"flies behave differently from na\u00efve ones in their later courtship.\n",
|
|
"\n",
|
|
"**You don't need any prior experience with Python or data science to follow\n",
|
|
"along.** This series of notebooks will walk you through everything, one\n",
|
|
"small step at a time.\n",
|
|
"\n",
|
|
"> **How to read these notebooks**: each notebook is split into \"cells\".\n",
|
|
"> Some cells are explanations (like this one), others are code that you\n",
|
|
"> can **run** by clicking on the cell and pressing `Shift + Enter`. Try it\n",
|
|
"> on the next cell.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"metadata": {},
|
|
"execution_count": null,
|
|
"outputs": [],
|
|
"source": [
|
|
"# This is a code cell. Click on it and press Shift+Enter to run it.\n",
|
|
"print(\"Hello, fly world!\")\n",
|
|
"1 + 1\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You should have seen `Hello, fly world!` printed and the number `2`\n",
|
|
"appear underneath. If something else happened, ask Giorgio \u2014 that's a\n",
|
|
"sign the environment isn't set up right.\n",
|
|
"\n",
|
|
"If this is the very first time you're using JupyterLab, take 10 minutes\n",
|
|
"to read the [official \"Getting started with JupyterLab\"\n",
|
|
"guide](https://jupyterlab.readthedocs.io/en/stable/getting_started/overview.html).\n",
|
|
"The most important things to know are:\n",
|
|
"\n",
|
|
"- A notebook (`.ipynb` file) is a sequence of **cells**.\n",
|
|
"- Each cell is either **Markdown** (formatted text, like this) or **Code**\n",
|
|
" (Python that the computer runs).\n",
|
|
"- The **kernel** is the running Python process behind the notebook. It\n",
|
|
" remembers everything you've defined. If something gets weird, restart\n",
|
|
" the kernel: top menu \u2192 *Kernel* \u2192 *Restart Kernel\u2026*.\n",
|
|
"- `Shift + Enter` runs a cell and moves to the next one.\n",
|
|
"- `Ctrl + Enter` runs a cell and stays put.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## What is the project about?\n",
|
|
"\n",
|
|
"Drosophila males court females with a stereotyped sequence (chasing,\n",
|
|
"wing-extension, tapping). When a male is rejected by a female (e.g.\n",
|
|
"because she's already mated), he **learns** to suppress his courtship \u2014\n",
|
|
"even toward new, receptive females, for a while. This is a textbook\n",
|
|
"example of *non-associative learning* in invertebrates ([review on\n",
|
|
"PubMed](https://pubmed.ncbi.nlm.nih.gov/?term=courtship+conditioning+drosophila)).\n",
|
|
"\n",
|
|
"The lab is interested in:\n",
|
|
"\n",
|
|
"- Does this learning **transfer across species**? (We have ~7 *Drosophila*\n",
|
|
" species recorded.)\n",
|
|
"- How long does the memory last? (training_length_hr,\n",
|
|
" consolidation_length_hr columns in the metadata.)\n",
|
|
"- Are there **individual differences** \u2014 do some males learn while others\n",
|
|
" don't? (The \"bimodal hypothesis\" in `docs/bimodal_hypothesis.md`.)\n",
|
|
"\n",
|
|
"Your job, broadly, will be to **turn videos of flies into numbers and\n",
|
|
"plots that answer these questions.**\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## How an experiment works (the bird's-eye view)\n",
|
|
"\n",
|
|
"1. **Training**: a male fly is placed with a non-receptive (mated) female.\n",
|
|
" He courts, gets rejected, eventually gives up.\n",
|
|
"2. *Wait* for some hours (the \"consolidation\" period \u2014 gives memory time\n",
|
|
" to form).\n",
|
|
"3. **Testing**: same male is placed with a fresh receptive female.\n",
|
|
" Does he court her vigorously, or has he learned to give up easily?\n",
|
|
"\n",
|
|
"Each experiment runs in an **HD mating arena** \u2014 a small chamber with\n",
|
|
"6 sub-arenas (we call them **ROIs**, for \"regions of interest\"). Each ROI\n",
|
|
"contains one couple (a male and a female). A camera films the whole arena\n",
|
|
"from above. So one **video** gives us 6 simultaneous experiments.\n",
|
|
"\n",
|
|
"The setup uses [Ethoscopes](https://www.ethoscope.com/) \u2014 open-source\n",
|
|
"behavioural recording boxes built in this lab. Each ethoscope is a\n",
|
|
"machine; we have 16 in total, named `ETHOSCOPE_067`, `ETHOSCOPE_076`, etc.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## What does the data look like?\n",
|
|
"\n",
|
|
"For each video, the **tracker** (a piece of software that runs after the\n",
|
|
"recording) finds the flies frame-by-frame and writes their positions to a\n",
|
|
"**SQLite database** (a single file, ending in `.db`). One DB per video.\n",
|
|
"Inside each DB there are 6 tables called `ROI_1`, `ROI_2`, \u2026, `ROI_6` \u2014\n",
|
|
"one per sub-arena. Each row of an ROI table is **one fly detection at one\n",
|
|
"moment in time** with these columns:\n",
|
|
"\n",
|
|
"| column | meaning |\n",
|
|
"|---|---|\n",
|
|
"| `id` | row number (auto-incremented) |\n",
|
|
"| `t` | time in **milliseconds** since the video started |\n",
|
|
"| `x`, `y` | fly position in **pixels** (top-left corner of the image is 0,0) |\n",
|
|
"| `w`, `h` | width and height of the bounding box around the fly, in pixels |\n",
|
|
"| `phi` | orientation angle of the fly |\n",
|
|
"| `is_inferred` | 1 if the position was guessed (not directly seen), 0 otherwise |\n",
|
|
"| `has_interacted` | (legacy column, mostly unused) |\n",
|
|
"\n",
|
|
"If a single ROI has two flies that the tracker can see, you'll get **two\n",
|
|
"rows with the same `t`** \u2014 one for each fly. If only one fly is detected\n",
|
|
"(maybe they're on top of each other), you'll get one row.\n",
|
|
"\n",
|
|
"That's the heart of the data. Everything else (distances, velocities,\n",
|
|
"group comparisons) is computed from these (t, x, y) traces.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Where everything lives\n",
|
|
"\n",
|
|
"Take a moment to memorize these locations \u2014 you'll come back to them often.\n",
|
|
"\n",
|
|
"| what | where |\n",
|
|
"|---|---|\n",
|
|
"| Tracking DBs (SQLite, one per video) | `/mnt/data/projects/cupido/tracked/` |\n",
|
|
"| Target JSONs (the user-clicked reference points) | `/mnt/data/projects/cupido/targets/` |\n",
|
|
"| Source video files | `/mnt/ethoscope_data/videos/` |\n",
|
|
"| Project code (this repo) | `/home/gg/ownCloud/Work/Projects/coding/cupido/tracking/` |\n",
|
|
"| The metadata table (xlsx + TSV) | `/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv` |\n",
|
|
"| Your notebooks | `notebooks/getting_started/` (this folder) |\n",
|
|
"\n",
|
|
"Let's verify a couple of these from inside Python:\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"metadata": {},
|
|
"execution_count": null,
|
|
"outputs": [],
|
|
"source": [
|
|
"from pathlib import Path\n",
|
|
"\n",
|
|
"tracked = Path(\"/mnt/data/projects/cupido/tracked\")\n",
|
|
"targets = Path(\"/mnt/data/projects/cupido/targets\")\n",
|
|
"\n",
|
|
"n_dbs = len(list(tracked.glob(\"*_tracking.db\")))\n",
|
|
"n_jsons = len(list(targets.glob(\"*.json\")))\n",
|
|
"\n",
|
|
"print(f\"Tracking DBs available: {n_dbs}\")\n",
|
|
"print(f\"Target JSONs available: {n_jsons}\")\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"You should see roughly 113 tracking DBs and 130 target JSONs. If those\n",
|
|
"numbers are zero, the storage volume isn't mounted \u2014 ask Giorgio.\n",
|
|
"\n",
|
|
"> **Note**: the tracking DBs are read-only inside the JupyterLab\n",
|
|
"> container. You can read them but not modify or delete them. That's a\n",
|
|
"> deliberate safety measure \u2014 we don't want analysis code accidentally\n",
|
|
"> corrupting the source data.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Glossary (refer back as needed)\n",
|
|
"\n",
|
|
"- **ROI** \u2014 *region of interest*. One sub-arena inside the HD mating\n",
|
|
" arena. There are 6 ROIs per video, numbered 1\u20136.\n",
|
|
"- **fly** \u2014 one detection in a single (t, ROI) cell. Two flies in the\n",
|
|
" same ROI at the same time = two rows with the same `t`.\n",
|
|
"- **trained** \u2014 the male had a training session before testing.\n",
|
|
"- **naive** \u2014 the male is a control (no training).\n",
|
|
"- **training session** \u2014 the recording where the male meets the\n",
|
|
" non-receptive female (he gets rejected).\n",
|
|
"- **testing session** \u2014 the recording where the male meets a fresh\n",
|
|
" receptive female (we measure his courtship).\n",
|
|
"- **t (milliseconds)** \u2014 time within one session, starting at 0.\n",
|
|
"- **(x, y) pixels** \u2014 fly position in the image. Top-left is (0, 0); x\n",
|
|
" grows to the right, y grows **downward** (this is the image-coordinate\n",
|
|
" convention, opposite of math class).\n",
|
|
"- **machine_name** \u2014 which ethoscope recorded the video, e.g.\n",
|
|
" `ETHOSCOPE_076`.\n",
|
|
"- **species** \u2014 `Melanogaster/CS`, `Sechellia`, `Simulans`, `Yakuba`,\n",
|
|
" `Erecta`, `Willistoni`, or `CS`.\n",
|
|
"\n",
|
|
"If you bump into other terms in the code, ask. Don't guess \u2014 biology\n",
|
|
"codebases pick up jargon over the years.\n"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"## What's next\n",
|
|
"\n",
|
|
"When you're ready, open these notebooks **in order**:\n",
|
|
"\n",
|
|
"1. `01_python_pandas_basics.ipynb` \u2014 just enough Python and pandas to\n",
|
|
" read and manipulate tabular data.\n",
|
|
"2. `02_explore_one_database.ipynb` \u2014 open one tracking DB, plot a fly's\n",
|
|
" trajectory, see what the numbers actually look like.\n",
|
|
"3. `03_compare_trained_vs_naive.ipynb` \u2014 your first real analysis,\n",
|
|
" comparing groups of flies.\n",
|
|
"\n",
|
|
"After those, the notebooks one level up (`flies_analysis.ipynb`,\n",
|
|
"`flies_analysis_simple.ipynb`) contain the analysis pipeline that the\n",
|
|
"previous student built \u2014 those will make sense once you've worked\n",
|
|
"through the tutorials.\n",
|
|
"\n",
|
|
"Don't try to power through all of them in one sitting. Run a few cells,\n",
|
|
"read the explanation, **change a number** to see what happens, **break\n",
|
|
"something on purpose** to see the error message. That's how you learn.\n"
|
|
]
|
|
}
|
|
]
|
|
}
|