Add beginner tutorial notebooks for incoming students

Four guided notebooks under notebooks/getting_started/ aimed at someone
new to Python and data science. The series progresses: project orientation
→ Python/pandas crash course → exploring one tracking DB → first
trained-vs-naive comparison using load_roi_data + Mann-Whitney U.

Each notebook leans heavily on markdown explanations, includes exercises
with empty cells, and links out to canonical references (JupyterLab,
official Python tutorial, pandas 10-min guide, Wikipedia for stats
concepts).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Giorgio Gilestro 2026-04-30 18:14:17 +01:00
parent 7d09523840
commit ec56e51bf9
5 changed files with 1607 additions and 0 deletions

View file

@ -0,0 +1,500 @@
{
"nbformat": 4,
"nbformat_minor": 5,
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 01 \u00b7 Python and pandas \u2014 just enough to be dangerous\n",
"\n",
"This notebook teaches the **minimum** Python and `pandas` you need to read\n",
"the rest of the project's code and write your own analyses.\n",
"\n",
"If you've never programmed before, don't try to memorize the syntax.\n",
"Just run each cell, read what it does, and come back when you're stuck on\n",
"something specific. The cheat sheet at the end is the only thing worth\n",
"keeping handy.\n",
"\n",
"External resources, in order of how much time they take:\n",
"\n",
"- \ud83e\udd98 [Python in 10 minutes (very condensed)](https://www.stavros.io/tutorials/python/)\n",
"- \ud83d\udc0d [Official Python tutorial \u2014 chapters 3\u20135](https://docs.python.org/3/tutorial/introduction.html)\n",
"- \ud83d\udc3c [pandas in 10 minutes (official)](https://pandas.pydata.org/docs/user_guide/10min.html)\n",
"- \ud83d\udcda [Python for Data Analysis (the book)](https://wesmckinney.com/book/) \u2014 free online\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Variables\n",
"\n",
"A variable is a named box you put a value into. The `=` is **assignment**,\n",
"not equality. Read it as \"make `name` refer to `value`\".\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"x = 5\n",
"y = 3\n",
"total = x + y\n",
"print(total)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Re-running the cell after changing `x = 5` to `x = 50` gives a different\n",
"answer. Try it.\n",
"\n",
"Variable names: lowercase letters, digits, and underscores. They can't\n",
"start with a digit. Convention is `snake_case`: `mean_distance`, not\n",
"`meanDistance` or `MeanDistance`.\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Strings and numbers\n",
"\n",
"A **string** is text in quotes. You can join strings with `+`. You can\n",
"turn a number into a string with `str()`, and vice-versa with `int()` /\n",
"`float()`.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"species = \"Drosophila melanogaster\"\n",
"n_flies = 12\n",
"message = \"We tracked \" + str(n_flies) + \" \" + species + \" males.\"\n",
"print(message)\n",
"\n",
"# A nicer way to build strings \u2014 f-strings (note the leading 'f'):\n",
"print(f\"We tracked {n_flies} {species} males.\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Lists\n",
"\n",
"A list is an ordered collection of things. Square brackets, items\n",
"separated by commas. You can mix types (but usually shouldn't).\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"machines = [\"ETHOSCOPE_076\", \"ETHOSCOPE_082\", \"ETHOSCOPE_086\"]\n",
"print(machines[0]) # first item \u2014 Python counts from 0!\n",
"print(machines[-1]) # last item\n",
"print(len(machines)) # how many items\n",
"print(machines + [\"ETHOSCOPE_140\"]) # concatenate (returns a new list)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Dictionaries\n",
"\n",
"A dictionary maps **keys** to **values**. Curly braces, `key: value`\n",
"pairs.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"fly = {\"species\": \"Sechellia\", \"trained\": True, \"age_days\": 5}\n",
"print(fly[\"species\"])\n",
"print(fly[\"age_days\"])\n",
"fly[\"alive\"] = False # add a new key\n",
"print(fly)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Conditions: if / elif / else\n",
"\n",
"Compare with `==` (equal), `!=` (not equal), `<`, `>`, `<=`, `>=`.\n",
"Combine with `and`, `or`, `not`.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"distance_px = 42\n",
"\n",
"if distance_px < 50:\n",
" label = \"close\"\n",
"elif distance_px < 200:\n",
" label = \"medium\"\n",
"else:\n",
" label = \"far\"\n",
"\n",
"print(label)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Loops\n",
"\n",
"`for x in collection:` runs the indented block once per item.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"for m in machines:\n",
" print(f\"Looking at machine {m}\")\n",
"\n",
"# Looping with an index, when you need it:\n",
"for i, m in enumerate(machines):\n",
" print(f\"{i}: {m}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Functions\n",
"\n",
"A function is a named, reusable chunk of code. `def` declares it. `return`\n",
"sends a value back to whoever called it.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"def fly_age_in_weeks(days):\n",
" \"\"\"Return age in weeks given age in days.\"\"\"\n",
" return days / 7\n",
"\n",
"print(fly_age_in_weeks(14)) # 2.0\n",
"print(fly_age_in_weeks(5)) # 0.714\u2026\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Importing libraries\n",
"\n",
"A library is somebody else's code. We use `import` to pull it into our\n",
"notebook.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"import math\n",
"print(math.sqrt(16)) # 4.0\n",
"print(math.pi)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 9. Meet pandas\n",
"\n",
"Real data is rarely a single number \u2014 it's a **table** with rows and\n",
"columns (think Excel). `pandas` is the library that handles tables in\n",
"Python. The two main objects are:\n",
"\n",
"- **`Series`** \u2014 a single column with a name.\n",
"- **`DataFrame`** \u2014 a whole table.\n",
"\n",
"By convention we import pandas as `pd`. Always.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"import pandas as pd\n",
"\n",
"# Read the project's metadata TSV (Tab-Separated Values).\n",
"tsv_path = \"/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv\"\n",
"df = pd.read_csv(tsv_path, sep=\"\\t\")\n",
"\n",
"# How big is it?\n",
"print(f\"Rows: {len(df)}\")\n",
"print(f\"Columns: {df.shape[1]}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10. Looking at the table\n",
"\n",
"`.head()` shows the first 5 rows. `.tail()` the last 5. `.columns` lists\n",
"column names. `.dtypes` shows the type of each column.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"df.head(3)\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"print(\"Column names:\")\n",
"for c in df.columns:\n",
" print(f\" {c}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 11. Selecting columns\n",
"\n",
"Two main ways to get one column: bracket-indexing (`df[\"name\"]`) or\n",
"attribute access (`df.name`). The first works for any column name; the\n",
"second only works if the name has no spaces or weird characters.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"df[\"species\"].head()\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"df.species.value_counts() # how many rows per species\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 12. Selecting multiple columns\n",
"\n",
"Pass a **list** of names inside the brackets:\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"df[[\"machine_name\", \"roi\", \"species\", \"male\"]].head()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 13. Filtering rows\n",
"\n",
"The pattern is `df[condition]`. The condition is a Series of `True`/`False`.\n",
"Pandas keeps the rows where it's `True`.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"trained = df[df[\"male\"] == \"trained\"]\n",
"print(f\"trained rows: {len(trained)}\")\n",
"\n",
"mel_only = df[df[\"species\"] == \"Melanogaster/CS\"]\n",
"print(f\"Melanogaster/CS rows: {len(mel_only)}\")\n",
"\n",
"# Combine conditions with & (and) | (or) \u2014 and wrap each part in parentheses.\n",
"trained_mel = df[(df[\"male\"] == \"trained\") & (df[\"species\"] == \"Melanogaster/CS\")]\n",
"print(f\"trained Mel rows: {len(trained_mel)}\")\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 14. Grouping and counting\n",
"\n",
"`.groupby(\"col\")` followed by an aggregator like `.size()` or `.mean()`\n",
"splits the table by the values in that column and computes something per\n",
"group.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"# How many ROIs per (species, training condition)?\n",
"df.groupby([\"species\", \"male\"]).size()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 15. Quick plots\n",
"\n",
"DataFrames know how to draw themselves. Under the hood it's `matplotlib`.\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"import matplotlib.pyplot as plt\n",
"\n",
"# How many rows per machine?\n",
"df[\"machine_name\"].value_counts().plot(kind=\"bar\", figsize=(10, 4))\n",
"plt.title(\"Number of fly-rows per ethoscope machine\")\n",
"plt.ylabel(\"rows\")\n",
"plt.show()\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 16. Exercises\n",
"\n",
"Don't skip these. They're how you find out what you actually understood.\n",
"\n",
"1. How many rows does `df` have where `age` equals `'5-7'`?\n",
"2. Print the **unique values** of the `memory` column. (Hint: `df[\"memory\"].unique()`)\n",
"3. How many distinct `(date, machine_name)` pairs are in the dataset?\n",
" (Hint: `df.groupby([\"date\", \"machine_name\"]).size().shape`.)\n",
"4. Make a bar plot of `species` counts. Which species has the most rows?\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"# Try exercise 1 here\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"# Try exercise 2 here\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"# Try exercise 3 here\n"
]
},
{
"cell_type": "code",
"metadata": {},
"execution_count": null,
"outputs": [],
"source": [
"# Try exercise 4 here\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Cheat sheet\n",
"\n",
"```python\n",
"import pandas as pd\n",
"df = pd.read_csv(\"file.tsv\", sep=\"\\t\") # read\n",
"df.head(); df.tail(); df.shape; df.columns # peek\n",
"df[\"col\"]; df[[\"a\", \"b\"]] # select\n",
"df[df[\"col\"] == \"value\"] # filter\n",
"df.groupby(\"col\").size() # count per group\n",
"df.groupby(\"col\")[\"x\"].mean() # mean of x per group\n",
"df[\"col\"].value_counts() # quick counts\n",
"df[\"col\"].unique() # unique values\n",
"df[\"new_col\"] = df[\"w\"] * df[\"h\"] # derived column\n",
"df.sort_values(\"col\", ascending=False) # sort\n",
"df.plot(...) # quick plot\n",
"```\n",
"\n",
"Keep this list open when reading other people's code. Most of pandas is\n",
"just combinations of these primitives. When you need more, the official\n",
"[pandas user guide](https://pandas.pydata.org/docs/user_guide/index.html)\n",
"is excellent.\n"
]
}
]
}