Move metadata xlsx/TSV to /mnt/data/projects/cupido/
Consolidates everything bulky (tracking DBs, targets, metadata spreadsheet) under a single DATA_VOLUME root outside the ownCloud-synced repo. Notebooks now use a visible DATA_DIR = Path(...) idiom rather than walking up the filesystem with PROJECT_ROOT.parent — easier for students with no Python background to follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
ec56e51bf9
commit
f176224150
8 changed files with 102 additions and 160 deletions
|
|
@ -16,7 +16,7 @@
|
|||
"cell_type": "markdown",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# 01 \u00b7 Python and pandas \u2014 just enough to be dangerous\n",
|
||||
"# 01 · Python and pandas — just enough to be dangerous\n",
|
||||
"\n",
|
||||
"This notebook teaches the **minimum** Python and `pandas` you need to read\n",
|
||||
"the rest of the project's code and write your own analyses.\n",
|
||||
|
|
@ -28,10 +28,10 @@
|
|||
"\n",
|
||||
"External resources, in order of how much time they take:\n",
|
||||
"\n",
|
||||
"- \ud83e\udd98 [Python in 10 minutes (very condensed)](https://www.stavros.io/tutorials/python/)\n",
|
||||
"- \ud83d\udc0d [Official Python tutorial \u2014 chapters 3\u20135](https://docs.python.org/3/tutorial/introduction.html)\n",
|
||||
"- \ud83d\udc3c [pandas in 10 minutes (official)](https://pandas.pydata.org/docs/user_guide/10min.html)\n",
|
||||
"- \ud83d\udcda [Python for Data Analysis (the book)](https://wesmckinney.com/book/) \u2014 free online\n"
|
||||
"- 🦘 [Python in 10 minutes (very condensed)](https://www.stavros.io/tutorials/python/)\n",
|
||||
"- 🐍 [Official Python tutorial — chapters 3–5](https://docs.python.org/3/tutorial/introduction.html)\n",
|
||||
"- 🐼 [pandas in 10 minutes (official)](https://pandas.pydata.org/docs/user_guide/10min.html)\n",
|
||||
"- 📚 [Python for Data Analysis (the book)](https://wesmckinney.com/book/) — free online\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -90,7 +90,7 @@
|
|||
"message = \"We tracked \" + str(n_flies) + \" \" + species + \" males.\"\n",
|
||||
"print(message)\n",
|
||||
"\n",
|
||||
"# A nicer way to build strings \u2014 f-strings (note the leading 'f'):\n",
|
||||
"# A nicer way to build strings — f-strings (note the leading 'f'):\n",
|
||||
"print(f\"We tracked {n_flies} {species} males.\")\n"
|
||||
]
|
||||
},
|
||||
|
|
@ -111,7 +111,7 @@
|
|||
"outputs": [],
|
||||
"source": [
|
||||
"machines = [\"ETHOSCOPE_076\", \"ETHOSCOPE_082\", \"ETHOSCOPE_086\"]\n",
|
||||
"print(machines[0]) # first item \u2014 Python counts from 0!\n",
|
||||
"print(machines[0]) # first item — Python counts from 0!\n",
|
||||
"print(machines[-1]) # last item\n",
|
||||
"print(len(machines)) # how many items\n",
|
||||
"print(machines + [\"ETHOSCOPE_140\"]) # concatenate (returns a new list)\n"
|
||||
|
|
@ -212,7 +212,7 @@
|
|||
" return days / 7\n",
|
||||
"\n",
|
||||
"print(fly_age_in_weeks(14)) # 2.0\n",
|
||||
"print(fly_age_in_weeks(5)) # 0.714\u2026\n"
|
||||
"print(fly_age_in_weeks(5)) # 0.714…\n"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
|
@ -242,12 +242,12 @@
|
|||
"source": [
|
||||
"## 9. Meet pandas\n",
|
||||
"\n",
|
||||
"Real data is rarely a single number \u2014 it's a **table** with rows and\n",
|
||||
"Real data is rarely a single number — it's a **table** with rows and\n",
|
||||
"columns (think Excel). `pandas` is the library that handles tables in\n",
|
||||
"Python. The two main objects are:\n",
|
||||
"\n",
|
||||
"- **`Series`** \u2014 a single column with a name.\n",
|
||||
"- **`DataFrame`** \u2014 a whole table.\n",
|
||||
"- **`Series`** — a single column with a name.\n",
|
||||
"- **`DataFrame`** — a whole table.\n",
|
||||
"\n",
|
||||
"By convention we import pandas as `pd`. Always.\n"
|
||||
]
|
||||
|
|
@ -257,17 +257,7 @@
|
|||
"metadata": {},
|
||||
"execution_count": null,
|
||||
"outputs": [],
|
||||
"source": [
|
||||
"import pandas as pd\n",
|
||||
"\n",
|
||||
"# Read the project's metadata TSV (Tab-Separated Values).\n",
|
||||
"tsv_path = \"/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv\"\n",
|
||||
"df = pd.read_csv(tsv_path, sep=\"\\t\")\n",
|
||||
"\n",
|
||||
"# How big is it?\n",
|
||||
"print(f\"Rows: {len(df)}\")\n",
|
||||
"print(f\"Columns: {df.shape[1]}\")\n"
|
||||
]
|
||||
"source": "import pandas as pd\nfrom pathlib import Path\n\n# All the project's bulky data lives under /mnt/data/projects/cupido/.\n# This pattern — define one DATA_DIR variable, then build sub-paths from\n# it — is much easier to read (and to update) than hard-coding long\n# strings everywhere.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\ntsv_path = DATA_DIR / \"all_video_info_merged.tsv\"\n\n# Read the project's metadata TSV (Tab-Separated Values).\ndf = pd.read_csv(tsv_path, sep=\"\\t\")\n\n# How big is it?\nprint(f\"Rows: {len(df)}\")\nprint(f\"Columns: {df.shape[1]}\")\n"
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
|
|
@ -368,7 +358,7 @@
|
|||
"mel_only = df[df[\"species\"] == \"Melanogaster/CS\"]\n",
|
||||
"print(f\"Melanogaster/CS rows: {len(mel_only)}\")\n",
|
||||
"\n",
|
||||
"# Combine conditions with & (and) | (or) \u2014 and wrap each part in parentheses.\n",
|
||||
"# Combine conditions with & (and) | (or) — and wrap each part in parentheses.\n",
|
||||
"trained_mel = df[(df[\"male\"] == \"trained\") & (df[\"species\"] == \"Melanogaster/CS\")]\n",
|
||||
"print(f\"trained Mel rows: {len(trained_mel)}\")\n"
|
||||
]
|
||||
|
|
@ -497,4 +487,4 @@
|
|||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
Loading…
Add table
Add a link
Reference in a new issue