Move metadata xlsx/TSV to /mnt/data/projects/cupido/
Consolidates everything bulky (tracking DBs, targets, metadata spreadsheet) under a single DATA_VOLUME root outside the ownCloud-synced repo. Notebooks now use a visible DATA_DIR = Path(...) idiom rather than walking up the filesystem with PROJECT_ROOT.parent — easier for students with no Python background to follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
parent
ec56e51bf9
commit
f176224150
8 changed files with 102 additions and 160 deletions
|
|
@ -1,8 +1,8 @@
|
||||||
# Processed Data
|
# Processed Data
|
||||||
|
|
||||||
CSVs derived from the tracking DBs (`/mnt/data/projects/cupido/tracked/`)
|
CSVs derived from the tracking DBs (`/mnt/data/projects/cupido/tracked/`)
|
||||||
and the merged TSV (`../../all_video_info_merged.tsv`). All files are
|
and the merged TSV (`/mnt/data/projects/cupido/all_video_info_merged.tsv`).
|
||||||
gitignored and regenerable.
|
All files are gitignored and regenerable.
|
||||||
|
|
||||||
## Files and Regeneration
|
## Files and Regeneration
|
||||||
|
|
||||||
|
|
@ -23,7 +23,7 @@ from load_roi_data import load_roi_data
|
||||||
data = load_roi_data() # full batch as one DataFrame
|
data = load_roi_data() # full batch as one DataFrame
|
||||||
# Or filter the metadata first:
|
# Or filter the metadata first:
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
tsv = pd.read_csv("../../all_video_info_merged.tsv", sep="\t")
|
tsv = pd.read_csv("/mnt/data/projects/cupido/all_video_info_merged.tsv", sep="\t")
|
||||||
data = load_roi_data(tsv[tsv.species.str.contains("Melanogaster")])
|
data = load_roi_data(tsv[tsv.species.str.contains("Melanogaster")])
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -16,11 +16,11 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 00 \u00b7 Welcome to the Cupido fly-tracking project\n",
|
"# 00 · Welcome to the Cupido fly-tracking project\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Hi! You're about to start working on a project that studies how *Drosophila*\n",
|
"Hi! You're about to start working on a project that studies how *Drosophila*\n",
|
||||||
"(fruit flies) form **memories of mating experiences** \u2014 and whether trained\n",
|
"(fruit flies) form **memories of mating experiences** — and whether trained\n",
|
||||||
"flies behave differently from na\u00efve ones in their later courtship.\n",
|
"flies behave differently from naïve ones in their later courtship.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"**You don't need any prior experience with Python or data science to follow\n",
|
"**You don't need any prior experience with Python or data science to follow\n",
|
||||||
"along.** This series of notebooks will walk you through everything, one\n",
|
"along.** This series of notebooks will walk you through everything, one\n",
|
||||||
|
|
@ -48,7 +48,7 @@
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You should have seen `Hello, fly world!` printed and the number `2`\n",
|
"You should have seen `Hello, fly world!` printed and the number `2`\n",
|
||||||
"appear underneath. If something else happened, ask Giorgio \u2014 that's a\n",
|
"appear underneath. If something else happened, ask Giorgio — that's a\n",
|
||||||
"sign the environment isn't set up right.\n",
|
"sign the environment isn't set up right.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If this is the very first time you're using JupyterLab, take 10 minutes\n",
|
"If this is the very first time you're using JupyterLab, take 10 minutes\n",
|
||||||
|
|
@ -61,7 +61,7 @@
|
||||||
" (Python that the computer runs).\n",
|
" (Python that the computer runs).\n",
|
||||||
"- The **kernel** is the running Python process behind the notebook. It\n",
|
"- The **kernel** is the running Python process behind the notebook. It\n",
|
||||||
" remembers everything you've defined. If something gets weird, restart\n",
|
" remembers everything you've defined. If something gets weird, restart\n",
|
||||||
" the kernel: top menu \u2192 *Kernel* \u2192 *Restart Kernel\u2026*.\n",
|
" the kernel: top menu → *Kernel* → *Restart Kernel…*.\n",
|
||||||
"- `Shift + Enter` runs a cell and moves to the next one.\n",
|
"- `Shift + Enter` runs a cell and moves to the next one.\n",
|
||||||
"- `Ctrl + Enter` runs a cell and stays put.\n"
|
"- `Ctrl + Enter` runs a cell and stays put.\n"
|
||||||
]
|
]
|
||||||
|
|
@ -74,7 +74,7 @@
|
||||||
"\n",
|
"\n",
|
||||||
"Drosophila males court females with a stereotyped sequence (chasing,\n",
|
"Drosophila males court females with a stereotyped sequence (chasing,\n",
|
||||||
"wing-extension, tapping). When a male is rejected by a female (e.g.\n",
|
"wing-extension, tapping). When a male is rejected by a female (e.g.\n",
|
||||||
"because she's already mated), he **learns** to suppress his courtship \u2014\n",
|
"because she's already mated), he **learns** to suppress his courtship —\n",
|
||||||
"even toward new, receptive females, for a while. This is a textbook\n",
|
"even toward new, receptive females, for a while. This is a textbook\n",
|
||||||
"example of *non-associative learning* in invertebrates ([review on\n",
|
"example of *non-associative learning* in invertebrates ([review on\n",
|
||||||
"PubMed](https://pubmed.ncbi.nlm.nih.gov/?term=courtship+conditioning+drosophila)).\n",
|
"PubMed](https://pubmed.ncbi.nlm.nih.gov/?term=courtship+conditioning+drosophila)).\n",
|
||||||
|
|
@ -85,7 +85,7 @@
|
||||||
" species recorded.)\n",
|
" species recorded.)\n",
|
||||||
"- How long does the memory last? (training_length_hr,\n",
|
"- How long does the memory last? (training_length_hr,\n",
|
||||||
" consolidation_length_hr columns in the metadata.)\n",
|
" consolidation_length_hr columns in the metadata.)\n",
|
||||||
"- Are there **individual differences** \u2014 do some males learn while others\n",
|
"- Are there **individual differences** — do some males learn while others\n",
|
||||||
" don't? (The \"bimodal hypothesis\" in `docs/bimodal_hypothesis.md`.)\n",
|
" don't? (The \"bimodal hypothesis\" in `docs/bimodal_hypothesis.md`.)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Your job, broadly, will be to **turn videos of flies into numbers and\n",
|
"Your job, broadly, will be to **turn videos of flies into numbers and\n",
|
||||||
|
|
@ -100,17 +100,17 @@
|
||||||
"\n",
|
"\n",
|
||||||
"1. **Training**: a male fly is placed with a non-receptive (mated) female.\n",
|
"1. **Training**: a male fly is placed with a non-receptive (mated) female.\n",
|
||||||
" He courts, gets rejected, eventually gives up.\n",
|
" He courts, gets rejected, eventually gives up.\n",
|
||||||
"2. *Wait* for some hours (the \"consolidation\" period \u2014 gives memory time\n",
|
"2. *Wait* for some hours (the \"consolidation\" period — gives memory time\n",
|
||||||
" to form).\n",
|
" to form).\n",
|
||||||
"3. **Testing**: same male is placed with a fresh receptive female.\n",
|
"3. **Testing**: same male is placed with a fresh receptive female.\n",
|
||||||
" Does he court her vigorously, or has he learned to give up easily?\n",
|
" Does he court her vigorously, or has he learned to give up easily?\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Each experiment runs in an **HD mating arena** \u2014 a small chamber with\n",
|
"Each experiment runs in an **HD mating arena** — a small chamber with\n",
|
||||||
"6 sub-arenas (we call them **ROIs**, for \"regions of interest\"). Each ROI\n",
|
"6 sub-arenas (we call them **ROIs**, for \"regions of interest\"). Each ROI\n",
|
||||||
"contains one couple (a male and a female). A camera films the whole arena\n",
|
"contains one couple (a male and a female). A camera films the whole arena\n",
|
||||||
"from above. So one **video** gives us 6 simultaneous experiments.\n",
|
"from above. So one **video** gives us 6 simultaneous experiments.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The setup uses [Ethoscopes](https://www.ethoscope.com/) \u2014 open-source\n",
|
"The setup uses [Ethoscopes](https://www.ethoscope.com/) — open-source\n",
|
||||||
"behavioural recording boxes built in this lab. Each ethoscope is a\n",
|
"behavioural recording boxes built in this lab. Each ethoscope is a\n",
|
||||||
"machine; we have 16 in total, named `ETHOSCOPE_067`, `ETHOSCOPE_076`, etc.\n"
|
"machine; we have 16 in total, named `ETHOSCOPE_067`, `ETHOSCOPE_076`, etc.\n"
|
||||||
]
|
]
|
||||||
|
|
@ -124,7 +124,7 @@
|
||||||
"For each video, the **tracker** (a piece of software that runs after the\n",
|
"For each video, the **tracker** (a piece of software that runs after the\n",
|
||||||
"recording) finds the flies frame-by-frame and writes their positions to a\n",
|
"recording) finds the flies frame-by-frame and writes their positions to a\n",
|
||||||
"**SQLite database** (a single file, ending in `.db`). One DB per video.\n",
|
"**SQLite database** (a single file, ending in `.db`). One DB per video.\n",
|
||||||
"Inside each DB there are 6 tables called `ROI_1`, `ROI_2`, \u2026, `ROI_6` \u2014\n",
|
"Inside each DB there are 6 tables called `ROI_1`, `ROI_2`, …, `ROI_6` —\n",
|
||||||
"one per sub-arena. Each row of an ROI table is **one fly detection at one\n",
|
"one per sub-arena. Each row of an ROI table is **one fly detection at one\n",
|
||||||
"moment in time** with these columns:\n",
|
"moment in time** with these columns:\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
|
@ -139,7 +139,7 @@
|
||||||
"| `has_interacted` | (legacy column, mostly unused) |\n",
|
"| `has_interacted` | (legacy column, mostly unused) |\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If a single ROI has two flies that the tracker can see, you'll get **two\n",
|
"If a single ROI has two flies that the tracker can see, you'll get **two\n",
|
||||||
"rows with the same `t`** \u2014 one for each fly. If only one fly is detected\n",
|
"rows with the same `t`** — one for each fly. If only one fly is detected\n",
|
||||||
"(maybe they're on top of each other), you'll get one row.\n",
|
"(maybe they're on top of each other), you'll get one row.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"That's the heart of the data. Everything else (distances, velocities,\n",
|
"That's the heart of the data. Everything else (distances, velocities,\n",
|
||||||
|
|
@ -149,51 +149,25 @@
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": "## Where everything lives\n\nTake a moment to memorize these locations — you'll come back to them often.\n\n| what | where |\n|---|---|\n| Tracking DBs (SQLite, one per video) | `/mnt/data/projects/cupido/tracked/` |\n| Target JSONs (the user-clicked reference points) | `/mnt/data/projects/cupido/targets/` |\n| The metadata table (xlsx + TSV) | `/mnt/data/projects/cupido/all_video_info_merged.tsv` |\n| Source video files | `/mnt/ethoscope_data/videos/` |\n| Project code (this repo) | `/home/gg/ownCloud/Work/Projects/coding/cupido/tracking/` |\n| Your notebooks | `notebooks/getting_started/` (this folder) |\n\nNotice the pattern: **everything bulky or regenerable lives under\n`/mnt/data/projects/cupido/`**. The repository itself only stores code,\ndocumentation, and small metadata files. We'll refer to that data\ndirectory as `DATA_DIR` from here on.\n\nLet's verify a couple of these from inside Python:\n"
|
||||||
"## Where everything lives\n",
|
|
||||||
"\n",
|
|
||||||
"Take a moment to memorize these locations \u2014 you'll come back to them often.\n",
|
|
||||||
"\n",
|
|
||||||
"| what | where |\n",
|
|
||||||
"|---|---|\n",
|
|
||||||
"| Tracking DBs (SQLite, one per video) | `/mnt/data/projects/cupido/tracked/` |\n",
|
|
||||||
"| Target JSONs (the user-clicked reference points) | `/mnt/data/projects/cupido/targets/` |\n",
|
|
||||||
"| Source video files | `/mnt/ethoscope_data/videos/` |\n",
|
|
||||||
"| Project code (this repo) | `/home/gg/ownCloud/Work/Projects/coding/cupido/tracking/` |\n",
|
|
||||||
"| The metadata table (xlsx + TSV) | `/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv` |\n",
|
|
||||||
"| Your notebooks | `notebooks/getting_started/` (this folder) |\n",
|
|
||||||
"\n",
|
|
||||||
"Let's verify a couple of these from inside Python:\n"
|
|
||||||
]
|
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": "from pathlib import Path\n\n# Single root for all the bulky / regenerable project data.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\n\ntracked_dir = DATA_DIR / \"tracked\"\ntargets_dir = DATA_DIR / \"targets\"\nmetadata_tsv = DATA_DIR / \"all_video_info_merged.tsv\"\n\nprint(f\"Tracking DBs available: {len(list(tracked_dir.glob('*_tracking.db')))}\")\nprint(f\"Target JSONs available: {len(list(targets_dir.glob('*.json')))}\")\nprint(f\"Metadata TSV exists: {metadata_tsv.exists()}\")\n"
|
||||||
"from pathlib import Path\n",
|
|
||||||
"\n",
|
|
||||||
"tracked = Path(\"/mnt/data/projects/cupido/tracked\")\n",
|
|
||||||
"targets = Path(\"/mnt/data/projects/cupido/targets\")\n",
|
|
||||||
"\n",
|
|
||||||
"n_dbs = len(list(tracked.glob(\"*_tracking.db\")))\n",
|
|
||||||
"n_jsons = len(list(targets.glob(\"*.json\")))\n",
|
|
||||||
"\n",
|
|
||||||
"print(f\"Tracking DBs available: {n_dbs}\")\n",
|
|
||||||
"print(f\"Target JSONs available: {n_jsons}\")\n"
|
|
||||||
]
|
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You should see roughly 113 tracking DBs and 130 target JSONs. If those\n",
|
"You should see roughly 113 tracking DBs and 130 target JSONs. If those\n",
|
||||||
"numbers are zero, the storage volume isn't mounted \u2014 ask Giorgio.\n",
|
"numbers are zero, the storage volume isn't mounted — ask Giorgio.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"> **Note**: the tracking DBs are read-only inside the JupyterLab\n",
|
"> **Note**: the tracking DBs are read-only inside the JupyterLab\n",
|
||||||
"> container. You can read them but not modify or delete them. That's a\n",
|
"> container. You can read them but not modify or delete them. That's a\n",
|
||||||
"> deliberate safety measure \u2014 we don't want analysis code accidentally\n",
|
"> deliberate safety measure — we don't want analysis code accidentally\n",
|
||||||
"> corrupting the source data.\n"
|
"> corrupting the source data.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
@ -203,26 +177,26 @@
|
||||||
"source": [
|
"source": [
|
||||||
"## Glossary (refer back as needed)\n",
|
"## Glossary (refer back as needed)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- **ROI** \u2014 *region of interest*. One sub-arena inside the HD mating\n",
|
"- **ROI** — *region of interest*. One sub-arena inside the HD mating\n",
|
||||||
" arena. There are 6 ROIs per video, numbered 1\u20136.\n",
|
" arena. There are 6 ROIs per video, numbered 1–6.\n",
|
||||||
"- **fly** \u2014 one detection in a single (t, ROI) cell. Two flies in the\n",
|
"- **fly** — one detection in a single (t, ROI) cell. Two flies in the\n",
|
||||||
" same ROI at the same time = two rows with the same `t`.\n",
|
" same ROI at the same time = two rows with the same `t`.\n",
|
||||||
"- **trained** \u2014 the male had a training session before testing.\n",
|
"- **trained** — the male had a training session before testing.\n",
|
||||||
"- **naive** \u2014 the male is a control (no training).\n",
|
"- **naive** — the male is a control (no training).\n",
|
||||||
"- **training session** \u2014 the recording where the male meets the\n",
|
"- **training session** — the recording where the male meets the\n",
|
||||||
" non-receptive female (he gets rejected).\n",
|
" non-receptive female (he gets rejected).\n",
|
||||||
"- **testing session** \u2014 the recording where the male meets a fresh\n",
|
"- **testing session** — the recording where the male meets a fresh\n",
|
||||||
" receptive female (we measure his courtship).\n",
|
" receptive female (we measure his courtship).\n",
|
||||||
"- **t (milliseconds)** \u2014 time within one session, starting at 0.\n",
|
"- **t (milliseconds)** — time within one session, starting at 0.\n",
|
||||||
"- **(x, y) pixels** \u2014 fly position in the image. Top-left is (0, 0); x\n",
|
"- **(x, y) pixels** — fly position in the image. Top-left is (0, 0); x\n",
|
||||||
" grows to the right, y grows **downward** (this is the image-coordinate\n",
|
" grows to the right, y grows **downward** (this is the image-coordinate\n",
|
||||||
" convention, opposite of math class).\n",
|
" convention, opposite of math class).\n",
|
||||||
"- **machine_name** \u2014 which ethoscope recorded the video, e.g.\n",
|
"- **machine_name** — which ethoscope recorded the video, e.g.\n",
|
||||||
" `ETHOSCOPE_076`.\n",
|
" `ETHOSCOPE_076`.\n",
|
||||||
"- **species** \u2014 `Melanogaster/CS`, `Sechellia`, `Simulans`, `Yakuba`,\n",
|
"- **species** — `Melanogaster/CS`, `Sechellia`, `Simulans`, `Yakuba`,\n",
|
||||||
" `Erecta`, `Willistoni`, or `CS`.\n",
|
" `Erecta`, `Willistoni`, or `CS`.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you bump into other terms in the code, ask. Don't guess \u2014 biology\n",
|
"If you bump into other terms in the code, ask. Don't guess — biology\n",
|
||||||
"codebases pick up jargon over the years.\n"
|
"codebases pick up jargon over the years.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
@ -234,16 +208,16 @@
|
||||||
"\n",
|
"\n",
|
||||||
"When you're ready, open these notebooks **in order**:\n",
|
"When you're ready, open these notebooks **in order**:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"1. `01_python_pandas_basics.ipynb` \u2014 just enough Python and pandas to\n",
|
"1. `01_python_pandas_basics.ipynb` — just enough Python and pandas to\n",
|
||||||
" read and manipulate tabular data.\n",
|
" read and manipulate tabular data.\n",
|
||||||
"2. `02_explore_one_database.ipynb` \u2014 open one tracking DB, plot a fly's\n",
|
"2. `02_explore_one_database.ipynb` — open one tracking DB, plot a fly's\n",
|
||||||
" trajectory, see what the numbers actually look like.\n",
|
" trajectory, see what the numbers actually look like.\n",
|
||||||
"3. `03_compare_trained_vs_naive.ipynb` \u2014 your first real analysis,\n",
|
"3. `03_compare_trained_vs_naive.ipynb` — your first real analysis,\n",
|
||||||
" comparing groups of flies.\n",
|
" comparing groups of flies.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"After those, the notebooks one level up (`flies_analysis.ipynb`,\n",
|
"After those, the notebooks one level up (`flies_analysis.ipynb`,\n",
|
||||||
"`flies_analysis_simple.ipynb`) contain the analysis pipeline that the\n",
|
"`flies_analysis_simple.ipynb`) contain the analysis pipeline that the\n",
|
||||||
"previous student built \u2014 those will make sense once you've worked\n",
|
"previous student built — those will make sense once you've worked\n",
|
||||||
"through the tutorials.\n",
|
"through the tutorials.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Don't try to power through all of them in one sitting. Run a few cells,\n",
|
"Don't try to power through all of them in one sitting. Run a few cells,\n",
|
||||||
|
|
@ -252,4 +226,4 @@
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
@ -16,7 +16,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 01 \u00b7 Python and pandas \u2014 just enough to be dangerous\n",
|
"# 01 · Python and pandas — just enough to be dangerous\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook teaches the **minimum** Python and `pandas` you need to read\n",
|
"This notebook teaches the **minimum** Python and `pandas` you need to read\n",
|
||||||
"the rest of the project's code and write your own analyses.\n",
|
"the rest of the project's code and write your own analyses.\n",
|
||||||
|
|
@ -28,10 +28,10 @@
|
||||||
"\n",
|
"\n",
|
||||||
"External resources, in order of how much time they take:\n",
|
"External resources, in order of how much time they take:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- \ud83e\udd98 [Python in 10 minutes (very condensed)](https://www.stavros.io/tutorials/python/)\n",
|
"- 🦘 [Python in 10 minutes (very condensed)](https://www.stavros.io/tutorials/python/)\n",
|
||||||
"- \ud83d\udc0d [Official Python tutorial \u2014 chapters 3\u20135](https://docs.python.org/3/tutorial/introduction.html)\n",
|
"- 🐍 [Official Python tutorial — chapters 3–5](https://docs.python.org/3/tutorial/introduction.html)\n",
|
||||||
"- \ud83d\udc3c [pandas in 10 minutes (official)](https://pandas.pydata.org/docs/user_guide/10min.html)\n",
|
"- 🐼 [pandas in 10 minutes (official)](https://pandas.pydata.org/docs/user_guide/10min.html)\n",
|
||||||
"- \ud83d\udcda [Python for Data Analysis (the book)](https://wesmckinney.com/book/) \u2014 free online\n"
|
"- 📚 [Python for Data Analysis (the book)](https://wesmckinney.com/book/) — free online\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
@ -90,7 +90,7 @@
|
||||||
"message = \"We tracked \" + str(n_flies) + \" \" + species + \" males.\"\n",
|
"message = \"We tracked \" + str(n_flies) + \" \" + species + \" males.\"\n",
|
||||||
"print(message)\n",
|
"print(message)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# A nicer way to build strings \u2014 f-strings (note the leading 'f'):\n",
|
"# A nicer way to build strings — f-strings (note the leading 'f'):\n",
|
||||||
"print(f\"We tracked {n_flies} {species} males.\")\n"
|
"print(f\"We tracked {n_flies} {species} males.\")\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
@ -111,7 +111,7 @@
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"machines = [\"ETHOSCOPE_076\", \"ETHOSCOPE_082\", \"ETHOSCOPE_086\"]\n",
|
"machines = [\"ETHOSCOPE_076\", \"ETHOSCOPE_082\", \"ETHOSCOPE_086\"]\n",
|
||||||
"print(machines[0]) # first item \u2014 Python counts from 0!\n",
|
"print(machines[0]) # first item — Python counts from 0!\n",
|
||||||
"print(machines[-1]) # last item\n",
|
"print(machines[-1]) # last item\n",
|
||||||
"print(len(machines)) # how many items\n",
|
"print(len(machines)) # how many items\n",
|
||||||
"print(machines + [\"ETHOSCOPE_140\"]) # concatenate (returns a new list)\n"
|
"print(machines + [\"ETHOSCOPE_140\"]) # concatenate (returns a new list)\n"
|
||||||
|
|
@ -212,7 +212,7 @@
|
||||||
" return days / 7\n",
|
" return days / 7\n",
|
||||||
"\n",
|
"\n",
|
||||||
"print(fly_age_in_weeks(14)) # 2.0\n",
|
"print(fly_age_in_weeks(14)) # 2.0\n",
|
||||||
"print(fly_age_in_weeks(5)) # 0.714\u2026\n"
|
"print(fly_age_in_weeks(5)) # 0.714…\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
@ -242,12 +242,12 @@
|
||||||
"source": [
|
"source": [
|
||||||
"## 9. Meet pandas\n",
|
"## 9. Meet pandas\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Real data is rarely a single number \u2014 it's a **table** with rows and\n",
|
"Real data is rarely a single number — it's a **table** with rows and\n",
|
||||||
"columns (think Excel). `pandas` is the library that handles tables in\n",
|
"columns (think Excel). `pandas` is the library that handles tables in\n",
|
||||||
"Python. The two main objects are:\n",
|
"Python. The two main objects are:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"- **`Series`** \u2014 a single column with a name.\n",
|
"- **`Series`** — a single column with a name.\n",
|
||||||
"- **`DataFrame`** \u2014 a whole table.\n",
|
"- **`DataFrame`** — a whole table.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"By convention we import pandas as `pd`. Always.\n"
|
"By convention we import pandas as `pd`. Always.\n"
|
||||||
]
|
]
|
||||||
|
|
@ -257,17 +257,7 @@
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": "import pandas as pd\nfrom pathlib import Path\n\n# All the project's bulky data lives under /mnt/data/projects/cupido/.\n# This pattern — define one DATA_DIR variable, then build sub-paths from\n# it — is much easier to read (and to update) than hard-coding long\n# strings everywhere.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\ntsv_path = DATA_DIR / \"all_video_info_merged.tsv\"\n\n# Read the project's metadata TSV (Tab-Separated Values).\ndf = pd.read_csv(tsv_path, sep=\"\\t\")\n\n# How big is it?\nprint(f\"Rows: {len(df)}\")\nprint(f\"Columns: {df.shape[1]}\")\n"
|
||||||
"import pandas as pd\n",
|
|
||||||
"\n",
|
|
||||||
"# Read the project's metadata TSV (Tab-Separated Values).\n",
|
|
||||||
"tsv_path = \"/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv\"\n",
|
|
||||||
"df = pd.read_csv(tsv_path, sep=\"\\t\")\n",
|
|
||||||
"\n",
|
|
||||||
"# How big is it?\n",
|
|
||||||
"print(f\"Rows: {len(df)}\")\n",
|
|
||||||
"print(f\"Columns: {df.shape[1]}\")\n"
|
|
||||||
]
|
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
|
|
@ -368,7 +358,7 @@
|
||||||
"mel_only = df[df[\"species\"] == \"Melanogaster/CS\"]\n",
|
"mel_only = df[df[\"species\"] == \"Melanogaster/CS\"]\n",
|
||||||
"print(f\"Melanogaster/CS rows: {len(mel_only)}\")\n",
|
"print(f\"Melanogaster/CS rows: {len(mel_only)}\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# Combine conditions with & (and) | (or) \u2014 and wrap each part in parentheses.\n",
|
"# Combine conditions with & (and) | (or) — and wrap each part in parentheses.\n",
|
||||||
"trained_mel = df[(df[\"male\"] == \"trained\") & (df[\"species\"] == \"Melanogaster/CS\")]\n",
|
"trained_mel = df[(df[\"male\"] == \"trained\") & (df[\"species\"] == \"Melanogaster/CS\")]\n",
|
||||||
"print(f\"trained Mel rows: {len(trained_mel)}\")\n"
|
"print(f\"trained Mel rows: {len(trained_mel)}\")\n"
|
||||||
]
|
]
|
||||||
|
|
@ -497,4 +487,4 @@
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
@ -16,7 +16,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 02 \u00b7 A first look at one tracking database\n",
|
"# 02 · A first look at one tracking database\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In this notebook we open **one** of the SQLite databases that the tracker\n",
|
"In this notebook we open **one** of the SQLite databases that the tracker\n",
|
||||||
"produced and look at what's actually inside. By the end you'll be able to:\n",
|
"produced and look at what's actually inside. By the end you'll be able to:\n",
|
||||||
|
|
@ -40,7 +40,7 @@
|
||||||
"## Setup\n",
|
"## Setup\n",
|
||||||
"\n",
|
"\n",
|
||||||
"We import the libraries we need. `sqlite3` is part of Python's standard\n",
|
"We import the libraries we need. `sqlite3` is part of Python's standard\n",
|
||||||
"library \u2014 no install needed.\n"
|
"library — no install needed.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
@ -71,15 +71,7 @@
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": "# Single root for all the project's data. Build sub-paths from it.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\ntracked_dir = DATA_DIR / \"tracked\"\n\ndb_files = sorted(tracked_dir.glob(\"*_tracking.db\"))\n\nprint(f\"Found {len(db_files)} tracking DBs.\")\nprint(\"\\nFirst 5 by name:\")\nfor db in db_files[:5]:\n print(f\" {db.name}\")\n"
|
||||||
"tracked_dir = Path(\"/mnt/data/projects/cupido/tracked\")\n",
|
|
||||||
"db_files = sorted(tracked_dir.glob(\"*_tracking.db\"))\n",
|
|
||||||
"\n",
|
|
||||||
"print(f\"Found {len(db_files)} tracking DBs.\")\n",
|
|
||||||
"print(\"\\nFirst 5 by name:\")\n",
|
|
||||||
"for db in db_files[:5]:\n",
|
|
||||||
" print(f\" {db.name}\")\n"
|
|
||||||
]
|
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
|
|
@ -90,7 +82,7 @@
|
||||||
"\n",
|
"\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"2024-09-17_10-32-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged_tracking.db\n",
|
"2024-09-17_10-32-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged_tracking.db\n",
|
||||||
"\u2514\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2518\u2514\u2500\u2500\u252c\u2500\u2500\u2518 \u2514\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\u2514\u2500\u2500\u2500\u2500\u2500\u2500\u252c\u2500\u2500\u2500\u2500\u2500\u2500\u2500\u2518\n",
|
"└────┬─────┘└──┬──┘ └────────────────┬───────────────┘└──────┬───────┘\n",
|
||||||
" date time machine UUID video format\n",
|
" date time machine UUID video format\n",
|
||||||
"```\n",
|
"```\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
|
@ -152,7 +144,7 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"You should see tables like `ROI_1`, `ROI_2`, \u2026, `ROI_6` (one per\n",
|
"You should see tables like `ROI_1`, `ROI_2`, …, `ROI_6` (one per\n",
|
||||||
"sub-arena), plus housekeeping tables like `METADATA`, `ROI_MAP`,\n",
|
"sub-arena), plus housekeeping tables like `METADATA`, `ROI_MAP`,\n",
|
||||||
"`VAR_MAP`, `START_EVENTS`. We mostly care about the `ROI_*` ones.\n",
|
"`VAR_MAP`, `START_EVENTS`. We mostly care about the `ROI_*` ones.\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
|
@ -187,7 +179,7 @@
|
||||||
"- `x`, `y`: fly position in **pixels**. The image origin (0, 0) is the\n",
|
"- `x`, `y`: fly position in **pixels**. The image origin (0, 0) is the\n",
|
||||||
" **top-left** corner. y grows downward.\n",
|
" **top-left** corner. y grows downward.\n",
|
||||||
"- `w`, `h`: bounding-box width/height. Their product (`area = w*h`) is a\n",
|
"- `w`, `h`: bounding-box width/height. Their product (`area = w*h`) is a\n",
|
||||||
" rough proxy for \"how big does this blob look\" \u2014 useful for spotting\n",
|
" rough proxy for \"how big does this blob look\" — useful for spotting\n",
|
||||||
" frames where the tracker merged two flies into one big detection.\n"
|
" frames where the tracker merged two flies into one big detection.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
@ -245,7 +237,7 @@
|
||||||
"source": [
|
"source": [
|
||||||
"The output tells you, e.g., \"100,000 frames had 2 flies visible, 30,000\n",
|
"The output tells you, e.g., \"100,000 frames had 2 flies visible, 30,000\n",
|
||||||
"had 1 fly visible\". Frames with 1 fly usually mean the two flies are\n",
|
"had 1 fly visible\". Frames with 1 fly usually mean the two flies are\n",
|
||||||
"overlapping or one is occluded \u2014 that's something we'll handle properly\n",
|
"overlapping or one is occluded — that's something we'll handle properly\n",
|
||||||
"in the next notebook.\n"
|
"in the next notebook.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
@ -257,7 +249,7 @@
|
||||||
"\n",
|
"\n",
|
||||||
"We'll plot the position over the first 5 minutes (300 000 ms). For\n",
|
"We'll plot the position over the first 5 minutes (300 000 ms). For\n",
|
||||||
"clarity we'll only look at frames where there were 2 flies and pick the\n",
|
"clarity we'll only look at frames where there were 2 flies and pick the\n",
|
||||||
"**first** of the two (sorted by `id`) as \"fly 1\" \u2014 this is a rough\n",
|
"**first** of the two (sorted by `id`) as \"fly 1\" — this is a rough\n",
|
||||||
"heuristic; identity tracking is harder than it sounds.\n"
|
"heuristic; identity tracking is harder than it sounds.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
@ -280,7 +272,7 @@
|
||||||
"plt.gca().invert_yaxis() # because pixel y grows downward\n",
|
"plt.gca().invert_yaxis() # because pixel y grows downward\n",
|
||||||
"plt.xlabel(\"x (pixels)\")\n",
|
"plt.xlabel(\"x (pixels)\")\n",
|
||||||
"plt.ylabel(\"y (pixels)\")\n",
|
"plt.ylabel(\"y (pixels)\")\n",
|
||||||
"plt.title(f\"Fly 1 trajectory \u2014 first 5 min \u2014 {db_path.name[:30]}\u2026\")\n",
|
"plt.title(f\"Fly 1 trajectory — first 5 min — {db_path.name[:30]}…\")\n",
|
||||||
"plt.legend()\n",
|
"plt.legend()\n",
|
||||||
"plt.axis(\"equal\")\n",
|
"plt.axis(\"equal\")\n",
|
||||||
"plt.show()\n"
|
"plt.show()\n"
|
||||||
|
|
@ -293,7 +285,7 @@
|
||||||
"You should see a tangle of lines confined to a roughly rectangular ROI.\n",
|
"You should see a tangle of lines confined to a roughly rectangular ROI.\n",
|
||||||
"That tangle is the fly walking around its sub-arena.\n",
|
"That tangle is the fly walking around its sub-arena.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Notice we did `plt.gca().invert_yaxis()` \u2014 that's because in image\n",
|
"Notice we did `plt.gca().invert_yaxis()` — that's because in image\n",
|
||||||
"coordinates y grows downward, but humans expect plots where y grows\n",
|
"coordinates y grows downward, but humans expect plots where y grows\n",
|
||||||
"upward. Without it the plot would be vertically flipped.\n"
|
"upward. Without it the plot would be vertically flipped.\n"
|
||||||
]
|
]
|
||||||
|
|
@ -318,7 +310,7 @@
|
||||||
"\n",
|
"\n",
|
||||||
"axes[0].plot(fly1[\"t\"] / 1000, fly1[\"x\"], linewidth=0.5)\n",
|
"axes[0].plot(fly1[\"t\"] / 1000, fly1[\"x\"], linewidth=0.5)\n",
|
||||||
"axes[0].set_ylabel(\"x (px)\")\n",
|
"axes[0].set_ylabel(\"x (px)\")\n",
|
||||||
"axes[0].set_title(f\"Fly 1, ROI 1, {db_path.name[:30]}\u2026\")\n",
|
"axes[0].set_title(f\"Fly 1, ROI 1, {db_path.name[:30]}…\")\n",
|
||||||
"\n",
|
"\n",
|
||||||
"axes[1].plot(fly1[\"t\"] / 1000, fly1[\"y\"], linewidth=0.5, color=\"darkorange\")\n",
|
"axes[1].plot(fly1[\"t\"] / 1000, fly1[\"y\"], linewidth=0.5, color=\"darkorange\")\n",
|
||||||
"axes[1].set_ylabel(\"y (px)\")\n",
|
"axes[1].set_ylabel(\"y (px)\")\n",
|
||||||
|
|
@ -344,7 +336,7 @@
|
||||||
"## Distance between the two flies\n",
|
"## Distance between the two flies\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Whenever the ROI has 2 detections at the same `t`, we can compute the\n",
|
"Whenever the ROI has 2 detections at the same `t`, we can compute the\n",
|
||||||
"Euclidean distance between them: `sqrt((x1-x2)\u00b2 + (y1-y2)\u00b2)`.\n"
|
"Euclidean distance between them: `sqrt((x1-x2)² + (y1-y2)²)`.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
@ -399,7 +391,7 @@
|
||||||
"## Don't forget to close the connection\n",
|
"## Don't forget to close the connection\n",
|
||||||
"\n",
|
"\n",
|
||||||
"If you opened a connection, close it when you're done. (Not strictly\n",
|
"If you opened a connection, close it when you're done. (Not strictly\n",
|
||||||
"necessary in a notebook \u2014 Python tidies up \u2014 but a good habit.)\n"
|
"necessary in a notebook — Python tidies up — but a good habit.)\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
@ -423,7 +415,7 @@
|
||||||
"2. Plot the distance trace for **ROI 4** instead of ROI 1.\n",
|
"2. Plot the distance trace for **ROI 4** instead of ROI 1.\n",
|
||||||
"3. Compute the **percentage of frames** in ROI 1 that had only 1 fly visible.\n",
|
"3. Compute the **percentage of frames** in ROI 1 that had only 1 fly visible.\n",
|
||||||
"4. The `area = w * h` column is a useful diagnostic. Plot `area` vs `t`\n",
|
"4. The `area = w * h` column is a useful diagnostic. Plot `area` vs `t`\n",
|
||||||
" for fly 1 \u2014 when does the bounding box get unusually large?\n"
|
" for fly 1 — when does the bounding box get unusually large?\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
@ -436,4 +428,4 @@
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
@ -16,13 +16,13 @@
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"# 03 \u00b7 Your first real analysis: trained vs naive\n",
|
"# 03 · Your first real analysis: trained vs naive\n",
|
||||||
"\n",
|
"\n",
|
||||||
"In notebook 02 we explored a single database. Now we'll work with **all\n",
|
"In notebook 02 we explored a single database. Now we'll work with **all\n",
|
||||||
"of them at once**, compute a simple per-fly metric, and ask the central\n",
|
"of them at once**, compute a simple per-fly metric, and ask the central\n",
|
||||||
"question of the project:\n",
|
"question of the project:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"> **Do trained males behave differently from na\u00efve males in the testing\n",
|
"> **Do trained males behave differently from naïve males in the testing\n",
|
||||||
"> session?**\n",
|
"> session?**\n",
|
||||||
"\n",
|
"\n",
|
||||||
"By the end you'll have:\n",
|
"By the end you'll have:\n",
|
||||||
|
|
@ -31,7 +31,7 @@
|
||||||
" project's helper function;\n",
|
" project's helper function;\n",
|
||||||
"- reduced each trace to one number per fly (the *median inter-fly\n",
|
"- reduced each trace to one number per fly (the *median inter-fly\n",
|
||||||
" distance*);\n",
|
" distance*);\n",
|
||||||
"- compared the trained group against the na\u00efve group with a histogram\n",
|
"- compared the trained group against the naïve group with a histogram\n",
|
||||||
" and a non-parametric statistical test;\n",
|
" and a non-parametric statistical test;\n",
|
||||||
"- learnt enough to start asking your own questions.\n"
|
"- learnt enough to start asking your own questions.\n"
|
||||||
]
|
]
|
||||||
|
|
@ -48,27 +48,13 @@
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": "import sys\nfrom pathlib import Path\n\nimport numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\nfrom scipy import stats\n\n# Two locations to know about:\n# - DATA_DIR : where the project's data files live (read-only data volume)\n# - REPO_ROOT : where the code repository lives (this notebook is inside it)\n# We build both as Path objects, then derive everything else from them.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\nREPO_ROOT = Path(\"/home/gg/ownCloud/Work/Projects/coding/cupido/tracking\")\n\n# Tell Python where to find the project's helper modules (in scripts/).\nsys.path.insert(0, str(REPO_ROOT / \"scripts\"))\n\nfrom load_roi_data import load_roi_data\n"
|
||||||
"import sys\n",
|
|
||||||
"from pathlib import Path\n",
|
|
||||||
"\n",
|
|
||||||
"import numpy as np\n",
|
|
||||||
"import pandas as pd\n",
|
|
||||||
"import matplotlib.pyplot as plt\n",
|
|
||||||
"from scipy import stats\n",
|
|
||||||
"\n",
|
|
||||||
"# Tell Python where to find the project's helper modules.\n",
|
|
||||||
"PROJECT_ROOT = Path(\"..\").resolve().parent # this notebook is in notebooks/getting_started/\n",
|
|
||||||
"sys.path.insert(0, str(PROJECT_ROOT / \"scripts\"))\n",
|
|
||||||
"\n",
|
|
||||||
"from load_roi_data import load_roi_data\n"
|
|
||||||
]
|
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"source": [
|
"source": [
|
||||||
"## Loading everything at once \u2014 but carefully\n",
|
"## Loading everything at once — but carefully\n",
|
||||||
"\n",
|
"\n",
|
||||||
"`load_roi_data()` opens every tracking DB referenced by the metadata TSV\n",
|
"`load_roi_data()` opens every tracking DB referenced by the metadata TSV\n",
|
||||||
"and returns one big DataFrame. **It can be slow and memory-hungry**\n",
|
"and returns one big DataFrame. **It can be slow and memory-hungry**\n",
|
||||||
|
|
@ -80,12 +66,7 @@
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": "# Load the metadata TSV first — it's small and fast.\ntsv_path = DATA_DIR / \"all_video_info_merged.tsv\"\nmeta = pd.read_csv(tsv_path, sep=\"\\t\")\nprint(f\"metadata rows: {len(meta)}\")\n"
|
||||||
"# Load the metadata TSV first \u2014 it's small and fast.\n",
|
|
||||||
"tsv_path = \"/home/gg/ownCloud/Work/Projects/coding/cupido/all_video_info_merged.tsv\"\n",
|
|
||||||
"meta = pd.read_csv(tsv_path, sep=\"\\t\")\n",
|
|
||||||
"print(f\"metadata rows: {len(meta)}\")\n"
|
|
||||||
]
|
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "markdown",
|
"cell_type": "markdown",
|
||||||
|
|
@ -180,7 +161,7 @@
|
||||||
"Right now each fly contributes **tens of thousands** of (t, x, y) rows.\n",
|
"Right now each fly contributes **tens of thousands** of (t, x, y) rows.\n",
|
||||||
"We can't compare distributions of millions of points across two groups\n",
|
"We can't compare distributions of millions of points across two groups\n",
|
||||||
"in any meaningful way. So we **collapse each (date, machine_name, ROI)\n",
|
"in any meaningful way. So we **collapse each (date, machine_name, ROI)\n",
|
||||||
"trace into a single summary number** \u2014 here, the median distance between\n",
|
"trace into a single summary number** — here, the median distance between\n",
|
||||||
"the two flies during testing.\n",
|
"the two flies during testing.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Why median rather than mean? Because tracker glitches (one fly\n",
|
"Why median rather than mean? Because tracker glitches (one fly\n",
|
||||||
|
|
@ -195,7 +176,7 @@
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Step 1 \u2014 per-frame distance.\n",
|
"# Step 1 — per-frame distance.\n",
|
||||||
"# Take only frames with exactly 2 flies (so we have a real distance).\n",
|
"# Take only frames with exactly 2 flies (so we have a real distance).\n",
|
||||||
"two_fly = testing.groupby([\"date\", \"machine_name\", \"ROI\", \"t\"]).filter(lambda g: len(g) == 2)\n",
|
"two_fly = testing.groupby([\"date\", \"machine_name\", \"ROI\", \"t\"]).filter(lambda g: len(g) == 2)\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
|
@ -220,7 +201,7 @@
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"# Step 2 \u2014 one number per (date, machine_name, ROI).\n",
|
"# Step 2 — one number per (date, machine_name, ROI).\n",
|
||||||
"per_fly = (\n",
|
"per_fly = (\n",
|
||||||
" per_frame\n",
|
" per_frame\n",
|
||||||
" .groupby([\"date\", \"machine_name\", \"ROI\", \"male\"])[\"distance_px\"]\n",
|
" .groupby([\"date\", \"machine_name\", \"ROI\", \"male\"])[\"distance_px\"]\n",
|
||||||
|
|
@ -278,7 +259,7 @@
|
||||||
"\n",
|
"\n",
|
||||||
"ax.set_xlabel(\"median inter-fly distance during testing (px)\")\n",
|
"ax.set_xlabel(\"median inter-fly distance during testing (px)\")\n",
|
||||||
"ax.set_ylabel(\"number of flies\")\n",
|
"ax.set_ylabel(\"number of flies\")\n",
|
||||||
"ax.set_title(\"Trained vs na\u00efve \u2014 Melanogaster/CS \u2014 testing session\")\n",
|
"ax.set_title(\"Trained vs naïve — Melanogaster/CS — testing session\")\n",
|
||||||
"ax.legend()\n",
|
"ax.legend()\n",
|
||||||
"plt.show()\n"
|
"plt.show()\n"
|
||||||
]
|
]
|
||||||
|
|
@ -293,10 +274,10 @@
|
||||||
" trained males are spending less time near the female (i.e. they\n",
|
" trained males are spending less time near the female (i.e. they\n",
|
||||||
" learned to give up).\n",
|
" learned to give up).\n",
|
||||||
"- If the two distributions look identical, no learning effect was\n",
|
"- If the two distributions look identical, no learning effect was\n",
|
||||||
" measurable with this metric \u2014 but that doesn't mean there's no effect,\n",
|
" measurable with this metric — but that doesn't mean there's no effect,\n",
|
||||||
" just that this particular summary didn't capture it.\n",
|
" just that this particular summary didn't capture it.\n",
|
||||||
"- A **bimodal** trained distribution (two humps) would mean some males\n",
|
"- A **bimodal** trained distribution (two humps) would mean some males\n",
|
||||||
" learned and others didn't \u2014 the \"individual differences\" story in\n",
|
" learned and others didn't — the \"individual differences\" story in\n",
|
||||||
" `docs/bimodal_hypothesis.md`.\n"
|
" `docs/bimodal_hypothesis.md`.\n"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
|
@ -353,7 +334,7 @@
|
||||||
"- **Pick a different metric**: instead of median distance, try fraction\n",
|
"- **Pick a different metric**: instead of median distance, try fraction\n",
|
||||||
" of time the flies were within 50 px (a \"close-proximity\" metric), or\n",
|
" of time the flies were within 50 px (a \"close-proximity\" metric), or\n",
|
||||||
" the maximum velocity per fly. (Velocity needs identity tracking, which\n",
|
" the maximum velocity per fly. (Velocity needs identity tracking, which\n",
|
||||||
" is harder \u2014 see `flies_analysis_simple.ipynb` cell 16 for an example.)\n",
|
" is harder — see `flies_analysis_simple.ipynb` cell 16 for an example.)\n",
|
||||||
"- **Look at it per species**: re-run with `species == \"Sechellia\"` and\n",
|
"- **Look at it per species**: re-run with `species == \"Sechellia\"` and\n",
|
||||||
" compare. Does the effect generalize? Where is it strongest?\n",
|
" compare. Does the effect generalize? Where is it strongest?\n",
|
||||||
"- **Look at the bimodality**: a kernel density plot\n",
|
"- **Look at the bimodality**: a kernel density plot\n",
|
||||||
|
|
@ -389,10 +370,10 @@
|
||||||
"`parquet` is a fast columnar format. `pip install pyarrow` if your\n",
|
"`parquet` is a fast columnar format. `pip install pyarrow` if your\n",
|
||||||
"environment doesn't have it.\n",
|
"environment doesn't have it.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"There are also vectorized ways to compute these distances ~100\u00d7 faster\n",
|
"There are also vectorized ways to compute these distances ~100× faster\n",
|
||||||
"that avoid `groupby().apply()`. Don't worry about that yet \u2014 get a\n",
|
"that avoid `groupby().apply()`. Don't worry about that yet — get a\n",
|
||||||
"correct answer first, optimize only if you find yourself waiting.\n"
|
"correct answer first, optimize only if you find yourself waiting.\n"
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
|
|
@ -2,21 +2,26 @@
|
||||||
|
|
||||||
from pathlib import Path
|
from pathlib import Path
|
||||||
|
|
||||||
|
# Where this code repository lives (the directory containing scripts/, notebooks/, ...).
|
||||||
PROJECT_ROOT = Path(__file__).resolve().parent.parent
|
PROJECT_ROOT = Path(__file__).resolve().parent.parent
|
||||||
DATA_RAW = PROJECT_ROOT / "data" / "raw"
|
DATA_RAW = PROJECT_ROOT / "data" / "raw"
|
||||||
DATA_METADATA = PROJECT_ROOT / "data" / "metadata"
|
DATA_METADATA = PROJECT_ROOT / "data" / "metadata"
|
||||||
DATA_PROCESSED = PROJECT_ROOT / "data" / "processed"
|
DATA_PROCESSED = PROJECT_ROOT / "data" / "processed"
|
||||||
FIGURES = PROJECT_ROOT / "figures"
|
FIGURES = PROJECT_ROOT / "figures"
|
||||||
|
|
||||||
# Offline-tracking pipeline paths
|
|
||||||
VIDEOS_ROOT = Path("/mnt/ethoscope_data/videos")
|
|
||||||
VIDEO_INFO_XLSX = PROJECT_ROOT.parent / "all_video_info_merged.xlsx"
|
|
||||||
INVENTORY_CSV = DATA_METADATA / "video_inventory.csv"
|
|
||||||
# Reason: kept on the local data volume alongside the tracking DBs (out of
|
|
||||||
# ownCloud sync). See TRACKING_OUTPUT_DIR comment below.
|
|
||||||
TARGETS_DIR = Path("/mnt/data/projects/cupido/targets")
|
|
||||||
# Reason: tracking DBs are large binary files that don't belong in
|
|
||||||
# ownCloud-synced storage (sync conflicts + bandwidth). They live on the
|
|
||||||
# local data volume instead. Regenerable from videos + target JSONs.
|
|
||||||
TRACKING_OUTPUT_DIR = Path("/mnt/data/projects/cupido/tracked")
|
|
||||||
LOGS_DIR = PROJECT_ROOT / "data" / "logs"
|
LOGS_DIR = PROJECT_ROOT / "data" / "logs"
|
||||||
|
|
||||||
|
# Where the source videos live (read-only NFS mount).
|
||||||
|
VIDEOS_ROOT = Path("/mnt/ethoscope_data/videos")
|
||||||
|
|
||||||
|
# Where the project's bulky data lives — outside the ownCloud-synced repo so
|
||||||
|
# it doesn't churn the cloud sync. This single root holds everything that's
|
||||||
|
# big or regenerable: tracking DBs, target-point JSONs, and the metadata
|
||||||
|
# spreadsheet (xlsx + TSV).
|
||||||
|
DATA_VOLUME = Path("/mnt/data/projects/cupido")
|
||||||
|
TARGETS_DIR = DATA_VOLUME / "targets"
|
||||||
|
TRACKING_OUTPUT_DIR = DATA_VOLUME / "tracked"
|
||||||
|
VIDEO_INFO_XLSX = DATA_VOLUME / "all_video_info_merged.xlsx"
|
||||||
|
VIDEO_INFO_TSV = DATA_VOLUME / "all_video_info_merged.tsv"
|
||||||
|
|
||||||
|
# A small CSV listing every video file we know about (built locally).
|
||||||
|
INVENTORY_CSV = DATA_METADATA / "video_inventory.csv"
|
||||||
|
|
|
||||||
|
|
@ -26,7 +26,7 @@ from pathlib import Path
|
||||||
|
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
|
|
||||||
from config import INVENTORY_CSV, TRACKING_OUTPUT_DIR, VIDEO_INFO_XLSX
|
from config import INVENTORY_CSV, TRACKING_OUTPUT_DIR, VIDEO_INFO_TSV, VIDEO_INFO_XLSX
|
||||||
|
|
||||||
|
|
||||||
_TIME_RE = re.compile(r"^(\d{8})_(\d{1,2})(\d{2})?(AM|PM)$", re.IGNORECASE)
|
_TIME_RE = re.compile(r"^(\d{8})_(\d{1,2})(\d{2})?(AM|PM)$", re.IGNORECASE)
|
||||||
|
|
@ -138,7 +138,7 @@ def main() -> None:
|
||||||
parser.add_argument(
|
parser.add_argument(
|
||||||
"--out",
|
"--out",
|
||||||
type=Path,
|
type=Path,
|
||||||
default=VIDEO_INFO_XLSX.with_suffix(".tsv"),
|
default=VIDEO_INFO_TSV,
|
||||||
help="output TSV path (default: alongside the xlsx)",
|
help="output TSV path (default: alongside the xlsx)",
|
||||||
)
|
)
|
||||||
args = parser.parse_args()
|
args = parser.parse_args()
|
||||||
|
|
|
||||||
|
|
@ -13,7 +13,7 @@ from pathlib import Path
|
||||||
|
|
||||||
import pandas as pd
|
import pandas as pd
|
||||||
|
|
||||||
from config import VIDEO_INFO_XLSX
|
from config import VIDEO_INFO_TSV
|
||||||
|
|
||||||
|
|
||||||
# Metadata columns to copy onto every tracking sample. These are the xlsx
|
# Metadata columns to copy onto every tracking sample. These are the xlsx
|
||||||
|
|
@ -68,7 +68,7 @@ def load_roi_data(meta: pd.DataFrame | None = None) -> pd.DataFrame:
|
||||||
sample. Empty if nothing could be loaded.
|
sample. Empty if nothing could be loaded.
|
||||||
"""
|
"""
|
||||||
if meta is None:
|
if meta is None:
|
||||||
meta = pd.read_csv(VIDEO_INFO_XLSX.with_suffix(".tsv"), sep="\t")
|
meta = pd.read_csv(VIDEO_INFO_TSV, sep="\t")
|
||||||
|
|
||||||
db_cache: dict = {}
|
db_cache: dict = {}
|
||||||
chunks: list[pd.DataFrame] = []
|
chunks: list[pd.DataFrame] = []
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue