{ "nbformat": 4, "nbformat_minor": 5, "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 02 · A first look at one tracking database\n", "\n", "In this notebook we open **one** of the SQLite databases that the tracker\n", "produced and look at what's actually inside. By the end you'll be able to:\n", "\n", "- list the tables in a `.db` file\n", "- read one ROI's tracking trace into a DataFrame\n", "- plot a fly's path through the arena\n", "- count how many flies are visible at each moment\n", "- compute a simple distance between the two flies in a ROI\n", "\n", "If you're curious how SQLite works, the\n", "[SQLite Quickstart](https://www.sqlite.org/quickstart.html) is short and\n", "worth reading. For our purposes, **SQLite is just a file that contains\n", "several tables you can query like a DataFrame**.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "We import the libraries we need. `sqlite3` is part of Python's standard\n", "library — no install needed.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "import sqlite3\n", "from pathlib import Path\n", "\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Find the databases\n", "\n", "The DBs live at `/mnt/data/projects/cupido/tracked/`. Let's list a few.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": "# Single root for all the project's data. Build sub-paths from it.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\ntracked_dir = DATA_DIR / \"tracked\"\n\ndb_files = sorted(tracked_dir.glob(\"*_tracking.db\"))\n\nprint(f\"Found {len(db_files)} tracking DBs.\")\nprint(\"\\nFirst 5 by name:\")\nfor db in db_files[:5]:\n print(f\" {db.name}\")\n" }, { "cell_type": "markdown", "metadata": {}, "source": [ "The filename encodes the date, time, machine UUID, video resolution, and\n", "the suffix `_tracking.db`. For example:\n", "\n", "```\n", "2024-09-17_10-32-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged_tracking.db\n", "└────┬─────┘└──┬──┘ └────────────────┬───────────────┘└──────┬───────┘\n", " date time machine UUID video format\n", "```\n", "\n", "Pick one to explore. Feel free to change the index.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "db_path = db_files[0]\n", "print(\"Working with:\", db_path.name)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Open the database\n", "\n", "We open it **read-only** as a safety measure. The `?mode=ro` flag is\n", "SQLite's read-only switch.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "conn = sqlite3.connect(f\"file:{db_path}?mode=ro\", uri=True)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## What tables are inside?\n", "\n", "Every SQLite database has a system table called `sqlite_master` that\n", "lists everything. We can query it like any other table.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "tables = pd.read_sql_query(\n", " \"SELECT name FROM sqlite_master WHERE type='table' ORDER BY name\", conn\n", ")\n", "tables\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should see tables like `ROI_1`, `ROI_2`, …, `ROI_6` (one per\n", "sub-arena), plus housekeeping tables like `METADATA`, `ROI_MAP`,\n", "`VAR_MAP`, `START_EVENTS`. We mostly care about the `ROI_*` ones.\n", "\n", "## Read one ROI\n", "\n", "`pd.read_sql_query()` runs an SQL query against the connection and\n", "returns a DataFrame. The query `SELECT * FROM ROI_1` means *\"give me all\n", "columns and all rows from the table called ROI_1\"*.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "roi1 = pd.read_sql_query(\"SELECT * FROM ROI_1\", conn)\n", "print(f\"shape: {roi1.shape}\") # (rows, columns)\n", "roi1.head()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Understanding the columns\n", "\n", "Refer back to notebook `00_welcome` for the full column reference. Quick\n", "recap of the important ones:\n", "\n", "- `t`: time in **milliseconds** since the video started.\n", "- `x`, `y`: fly position in **pixels**. The image origin (0, 0) is the\n", " **top-left** corner. y grows downward.\n", "- `w`, `h`: bounding-box width/height. Their product (`area = w*h`) is a\n", " rough proxy for \"how big does this blob look\" — useful for spotting\n", " frames where the tracker merged two flies into one big detection.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Quick descriptive stats\n", "roi1[[\"t\", \"x\", \"y\", \"w\", \"h\"]].describe()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The minimum `t` should be 0 (start of the video). The maximum tells you\n", "how long the recording was. Convert ms to minutes by dividing by 60000:\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "duration_min = roi1[\"t\"].max() / 60_000\n", "print(f\"Session length: {duration_min:.1f} minutes\")\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How many flies per frame?\n", "\n", "If two flies are visible in this ROI, we get **two rows per `t`**. Let's\n", "check.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "flies_per_frame = roi1.groupby(\"t\").size()\n", "print(flies_per_frame.value_counts().sort_index())\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The output tells you, e.g., \"100,000 frames had 2 flies visible, 30,000\n", "had 1 fly visible\". Frames with 1 fly usually mean the two flies are\n", "overlapping or one is occluded — that's something we'll handle properly\n", "in the next notebook.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot one fly's trajectory\n", "\n", "We'll plot the position over the first 5 minutes (300 000 ms). For\n", "clarity we'll only look at frames where there were 2 flies and pick the\n", "**first** of the two (sorted by `id`) as \"fly 1\" — this is a rough\n", "heuristic; identity tracking is harder than it sounds.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Filter to the first 5 minutes\n", "sub = roi1[roi1[\"t\"] <= 5 * 60_000]\n", "\n", "# Pick \"fly 1\" by taking the first row at each time point\n", "fly1 = sub.sort_values([\"t\", \"id\"]).drop_duplicates(\"t\", keep=\"first\")\n", "\n", "plt.figure(figsize=(6, 5))\n", "plt.plot(fly1[\"x\"], fly1[\"y\"], color=\"steelblue\", linewidth=0.5, alpha=0.7)\n", "plt.scatter(fly1[\"x\"].iloc[0], fly1[\"y\"].iloc[0], color=\"green\", label=\"start\", zorder=5)\n", "plt.scatter(fly1[\"x\"].iloc[-1], fly1[\"y\"].iloc[-1], color=\"red\", label=\"end\", zorder=5)\n", "plt.gca().invert_yaxis() # because pixel y grows downward\n", "plt.xlabel(\"x (pixels)\")\n", "plt.ylabel(\"y (pixels)\")\n", "plt.title(f\"Fly 1 trajectory — first 5 min — {db_path.name[:30]}…\")\n", "plt.legend()\n", "plt.axis(\"equal\")\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You should see a tangle of lines confined to a roughly rectangular ROI.\n", "That tangle is the fly walking around its sub-arena.\n", "\n", "Notice we did `plt.gca().invert_yaxis()` — that's because in image\n", "coordinates y grows downward, but humans expect plots where y grows\n", "upward. Without it the plot would be vertically flipped.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot position over time\n", "\n", "A trajectory plot collapses time into \"shape on a page\". To see *when*\n", "things happen we need time on the x-axis.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "fig, axes = plt.subplots(2, 1, figsize=(12, 5), sharex=True)\n", "\n", "axes[0].plot(fly1[\"t\"] / 1000, fly1[\"x\"], linewidth=0.5)\n", "axes[0].set_ylabel(\"x (px)\")\n", "axes[0].set_title(f\"Fly 1, ROI 1, {db_path.name[:30]}…\")\n", "\n", "axes[1].plot(fly1[\"t\"] / 1000, fly1[\"y\"], linewidth=0.5, color=\"darkorange\")\n", "axes[1].set_ylabel(\"y (px)\")\n", "axes[1].set_xlabel(\"time (s)\")\n", "axes[1].invert_yaxis()\n", "\n", "plt.tight_layout()\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Bursts of variation = active fly. Long flat stretches = the fly is sitting\n", "still. You'll come to recognize courtship vs idling by eye after a while.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Distance between the two flies\n", "\n", "Whenever the ROI has 2 detections at the same `t`, we can compute the\n", "Euclidean distance between them: `sqrt((x1-x2)² + (y1-y2)²)`.\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "two_fly_frames = roi1.groupby(\"t\").filter(lambda g: len(g) == 2)\n", "two_fly_frames = two_fly_frames.sort_values([\"t\", \"id\"])\n", "\n", "# Pivot so each row is one timepoint with x1, y1, x2, y2\n", "def pair_up(g):\n", " g = g.reset_index(drop=True)\n", " return pd.Series({\n", " \"x1\": g.loc[0, \"x\"], \"y1\": g.loc[0, \"y\"],\n", " \"x2\": g.loc[1, \"x\"], \"y2\": g.loc[1, \"y\"],\n", " })\n", "\n", "paired = two_fly_frames.groupby(\"t\").apply(pair_up).reset_index()\n", "paired[\"distance_px\"] = np.hypot(paired[\"x1\"] - paired[\"x2\"], paired[\"y1\"] - paired[\"y2\"])\n", "paired.head()\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "plt.figure(figsize=(12, 4))\n", "plt.plot(paired[\"t\"] / 1000, paired[\"distance_px\"], linewidth=0.4)\n", "plt.xlabel(\"time (s)\")\n", "plt.ylabel(\"inter-fly distance (px)\")\n", "plt.title(\"Distance between the two flies in ROI 1\")\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is the kind of trace that drives the rest of the analysis: a male\n", "courting a female stays close (small distance); a male giving up wanders\n", "off (large distance). The shape of this curve is the behavioural readout.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Don't forget to close the connection\n", "\n", "If you opened a connection, close it when you're done. (Not strictly\n", "necessary in a notebook — Python tidies up — but a good habit.)\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "conn.close()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises\n", "\n", "1. Pick a different DB (change `db_files[0]` to `db_files[10]` for example)\n", " and re-run the trajectory plot. Is the arena bigger / smaller? Why\n", " might that be? (Hint: look at the resolution part of the filename.)\n", "2. Plot the distance trace for **ROI 4** instead of ROI 1.\n", "3. Compute the **percentage of frames** in ROI 1 that had only 1 fly visible.\n", "4. The `area = w * h` column is a useful diagnostic. Plot `area` vs `t`\n", " for fly 1 — when does the bounding box get unusually large?\n" ] }, { "cell_type": "code", "metadata": {}, "execution_count": null, "outputs": [], "source": [ "# Exercise space\n" ] } ] }