{
 "nbformat": 4,
 "nbformat_minor": 5,
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "name": "python"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 02 · A first look at one tracking database\n",
    "\n",
    "In this notebook we open **one** of the SQLite databases that the tracker\n",
    "produced and look at what's actually inside. By the end you'll be able to:\n",
    "\n",
    "- list the tables in a `.db` file\n",
    "- read one ROI's tracking trace into a DataFrame\n",
    "- plot a fly's path through the arena\n",
    "- count how many flies are visible at each moment\n",
    "- compute a simple distance between the two flies in a ROI\n",
    "\n",
    "If you're curious how SQLite works, the\n",
    "[SQLite Quickstart](https://www.sqlite.org/quickstart.html) is short and\n",
    "worth reading. For our purposes, **SQLite is just a file that contains\n",
    "several tables you can query like a DataFrame**.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "We import the libraries we need. `sqlite3` is part of Python's standard\n",
    "library — no install needed.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "import sqlite3\n",
    "from pathlib import Path\n",
    "\n",
    "import pandas as pd\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Find the databases\n",
    "\n",
    "The DBs live at `/mnt/data/projects/cupido/tracked/`. Let's list a few.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": "# Single root for all the project's data. Build sub-paths from it.\nDATA_DIR = Path(\"/mnt/data/projects/cupido\")\ntracked_dir = DATA_DIR / \"tracked\"\n\ndb_files = sorted(tracked_dir.glob(\"*_tracking.db\"))\n\nprint(f\"Found {len(db_files)} tracking DBs.\")\nprint(\"\\nFirst 5 by name:\")\nfor db in db_files[:5]:\n    print(f\"  {db.name}\")\n"
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The filename encodes the date, time, machine UUID, video resolution, and\n",
    "the suffix `_tracking.db`. For example:\n",
    "\n",
    "```\n",
    "2024-09-17_10-32-10_076e2825a7274661bd0697c42d6fa4c0__1920x1088@25fps-28q_merged_tracking.db\n",
    "└────┬─────┘└──┬──┘ └────────────────┬───────────────┘└──────┬───────┘\n",
    "   date     time       machine UUID                  video format\n",
    "```\n",
    "\n",
    "Pick one to explore. Feel free to change the index.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "db_path = db_files[0]\n",
    "print(\"Working with:\", db_path.name)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Open the database\n",
    "\n",
    "We open it **read-only** as a safety measure. The `?mode=ro` flag is\n",
    "SQLite's read-only switch.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "conn = sqlite3.connect(f\"file:{db_path}?mode=ro\", uri=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What tables are inside?\n",
    "\n",
    "Every SQLite database has a system table called `sqlite_master` that\n",
    "lists everything. We can query it like any other table.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "tables = pd.read_sql_query(\n",
    "    \"SELECT name FROM sqlite_master WHERE type='table' ORDER BY name\", conn\n",
    ")\n",
    "tables\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see tables like `ROI_1`, `ROI_2`, …, `ROI_6` (one per\n",
    "sub-arena), plus housekeeping tables like `METADATA`, `ROI_MAP`,\n",
    "`VAR_MAP`, `START_EVENTS`. We mostly care about the `ROI_*` ones.\n",
    "\n",
    "## Read one ROI\n",
    "\n",
    "`pd.read_sql_query()` runs an SQL query against the connection and\n",
    "returns a DataFrame. The query `SELECT * FROM ROI_1` means *\"give me all\n",
    "columns and all rows from the table called ROI_1\"*.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "roi1 = pd.read_sql_query(\"SELECT * FROM ROI_1\", conn)\n",
    "print(f\"shape: {roi1.shape}\")     # (rows, columns)\n",
    "roi1.head()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Understanding the columns\n",
    "\n",
    "Refer back to notebook `00_welcome` for the full column reference. Quick\n",
    "recap of the important ones:\n",
    "\n",
    "- `t`: time in **milliseconds** since the video started.\n",
    "- `x`, `y`: fly position in **pixels**. The image origin (0, 0) is the\n",
    "  **top-left** corner. y grows downward.\n",
    "- `w`, `h`: bounding-box width/height. Their product (`area = w*h`) is a\n",
    "  rough proxy for \"how big does this blob look\" — useful for spotting\n",
    "  frames where the tracker merged two flies into one big detection.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# Quick descriptive stats\n",
    "roi1[[\"t\", \"x\", \"y\", \"w\", \"h\"]].describe()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The minimum `t` should be 0 (start of the video). The maximum tells you\n",
    "how long the recording was. Convert ms to minutes by dividing by 60000:\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "duration_min = roi1[\"t\"].max() / 60_000\n",
    "print(f\"Session length: {duration_min:.1f} minutes\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## How many flies per frame?\n",
    "\n",
    "If two flies are visible in this ROI, we get **two rows per `t`**. Let's\n",
    "check.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "flies_per_frame = roi1.groupby(\"t\").size()\n",
    "print(flies_per_frame.value_counts().sort_index())\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The output tells you, e.g., \"100,000 frames had 2 flies visible, 30,000\n",
    "had 1 fly visible\". Frames with 1 fly usually mean the two flies are\n",
    "overlapping or one is occluded — that's something we'll handle properly\n",
    "in the next notebook.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot one fly's trajectory\n",
    "\n",
    "We'll plot the position over the first 5 minutes (300 000 ms). For\n",
    "clarity we'll only look at frames where there were 2 flies and pick the\n",
    "**first** of the two (sorted by `id`) as \"fly 1\" — this is a rough\n",
    "heuristic; identity tracking is harder than it sounds.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# Filter to the first 5 minutes\n",
    "sub = roi1[roi1[\"t\"] <= 5 * 60_000]\n",
    "\n",
    "# Pick \"fly 1\" by taking the first row at each time point\n",
    "fly1 = sub.sort_values([\"t\", \"id\"]).drop_duplicates(\"t\", keep=\"first\")\n",
    "\n",
    "plt.figure(figsize=(6, 5))\n",
    "plt.plot(fly1[\"x\"], fly1[\"y\"], color=\"steelblue\", linewidth=0.5, alpha=0.7)\n",
    "plt.scatter(fly1[\"x\"].iloc[0], fly1[\"y\"].iloc[0], color=\"green\", label=\"start\", zorder=5)\n",
    "plt.scatter(fly1[\"x\"].iloc[-1], fly1[\"y\"].iloc[-1], color=\"red\", label=\"end\", zorder=5)\n",
    "plt.gca().invert_yaxis()   # because pixel y grows downward\n",
    "plt.xlabel(\"x (pixels)\")\n",
    "plt.ylabel(\"y (pixels)\")\n",
    "plt.title(f\"Fly 1 trajectory — first 5 min — {db_path.name[:30]}…\")\n",
    "plt.legend()\n",
    "plt.axis(\"equal\")\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You should see a tangle of lines confined to a roughly rectangular ROI.\n",
    "That tangle is the fly walking around its sub-arena.\n",
    "\n",
    "Notice we did `plt.gca().invert_yaxis()` — that's because in image\n",
    "coordinates y grows downward, but humans expect plots where y grows\n",
    "upward. Without it the plot would be vertically flipped.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot position over time\n",
    "\n",
    "A trajectory plot collapses time into \"shape on a page\". To see *when*\n",
    "things happen we need time on the x-axis.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "fig, axes = plt.subplots(2, 1, figsize=(12, 5), sharex=True)\n",
    "\n",
    "axes[0].plot(fly1[\"t\"] / 1000, fly1[\"x\"], linewidth=0.5)\n",
    "axes[0].set_ylabel(\"x (px)\")\n",
    "axes[0].set_title(f\"Fly 1, ROI 1, {db_path.name[:30]}…\")\n",
    "\n",
    "axes[1].plot(fly1[\"t\"] / 1000, fly1[\"y\"], linewidth=0.5, color=\"darkorange\")\n",
    "axes[1].set_ylabel(\"y (px)\")\n",
    "axes[1].set_xlabel(\"time (s)\")\n",
    "axes[1].invert_yaxis()\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bursts of variation = active fly. Long flat stretches = the fly is sitting\n",
    "still. You'll come to recognize courtship vs idling by eye after a while.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distance between the two flies\n",
    "\n",
    "Whenever the ROI has 2 detections at the same `t`, we can compute the\n",
    "Euclidean distance between them: `sqrt((x1-x2)² + (y1-y2)²)`.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "two_fly_frames = roi1.groupby(\"t\").filter(lambda g: len(g) == 2)\n",
    "two_fly_frames = two_fly_frames.sort_values([\"t\", \"id\"])\n",
    "\n",
    "# Pivot so each row is one timepoint with x1, y1, x2, y2\n",
    "def pair_up(g):\n",
    "    g = g.reset_index(drop=True)\n",
    "    return pd.Series({\n",
    "        \"x1\": g.loc[0, \"x\"], \"y1\": g.loc[0, \"y\"],\n",
    "        \"x2\": g.loc[1, \"x\"], \"y2\": g.loc[1, \"y\"],\n",
    "    })\n",
    "\n",
    "paired = two_fly_frames.groupby(\"t\").apply(pair_up).reset_index()\n",
    "paired[\"distance_px\"] = np.hypot(paired[\"x1\"] - paired[\"x2\"], paired[\"y1\"] - paired[\"y2\"])\n",
    "paired.head()\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "plt.figure(figsize=(12, 4))\n",
    "plt.plot(paired[\"t\"] / 1000, paired[\"distance_px\"], linewidth=0.4)\n",
    "plt.xlabel(\"time (s)\")\n",
    "plt.ylabel(\"inter-fly distance (px)\")\n",
    "plt.title(\"Distance between the two flies in ROI 1\")\n",
    "plt.show()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is the kind of trace that drives the rest of the analysis: a male\n",
    "courting a female stays close (small distance); a male giving up wanders\n",
    "off (large distance). The shape of this curve is the behavioural readout.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Don't forget to close the connection\n",
    "\n",
    "If you opened a connection, close it when you're done. (Not strictly\n",
    "necessary in a notebook — Python tidies up — but a good habit.)\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "conn.close()\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exercises\n",
    "\n",
    "1. Pick a different DB (change `db_files[0]` to `db_files[10]` for example)\n",
    "   and re-run the trajectory plot. Is the arena bigger / smaller? Why\n",
    "   might that be? (Hint: look at the resolution part of the filename.)\n",
    "2. Plot the distance trace for **ROI 4** instead of ROI 1.\n",
    "3. Compute the **percentage of frames** in ROI 1 that had only 1 fly visible.\n",
    "4. The `area = w * h` column is a useful diagnostic. Plot `area` vs `t`\n",
    "   for fly 1 — when does the bounding box get unusually large?\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "execution_count": null,
   "outputs": [],
   "source": [
    "# Exercise space\n"
   ]
  }
 ]
}