cupido/docs/experimental_design.md
Giorgio e7e4db264d Initial commit: organized project structure for student handoff
Reorganized flat 41-file directory into structured layout with:
- scripts/ for Python analysis code with shared config.py
- notebooks/ for Jupyter analysis notebooks
- data/ split into raw/, metadata/, processed/
- docs/ with analysis summary, experimental design, and bimodal hypothesis tutorial
- tasks/ with todo checklist and lessons learned
- Comprehensive README, PLANNING.md, and .gitignore

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 16:08:36 +00:00

98 lines
5.1 KiB
Markdown

# Experimental Design: Barrier-Opening Social Interaction Assay
## Overview
This document describes the experimental design of a Drosophila behavioral tracking experiment conducted as part of the Cupido project. The experiment uses a barrier-opening assay to measure social interaction patterns between trained and untrained flies.
**Date**: July 15, 2025
**Species**: *Drosophila melanogaster*, Canton-S (CS) wild-type strain
## Assay Description
The barrier-opening assay places two flies in each Region of Interest (ROI), separated by a physical barrier. After a configurable delay from the start of recording, the barrier is manually opened, allowing the flies to interact socially. The primary behavioral metric is the **distance between the two flies over time** following barrier opening, which serves as a proxy for social engagement.
## Experimental Groups
- **Trained**: Flies that received prior social experience before the assay.
- **Untrained**: Socially naive flies with no prior social experience.
> **Note**: The exact training protocol (duration, conditions, group sizes) should be documented separately. "Trained" refers to flies with prior social experience; "untrained" refers to socially naive individuals.
## Equipment and Recording Parameters
| Parameter | Value |
|-----------|-------|
| Resolution | 1920 x 1088 pixels |
| Frame rate | 25 fps |
| Video codec | H.264 |
| Quality | 28q |
| ROIs per session | 6 (each containing a pair of flies) |
| Tracking output | SQLite databases (one per session) |
## Machines and Recording Sessions
Three ethoscope machines were used for tracking, with a fourth (Machine 139) having metadata but no tracking data.
| Machine | Session Start Time | Barrier Opening (s) | Status |
|---------|--------------------|----------------------|--------|
| ETHOSCOPE_076 | 16:03:10 | 52 | OK |
| ETHOSCOPE_076 | 16:31:34 | 25 | OK |
| ETHOSCOPE_145 | 16:03:27 | 42 | OK |
| ETHOSCOPE_145 | 16:31:41 | 20 | OK |
| ETHOSCOPE_268 | 16:32:05 | 75 | OK |
| ETHOSCOPE_139 | 16:31:52 | Not recorded | **DATA MISSING** |
**Total sessions**: 6 (5 with tracking data, 1 missing)
## ROI-to-Group Mapping
Each session contains 6 ROIs. The assignment of trained/untrained groups to ROIs varies across sessions.
| Machine | Session | ROI 1 | ROI 2 | ROI 3 | ROI 4 | ROI 5 | ROI 6 |
|---------|---------|-------|-------|-------|-------|-------|-------|
| 076 | 16:03:10 | Trained | Untrained | Trained | Untrained | Trained | Untrained |
| 076 | 16:31:34 | Trained | Trained | Trained | Untrained | Untrained | Untrained |
| 145 | 16:03:27 | Trained | Trained | Trained | Untrained | Untrained | Untrained |
| 145 | 16:31:41 | Trained | Trained | Trained | Untrained | Untrained | Untrained |
| 268 | 16:32:05 | Untrained | Untrained | Untrained | Trained | Trained | Trained |
| 139 | 16:31:52 | Trained | Trained | Trained | Untrained | Untrained | Untrained |
## Sample Sizes
| Group | ROIs (total) | ROIs (with data) | ROIs (missing) |
|-------|-------------|-------------------|----------------|
| Trained | 18 | 15 | 3 (Machine 139) |
| Untrained | 18 | 15 | 3 (Machine 139) |
| **Total** | **36** | **30** | **6** |
## Tracking Database Schema
Each recording session produces a SQLite database file containing tables `ROI_1` through `ROI_6`. Each table has the following columns:
| Column | Type | Description |
|--------|------|-------------|
| `id` | INTEGER | Row identifier |
| `t` | INTEGER | Timestamp in **milliseconds** from start of recording |
| `x` | REAL | Horizontal position in pixels |
| `y` | REAL | Vertical position in pixels |
| `w` | REAL | Width of detected object (pixels) |
| `h` | REAL | Height of detected object (pixels) |
| `phi` | REAL | Angle/orientation of detected object |
| `is_inferred` | INTEGER | Whether the position was inferred (not directly detected) |
| `has_interacted` | INTEGER | Whether an interaction was detected |
## Known Issues and Data Caveats
1. **Machine 139 missing data**: Metadata entries exist for ETHOSCOPE_139 (session 16:31:52) in the metadata CSV, but no corresponding tracking database file is present and no barrier opening time was recorded. This accounts for 6 missing ROIs (3 trained, 3 untrained). The cause needs investigation.
2. **Time unit mismatch between files**: The tracking databases store time (`t`) in **milliseconds**, while `2025_07_15_barrier_opening.csv` stores barrier opening times in **seconds**. The analysis pipeline converts barrier opening times to milliseconds for alignment.
3. **Machine name type inconsistency**: The metadata CSV stores machine identifiers as integers (e.g., `76`, `145`, `268`), while `2025_07_15_barrier_opening.csv` also stores them as integers (e.g., `076`, `145`, `268`). String conversion with zero-padding is required when matching between files and when constructing tracking database filenames (e.g., `ETHOSCOPE_076`).
## Source Files
| File | Description |
|------|-------------|
| `2025_07_15_metadata_fixed.csv` | ROI-to-group mapping (trained/untrained) |
| `2025_07_15_barrier_opening.csv` | Barrier opening times per machine/session |
| `*_tracking.db` | SQLite tracking databases (one per session) |