planning
Sprint Overview
Sprint tracking and deliverables. Updated after each sprint review.
goal
Set up the project infrastructure — repo, tooling, documentation, architecture decisions, and data structure analysis.
deliverables
GitHub repo structured and scaffolded
36 issues created with labels and milestones
GitHub Project board live
Notion ↔ GitHub sync (auto + manual)
Architecture diagram finalized
Data structure fully documented from real JSON files
Project website live on Vercel
Stack decided: Python + asyncio, Airflow, PostgreSQL
Architecture decisions recorded (ADR-001 to 004)
done
Create the repo
Add teammates as collaborators
Create labels and milestones
Create 36 issues
Link repo to Project board
Link GitHub to Notion
Verify everyone can connect to the VM
Verify MySQL DB (pidb) connection works
Verify access to sFTP / Meteo2 folder
Verify access to Raspberry Pi / sensor JSON endpoint
Look at actual JSON sensor files to understand the structure
pending / carry over
Install chosen tools (Python, KNIME, Power BI Desktop, etc.)
Set up Bronze folder structure on the VM
Gold layer storage engine decision
ETL orchestration tool decision
SAC export mechanism decision
Presence derivation logic threshold
Who owns which part
DoD & DoR
goal
Build and deploy the full ingestion pipeline — Bronze raw files to Silver cleaned tables in PostgreSQL. 15M+ rows ingested, production-ready.
deliverables
Bronze ingestion pipeline (watcher + async downloader)
Silver ETL pipeline (flatten, clean, deduplicate)
15M+ rows ingested end-to-end
Watcher optimization — pipeline cycles from 5 min to 6 s
Production deployment — pg_dump / pg_restore to domotic_prod
create_silver.py — auto DB creation + admin privileges
Documentation: SETUP.md, ETL.md, ARCHITECTURE.md
Architecture diagram updated
Star schema v2 submitted for teacher feedback
Dashboard mockups — tenant view + admin view
done
Bronze folder structure on VM
Async watcher downloading JSON sensor files
Silver tables created in PostgreSQL
Flattening + cleaning pipeline working
Deduplication logic implemented
15M+ rows loaded into Silver
Watcher perf fix — cycles from 5 min to 6 s
create_silver.py with DB_ADMIN_URL support
pg_dump / pg_restore prod deployment
Documentation written (SETUP, ETL, ARCHITECTURE)
Star schema v2 designed and sent to Cosette
pending / carry over
Gold layer (blocked on star schema v2 validation)
Weather CSV integration (Sacha)
ML model exploration (Johann)
Dashboard mockup finalization
goal
Build the Gold star schema ETL (Silver → Gold), integrate weather data, start ML presence prediction, and finalize dashboard mockups.
deliverables
Silver → Gold ETL scripts (create_gold.py, populate_dims.py, fact_*.py)
run_gold.py orchestrator
Watermark-based incremental loading
Weather CSV download + Silver integration (Sacha)
ML presence prediction model on KNIME (Johann)
Dashboard mockups refined (Johann)
Watcher fix — DD.MM sorting bug + --scan flag (Dehlya)
Code review — weather_download.py PR (Dehlya)
Site documentation update — per-flow workflow docs
done
create_silver.py — auto DB creation + admin privileges
Watcher fix — date-based comparison bug resolved
--scan flag added to watcher
Dashboard mockups — tenant + admin views
Code review of Sacha's weather_download.py PR
Silver → Gold ETL workflow documented
pending / carry over
BLOCKER: Gold layer blocked — star schema v2 sent to Cosette, no 2nd feedback yet
weather_download.py — final fixes after PR review (Sacha)
ML presence model — KNIME workflow in progress (Johann)
Sprint 4 planning
Sprints 4–6 will appear here as they are planned.