planning
Sprint Overview
Sprint tracking and deliverables. Updated after each sprint review.
goal
Set up the project infrastructure — repo, tooling, documentation, architecture decisions, and data structure analysis.
deliverables
GitHub repo structured and scaffolded
36 issues created with labels and milestones
GitHub Project board live
Notion ↔ GitHub sync (auto + manual)
Architecture diagram finalized
Data structure fully documented from real JSON files
Project website live on Vercel
Stack decided: Python + custom watcher loop, PostgreSQL, KNIME, Power BI
Architecture decisions recorded (ADR-001 to 004)
done
Create the repo
Add teammates as collaborators
Create labels and milestones
Create 36 issues
Link repo to Project board
Link GitHub to Notion
Verify everyone can connect to the VM
Verify MySQL DB (pidb) connection works
Verify access to sFTP / Meteo2 folder
Verify access to Raspberry Pi / sensor JSON endpoint
Look at actual JSON sensor files to understand the structure
pending / carry over
Install chosen tools (Python, KNIME, Power BI Desktop, etc.)
Set up Bronze folder structure on the VM
Gold layer storage engine decision
ETL orchestration tool decision
SAC export mechanism decision
Presence derivation logic threshold
Who owns which part
DoD & DoR
goal
Build the full ingestion pipeline — Bronze raw files to Silver cleaned tables in PostgreSQL — for the two apartment sources and the MySQL master registry.
deliverables
Bronze raw storage (#3) — timestamped folder layout on the VM
MySQL → Silver static metadata import (#4)
Sensor JSON ingestion: Jimmy (#5) and Jérémie (#6)
Weather forecast → raw store (#7)
Silver clean storage (#13) + load flow with quality checks (#14)
Fast-flow scheduler retrieving the latest data each minute (#11a)
Watcher optimization — pipeline cycles from 5 min to 6 s
create_silver.py — auto DB creation + admin privileges
Documentation: SETUP.md, ETL.md, ARCHITECTURE.md
Architecture diagram updated
Star schema v2 submitted for teacher feedback
Dashboard mockups — tenant view + admin view
done
Bronze folder structure on VM
Watcher downloading JSON sensor files (Jimmy + Jérémie)
MySQL → Silver static-metadata flow
Silver tables created in PostgreSQL
Flattening + cleaning pipeline working
Deduplication logic implemented
Watcher perf fix — cycles from 5 min to 6 s
create_silver.py with DB_ADMIN_URL support
Documentation written (SETUP, ETL, ARCHITECTURE)
Star schema v2 designed and sent for review
Bronze predictive ingestion — ~5 ms .exists() check vs full SMB scan
COPY-into-TEMP-TABLE upsert pattern adopted (50–150× faster than per-row INSERT)
silver.etl_watermark table — idempotent re-runs from day one
pending / carry over
Gold layer (blocked on star schema v2 validation)
Weather Bronze→Silver flow (Sacha)
Slow-flow scheduler (#11b)
ML model exploration (Johann)
Dashboard mockup finalisation
goal
Wire the first half of the KNIME data flow (Silver → KNIME import), fix watcher edge-cases, refresh tooling — while Gold remained blocked on star-schema-v2 validation.
deliverables
KNIME data flow — import side (#25a)
create_silver.py — auto DB creation + admin privileges (DB_ADMIN_URL)
Watcher fix — date-based filename comparison (DD.MM sorting bug)
--scan flag added to watcher for full-rescan triggers
Dashboard mockups — tenant + admin views
Code review — Sacha's weather_download.py PR
Per-flow workflow docs on the website
done
#25a — Build a data flow to import (closed Mar 13)
create_silver.py with DB_ADMIN_URL support
Watcher DD.MM sort bug resolved
--scan flag added to watcher
Dashboard mockups delivered (tenant + admin)
Code review of Sacha's weather_download.py PR
Silver → Gold ETL workflow documented on the site
pending / carry over
BLOCKER: Gold layer — star schema v2 sent for review, awaiting 2nd round of feedback
weather_download.py — final fixes after PR review (Sacha)
Presence prediction model — KNIME workflow in progress (Johann)
goal
Unblock Gold (proceed without final external feedback), iterate on clean_weather, kick off the first ML model-selection workflow on KNIME. Energy pricing decision required for the cost dimension.
deliverables
Gold layer implementation — first cut (Dehlya)
Energy pricing decision: Oiken tariffs (both apartments on Oiken's network)
clean_weather.py implementation (Sacha)
Code review for Sacha's Bronze→Silver weather work (Dehlya)
Presence model selection workflow on KNIME (#26a)
Scrum management updates (Johann)
done
#26a — Build a workflow to select the best presence model (closed Mar 26)
Gold layer scaffolding in place
Energy pricing — Oiken tariffs adopted
clean_weather.py — first implementation
Bronze→Silver weather code review
ML — model selection workflow for presence prediction
Star schema v2 finalised after teacher feedback
dim_tariff design with provider × year grain (Oiken 2023–2025 @ 0.34 CHF/kWh)
pending / carry over
Gold fact tables — implementation continuing
weather_download.py — date-handling issues
Energy consumption prediction model — to start
goal
Push Gold facts forward, fix the weather pipeline's date/path handling, advance both ML models, decide on prediction storage strategy.
deliverables
Gold fact tables — implementation in progress (Dehlya)
clean_weather.py — updated after code review (Sacha)
Decision on prediction storage strategy (history vs overwrite)
Presence + energy prediction models — continued work (Johann)
done
Code reviews exchanged across the team
Gold fact tables progressing
clean_weather.py — second pass after review feedback
Decision: predictions kept as history (no overwrite on re-run)
sFTP folder-selection bug identified — fix scheduled for next sprint
Weather pipeline date-format issue scoped
pending / carry over
sFTP folder selection — date/path bug to resolve
Date formatting in the weather pipeline
ML presence + energy models — finalise
Gold layer completion — fact tables, materialised view
goal
Land Gold end-to-end, complete both weather pipelines, scaffold both KNIME ML workflows. The biggest single sprint — twelve issues closed.
deliverables
Gold OLAP modelisation (#15)
Gold database created (#16)
Silver → Gold ETL flow (#17)
Weather Sources → Bronze (#7a)
Weather Bronze → Silver (#7b)
Weather raw-store flow (#7) — full chain wired
Slow-flow scheduler — daily weather + nightly catch-up (#11b)
Presence model — workflow with selected model (#26a follow-up)
Energy model — best-model-selection workflow (#27a)
Energy model — workflow with selected model (#27b)
KNIME data flow — export side (#25b)
Mockups delivered (#12)
done
#7 — Weather raw-store flow
#7a — Weather Sources → Bronze
#7b — Weather Bronze → Silver
#11b — Slow-flow scheduler
#12 — Mockups
#15 — OLAP modelisation
#16 — Gold database
#17 — Silver → Gold ETL flow
#25b — Data flow to export (KNIME)
#26a — Workflow with selected presence model
#27a — Best-model-selection workflow (energy)
#27b — Workflow with selected energy model
Gold tables: 7 dims + 5 facts + mv_energy_with_cost
9-step populate process (populate_dimensions, populate_sensors, populate_weather)
KNIME Variable → Credentials pattern adopted for runtime credential injection
silver.weather_forecasts with weather_watermark for idempotent re-runs
scripts/run_knime_predictions.py — batch-mode invocation with stdout/stderr capture
sFTP folder-selection bug fixed alongside the weather pipeline
pending / carry over
Predictions back to Gold (next sprint — #28)
Power BI dashboards (next sprint — #19, #20, #29)
Row-level security (next sprint — #24)
goal
Persist KNIME predictions back to Gold, build the three Power BI dashboards, implement row-level security per apartment.
deliverables
Predictions written back to gold.fact_prediction_motion / fact_prediction_consumption (#28)
Power BI energy consumption dashboard (#19)
Power BI environment dashboard — temperature, humidity, CO₂, door/window status (#20)
Power BI prediction visualisation dashboard (#29)
Power BI row-level security — apartment-scoped views (#24)
KNIME data export/import flow — both directions complete (#25)
Presence model running headless via run_knime_predictions.py (#26)
Energy/consumption model running headless (#27)
done
#19 — Power BI energy dashboard
#20 — Power BI environment dashboard
#24 — Row-level security per apartment
#25 — KNIME data flow export/import
#26 — Presence prediction model in production
#27 — Energy consumption prediction model in production
#28 — Predictions loaded back to data warehouse
#29 — Power BI prediction dashboard
pending / carry over
Anonymisation / masking (next sprint — #18)
GDPR writeup (next sprint — #32)
Scalability forecast (next sprint — #31)
Customer deployment package (next sprint — #33)
goal
Wrap up customer-facing deliverables: anonymisation, GDPR/ethics writeup, scalability forecast, and the self-contained installer for the customer environment.
deliverables
Data masking / anonymisation at the silver → gold boundary (#18)
GDPR & ethics written assessment (#32)
Scalability forecast — storage + compute projections (#31)
Full data-flow scheduling — all flows in one watcher process (#11)
Self-contained customer deployment package — Python installer (#33)
Compress-after-silver — 10-15× bronze shrink while preserving audit trail
Postgres tuning (shared_buffers = 4 GB) + COPY-into-TEMP-TABLE upsert
Drop-constraint backfill script — first install from ~4 h to ~6 min on re-run
Power BI filters across the dashboards (Sacha)
User guide written (Sacha)
Scrum management, project coordination, documentation review, slide support (Johann)
done
#11 — Full scheduling end-to-end
#18 — Anonymisation: building_name → 'Building <id>', owner_user_id stripped, first names kept as RLS pseudonyms (ADR-005)
#31 — Scalability forecast delivered
#32 — GDPR / ethics assessment delivered
#33 — Customer installer tested end-to-end on the VM
Compress-after-silver shipped (with .json.gz read-back support)
Postgres tuning + COPY upsert path
fast_silver_backfill.py drop-constraint script + pre-flight duplicate check
Sacha — Power BI filters across all dashboards
Sacha — user guide written
Johann — scrum management, project coordination
Johann — documentation contributions and review
Johann — presentation slides structure + content support
pending / carry over
Defense prep (next sprint — #36)
User guide polish (next sprint — #34)
Technical documentation polish (next sprint — #35)
Decision: SAP SAC track (#21, #22) abandoned — Power BI focus for the defense
Decision: storage encryption (#9), monitoring/alerts (#10), full IAM (#23), i18n (#30) deferred to future work
goal
Polish the customer-facing deliverables (technical doc, installation guide, user guide), finalise the slide deck, rehearse the defense.
deliverables
Technical documentation polish (#35) — Word + Markdown, 19 chapters
Installation guide polish (#34) — embedded screenshots
User guide polish — review pass against the actual UI
Defense slide deck (#36) — Bronze→Gold deep-dive owned by Dehlya (slides 7–13), with speaker notes
Personal cheatfile / study packet for Q&A (19 sections, ~600 lines)
Final installer end-to-end test on the VM
Last-minute fixes from the test run (e.g. .json.gz read-back, KNIME version pin)
done
Documentation regenerated to .docx with screenshots
Slide deck polished (slides 7–13) with technical speaker notes
AI declaration paragraph added to all customer-facing docs
KNIME workflows pinned to VM's KNIME 5.8 version
Bronze .json.gz read-back path fixed (audit-trail story now real)
/scrum/devops page removed (production deployment was abandoned)
pending / carry over
Final rehearsal of the defense
Last polish pass on installer prompt copy ("5-course tasting menu ☕")
Light test coverage (#37) — idempotency tested by re-run rather than pytest
Sprints are bounded by the weekly Friday review meetings — see the meeting log for the agenda and decisions from each. Sprints 4 and 6 ran two weeks each; the other sprints were one-week cycles.