docs / architecture

Architecture

End-to-end pipeline design for UC2. View decisions →

UC2 - Apartments Domotic

End-to-end architecture - Data: Aug 18 - Oct 27, 2023 (simulated real-time feed, ~1 file/min/apt)
Fast flowevery 1 min
Slow flowdaily ~7am
SourcesExternal
MySQL DB (pidb)
Buildings - Rooms - Sensors
Devices - DIErrors - BuildingType
10.130.25.152:3306 - static
JSON Sensors x2
Jimmy & Jeremie apartments
Plugs - doors - motions - meteos
humidity - consumption
SMB share (Z:\) - every 1 min
Weather CSV
sFTP folder Meteo2
Temp - humidity - rain
solar radiation
PRED_* fields - external
watcher.py (loop 60s) \u2192 bulk_to_bronze.py (16 threads)
weather_download.py - daily cron
BronzeRaw storagedone
Raw JSON - Jimmy
Immutable copy on disk
/bronze/jimmy/YYYY/MM/DD/HH/
~115k files
Raw JSON - Jeremie
Immutable copy on disk
/bronze/jeremie/YYYY/MM/DD/HH/
~130k files
Raw Weather CSVs
Daily forecast files
/bronze/weather/YYYY/MM/DD/
flatten_sensors.py - clean - join - deduplicate - flag outliers - watermark resume
SilverClean datadone
sensor_events
Flattened rows - room + type
value + timestamp - outliers flagged
not removed - is_outlier column
PostgreSQL - silver schema
15M+ rows - both apartments
dim_* tables (10)
buildings - rooms - sensors
devices - building_types
energy_profiles - parameters
import_mysql_to_silver.py
from MySQL pidb - GDPR: users skipped
weather_clean
Standardized forecasts
temp - humidity - rain - radiation
clean_weather.py (Sacha)
etl_watermark
Tracks processed files
resume-capable pipeline
245k+ entries
Silver \u2192 Gold ETL - hourly & daily aggregation - OLAP star schema - KPI computation
GoldOLAP / DWHdone
fact_energy_minute
1 device x 1 min grain
power_w - energy_kwh - cost_chf
Power BI - #19
fact_environment_minute
1 room x 1 min grain
temp - humidity - CO2 - noise
window/door flags - is_anomaly
Power BI - #20
fact_presence_minute
1 room x 1 min grain
motion_count - presence_flag
presence_prob
SAP SAC - #22
fact_device_health_day
1 device x 1 day grain
error_count - uptime_pct
missing_readings - battery
Power BI - #10
fact_weather_day
1 site x 1 horizon x 1 day
temp - humidity - precipitation
radiation
Power BI - #19 overlay
fact_prediction
1 room x 1 model x 1 target
predicted_value - confidence
dim_model + dim_horizon + dim_target
PBI + SAC - #26 #27 #29
BI reads enriched Gold directly
ML / Data Science planned
Reads Silver for training (full resolution)
Writes predictions back to Gold
Energy ForecastPresence PredictionAnomaly Detection
Python - scikit-learn - KNIME - #25 #26 #27 #28
fact_ml_predictions → Gold
direct DB connection
ETL export - load into SAC data model - #21
BI / VizDashboardsplanned
Power BIMicrosoft
Energy Dashboard
kWh + CHF over time
per room / apt - equipment
weather overlay - forecast
#19 - #29 - filter by room, date
Environment Dashboard
Temp - humidity - CO2
windows/doors status by room
#20 - filter by room, time
SAP Analytics CloudSAP
Room Presence Dashboard
Occupation by room over time
usage patterns - filter by
period & room
#22 - FR / ES / RO / PT (#30)
Access Control
Owner vs Renter roles
Row-level security per apt
#23 - #24

Stack (ADR-001 to 004)

--ADR-001Python 3.11 + asyncio for ingestion
--ADR-002PostgreSQL for Silver and Gold (dev + prod)
--ADR-003Airflow for orchestration (later)
--ADR-004File system for Bronze (timestamped paths)

Pipeline scripts

--watcherwatcher.py — loop every 60s, skip if no new files
--ingestbulk_to_bronze.py — SMB → Bronze, 16 threads, resume
--flattenflatten_sensors.py — Bronze → Silver, 8 workers, watermark
--mysqlimport_mysql_to_silver.py — 10 dim/ref tables, idempotent
--weatherclean_weather.py — weather CSV cleaning, in review

Current state (Sprint 3)

--Bronze245k+ JSON files across both apartments
--Silver15M+ sensor_events + 10 dimension tables
--GoldDone — 6 fact tables + 9 dimensions (star schema v2)
--MLPlanned — Sprint 5 (reads Silver, writes Gold)
--BIPlanned — Sprint 4
--Proddomotic_prod deployed on VM — Silver restored

Open questions

--CHF conversion rate for energy cost — decided: Oiken tariffs
--Presence logic — exact threshold for motion → occupied
--SAC export mechanism — direct connector or file-based?
--Airflow setup timing — needed for Sprint 4?

Fast flow — sensors

JSON files from 2 apartments via local network

recup.py polls every minute using asyncio

Raw files stored in Bronze (timestamped paths)

Flattened → Silver → aggregated → Gold

Slow flow — weather

CSV downloaded daily from sFTP (Meteo2) at ~7am

Raw CSV stored in Bronze

Parsed, standardized → Silver

Joined with sensor data in Gold for ML features

ML enrichment

Reads from Gold layer

Energy consumption forecast (time series)

Room presence prediction (classification)

Predictions written back to Gold

BI layer

Power BI — energy & environment dashboards

SAP Analytics Cloud — room presence dashboard

Row-level security per apartment owner

Available in FR, ES, RO, PT