docs / architecture
Architecture
End-to-end pipeline design for UC2. View decisions →
UC2 - Apartments Domotic
End-to-end architecture - Data: Aug 18 - Oct 27, 2023 (simulated real-time feed, ~1 file/min/apt)
Fast flowevery 1 min
Slow flowdaily ~7am
SourcesExternal
MySQL DB (pidb)
Buildings - Rooms - Sensors
Devices - DIErrors - BuildingType
Devices - DIErrors - BuildingType
10.130.25.152:3306 - static
JSON Sensors x2
Jimmy & Jeremie apartments
Plugs - doors - motions - meteos
humidity - consumption
Plugs - doors - motions - meteos
humidity - consumption
SMB share (Z:\) - every 1 min
Weather CSV
sFTP folder Meteo2
Temp - humidity - rain
solar radiation
Temp - humidity - rain
solar radiation
PRED_* fields - external
watcher.py (loop 60s) \u2192 bulk_to_bronze.py (16 threads)
weather_download.py - daily cron
BronzeRaw storagedone
Raw JSON - Jimmy
Immutable copy on disk
/bronze/jimmy/YYYY/MM/DD/HH/
~115k files
Raw JSON - Jeremie
Immutable copy on disk
/bronze/jeremie/YYYY/MM/DD/HH/
~130k files
Raw Weather CSVs
Daily forecast files
/bronze/weather/YYYY/MM/DD/
flatten_sensors.py - clean - join - deduplicate - flag outliers - watermark resume
SilverClean datadone
sensor_events
Flattened rows - room + type
value + timestamp - outliers flagged
not removed - is_outlier column
value + timestamp - outliers flagged
not removed - is_outlier column
PostgreSQL - silver schema
15M+ rows - both apartments
dim_* tables (10)
buildings - rooms - sensors
devices - building_types
energy_profiles - parameters
devices - building_types
energy_profiles - parameters
import_mysql_to_silver.py
from MySQL pidb - GDPR: users skipped
weather_clean
Standardized forecasts
temp - humidity - rain - radiation
temp - humidity - rain - radiation
clean_weather.py (Sacha)
etl_watermark
Tracks processed files
resume-capable pipeline
resume-capable pipeline
245k+ entries
Silver \u2192 Gold ETL - hourly & daily aggregation - OLAP star schema - KPI computation
GoldOLAP / DWHdone
fact_energy_minute
1 device x 1 min grain
power_w - energy_kwh - cost_chf
power_w - energy_kwh - cost_chf
Power BI - #19
fact_environment_minute
1 room x 1 min grain
temp - humidity - CO2 - noise
window/door flags - is_anomaly
temp - humidity - CO2 - noise
window/door flags - is_anomaly
Power BI - #20
fact_presence_minute
1 room x 1 min grain
motion_count - presence_flag
presence_prob
motion_count - presence_flag
presence_prob
SAP SAC - #22
fact_device_health_day
1 device x 1 day grain
error_count - uptime_pct
missing_readings - battery
error_count - uptime_pct
missing_readings - battery
Power BI - #10
fact_weather_day
1 site x 1 horizon x 1 day
temp - humidity - precipitation
radiation
temp - humidity - precipitation
radiation
Power BI - #19 overlay
fact_prediction
1 room x 1 model x 1 target
predicted_value - confidence
dim_model + dim_horizon + dim_target
predicted_value - confidence
dim_model + dim_horizon + dim_target
PBI + SAC - #26 #27 #29
BI reads enriched Gold directly
ML / Data Science planned
Reads Silver for training (full resolution)
Writes predictions back to Gold
Writes predictions back to Gold
Energy ForecastPresence PredictionAnomaly Detection
Python - scikit-learn - KNIME - #25 #26 #27 #28
fact_ml_predictions → Gold
direct DB connection
ETL export - load into SAC data model - #21
BI / VizDashboardsplanned
Power BIMicrosoft
Energy Dashboard
kWh + CHF over time
per room / apt - equipment
weather overlay - forecast
per room / apt - equipment
weather overlay - forecast
#19 - #29 - filter by room, date
Environment Dashboard
Temp - humidity - CO2
windows/doors status by room
windows/doors status by room
#20 - filter by room, time
SAP Analytics CloudSAP
Room Presence Dashboard
Occupation by room over time
usage patterns - filter by
period & room
usage patterns - filter by
period & room
#22 - FR / ES / RO / PT (#30)
Access Control
Owner vs Renter roles
Row-level security per apt
Row-level security per apt
#23 - #24
Stack (ADR-001 to 004)
--ADR-001Python 3.11 + asyncio for ingestion
--ADR-002PostgreSQL for Silver and Gold (dev + prod)
--ADR-003Airflow for orchestration (later)
--ADR-004File system for Bronze (timestamped paths)
Pipeline scripts
--watcherwatcher.py — loop every 60s, skip if no new files
--ingestbulk_to_bronze.py — SMB → Bronze, 16 threads, resume
--flattenflatten_sensors.py — Bronze → Silver, 8 workers, watermark
--mysqlimport_mysql_to_silver.py — 10 dim/ref tables, idempotent
--weatherclean_weather.py — weather CSV cleaning, in review
Current state (Sprint 3)
--Bronze245k+ JSON files across both apartments
--Silver15M+ sensor_events + 10 dimension tables
--GoldDone — 6 fact tables + 9 dimensions (star schema v2)
--MLPlanned — Sprint 5 (reads Silver, writes Gold)
--BIPlanned — Sprint 4
--Proddomotic_prod deployed on VM — Silver restored
Open questions
--CHF conversion rate for energy cost — decided: Oiken tariffs
--Presence logic — exact threshold for motion → occupied
--SAC export mechanism — direct connector or file-based?
--Airflow setup timing — needed for Sprint 4?
Fast flow — sensors
—
JSON files from 2 apartments via local network
—
recup.py polls every minute using asyncio
—
Raw files stored in Bronze (timestamped paths)
—
Flattened → Silver → aggregated → Gold
Slow flow — weather
—
CSV downloaded daily from sFTP (Meteo2) at ~7am
—
Raw CSV stored in Bronze
—
Parsed, standardized → Silver
—
Joined with sensor data in Gold for ML features
ML enrichment
—
Reads from Gold layer
—
Energy consumption forecast (time series)
—
Room presence prediction (classification)
—
Predictions written back to Gold
BI layer
—
Power BI — energy & environment dashboards
—
SAP Analytics Cloud — room presence dashboard
—
Row-level security per apartment owner
—
Available in FR, ES, RO, PT