docs / workflows
Silver → Gold ETL
Star schema population — dimensions from Silver metadata, fact tables from minute-grain aggregation of sensor_events. Idempotent upserts, safe to rerun.
silver.sensor_events
15M+ rows — all sensor readings
apartment, room, sensor_type, field, value, timestamp
silver.dim_buildings
Building metadata (from MySQL)
houseName, id — enriches dim_apartment
silver.etl_watermark
Tracks processed files
Used by Bronze→Silver, not Gold (yet)
etl/silver_to_gold/create_gold.py — run once, idempotent
Creates gold schema (requires DB_ADMIN_URL for privileges)
Grants schema privileges to app user (DB_URL)
Creates all dimension and fact tables with CREATE TABLE IF NOT EXISTS
Creates indexes on date_key, apartment_key, room_key for fast BI queries
Verifies all tables exist after creation
dimension tables (5)
table
grain
key format
source
dim_date
1 day
YYYYMMDD
sensor_events timestamps
dim_datetime
1 minute
YYYYMMDDHHMM
sensor_events timestamps
dim_apartment
apartment
serial
sensor_events + dim_buildings enrichment
dim_room
room
serial
sensor_events rooms per apartment
dim_device
device
serial
synthetic: apartment_room_sensortype
fact tables (4)
table
grain
measures
sensors
fact_energy_minute
1 min / device
power_w, energy_wh, energy_kwh, counter_total, cost_chf, is_valid
plug, consumption
fact_environment_minute
1 min / room
temperature_c, humidity_pct, co2_ppm, noise_db, pressure_hpa, window/door flags, is_anomaly
meteo, humidity, door, window
fact_presence_minute
1 min / room
motion_count, door_open_flag, presence_flag, presence_prob (ML — future)
motion, door
fact_device_health_day
1 day / device
battery_min_pct, battery_avg_pct, error_count, missing_readings, uptime_pct
all (battery field)
Usage —python etl/silver_to_gold/create_gold.py
etl/silver_to_gold/populate_gold.py — 9-step process, idempotent upserts
dim_date—Extract unique dates from sensor_events, compute day_of_week, week, month, year, is_weekend
dim_datetime—Extract minute-level timestamps, truncate to minute grain, compute time attributes
dim_apartment—Distinct apartments from sensor_events, then enrich from silver.dim_buildings (building_name, building_id)
dim_room—Distinct rooms per apartment, linked via apartment_key foreign key
dim_device—Synthetic device_id from apartment + room + sensor_type, linked to room_key
fact_energy_minute—GROUP BY minute/device — MAX(power), MAX(total energy), convert Wh to kWh, BOOL_AND(not outlier) for is_valid
fact_environment_minute—GROUP BY minute/room — MAX per metric, BOOL_OR for window/door open flags, BOOL_OR(outlier) for is_anomaly
fact_presence_minute—GROUP BY minute/room — SUM(motion events), BOOL_OR(door open), presence_flag = motion OR door activity
fact_device_health_day—GROUP BY day/device — MIN/AVG battery percentage from battery field readings
Pattern — All fact tables use GROUP BY date_trunc('minute', timestamp) with CASE WHEN pivots to extract specific fields. Upserts via ON CONFLICT DO UPDATE — safe to rerun at any time.
Usage —python etl/silver_to_gold/populate_gold.py
Minute grain
Preserves temporal resolution for BI dashboards. Aggregation to hourly/daily done at query time.
Synthetic device_id
No real device IDs in sensor data — constructed as apartment_room_sensortype (e.g. jimmy_kitchen_plug).
Presence logic
Simple boolean: motion detected OR door opened = present. ML model (presence_prob) will replace this later.
Outlier passthrough
Silver flags outliers but keeps them. Gold uses is_valid / is_anomaly to let BI dashboards filter.
Full reload
Every run re-processes all Silver data (ON CONFLICT DO UPDATE). Future: watermark-based incremental.
5 dimension tables
datetime (minute), date (day), apartment, room, device
4 fact tables
energy, environment, presence (minute grain) + device health (day grain)