docs / data
Sources & Structure
Three data sources flow through the medallion architecture: Bronze (raw) → Silver (clean) → Gold (aggregated).
Click a source for detailed field definitions and structure.
01 · data sources
JSON Sensors
real-time · every minute
~245,000 JSON files from 2 apartments. 6 sensor categories: plugs, doors/windows, motion, meteo, humidity, consumption.
view details →
MySQL DB (pidb)
static · on-demand
10 reference tables from the school database. Buildings, rooms, sensors, devices, energy profiles, error logs.
view details →
Weather CSV (sFTP)
daily · ~7am
~93 forecast files with 46 prediction groups, 45 Swiss stations, 4 measurements. 153k rows per file.
view details →
02 · bronze layer (raw, immutable)
Files written once, never modified or deleted. Timestamped folder structure.
storage/bronze/jimmy/YYYY/MM/DD/HH/*.json
storage/bronze/jeremie/YYYY/MM/DD/HH/*.json
storage/bronze/weather/YYYY/MM/DD/Pred_YYYY-MM-DD.csv
Files are never modified or deleted after writing
One JSON file per apartment per minute (~245k total)
One weather CSV per day (~93 total)
api_token field is present in raw JSON — removed in Silver
03 · silver layer (clean, full resolution)
Bronze → Silver transformations
JSON flattened into tabular rows (one row per sensor reading)
api_token stripped from all records
Datetime string (DD.MM.YYYY HH:MM) parsed to UTC timestamp
Door/window switch fields normalized to boolean
Room abbreviations standardized (Bhroom → Bathroom, Bdroom → Bedroom)
Outliers flagged with is_outlier = true — not removed
Weather sentinels (-99999.0) replaced with NULL
Watermark tracking for resume capability
silver schema tables
table
description
key columns
sensor_events
15M+ flattened sensor readings
apartment, room, sensor_type, field, timestamp
weather_clean
Standardized weather forecasts
timestamp, site, prediction_date
etl_watermark
Tracks processed sensor files
filename
weather_watermark
Tracks processed weather CSVs
filename
dim_buildings
Building metadata (from MySQL)
id
dim_rooms
Room details (from MySQL)
id
dim_sensors
Sensor-to-room mapping (from MySQL)
id
dim_devices
Appliance inventory (from MySQL)
id
ref_energy_profiles
Reference kWh/yr (from MySQL)
id
ref_parameters
Threshold configs (from MySQL)
id
log_sensor_errors
Sensor error logs (from MySQL)
id
04 · gold layer (star schema)
Aggregated at minute grain for BI dashboards and ML models. See the Star Schema page for full column-level detail.
5 dimensions
dim_datetime, dim_date, dim_apartment, dim_room, dim_device + dim_tariff
4 fact tables
fact_energy_minute, fact_environment_minute, fact_presence_minute, fact_device_health_day
1 materialized view
mv_energy_with_cost (energy × OIKEN tariff)