docs / data

Sources & Structure

Three data sources flow through the medallion architecture: Bronze (raw) Silver (clean) Gold (aggregated).

Click a source for detailed field definitions and structure.

01 · data sources

02 · bronze layer (raw, immutable)

Files written once, never modified or deleted. Timestamped folder structure.

storage/bronze/jimmy/YYYY/MM/DD/HH/*.json

storage/bronze/jeremie/YYYY/MM/DD/HH/*.json

storage/bronze/weather/YYYY/MM/DD/Pred_YYYY-MM-DD.csv

01

Files are never modified or deleted after writing

02

One JSON file per apartment per minute (~245k total)

03

One weather CSV per day (~93 total)

04

api_token field is present in raw JSON — removed in Silver

03 · silver layer (clean, full resolution)

Bronze Silver transformations

01

JSON flattened into tabular rows (one row per sensor reading)

02

api_token stripped from all records

03

Datetime string (DD.MM.YYYY HH:MM) parsed to UTC timestamp

04

Door/window switch fields normalized to boolean

05

Room abbreviations standardized (Bhroom → Bathroom, Bdroom → Bedroom)

06

Outliers flagged with is_outlier = true — not removed

07

Weather sentinels (-99999.0) replaced with NULL

08

Watermark tracking for resume capability

silver schema tables

table

description

key columns

sensor_events

15M+ flattened sensor readings

apartment, room, sensor_type, field, timestamp

weather_clean

Standardized weather forecasts

timestamp, site, prediction_date

etl_watermark

Tracks processed sensor files

filename

weather_watermark

Tracks processed weather CSVs

filename

dim_buildings

Building metadata (from MySQL)

id

dim_rooms

Room details (from MySQL)

id

dim_sensors

Sensor-to-room mapping (from MySQL)

id

dim_devices

Appliance inventory (from MySQL)

id

ref_energy_profiles

Reference kWh/yr (from MySQL)

id

ref_parameters

Threshold configs (from MySQL)

id

log_sensor_errors

Sensor error logs (from MySQL)

id

04 · gold layer (star schema)

Aggregated at minute grain for BI dashboards and ML models. See the Star Schema page for full column-level detail.

5 dimensions

dim_datetime, dim_date, dim_apartment, dim_room, dim_device + dim_tariff

4 fact tables

fact_energy_minute, fact_environment_minute, fact_presence_minute, fact_device_health_day

1 materialized view

mv_energy_with_cost (energy × OIKEN tariff)