reference

Data Structure

Sources, field definitions and layer schemas. Updated as the project progresses.

Sources → Bronze → Silver → Gold

01 · sources

MySQL DB (pidb)

static · on-demand

Access: Direct DB connection

Frequency: Static — queried periodically for snapshots

Format: Relational tables

BuildingsRoomsSensorsDevicesDIErrorsBuildingType

Full schemas TBD — need to query the DB directly.

JSON Sensors

real-time · every minute

Access: Local network via Raspberry Pi (recup.py)

Frequency: Every 1 minute — recup.py must complete in under 1 minute

Format: JSON, one file per collection run · DD.MM.YYYY HHMM_<User>_received.json

plugsdoorsWindowsmotionsmeteoshumiditiesconsumptions

Both apartments share the same JSON structure. Room names differ.

Weather CSV (sFTP)

daily · ~7am

Access: sFTP, folder Meteo2

Frequency: Daily, downloaded at ~7am

Format: CSV

PRED_T_2M_ctrlPRED_RELHUM_2M_ctrlPRED_TOT_PREC_ctrlPRED_GLOB_ctrl

Used as ML features for energy consumption forecasting.

02 · json sensor structure

Apartments

user fieldapartment
JimmyLoupJimmy
JeremieVianinJérémie

Both apartments share the same JSON structure. Room names differ.

Top-level fields

fieldtypedescription
userstringApartment owner identifier
api_tokenstringAuth token — must be masked/removed in Silver
datetimestringCollection timestamp — format DD.MM.YYYY HH:MM
plugsobjectSmart plug readings, keyed by room name
doorsWindowsobjectDoor/window sensor readings, keyed by room name
motionsobjectMotion sensor readings, keyed by room name
meteosobjectEnvironmental sensors, nested under meteo key
humiditiesobjectHumidity sensor readings, keyed by room name
consumptionsobjectWhole-house energy consumption, keyed by House
plugs

Rooms: Office, Livingroom

fieldtypedescription
timePlugintUnix timestamp of the reading
powerfloatCurrent power draw (W)
overpowerfloatOverpower value (W)
is_validboolWhether reading is valid
counter1/2/3floatPer-phase power counters (W)
totalintTotal cumulative energy (Wh)
temperaturefloatPlug internal temperature (°C)
overtemperatureboolOvertemperature flag
switchboolPlug on/off state
doorsWindows

Rooms: Jérémie: Kitchen, Laundry, Bhroom, Office, Bdroom · Jimmy: Office, Bhroom, Bdroom

fieldtypedescription
typestringDoor or Window
switchstring"on" = open · "off" = closed
batteryintBattery level (%)
defenseintDefense mode flag
motions

Rooms: Jérémie: Kitchen, Office · Jimmy: Livingroom, Office

fieldtypedescription
motionboolMotion detected
lightintLight level (lux)
temperaturefloatRoom temperature at sensor (°C)
meteos

Rooms: Livingroom, Outdoor, Office, Bdroom (both apartments)

fieldtypedescription
Temperaturefloat°C
CO2intppm
Humidityint%
NoiseintdB — Livingroom only
PressurefloathPa — Livingroom only
AbsolutePressurefloathPa — Livingroom only
battery_percentintBattery level (%) — indoor stations
Temperaturefloat°C — Outdoor station
Humidityint% — Outdoor station
humidities

Rooms: Jérémie: Laundry, Bhroom · Jimmy: Bhroom

fieldtypedescription
temperaturefloat°C
humidityfloat%
devicePowerintBattery level (%)
consumptions

Rooms: House (whole apartment)

fieldtypedescription
timeConsumptionintUnix timestamp
power1/2/3floatPer-phase power (W)
pf1/2/3floatPower factor per phase
current1/2/3floatCurrent per phase (A)
voltage1/2/3floatVoltage per phase (V)
is_valid1/2/3boolValidity flag per phase
switchboolMain switch state
total_powerfloatTotal house power draw (W)

Room name mapping

Room coverage differs between apartments — not all rooms have all sensor types.

JSON keyfull name
BhroomBathroom
BdroomBedroom
LivingroomLiving room
OfficeOffice
KitchenKitchen
LaundryLaundry room
OutdoorOutdoor station

03 · weather csv structure

CSV fields

fielddescriptionunit
TimeForecast timestampdatetime
ValueForecast valuedepends on Measurement
PredictionPredicted valuedepends on Measurement
SiteLocation identifier
MeasurementMeasurement type
UnitUnit of the value

Measurement types

measurementdescriptionunit
PRED_T_2M_ctrlAir temperature at 2m°C
PRED_RELHUM_2M_ctrlRelative humidity at 2m%
PRED_TOT_PREC_ctrlTotal precipitationmm
PRED_GLOB_ctrlGlobal solar radiationW/m²

04 · bronze layer

Raw, immutable storage

Files written once, never modified or deleted.

/bronze/jimmy/YYYY/MM/DD/HH/DD.MM.YYYY_HHMM_JimmyLoup_received.json

/bronze/jeremie/YYYY/MM/DD/HH/DD.MM.YYYY_HHMM_JeremieVianin_received.json

/bronze/mysql/YYYY/MM/DD/<table>.csv

/bronze/weather/YYYY/MM/DD/meteo2.csv

01

Files are never modified or deleted after writing

02

One JSON file per recup.py execution (~every minute)

03

api_token field is present in raw files — removed in Silver

05 · silver layer

Cleaned, flattened and joined. GDPR masking applied.

Bronze → Silver transformations

01

JSON flattened into tabular rows (one row per sensor reading)

02

api_token stripped

03

datetime string (DD.MM.YYYY HH:MM) parsed to UTC timestamp

04

switch fields on doorsWindows normalized to boolean ("on" → true)

05

Room abbreviations standardized (Bhroom → Bathroom, Bdroom → Bedroom)

06

Joined with apartment metadata from MySQL

07

Outliers flagged with is_outlier = true — not removed

08

Missing readings flagged

sensor_events

One row per sensor reading per timestamp.

columntypedescription
event_idINTPrimary key
apartmentVARCHARjimmy / jeremie
roomVARCHARStandardized room name
sensor_typeVARCHARplug / door / window / motion / meteo / humidity / consumption
fieldVARCHARSpecific field name (e.g. power, CO2, humidity)
valueFLOATNumeric reading
unitVARCHARUnit of the value
timestampDATETIMEUTC timestamp
is_outlierBOOLEANOutlier flag

apartment_metadata

Joined MySQL static tables. Addresses masked for GDPR.

columntypedescription
building_idINT
building_nameVARCHARMasked (GDPR)
room_idINT
room_nameVARCHARStandardized
sensor_idVARCHAR
sensor_typeVARCHAR
device_idVARCHAR

weather_clean

Standardized weather forecast data.

columntypedescription
timestampDATETIMEUTC
siteVARCHARLocation
temperature_cFLOAT°C
humidity_pctFLOAT%
precipitation_mmFLOATmm
radiation_wm2FLOATW/m²

di_errors_clean

Parsed and enriched error log.

columntypedescription
error_idINT
sensor_idVARCHAR
room_idINT
timestampDATETIMEUTC
error_messageVARCHARParsed
severityVARCHARlow / medium / high

06 · gold layer (olap / dwh)

Aggregated, business-ready data. KPIs computed here.

fact_energy

columntypedescription
dateDATE
hourINT0–23
apartmentVARCHAR
roomVARCHAR
consumption_wFLOATAvg power draw (W)
total_power_wFLOATWhole-house total_power (W)
consumption_chfFLOATCHF equivalent (rate TBD)

fact_occupation

columntypedescription
dateDATE
hourINT
apartmentVARCHAR
roomVARCHAR
is_occupiedBOOLEANDerived: motion=true OR door=open
occupied_minutesINTMinutes occupied in that hour

fact_environment

columntypedescription
timestampDATETIME
apartmentVARCHAR
roomVARCHAR
temperature_cFLOAT
humidity_pctFLOAT
co2_ppmINT
noise_dbINTLivingroom only
pressure_hpaFLOATLivingroom only
is_anomalyBOOLEAN

fact_sensor_reliability

columntypedescription
dateDATE
sensor_idVARCHAR
apartmentVARCHAR
roomVARCHAR
error_countINT
uptime_pctFLOAT
missing_readingsINT

fact_ml_predictions

Populated after ML sprint
columntypedescription
dateDATE
apartmentVARCHAR
roomVARCHAR
predicted_consumption_wFLOAT
predicted_occupiedBOOLEAN
model_versionVARCHAR

07 · open questions

Full MySQL table schemas

blocks apartment_metadata Silver table

CHF conversion rate for energy cost

blocks fact_energy

Presence logic — exact threshold for motion → occupied?

blocks fact_occupation

DIErrors sensor ID format — parseable?

blocks di_errors_clean