reference
Data Structure
Sources, field definitions and layer schemas. Updated as the project progresses.
Sources → Bronze → Silver → Gold
01 · sources
MySQL DB (pidb)
static · on-demand
Access: Direct DB connection
Frequency: Static — queried periodically for snapshots
Format: Relational tables
Full schemas TBD — need to query the DB directly.
JSON Sensors
real-time · every minute
Access: Local network via Raspberry Pi (recup.py)
Frequency: Every 1 minute — recup.py must complete in under 1 minute
Format: JSON, one file per collection run · DD.MM.YYYY HHMM_<User>_received.json
Both apartments share the same JSON structure. Room names differ.
Weather CSV (sFTP)
daily · ~7am
Access: sFTP, folder Meteo2
Frequency: Daily, downloaded at ~7am
Format: CSV
Used as ML features for energy consumption forecasting.
02 · json sensor structure
Apartments
| user field | apartment |
|---|---|
| JimmyLoup | Jimmy |
| JeremieVianin | Jérémie |
Both apartments share the same JSON structure. Room names differ.
Top-level fields
| field | type | description |
|---|---|---|
| user | string | Apartment owner identifier |
| api_token | string | Auth token — must be masked/removed in Silver |
| datetime | string | Collection timestamp — format DD.MM.YYYY HH:MM |
| plugs | object | Smart plug readings, keyed by room name |
| doorsWindows | object | Door/window sensor readings, keyed by room name |
| motions | object | Motion sensor readings, keyed by room name |
| meteos | object | Environmental sensors, nested under meteo key |
| humidities | object | Humidity sensor readings, keyed by room name |
| consumptions | object | Whole-house energy consumption, keyed by House |
plugs↓
Rooms: Office, Livingroom
| field | type | description |
|---|---|---|
| timePlug | int | Unix timestamp of the reading |
| power | float | Current power draw (W) |
| overpower | float | Overpower value (W) |
| is_valid | bool | Whether reading is valid |
| counter1/2/3 | float | Per-phase power counters (W) |
| total | int | Total cumulative energy (Wh) |
| temperature | float | Plug internal temperature (°C) |
| overtemperature | bool | Overtemperature flag |
| switch | bool | Plug on/off state |
doorsWindows↓
Rooms: Jérémie: Kitchen, Laundry, Bhroom, Office, Bdroom · Jimmy: Office, Bhroom, Bdroom
| field | type | description |
|---|---|---|
| type | string | Door or Window |
| switch | string | "on" = open · "off" = closed |
| battery | int | Battery level (%) |
| defense | int | Defense mode flag |
motions↓
Rooms: Jérémie: Kitchen, Office · Jimmy: Livingroom, Office
| field | type | description |
|---|---|---|
| motion | bool | Motion detected |
| light | int | Light level (lux) |
| temperature | float | Room temperature at sensor (°C) |
meteos↓
Rooms: Livingroom, Outdoor, Office, Bdroom (both apartments)
| field | type | description |
|---|---|---|
| Temperature | float | °C |
| CO2 | int | ppm |
| Humidity | int | % |
| Noise | int | dB — Livingroom only |
| Pressure | float | hPa — Livingroom only |
| AbsolutePressure | float | hPa — Livingroom only |
| battery_percent | int | Battery level (%) — indoor stations |
| Temperature | float | °C — Outdoor station |
| Humidity | int | % — Outdoor station |
humidities↓
Rooms: Jérémie: Laundry, Bhroom · Jimmy: Bhroom
| field | type | description |
|---|---|---|
| temperature | float | °C |
| humidity | float | % |
| devicePower | int | Battery level (%) |
consumptions↓
Rooms: House (whole apartment)
| field | type | description |
|---|---|---|
| timeConsumption | int | Unix timestamp |
| power1/2/3 | float | Per-phase power (W) |
| pf1/2/3 | float | Power factor per phase |
| current1/2/3 | float | Current per phase (A) |
| voltage1/2/3 | float | Voltage per phase (V) |
| is_valid1/2/3 | bool | Validity flag per phase |
| switch | bool | Main switch state |
| total_power | float | Total house power draw (W) |
Room name mapping
Room coverage differs between apartments — not all rooms have all sensor types.
| JSON key | full name |
|---|---|
| Bhroom | Bathroom |
| Bdroom | Bedroom |
| Livingroom | Living room |
| Office | Office |
| Kitchen | Kitchen |
| Laundry | Laundry room |
| Outdoor | Outdoor station |
03 · weather csv structure
CSV fields
| field | description | unit |
|---|---|---|
| Time | Forecast timestamp | datetime |
| Value | Forecast value | depends on Measurement |
| Prediction | Predicted value | depends on Measurement |
| Site | Location identifier | — |
| Measurement | Measurement type | — |
| Unit | Unit of the value | — |
Measurement types
| measurement | description | unit |
|---|---|---|
| PRED_T_2M_ctrl | Air temperature at 2m | °C |
| PRED_RELHUM_2M_ctrl | Relative humidity at 2m | % |
| PRED_TOT_PREC_ctrl | Total precipitation | mm |
| PRED_GLOB_ctrl | Global solar radiation | W/m² |
04 · bronze layer
Raw, immutable storage
Files written once, never modified or deleted.
/bronze/jimmy/YYYY/MM/DD/HH/DD.MM.YYYY_HHMM_JimmyLoup_received.json
/bronze/jeremie/YYYY/MM/DD/HH/DD.MM.YYYY_HHMM_JeremieVianin_received.json
/bronze/mysql/YYYY/MM/DD/<table>.csv
/bronze/weather/YYYY/MM/DD/meteo2.csv
Files are never modified or deleted after writing
One JSON file per recup.py execution (~every minute)
api_token field is present in raw files — removed in Silver
05 · silver layer
Cleaned, flattened and joined. GDPR masking applied.
Bronze → Silver transformations
JSON flattened into tabular rows (one row per sensor reading)
api_token stripped
datetime string (DD.MM.YYYY HH:MM) parsed to UTC timestamp
switch fields on doorsWindows normalized to boolean ("on" → true)
Room abbreviations standardized (Bhroom → Bathroom, Bdroom → Bedroom)
Joined with apartment metadata from MySQL
Outliers flagged with is_outlier = true — not removed
Missing readings flagged
sensor_events
One row per sensor reading per timestamp.
| column | type | description |
|---|---|---|
| event_id | INT | Primary key |
| apartment | VARCHAR | jimmy / jeremie |
| room | VARCHAR | Standardized room name |
| sensor_type | VARCHAR | plug / door / window / motion / meteo / humidity / consumption |
| field | VARCHAR | Specific field name (e.g. power, CO2, humidity) |
| value | FLOAT | Numeric reading |
| unit | VARCHAR | Unit of the value |
| timestamp | DATETIME | UTC timestamp |
| is_outlier | BOOLEAN | Outlier flag |
apartment_metadata
Joined MySQL static tables. Addresses masked for GDPR.
| column | type | description |
|---|---|---|
| building_id | INT | — |
| building_name | VARCHAR | Masked (GDPR) |
| room_id | INT | — |
| room_name | VARCHAR | Standardized |
| sensor_id | VARCHAR | — |
| sensor_type | VARCHAR | — |
| device_id | VARCHAR | — |
weather_clean
Standardized weather forecast data.
| column | type | description |
|---|---|---|
| timestamp | DATETIME | UTC |
| site | VARCHAR | Location |
| temperature_c | FLOAT | °C |
| humidity_pct | FLOAT | % |
| precipitation_mm | FLOAT | mm |
| radiation_wm2 | FLOAT | W/m² |
di_errors_clean
Parsed and enriched error log.
| column | type | description |
|---|---|---|
| error_id | INT | — |
| sensor_id | VARCHAR | — |
| room_id | INT | — |
| timestamp | DATETIME | UTC |
| error_message | VARCHAR | Parsed |
| severity | VARCHAR | low / medium / high |
06 · gold layer (olap / dwh)
Aggregated, business-ready data. KPIs computed here.
fact_energy
| column | type | description |
|---|---|---|
| date | DATE | — |
| hour | INT | 0–23 |
| apartment | VARCHAR | — |
| room | VARCHAR | — |
| consumption_w | FLOAT | Avg power draw (W) |
| total_power_w | FLOAT | Whole-house total_power (W) |
| consumption_chf | FLOAT | CHF equivalent (rate TBD) |
fact_occupation
| column | type | description |
|---|---|---|
| date | DATE | — |
| hour | INT | — |
| apartment | VARCHAR | — |
| room | VARCHAR | — |
| is_occupied | BOOLEAN | Derived: motion=true OR door=open |
| occupied_minutes | INT | Minutes occupied in that hour |
fact_environment
| column | type | description |
|---|---|---|
| timestamp | DATETIME | — |
| apartment | VARCHAR | — |
| room | VARCHAR | — |
| temperature_c | FLOAT | — |
| humidity_pct | FLOAT | — |
| co2_ppm | INT | — |
| noise_db | INT | Livingroom only |
| pressure_hpa | FLOAT | Livingroom only |
| is_anomaly | BOOLEAN | — |
fact_sensor_reliability
| column | type | description |
|---|---|---|
| date | DATE | — |
| sensor_id | VARCHAR | — |
| apartment | VARCHAR | — |
| room | VARCHAR | — |
| error_count | INT | — |
| uptime_pct | FLOAT | — |
| missing_readings | INT | — |
fact_ml_predictions
Populated after ML sprint| column | type | description |
|---|---|---|
| date | DATE | — |
| apartment | VARCHAR | — |
| room | VARCHAR | — |
| predicted_consumption_w | FLOAT | — |
| predicted_occupied | BOOLEAN | — |
| model_version | VARCHAR | — |
07 · open questions
Full MySQL table schemas
blocks apartment_metadata Silver table
CHF conversion rate for energy cost
blocks fact_energy
Presence logic — exact threshold for motion → occupied?
blocks fact_occupation
DIErrors sensor ID format — parseable?
blocks di_errors_clean