docs / data / weather
Weather Forecast Data
Daily weather prediction files from the Meteo2 sFTP server. Each file contains a multi-group forecast for 45 Swiss weather stations, covering ~3 days ahead at 3-hour resolution.
file overview
| Filename format | Pred_YYYY-MM-DD.csv |
| Rows per file | ~153,000 |
| Prediction groups | 46 (numbered 00 to 45) |
| Sites | 45 weather stations across Switzerland |
| Measurements | 4 (temperature, humidity, precipitation, radiation) |
| Time horizon | ~3 days (72h) from prediction date |
| Time resolution | 3-hour intervals (predictions 00-33), daily (34-45) |
| Delivery | Daily ~7am via sFTP (/Meteo2) |
| Total files | ~93 (Aug-Oct 2023) |
how to read a weather file — step by step
Each file is named after the day the forecast was issued (e.g. Pred_2023-10-03.csv = forecast issued on October 3rd). Inside, every row is one predicted value for one measurement, at one weather station, at one target time, from one prediction group.
Time,Value,Prediction,Site,Measurement,Unit // Pred 00, temperature — has real values 2023-10-03 09:00:00+00:00,18.1,00,Sion,PRED_T_2M_ctrl,°C // Pred 00, radiation — SENTINEL (no data!) 2023-10-03 09:00:00+00:00,-99999.0,00,Sion,PRED_GLOB_ctrl,Watt/m2 // Pred 03, same hour — radiation has real value 2023-10-03 09:00:00+00:00,351.9,03,Sion,PRED_GLOB_ctrl,Watt/m2 // Pred 01, different hours (offset +1) 2023-10-03 01:00:00+00:00,13.5,01,Sion,PRED_T_2M_ctrl,°C 2023-10-03 04:00:00+00:00,13.0,01,Sion,PRED_T_2M_ctrl,°C 2023-10-03 07:00:00+00:00,14.1,01,Sion,PRED_T_2M_ctrl,°C 2023-10-03 10:00:00+00:00,21.0,01,Sion,PRED_T_2M_ctrl,°C
reading the example above
Time = this prediction is for October 3rd at 09:00 UTC
Value = predicted temperature is 19.4°C
Prediction = this comes from prediction group 03
Site = weather station in Sion
Measurement = temperature at 2 meters (PRED_T_2M_ctrl)
example 1 — one prediction group covers specific hours
Prediction group 01 for Sion, temperature — it covers hours 01, 04, 07, 10, 13, 16, 19, 22 (every 3 hours, offset by 1):
| time | value | pred | site | measurement |
|---|---|---|---|---|
| 2023-10-03 01:00 | 13.5°C | 01 | Sion | PRED_T_2M_ctrl |
| 2023-10-03 04:00 | 13.0°C | 01 | Sion | PRED_T_2M_ctrl |
| 2023-10-03 07:00 | 14.1°C | 01 | Sion | PRED_T_2M_ctrl |
| 2023-10-03 10:00 | 21.0°C | 01 | Sion | PRED_T_2M_ctrl |
| 2023-10-03 13:00 | 24.7°C | 01 | Sion | PRED_T_2M_ctrl |
| 2023-10-03 16:00 | 23.1°C | 01 | Sion | PRED_T_2M_ctrl |
| 2023-10-03 19:00 | 17.1°C | 01 | Sion | PRED_T_2M_ctrl |
| 2023-10-03 22:00 | 18.6°C | 01 | Sion | PRED_T_2M_ctrl |
Notice: hours go 01, 04, 07, 10... — not every hour. To get hours 00, 03, 06... you need prediction group 00. To get hours 02, 05, 08... you need prediction group 02. All three together = full 24h coverage.
example 2 — multiple predictions for the same hour
For Sion at 09:00, temperature — prediction groups 00, 03, 06, 09, 12, 15 all cover this hour (they all use the 00/03/06/09... offset). Each gives a slightly different forecast:
| time | value | pred | interpretation |
|---|---|---|---|
| 2023-10-03 09:00 | 18.1°C | 00 | Model run A |
| 2023-10-03 09:00 | 19.4°C | 03 | Model run B |
| 2023-10-03 09:00 | 19.5°C | 06 | Model run C |
| 2023-10-03 09:00 | 19.3°C | 09 | Model run D |
| 2023-10-03 09:00 | 19.6°C | 12 | Model run E |
| 2023-10-03 09:00 | 19.6°C | 15 | Model run F |
These are different model initializations giving slightly different forecasts. The average (19.25°C) is the best estimate. The spread (18.1 to 19.6) indicates uncertainty.
example 3 — the sentinel problem in prediction 00
For Sion at 09:00, solar radiation — prediction 00 has NO DATA (-99999.0), while predictions 03 and 06 have real values:
| time | value | pred | measurement | problem? |
|---|---|---|---|---|
| 2023-10-03 09:00 | -99999.0 | 00 | PRED_GLOB_ctrl | SENTINEL — no data |
| 2023-10-03 09:00 | 351.9 W/m² | 03 | PRED_GLOB_ctrl | Real value |
| 2023-10-03 09:00 | 354.7 W/m² | 06 | PRED_GLOB_ctrl | Real value |
This is the bug—If the code keeps only run 00 (keep=first), we lose all radiation and precipitation data for hours 00, 03, 06, 09, 12, 15, 18, 21 — 8 out of 24 hours every day. Fix: skip sentinel values and use later runs (03, 06, 09...) which have real data.
putting it all together
to reconstruct a full day of weather for one site
Step 1: Take prediction 00 (hours 00,03,06...) + prediction 01 (hours 01,04,07...) + prediction 02 (hours 02,05,08...) = all 24 hours at 3h intervals
Step 2: For any sentinel values (-99999.0), replace with the average of predictions 03/06/09... for the same hour
Step 3: Optionally, average all model runs per hour for a more robust estimate
Step 4: Interpolate from 3h to 1h or 15min to match sensor data grain
CSV column structure
Each CSV has a header row followed by ~153,000 data rows. Columns are comma-separated.
| column | type | description |
|---|---|---|
| Time | timestamp | Forecast target timestamp (UTC). Format: YYYY-MM-DD HH:MM:SS+00:00 |
| Value | float | Predicted value. -99999.0 = sentinel (no data for this prediction/measurement) |
| Prediction | int (00-45) | Incremental counter — multiple forecasts computed per day. Each is a separate model run. |
| Site | string | Weather station name (45 sites across Switzerland) |
| Measurement | string | Measurement code (4 types) |
| Unit | string | Physical unit |
measurements (4 types)
| code | name | unit | sentinel in pred 00? |
|---|---|---|---|
| PRED_T_2M_ctrl | Temperature | °C | No (has values in pred 00) |
| PRED_RELHUM_2M_ctrl | Humidity | % | No (has values in pred 00) |
| PRED_TOT_PREC_ctrl | Precipitation | mm | Yes (-99999.0 in pred 00) |
| PRED_GLOB_ctrl | Solar radiation | W/m² | Yes (-99999.0 in pred 00) |
Sentinel value—-99999.0 means no data. Prediction group 00 has sentinel values for PRED_GLOB_ctrl (radiation) and PRED_TOT_PREC_ctrl (precipitation) but real values for temperature and humidity. All other prediction groups (01-45) have real values for all 4 measurements.
prediction numbers — multiple model runs per day
Official definition—The Prediction column is an incremental counter of the forecast. Multiple forecasts are computed per day, each with slightly different initial conditions. Higher numbers = later model runs within the same day.
Each prediction number represents a separate model run. Runs 00-33 produce 3-hourly forecasts, while runs 34-45 produce daily values only. Different runs cover different time offsets: run 00 outputs hours 00, 03, 06..., run 01 outputs 01, 04, 07..., run 02 outputs 02, 05, 08... Together, a triplet (e.g. 00+01+02) covers every hour. Runs that share the same offset (00, 03, 06, 09...) give slightly different values for the same hours — these can be averaged for a more robust estimate.
| prediction numbers | hours covered | interval | groups | rows/group |
|---|---|---|---|---|
| 00, 03, 06, 09, ..., 33 | 00, 03, 06, 09, 12, 15, 18, 21 | 3-hour | 12 groups | 4,320 each |
| 01, 04, 07, 10, ..., 31 | 01, 04, 07, 10, 13, 16, 19, 22 | 3-hour | 11 groups | 4,320 each |
| 02, 05, 08, 11, ..., 32 | 02, 05, 08, 11, 14, 17, 20, 23 | 3-hour | 11 groups | 4,320 each |
| 34, 35, ..., 45 | Daily only (00:00 or 13:00) | daily | 12 groups | 540 each |
hour coverage example (predictions 00, 01, 02)
Pred 00
00, 03, 06, 09, 12, 15, 18, 21
Pred 01
01, 04, 07, 10, 13, 16, 19, 22
Pred 02
02, 05, 08, 11, 14, 17, 20, 23
= every hour from 00 to 23 is covered
Runs sharing the same hour offset (e.g. 00, 03, 06, ..., 33 all cover hour 00:00) are independent forecasts for the same target time. With ~12 runs per hour, averaging gives a more reliable estimate and the spread indicates uncertainty.
For ML / energy prediction—To get the best hourly weather forecast: (1) combine runs 00+01+02 for full 24h coverage at 3-hour resolution, (2) average across all runs sharing the same hour for a robust estimate, (3) interpolate to 1h or 15min to match sensor grain. Always filter by prediction_date to only use forecasts available at simulation time.
temporal context — prediction_date
The filename (Pred_YYYY-MM-DD.csv) is the prediction issue date — the day the forecast was generated. Each file forecasts ~3 days ahead. This means multiple files contain predictions for the same target timestamps, but issued on different days.
overlap example
Pred_2023-09-13.csv contains forecasts for Sep 13, 14, 15
Pred_2023-09-14.csv contains forecasts for Sep 14, 15, 16
Pred_2023-09-15.csv contains forecasts for Sep 15, 16, 17
All three files have predictions for Sep 15 — but issued 2 days, 1 day, and 0 days before. For simulation: use the file that was available at the time of the decision.
Requirement—The prediction_date (from filename) MUST be stored in Silver alongside the forecast data. Without it, we cannot simulate real-time decision making — we would be using future information. The upsert key must include prediction_date: UNIQUE(timestamp, site, prediction_date).
weather stations (45 sites)
For the apartment domotics use case, the relevant station is the one closest to Valais (e.g. Sion, Visp, Zermatt, Montana, Evionnaz). The exact station to use depends on the apartment location.
data volume
| metric | value |
|---|---|
| Rows per file | ~153,000 |
| Files (Aug-Oct 2023) | ~93 |
| Total raw rows | ~14.2 million |
| After sentinel removal | ~12.5 million (estimate) |
| After dedup to best hourly | ~670,000 (3 days x 24h x 45 sites x 4 meas x 93 files) |
| If keeping all prediction groups | ~14M in Silver |
| If aggregating to ensemble mean | ~670,000 in Silver |
recommended processing strategy
Parse CSV, add prediction_date column from filename (Pred_YYYY-MM-DD.csv)
Remove sentinel values (-99999.0)
Keep prediction group number in Silver for full traceability
For Gold / ML: aggregate by averaging across prediction triplets (ensemble mean per hour)
Store ensemble spread (std dev) alongside mean for uncertainty estimation
Always filter by prediction_date <= target_date for simulation scenarios
Interpolate 3-hour forecasts to 1-hour or 15-min to match sensor data grain
alignment with sensor data
| sensor data | weather forecasts | |
|---|---|---|
| Source | JSON files (SMB) | CSV files (sFTP) |
| Frequency | Every 1 minute | Daily file, 3-hour intervals |
| Time resolution | 1 minute | 3 hours (pred 00-33) / daily (pred 34-45) |
| Location | Per apartment (jimmy, jeremie) | Per weather station (45 sites) |
| Storage (Silver) | silver.sensor_events | silver.weather_clean |
| Gold grain | 1 minute (fact tables) | Needs interpolation to match |
Alignment for ML—Sensor data is per-minute, weather is per-3-hours. For energy prediction models (15-min or hourly), weather must be interpolated (linear or forward-fill) to match the sensor grain. The Gold layer or ML pipeline should handle this interpolation, not the Silver ETL.