docs / data / weather

Weather Forecast Data

Daily weather prediction files from the Meteo2 sFTP server. Each file contains a multi-group forecast for 45 Swiss weather stations, covering ~3 days ahead at 3-hour resolution.

file overview

Filename formatPred_YYYY-MM-DD.csv
Rows per file~153,000
Prediction groups46 (numbered 00 to 45)
Sites45 weather stations across Switzerland
Measurements4 (temperature, humidity, precipitation, radiation)
Time horizon~3 days (72h) from prediction date
Time resolution3-hour intervals (predictions 00-33), daily (34-45)
DeliveryDaily ~7am via sFTP (/Meteo2)
Total files~93 (Aug-Oct 2023)

how to read a weather file — step by step

Each file is named after the day the forecast was issued (e.g. Pred_2023-10-03.csv = forecast issued on October 3rd). Inside, every row is one predicted value for one measurement, at one weather station, at one target time, from one prediction group.

Pred_2023-10-03.csv (excerpt — 8 rows out of 153,360)
Time,Value,Prediction,Site,Measurement,Unit

// Pred 00, temperature — has real values
2023-10-03 09:00:00+00:00,18.1,00,Sion,PRED_T_2M_ctrl,°C

// Pred 00, radiation — SENTINEL (no data!)
2023-10-03 09:00:00+00:00,-99999.0,00,Sion,PRED_GLOB_ctrl,Watt/m2

// Pred 03, same hour — radiation has real value
2023-10-03 09:00:00+00:00,351.9,03,Sion,PRED_GLOB_ctrl,Watt/m2

// Pred 01, different hours (offset +1)
2023-10-03 01:00:00+00:00,13.5,01,Sion,PRED_T_2M_ctrl,°C
2023-10-03 04:00:00+00:00,13.0,01,Sion,PRED_T_2M_ctrl,°C
2023-10-03 07:00:00+00:00,14.1,01,Sion,PRED_T_2M_ctrl,°C
2023-10-03 10:00:00+00:00,21.0,01,Sion,PRED_T_2M_ctrl,°C

reading the example above

Time = this prediction is for October 3rd at 09:00 UTC

Value = predicted temperature is 19.4°C

Prediction = this comes from prediction group 03

Site = weather station in Sion

Measurement = temperature at 2 meters (PRED_T_2M_ctrl)

example 1 — one prediction group covers specific hours

Prediction group 01 for Sion, temperature — it covers hours 01, 04, 07, 10, 13, 16, 19, 22 (every 3 hours, offset by 1):

timevaluepredsitemeasurement
2023-10-03 01:0013.5°C01SionPRED_T_2M_ctrl
2023-10-03 04:0013.0°C01SionPRED_T_2M_ctrl
2023-10-03 07:0014.1°C01SionPRED_T_2M_ctrl
2023-10-03 10:0021.0°C01SionPRED_T_2M_ctrl
2023-10-03 13:0024.7°C01SionPRED_T_2M_ctrl
2023-10-03 16:0023.1°C01SionPRED_T_2M_ctrl
2023-10-03 19:0017.1°C01SionPRED_T_2M_ctrl
2023-10-03 22:0018.6°C01SionPRED_T_2M_ctrl

Notice: hours go 01, 04, 07, 10... — not every hour. To get hours 00, 03, 06... you need prediction group 00. To get hours 02, 05, 08... you need prediction group 02. All three together = full 24h coverage.

example 2 — multiple predictions for the same hour

For Sion at 09:00, temperature — prediction groups 00, 03, 06, 09, 12, 15 all cover this hour (they all use the 00/03/06/09... offset). Each gives a slightly different forecast:

timevaluepredinterpretation
2023-10-03 09:0018.1°C00Model run A
2023-10-03 09:0019.4°C03Model run B
2023-10-03 09:0019.5°C06Model run C
2023-10-03 09:0019.3°C09Model run D
2023-10-03 09:0019.6°C12Model run E
2023-10-03 09:0019.6°C15Model run F

These are different model initializations giving slightly different forecasts. The average (19.25°C) is the best estimate. The spread (18.1 to 19.6) indicates uncertainty.

example 3 — the sentinel problem in prediction 00

For Sion at 09:00, solar radiation — prediction 00 has NO DATA (-99999.0), while predictions 03 and 06 have real values:

timevaluepredmeasurementproblem?
2023-10-03 09:00-99999.000PRED_GLOB_ctrlSENTINEL — no data
2023-10-03 09:00351.9 W/m²03PRED_GLOB_ctrlReal value
2023-10-03 09:00354.7 W/m²06PRED_GLOB_ctrlReal value

This is the bugIf the code keeps only run 00 (keep=first), we lose all radiation and precipitation data for hours 00, 03, 06, 09, 12, 15, 18, 21 — 8 out of 24 hours every day. Fix: skip sentinel values and use later runs (03, 06, 09...) which have real data.

putting it all together

to reconstruct a full day of weather for one site

Step 1: Take prediction 00 (hours 00,03,06...) + prediction 01 (hours 01,04,07...) + prediction 02 (hours 02,05,08...) = all 24 hours at 3h intervals

Step 2: For any sentinel values (-99999.0), replace with the average of predictions 03/06/09... for the same hour

Step 3: Optionally, average all model runs per hour for a more robust estimate

Step 4: Interpolate from 3h to 1h or 15min to match sensor data grain

CSV column structure

Each CSV has a header row followed by ~153,000 data rows. Columns are comma-separated.

columntypedescription
TimetimestampForecast target timestamp (UTC). Format: YYYY-MM-DD HH:MM:SS+00:00
ValuefloatPredicted value. -99999.0 = sentinel (no data for this prediction/measurement)
Predictionint (00-45)Incremental counter — multiple forecasts computed per day. Each is a separate model run.
SitestringWeather station name (45 sites across Switzerland)
MeasurementstringMeasurement code (4 types)
UnitstringPhysical unit

measurements (4 types)

codenameunitsentinel in pred 00?
PRED_T_2M_ctrlTemperature°CNo (has values in pred 00)
PRED_RELHUM_2M_ctrlHumidity%No (has values in pred 00)
PRED_TOT_PREC_ctrlPrecipitationmmYes (-99999.0 in pred 00)
PRED_GLOB_ctrlSolar radiationW/m²Yes (-99999.0 in pred 00)

Sentinel value-99999.0 means no data. Prediction group 00 has sentinel values for PRED_GLOB_ctrl (radiation) and PRED_TOT_PREC_ctrl (precipitation) but real values for temperature and humidity. All other prediction groups (01-45) have real values for all 4 measurements.

prediction numbers — multiple model runs per day

Official definitionThe Prediction column is an incremental counter of the forecast. Multiple forecasts are computed per day, each with slightly different initial conditions. Higher numbers = later model runs within the same day.

Each prediction number represents a separate model run. Runs 00-33 produce 3-hourly forecasts, while runs 34-45 produce daily values only. Different runs cover different time offsets: run 00 outputs hours 00, 03, 06..., run 01 outputs 01, 04, 07..., run 02 outputs 02, 05, 08... Together, a triplet (e.g. 00+01+02) covers every hour. Runs that share the same offset (00, 03, 06, 09...) give slightly different values for the same hours — these can be averaged for a more robust estimate.

prediction numbershours coveredintervalgroupsrows/group
00, 03, 06, 09, ..., 3300, 03, 06, 09, 12, 15, 18, 213-hour12 groups4,320 each
01, 04, 07, 10, ..., 3101, 04, 07, 10, 13, 16, 19, 223-hour11 groups4,320 each
02, 05, 08, 11, ..., 3202, 05, 08, 11, 14, 17, 20, 233-hour11 groups4,320 each
34, 35, ..., 45Daily only (00:00 or 13:00)daily12 groups540 each

hour coverage example (predictions 00, 01, 02)

Pred 00

00, 03, 06, 09, 12, 15, 18, 21

Pred 01

01, 04, 07, 10, 13, 16, 19, 22

Pred 02

02, 05, 08, 11, 14, 17, 20, 23

= every hour from 00 to 23 is covered

Runs sharing the same hour offset (e.g. 00, 03, 06, ..., 33 all cover hour 00:00) are independent forecasts for the same target time. With ~12 runs per hour, averaging gives a more reliable estimate and the spread indicates uncertainty.

For ML / energy predictionTo get the best hourly weather forecast: (1) combine runs 00+01+02 for full 24h coverage at 3-hour resolution, (2) average across all runs sharing the same hour for a robust estimate, (3) interpolate to 1h or 15min to match sensor grain. Always filter by prediction_date to only use forecasts available at simulation time.

temporal context — prediction_date

The filename (Pred_YYYY-MM-DD.csv) is the prediction issue date — the day the forecast was generated. Each file forecasts ~3 days ahead. This means multiple files contain predictions for the same target timestamps, but issued on different days.

overlap example

Pred_2023-09-13.csv contains forecasts for Sep 13, 14, 15

Pred_2023-09-14.csv contains forecasts for Sep 14, 15, 16

Pred_2023-09-15.csv contains forecasts for Sep 15, 16, 17

All three files have predictions for Sep 15 — but issued 2 days, 1 day, and 0 days before. For simulation: use the file that was available at the time of the decision.

RequirementThe prediction_date (from filename) MUST be stored in Silver alongside the forecast data. Without it, we cannot simulate real-time decision making — we would be using future information. The upsert key must include prediction_date: UNIQUE(timestamp, site, prediction_date).

weather stations (45 sites)

Aadorf / TänikonAltdorfBasel / BinningenBern / ZollikofenBinnBlatten, LötschentalBouveretBuchs / AarauChamChurCol du Grand St-BernardDelémontEggishornEvionnazEvolène / VillaFribourg / GrangeneuveGenève / CointrinGiswilGlarusGornergratGrenchenGrächenJungfraujochLa BrévineLes AttelasLes MarécottesLuganoLuzernMontagnier, BagnesMontanaMonte Rosa-PlattjeMottecMöhlinNeuchâtelPullySattel, SZSchaffhausenSimplon-DorfSionSt. GallenSäntisUlrichenVispZermattZürich / Fluntern

For the apartment domotics use case, the relevant station is the one closest to Valais (e.g. Sion, Visp, Zermatt, Montana, Evionnaz). The exact station to use depends on the apartment location.

data volume

metricvalue
Rows per file~153,000
Files (Aug-Oct 2023)~93
Total raw rows~14.2 million
After sentinel removal~12.5 million (estimate)
After dedup to best hourly~670,000 (3 days x 24h x 45 sites x 4 meas x 93 files)
If keeping all prediction groups~14M in Silver
If aggregating to ensemble mean~670,000 in Silver

recommended processing strategy

1.

Parse CSV, add prediction_date column from filename (Pred_YYYY-MM-DD.csv)

2.

Remove sentinel values (-99999.0)

3.

Keep prediction group number in Silver for full traceability

4.

For Gold / ML: aggregate by averaging across prediction triplets (ensemble mean per hour)

5.

Store ensemble spread (std dev) alongside mean for uncertainty estimation

6.

Always filter by prediction_date <= target_date for simulation scenarios

7.

Interpolate 3-hour forecasts to 1-hour or 15-min to match sensor data grain

alignment with sensor data

sensor dataweather forecasts
SourceJSON files (SMB)CSV files (sFTP)
FrequencyEvery 1 minuteDaily file, 3-hour intervals
Time resolution1 minute3 hours (pred 00-33) / daily (pred 34-45)
LocationPer apartment (jimmy, jeremie)Per weather station (45 sites)
Storage (Silver)silver.sensor_eventssilver.weather_clean
Gold grain1 minute (fact tables)Needs interpolation to match

Alignment for MLSensor data is per-minute, weather is per-3-hours. For energy prediction models (15-min or hourly), weather must be interpolated (linear or forward-fill) to match the sensor grain. The Gold layer or ML pipeline should handle this interpolation, not the Silver ETL.