docs / data / weather

Weather Forecast Data

Daily weather prediction files from the Meteo2 sFTP server. Each file contains a multi-group forecast for 45 Swiss weather stations, covering ~3 days ahead at 3-hour resolution.

file overview


Filename format	Pred_YYYY-MM-DD.csv
Rows per file	~153,000
Prediction groups	46 (numbered 00 to 45)
Sites	45 weather stations across Switzerland
Measurements	4 (temperature, humidity, precipitation, radiation)
Time horizon	~3 days (72h) from prediction date
Time resolution	3-hour intervals (predictions 00-33), daily (34-45)
Delivery	Daily ~7am via sFTP (/Meteo2)
Total files	~93 (Aug-Oct 2023)

how to read a weather file — step by step

Each file is named after the day the forecast was issued (e.g. Pred_2023-10-03.csv = forecast issued on October 3rd). Inside, every row is one predicted value for one measurement, at one weather station, at one target time, from one prediction group.

Pred_2023-10-03.csv (excerpt — 8 rows out of 153,360)

Time,Value,Prediction,Site,Measurement,Unit

// Pred 00, temperature — has real values
2023-10-03 09:00:00+00:00,18.1,00,Sion,PRED_T_2M_ctrl,°C

// Pred 00, radiation — SENTINEL (no data!)
2023-10-03 09:00:00+00:00,-99999.0,00,Sion,PRED_GLOB_ctrl,Watt/m2

// Pred 03, same hour — radiation has real value
2023-10-03 09:00:00+00:00,351.9,03,Sion,PRED_GLOB_ctrl,Watt/m2

// Pred 01, different hours (offset +1)
2023-10-03 01:00:00+00:00,13.5,01,Sion,PRED_T_2M_ctrl,°C
2023-10-03 04:00:00+00:00,13.0,01,Sion,PRED_T_2M_ctrl,°C
2023-10-03 07:00:00+00:00,14.1,01,Sion,PRED_T_2M_ctrl,°C
2023-10-03 10:00:00+00:00,21.0,01,Sion,PRED_T_2M_ctrl,°C

reading the example above

Time = this prediction is for October 3rd at 09:00 UTC

Value = predicted temperature is 19.4°C

Prediction = this comes from prediction group 03

Site = weather station in Sion

Measurement = temperature at 2 meters (PRED_T_2M_ctrl)

example 1 — one prediction group covers specific hours

Prediction group 01 for Sion, temperature — it covers hours 01, 04, 07, 10, 13, 16, 19, 22 (every 3 hours, offset by 1):

time	value	pred	site	measurement
2023-10-03 01:00	13.5°C	01	Sion	PRED_T_2M_ctrl
2023-10-03 04:00	13.0°C	01	Sion	PRED_T_2M_ctrl
2023-10-03 07:00	14.1°C	01	Sion	PRED_T_2M_ctrl
2023-10-03 10:00	21.0°C	01	Sion	PRED_T_2M_ctrl
2023-10-03 13:00	24.7°C	01	Sion	PRED_T_2M_ctrl
2023-10-03 16:00	23.1°C	01	Sion	PRED_T_2M_ctrl
2023-10-03 19:00	17.1°C	01	Sion	PRED_T_2M_ctrl
2023-10-03 22:00	18.6°C	01	Sion	PRED_T_2M_ctrl

Notice: hours go 01, 04, 07, 10... — not every hour. To get hours 00, 03, 06... you need prediction group 00. To get hours 02, 05, 08... you need prediction group 02. All three together = full 24h coverage.

example 2 — multiple predictions for the same hour

For Sion at 09:00, temperature — prediction groups 00, 03, 06, 09, 12, 15 all cover this hour (they all use the 00/03/06/09... offset). Each gives a slightly different forecast:

time	value	pred	interpretation
2023-10-03 09:00	18.1°C	00	Model run A
2023-10-03 09:00	19.4°C	03	Model run B
2023-10-03 09:00	19.5°C	06	Model run C
2023-10-03 09:00	19.3°C	09	Model run D
2023-10-03 09:00	19.6°C	12	Model run E
2023-10-03 09:00	19.6°C	15	Model run F

These are different model initializations giving slightly different forecasts. The average (19.25°C) is the best estimate. The spread (18.1 to 19.6) indicates uncertainty.

example 3 — the sentinel problem in prediction 00

For Sion at 09:00, solar radiation — prediction 00 has NO DATA (-99999.0), while predictions 03 and 06 have real values:

time	value	pred	measurement	problem?
2023-10-03 09:00	-99999.0	00	PRED_GLOB_ctrl	SENTINEL — no data
2023-10-03 09:00	351.9 W/m²	03	PRED_GLOB_ctrl	Real value
2023-10-03 09:00	354.7 W/m²	06	PRED_GLOB_ctrl	Real value

This is the bug—If the code keeps only run 00 (keep=first), we lose all radiation and precipitation data for hours 00, 03, 06, 09, 12, 15, 18, 21 — 8 out of 24 hours every day. Fix: skip sentinel values and use later runs (03, 06, 09...) which have real data.

putting it all together

to reconstruct a full day of weather for one site

Step 1: Take prediction 00 (hours 00,03,06...) + prediction 01 (hours 01,04,07...) + prediction 02 (hours 02,05,08...) = all 24 hours at 3h intervals

Step 2: For any sentinel values (-99999.0), replace with the average of predictions 03/06/09... for the same hour

Step 3: Optionally, average all model runs per hour for a more robust estimate

Step 4: Interpolate from 3h to 1h or 15min to match sensor data grain

CSV column structure

Each CSV has a header row followed by ~153,000 data rows. Columns are comma-separated.

column	type	description
Time	timestamp	Forecast target timestamp (UTC). Format: YYYY-MM-DD HH:MM:SS+00:00
Value	float	Predicted value. -99999.0 = sentinel (no data for this prediction/measurement)
Prediction	int (00-45)	Incremental counter — multiple forecasts computed per day. Each is a separate model run.
Site	string	Weather station name (45 sites across Switzerland)
Measurement	string	Measurement code (4 types)
Unit	string	Physical unit

measurements (4 types)

code	name	unit	sentinel in pred 00?
PRED_T_2M_ctrl	Temperature	°C	No (has values in pred 00)
PRED_RELHUM_2M_ctrl	Humidity	%	No (has values in pred 00)
PRED_TOT_PREC_ctrl	Precipitation	mm	Yes (-99999.0 in pred 00)
PRED_GLOB_ctrl	Solar radiation	W/m²	Yes (-99999.0 in pred 00)

Sentinel value—-99999.0 means no data. Prediction group 00 has sentinel values for PRED_GLOB_ctrl (radiation) and PRED_TOT_PREC_ctrl (precipitation) but real values for temperature and humidity. All other prediction groups (01-45) have real values for all 4 measurements.

prediction numbers — multiple model runs per day

Official definition—The Prediction column is an incremental counter of the forecast. Multiple forecasts are computed per day, each with slightly different initial conditions. Higher numbers = later model runs within the same day.

Each prediction number represents a separate model run. Runs 00-33 produce 3-hourly forecasts, while runs 34-45 produce daily values only. Different runs cover different time offsets: run 00 outputs hours 00, 03, 06..., run 01 outputs 01, 04, 07..., run 02 outputs 02, 05, 08... Together, a triplet (e.g. 00+01+02) covers every hour. Runs that share the same offset (00, 03, 06, 09...) give slightly different values for the same hours — these can be averaged for a more robust estimate.

prediction numbers	hours covered	interval	groups	rows/group
00, 03, 06, 09, ..., 33	00, 03, 06, 09, 12, 15, 18, 21	3-hour	12 groups	4,320 each
01, 04, 07, 10, ..., 31	01, 04, 07, 10, 13, 16, 19, 22	3-hour	11 groups	4,320 each
02, 05, 08, 11, ..., 32	02, 05, 08, 11, 14, 17, 20, 23	3-hour	11 groups	4,320 each
34, 35, ..., 45	Daily only (00:00 or 13:00)	daily	12 groups	540 each

hour coverage example (predictions 00, 01, 02)

Pred 00

00, 03, 06, 09, 12, 15, 18, 21

Pred 01

01, 04, 07, 10, 13, 16, 19, 22

Pred 02

02, 05, 08, 11, 14, 17, 20, 23

= every hour from 00 to 23 is covered

Runs sharing the same hour offset (e.g. 00, 03, 06, ..., 33 all cover hour 00:00) are independent forecasts for the same target time. With ~12 runs per hour, averaging gives a more reliable estimate and the spread indicates uncertainty.

For ML / energy prediction—To get the best hourly weather forecast: (1) combine runs 00+01+02 for full 24h coverage at 3-hour resolution, (2) average across all runs sharing the same hour for a robust estimate, (3) interpolate to 1h or 15min to match sensor grain. Always filter by prediction_date to only use forecasts available at simulation time.

temporal context — prediction_date

The filename (Pred_YYYY-MM-DD.csv) is the prediction issue date — the day the forecast was generated. Each file forecasts ~3 days ahead. This means multiple files contain predictions for the same target timestamps, but issued on different days.

overlap example

Pred_2023-09-13.csv contains forecasts for Sep 13, 14, 15

Pred_2023-09-14.csv contains forecasts for Sep 14, 15, 16

Pred_2023-09-15.csv contains forecasts for Sep 15, 16, 17

All three files have predictions for Sep 15 — but issued 2 days, 1 day, and 0 days before. For simulation: use the file that was available at the time of the decision.

Requirement—The prediction_date (from filename) MUST be stored in Silver alongside the forecast data. Without it, we cannot simulate real-time decision making — we would be using future information. The upsert key must include prediction_date: UNIQUE(timestamp, site, prediction_date).

weather stations (45 sites)

Aadorf / TänikonAltdorfBasel / BinningenBern / ZollikofenBinnBlatten, LötschentalBouveretBuchs / AarauChamChurCol du Grand St-BernardDelémontEggishornEvionnazEvolène / VillaFribourg / GrangeneuveGenève / CointrinGiswilGlarusGornergratGrenchenGrächenJungfraujochLa BrévineLes AttelasLes MarécottesLuganoLuzernMontagnier, BagnesMontanaMonte Rosa-PlattjeMottecMöhlinNeuchâtelPullySattel, SZSchaffhausenSimplon-DorfSionSt. GallenSäntisUlrichenVispZermattZürich / Fluntern

For the apartment domotics use case, the relevant station is the one closest to Valais (e.g. Sion, Visp, Zermatt, Montana, Evionnaz). The exact station to use depends on the apartment location.

data volume

metric	value
Rows per file	~153,000
Files (Aug-Oct 2023)	~93
Total raw rows	~14.2 million
After sentinel removal	~12.5 million (estimate)
After dedup to best hourly	~670,000 (3 days x 24h x 45 sites x 4 meas x 93 files)
If keeping all prediction groups	~14M in Silver
If aggregating to ensemble mean	~670,000 in Silver

recommended processing strategy

Parse CSV, add prediction_date column from filename (Pred_YYYY-MM-DD.csv)

Remove sentinel values (-99999.0)

Keep prediction group number in Silver for full traceability

For Gold / ML: aggregate by averaging across prediction triplets (ensemble mean per hour)

Store ensemble spread (std dev) alongside mean for uncertainty estimation

Always filter by prediction_date <= target_date for simulation scenarios

Interpolate 3-hour forecasts to 1-hour or 15-min to match sensor data grain

alignment with sensor data

	sensor data	weather forecasts
Source	JSON files (SMB)	CSV files (sFTP)
Frequency	Every 1 minute	Daily file, 3-hour intervals
Time resolution	1 minute	3 hours (pred 00-33) / daily (pred 34-45)
Location	Per apartment (jimmy, jeremie)	Per weather station (45 sites)
Storage (Silver)	silver.sensor_events	silver.weather_clean
Gold grain	1 minute (fact tables)	Needs interpolation to match

Alignment for ML—Sensor data is per-minute, weather is per-3-hours. For energy prediction models (15-min or hourly), weather must be interpolated (linear or forward-fill) to match the sensor grain. The Gold layer or ML pipeline should handle this interpolation, not the Silver ETL.