docs / workflows
Pipeline Workflows
End-to-end data pipeline following the medallion architecture: Sources → Bronze → Silver → Gold → BI/ML.
pipeline overview
Watcher orchestrator, sensor JSON ingestion from SMB, weather CSV from sFTP. Prediction-based file discovery on 245k+ files.
Schema creation, sensor flattening (15M+ rows), MySQL dimension import, weather cleaning. Watermark-based incremental processing.
Star schema ETL — dimension population, fact table generation with minute-grain aggregation. 9-step populate process, idempotent upserts.
Power BI dashboards (energy, environment), SAP SAC (presence), ML predictions, role-based access control.
key principles
Idempotent
Every script can run multiple times safely — no duplicates, no data loss.
Resume-capable
If a script crashes, it picks up where it left off (watermark system, file existence checks).
No source modification
Raw data on SMB / sFTP is never modified or deleted.
Prediction over scanning
For the 245k+ file SMB share, we predict filenames instead of scanning the directory.