docs / workflows

Pipeline Workflows

End-to-end data pipeline following the medallion architecture: Sources → Bronze → Silver → Gold → BI/ML.

pipeline overview

SourcesBronzeSilverGoldBI / ML

key principles

Idempotent

Every script can run multiple times safely — no duplicates, no data loss.

Resume-capable

If a script crashes, it picks up where it left off (watermark system, file existence checks).

No source modification

Raw data on SMB / sFTP is never modified or deleted.

Prediction over scanning

For the 245k+ file SMB share, we predict filenames instead of scanning the directory.