docs / architecture / decisions

Architecture Decisions

Key decisions made during the project with rationale.

ADR-001Python 3.11 for ingestion

Decision: Python 3.11 with ThreadPoolExecutor + ProcessPoolExecutor

Reason: I/O-bound workload, team familiarity, rich ecosystem (pandas, SQLAlchemy, paramiko). Acceptable performance for batch processing.

ADR-002PostgreSQL for Silver and Gold

Decision: PostgreSQL 15+ with separate silver and gold schemas

Reason: Multi-user access, native Power BI connector, proper SQL for OLAP queries, free and production-grade.

ADR-003Custom watcher over Airflow

Decision: Lightweight Python watcher (60s loop + daily weather subprocess)

Reason: Single-VM deployment makes Airflow overkill. Zero infrastructure overhead, trivially restartable, no dependency conflicts.

ADR-004File system for Bronze

Decision: Local file system with YYYY/MM/DD/HH/ folders

Reason: Immutable raw storage, no DB overhead, easy to inspect and replay.

ADR-005Self-contained installer

Decision: Client-side install wizard generates a Python installer with .env baked in

Reason: Brings deploy from ~10 manual steps to one command. Credentials never leave the deployer's machine.

ADR-006Two PostgreSQL roles created at install

Decision: Admin only at install time, app user only in .env

Reason: Pipeline runs as least-privilege user. Admin secret never persists.

ADR-007Mask PII in gold dim_apartment

Decision: Always mask owner_user_id and building_name; keep first-name pseudonym

Reason: Under GDPR Art. 4(1) common first names in isolation are not PII. Power BI RLS depends on a stable column. Truly identifying fields are removed.

ADR-008Watcher revisited at deploy time

Decision: Reaffirm ADR-003 even at the deploy stage

Reason: Adding Airflow now would mean another DB, another web UI, more failure surface. Status is observable via install.log + scripts/status.py + DB queries. Right size for < 10 apartments on a single VM.