docs / architecture / decisions
Architecture Decisions
Key decisions made during the project with rationale.
Decision: Python 3.11 with ThreadPoolExecutor + ProcessPoolExecutor
Reason: I/O-bound workload, team familiarity, rich ecosystem (pandas, SQLAlchemy, paramiko). Acceptable performance for batch processing.
Decision: PostgreSQL 15+ with separate silver and gold schemas
Reason: Multi-user access, native Power BI connector, proper SQL for OLAP queries, free and production-grade.
Decision: Lightweight Python watcher (60s loop + daily weather subprocess)
Reason: Single-VM deployment makes Airflow overkill. Zero infrastructure overhead, trivially restartable, no dependency conflicts.
Decision: Local file system with YYYY/MM/DD/HH/ folders
Reason: Immutable raw storage, no DB overhead, easy to inspect and replay.
Decision: Client-side install wizard generates a Python installer with .env baked in
Reason: Brings deploy from ~10 manual steps to one command. Credentials never leave the deployer's machine.
Decision: Admin only at install time, app user only in .env
Reason: Pipeline runs as least-privilege user. Admin secret never persists.
Decision: Always mask owner_user_id and building_name; keep first-name pseudonym
Reason: Under GDPR Art. 4(1) common first names in isolation are not PII. Power BI RLS depends on a stable column. Truly identifying fields are removed.
Decision: Reaffirm ADR-003 even at the deploy stage
Reason: Adding Airflow now would mean another DB, another web UI, more failure surface. Status is observable via install.log + scripts/status.py + DB queries. Right size for < 10 apartments on a single VM.