docs / setup

Setup Guide

How to clone the repository, configure the environment, and run the full pipeline from scratch.

prerequisites

Python 3.11+

Core language for all ETL scripts

PostgreSQL 15+

Silver and Gold layer storage

Node.js 18+

Website (this site) — optional

python dependencies

sqlalchemypsycopg2-binarypymysqlpython-dotenvparamikopandas

.env configuration

variable

example

description

DB_URL

postgresql://domotic:pass@localhost:5432/domotic_dev

App user connection string

DB_ADMIN_URL

postgresql://postgres:adminpass@localhost:5432/postgres

Admin connection for DB creation

MYSQL_URL

mysql+pymysql://student:pass@10.130.25.152:3306/pidb

MySQL source (school network)

SMB_PATH

Z:\

Mounted SMB share with sensor JSON files

BRONZE_ROOT

storage\bronze

Local Bronze storage folder

SFTP_HOST

(hostname)

sFTP server for weather data

SFTP_PORT

22

sFTP port

SFTP_USER

(username)

sFTP credentials

SFTP_PASSWORD

(password)

sFTP credentials

SFTP_PATH

/Meteo2

Remote directory for weather CSVs

WEATHER_MIN_YEAR

2023

Ignore weather data before this year

LOG_DIR

logs

Directory for ETL log files

step-by-step setup

1Clone the repository

git clone https://github.com/dehlya/data-cycle-domotic.git && cd data-cycle-domotic

2Install Python dependencies

pip install -r requirements.txt

3Configure environment

cp .env.example .env # then fill in your values

4Create Silver schema

python etl/bronze_to_silver/create_silver.py

5Import MySQL dimensions

python etl/bronze_to_silver/import_mysql_to_silver.py

6Run initial Bronze ingestion

python ingestion/fast_flow/bulk_to_bronze.py --full

7Flatten sensors to Silver

python etl/bronze_to_silver/flatten_sensors.py

8Create Gold schema

python etl/silver_to_gold/create_gold.py

9Populate Gold tables

python etl/silver_to_gold/populate_gold.py

10Start the watcher (ongoing)

python ingestion/fast_flow/watcher.py