docs / setup

Installation Guide

The complete walkthrough for deploying DataCycle on a fresh Windows machine — exactly as shipped to the customer.

Overview

The complete walkthrough for deploying DataCycle on a fresh Windows machine. Every step the installer takes is documented below with the exact terminal output you should see, so you know whether it's working.

Time budget: about 4 hours on a first install (the silver backfill is the long pole — years of historical sensor data land in one go) or ~15 minutes on a re-install (watermarks let every step short-circuit on already-done work).

│ Treat the long first install like a build pipeline run, not a wizard.

│ Start it, walk away, come back to a finished system. There are no manual

│ steps in the middle — the installer handles all 10 phases automatically

│ and only stops at the very end for optional yes/no prompts — and even those can be pre-answered in the wizard for a fully unattended run.

What gets deployed

For context, here is the end-to-end pipeline the installer is about to set up:

Figure : End-to-end DataCycle pipeline (top half).

Figure : End-to-end DataCycle pipeline (bottom half).

Hardware & software requirements

Machine

Software you need installed first

Important — KNIME version: the workflows are pinned to KNIME 5.8.Installing a newer version (5.9+) on the target machine causes unsupported workflow version errors. Pick 5.8 from the KNIME archive page if you need to roll back.

Postgres tuning (one-time, before running the installer)

The default Postgres setting is shared_buffers = 128MB, which is too small for the unique-index check on silver.sensor_events once the table grows past a few million rows. Raising it to 4 GB keeps the index hot in RAM and brings the full backfill ETA from ~4 hours down to ~6 minutes. Apply this change once, before running the installer.

Open postgresql.conf — default location: C:\Program Files\PostgreSQL\17\data\postgresql.conf. Administrator rights are required to edit it.

Find the line #shared_buffers = 128MB (in the RESOURCE USAGE section). Uncomment it and change the value to:

shared_buffers = 4GB

Save the file.

Restart Postgres so the change takes effect. From an elevated PowerShell:

net stop postgresql-x64-17

net start postgresql-x64-17

Or open services.msc, find postgresql-x64-17, right-click → Restart.

Verify the new value. Open psql and run:

SHOW shared_buffers;

It should return 4GB.

Rule of thumb: shared_buffers should be roughly 25 % of available RAM. The project VM has 16 GB, so 4 GB is right on that line. One-time only — the setting persists in postgresql.conf across re-installs.

Credentials you'll need

Have these ready before opening the install wizard:

Postgres admin account (typically postgres / your admin password)

— used only at install to create the app DB + user. Never written to disk after install.

App user credentials to be created — e.g. domotic / <your choice>.

This is the user the pipeline runs as.

SMB share credentials for the sensor JSON files (UNC path + user/password).

sFTP credentials for the weather forecasts (host, port, user, password).

MySQL credentials for the school apartment registry (pidb).

The installer auto-detects Python, Git, Power BI, and KNIME on Windows.

Step 1 — Generate `data-cycle-installer.py` from the wizard

Open the project's install wizard at /install in your browser.

Figure : The install wizard landing page at /install.

The page summarises what you're about to deploy and lists prerequisites in plain English:

Figure : Prerequisites and what the installer will deploy.

Scroll down and fill in the form. Most fields have sensible defaults; the sections that need your input are credentials (Postgres / MySQL / sFTP) and the SMB share configuration:

Figure : Credentials and SMB form fields in the wizard.

Optional — pre-answer the post-install prompts

Below the credentials form, the wizard has a Post-install actions section with six dropdowns (one per prompt the installer normally asks at the end). Each is a 3-state select:

The six actions you can pre-answer:

Leave them all on Ask during install for the standard interactive flow; flip the ones you've decided about for an unattended deploy.

Unattended install

For an unattended deploy, set every dropdown in the wizard's Post-install actions section to “Yes” or “No” before downloading the installer. The generated data-cycle-installer.py will run end-to-end without pausing, and the watcher (if you set “Start watcher now” to Yes) ends up in its own console window — so even running the installer from a CI script terminates cleanly with no orphan process.

Click Download installer. Every value you typed is now baked into the generated file. If you mis-typed something, the wizard has a Restore from previous installer uploader — drop the .py file in to pre-fill the form.

Tip: the downloaded file contains your passwords in plaintext. Keep it on the install machine only; don't commit it to a shared repo.

Step 2 — Run the installer

From a PowerShell or CMD window:

By default the installer creates ./data-cycle-domotic next to where you ran it. Pass a path to override:

What happens next is a 10-phase pipeline. Each phase prints its progress to the console; the installer never asks for input during phases 1–10 (only optional yes/no prompts at the very end).

Phase 1 — Prerequisite checks (~5 s)

The installer probes for Python ≥ 3.10, git, PostgreSQL psql, Power BI Desktop, and KNIME Analytics Platform. Missing tools produce a clear warning and a download link, but the installer continues — Power BI and KNIME are only needed for their respective steps and can be installed after the fact (just re-run the installer once they're in place).

Figure : Phase 1 console output: prerequisite checks.

Phase 2 — Clone repo (~30–60 s)

Figure : Phase 2 console output: cloning the repo.

If the directory already exists, the installer does git fetch + git checkout -B main FETCH_HEAD instead of cloning fresh — re-runs always get the latest code.

Phase 3 — Write `.env` (<1 s)

All connection strings + credentials + tunables you typed in the wizard land in the project's .env file. The Postgres admin password is deliberately not written here; only the app user's credentials are.

Figure : Phase 3 console output: writing .env.

Phase 4 — Python venv + dependencies (~2–3 min)

Figure : Phase 4 console output: venv and pip install.

The installer creates .venv\ and runs pip install -r requirements.txt. On a re-install pip skips already-installed packages and only catches new ones (e.g. matplotlib if you added it later).

Phase 5 — Pre-flight connectivity checks (~5–10 s)

The installer mounts the SMB drive (net use Z: \\server\share /USER:...), opens a quick Postgres connection, and verifies MySQL + sFTP credentials. If any of these fail it bails early with a clear error — better to fail in 5 seconds than 4 hours in.

Figure : Phase 5 console output: pre-flight connectivity checks.

Phase 6 — Database + schemas (~10–30 s)

Idempotent — uses CREATE IF NOT EXISTS everywhere. Re-running this phase on an existing install is a no-op.

Figure : Phase 6 console output: database and schema creation.

Phase 7 — Bootstrap silver (the long step)

This is what makes the first install long. There are four sub-phases:

(a) MySQL → silver (~25 s): The 10 master tables (buildings, rooms, sensors, devices, profile, dierrors, etc.) are imported into silver as dim/ref tables.

(b) SMB → bronze → silver backfill prompt:

Figure : Phase 7 console output: SMB → bronze → silver backfill prompt.

This is one of the only interactive question during the install. Answer Y on a first install, or N if you want to skip it (e.g. for a quick test on an empty system).

Figure : Phase 7 — confirming the SMB backfill.

The installer first runs bulk_to_bronze (SMB → bronze, file copy) followed by flatten_sensors (bronze → silver, the heavy ETL).

Figure : Phase 7 — flatten_sensors progress bar.

The first ETA is conservative; it tightens as workers warm up and the COPY-into-TEMP-TABLE upsert path gets going. With shared_buffers=4GB tuned (see prerequisites), the full 400k-file backfill finishes in ~6 minutes. Without that tuning, expect ~4 hours.

(d) Weather sFTP download + clean (~30 s): sequential download (sFTP servers dislike parallel sessions) of any new Pred_*.csv files, followed by clean_weather to push them into silver.weather_forecasts.

Phase 8 — Initial gold ETL (~30–60 s)

Once silver is full, the installer runs populate_gold to materialise the star schema:

Figure : Phase 8 — populate_gold running.

Note the PII masking line — the installer enforces GDPR Art. 4(1) at the silver → gold boundary (see DECISIONS.md ADR-005). First names are kept as RLS pseudonyms; building names and user IDs are sanitised.

When done you get a summary of every gold table and its row count:

Figure : Phase 8 — gold ETL summary with row counts.

If any of these are 0, check the silver row counts (the watcher / admin pane shows them) and look at logs/populate_gold.log for errors.

Phase 9 — Verify + auto-config BI/KNIME (~30–60 s)

Figure : Phase 9 — BI / KNIME auto-configuration output.

The installer:

Patches the .knwf workflow files so their PostgreSQL Connector nodes

point at your local DB (host / port / database_name)

Deploys the workflows to your KNIME workspace

Tries to patch the .pbix data source — but Power BI stores the

connection in a binary blob the installer can't safely modify. — hence the “.pbix has no Connections file; skipping” warning. You re-point it once after install, guided by the admin pane wizard (see Step 4).

Pip-installs matplotlib + pandas into Power BI's Python interpreter

so its Python visuals render correctly.

Phase 10 — Auto-start watcher on login (optional)

Figure : Phase 10 — auto-start watcher registration.

Answering Y drops a shortcut into shell:startup so the watcher launches as a hidden background process every time you sign in.

Final post-install prompts

Two of the six wizard-dropdown actions (SMB backfill, autostart watcher) happen earlier inside Phase 7 and Phase 10. The remaining four are batched here at the end.

After the 10 phases, the installer offers six optional one-shot actions. Any of these you pre-answered in the wizard run silently with that answer — only “Ask during install” choices are surfaced here.

Figure : Final post-install prompt menu.

Step 3 — Verify everything is healthy

The Streamlit admin dashboard at <http://localhost:8501> is the easiest way to verify the install:

Figure : Admin dashboard freshness tiles after a successful install.

What to look for on first launch:

🟢 Database indicator (top-left of the status row)

🟢 Watcher process (if you accepted autostart or started it

manually)

Data freshness cards — each gold table updated < 1 day ago

Gold tables section — every fact table shows a row count > 0

If any indicator is red or any count is 0, the same dashboard has one-click action buttons (e.g. Run gold ETL (sensors), Run KNIME predictions) to re-trigger that pipeline. The action runs in a subprocess and logs to storage\admin_logs\ for review.

You can also verify from the command line:

Same checks in CLI form — useful for headless automation.

Step 4 — Re-point the Power BI dashboard (one-time)

The first time the dashboard opens, Power BI is still pointing at the developer's database (because of the binary-blob limitation in Phase 9). The admin pane has a guided Power BI First-Time Setup wizard at the top of the page that walks you through the 30-second re-point:

Figure : Power BI re-point wizard in the admin pane.

Your localhost / domotic / domotic values are pre-filled with one-click copy buttons. Click 📂 Open Power BI Dashboard, then in Power BI:

Click Transform Data → Data source settings

Select the existing PostgreSQL source, click Change Source…

Paste the Server and Database values from the wizard

Click OK, then Close, then Apply changes

When Power BI prompts for credentials:

Authentication kind: choose Database (not Windows)

Username: paste from the wizard

Password: the Postgres password you set during installation

Click Connect

Once done, click ✓ I've configured the connection — don't show again in the wizard. The panel collapses to a small green banner; it'll only come back if you switch DBs later.

After re-pointing, the dashboard is live. Useful tricks:

Press F11 in Power BI for fullscreen presentation mode

Modeling → View as → Other user → Jimmy (or Jeremie, or

unselect for admin) to preview the row-level security as a tenant

The watcher refreshes gold every 15 minutes, so hit Home → Refresh

in Power BI to pull the latest

Common install issues

Re-running the installer

The installer is fully idempotent. Re-running it:

Skips clone if already cloned (does git pull to update)

Skips deps if venv intact (re-runs pip install to catch up on new deps)

Skips DB + schemas if already created (CREATE IF NOT EXISTS)

Skips bronze files already in silver (watermark)

Skips silver files already in gold (set-based merge)

So if anything fails partway, just re-run. Total time on a re-run is typically under 15 minutes.

Uninstall / clean reset

That's a full clean slate. The Postgres tuning (shared_buffers) will remain in postgresql.conf — that's fine, it doesn't hurt anything else.

References

Companion documents

the technical doc — Architecture, schema, pipeline internals, ten ADRs. Read this if you need to understand WHY the installer does what it does.

the user guide — How to use the dashboards and admin pane after install.

the report — Final project report (GDPR, scalability, AI usage declaration).

Internal project artefacts

[1] Herbelin, D., Germanier, S., & von Roten, J. (2026). DataCycle Domotic [Source code repository]. GitHub. https://github.com/dehlya/data-cycle-domotic

[2] Herbelin, D., Germanier, S., & von Roten, J. (2026). DataCycle Domotic install wizard [Web application]. https://github.com/dehlya/DCP-Website Locally accessible at http://localhost:3000/install once the site is deployed on the project VM.

Tooling references

[3] PostgreSQL Global Development Group. (2024). PostgreSQL 17 documentation. https://www.postgresql.org/docs/17/

[4] KNIME AG. (2025). KNIME Analytics Platform 5.8 documentation. https://docs.knime.com/

[5] Microsoft Corporation. (2025). Power BI Desktop documentation. https://learn.microsoft.com/power-bi/

[6] Streamlit Inc. (2025). Streamlit documentation. https://docs.streamlit.io/

Course materials

[7] HES-SO Valais. (2026). Module 64-61 Data Cycle Project — Introduction 2026 [Course materials]. Spring semester 2025–2026.