dataflow is a CLI script running on the gl-calcs Linux server that hosts the
InfluxDB time series database. The script scans folders for files and tries
to assign a filetype to each found file. If a filetype was successfully
assigned to a specific file, dataflow uploads the data of the respective
file using the settings for the respectively assigned filetype.
dataflow scans folders for files and then, for each found file, it:
- reads found files
- scans found files for variables
- uploads found data to the database
The database functionality (reading, scanning and uploading) is built directly into dataflow via
the influxdb-client library; the previously required dbc-influxdb dependency is no longer used.
dataflow configurations, including the different filetypes, are given in the configs folder.
Configurations for accessing the database are not included in the configs folder for security reasons.
dataflow uses uv for dependency management and runs on Python 3.12.
Filetypes are defined in the configs, see here: Filetypes
dataflow uses uv and Python 3.12.
uv sync # create the virtual environment and install all dependencies
uv run dataflow -h # run the CLI inside the managed environment
uv build # build the source archive (.tar.gz) and wheel into ./dist
uv reads the pinned Python version (3.12) and the locked dependencies (uv.lock) automatically and
will download Python 3.12 if it is not already available.
gl-calcs is a Linux computer running Red Hat Enterprise Linux (RHEL) 8.9. dataflow is
installed there as an isolated CLI tool with pipx.
Important
dataflow requires Python 3.12. RHEL 8.9 ships Python 3.6 as its default python3, so pipx
must be pointed at a separately provided Python 3.12 interpreter (see step 1). If you install with
the system Python, the install fails with an "unsupported Python version" / "requires-python" error.
On RHEL 8.9, install Python 3.12 from the AppStream repository (requires sudo):
sudo dnf install -y python3.12
which python3.12 # -> /usr/bin/python3.12This is used as <py312> below. Installing python3.12 does not change the system default
python3, so it is safe.
If sudo is not available, let uv provide a standalone Python 3.12 in user space instead (RHEL
8.9's glibc is new enough for uv's prebuilt CPython):
# install uv once (https://docs.astral.sh/uv/getting-started/installation/)
curl -LsSf https://astral.sh/uv/install.sh | sh
uv python install 3.12
uv python find 3.12 # -> e.g. /home/holukas/.local/share/uv/python/cpython-3.12.x/bin/python3.12<py312> in the pipx install command below is a placeholder — replace it with the full path to
the Python 3.12 executable you just installed. How to find that path:
If you installed via dnf, the executable is python3.12 on the PATH. Get its full path with:
which python3.12
# -> /usr/bin/python3.12If you installed via uv, ask uv directly:
uv python find 3.12
# -> /home/holukas/.local/share/uv/python/cpython-3.12.x/bin/python3.12Verify the path you found really is Python 3.12 before using it (replace the path with your own):
/usr/bin/python3.12 --version
# -> Python 3.12.xUse that exact path as <py312> in step 3. For example, if which python3.12 returned
/usr/bin/python3.12, the install command becomes:
pipx install --python /usr/bin/python3.12 /path/to/dataflow-0.22.0.tar.gzTip
If which python3.12 prints nothing, the interpreter is not on your PATH — re-check step 1, or
use the absolute path that dnf/uv installed it to.
On the dev machine, build the source archive and wheel:
uv build # writes dataflow-0.22.0.tar.gz (+ .whl) into ./distCopy the resulting dist/dataflow-0.22.0.tar.gz to gl-calcs.
pipx install --python <py312> /path/to/dataflow-0.22.0.tar.gzThis creates an isolated environment for dataflow and puts the dataflow command on the PATH
(usually ~/.local/bin; run pipx ensurepath once if it is not on the PATH yet). Verify with:
dataflow -hAlternatively, install a specific tagged version directly from GitHub (still pinning 3.12):
pipx install --python <py312> https://github.com/holukas/dataflow/archive/refs/tags/v0.22.0.tar.gzpipx install --force --python <py312> /path/to/dataflow-0.22.0.tar.gz # replace with a new build
pipx uninstall dataflowTip
Since the project already uses uv, you can skip pipx entirely and manage the tool with
uv tool install --python 3.12 /path/to/dataflow-0.22.0.tar.gz (and uv tool upgrade /
uv tool uninstall). This installs the same isolated dataflow command without needing pipx.
Accessed using the help argument with python .\main.py -h.
usage: main.py [-h] [-y YEAR] [-m MONTH] [-l FILELIMIT] [-n NEWESTFILES] site datatype access filegroup dirconf
dataflow
positional arguments:
site Site abbreviation, e.g. ch-dav, ch-lae
datatype Data type: 'raw' for raw data, 'processed' for processed data
access Access to data via 'server' address (e.g. outside gl-calcs) or 'mount' path (e.g. on gl-calcs)
filegroup Data group, e.g. '10_meteo'
dirconf Path to folder with configuration settings
optional arguments:
-h, --help show this help message and exit
-y YEAR, --year YEAR Year (default: None)
-m MONTH, --month MONTH
Month (default: None)
-l FILELIMIT, --filelimit FILELIMIT
File limit, 0 corresponds to no limit. (default: 0)
-n NEWESTFILES, --newestfiles NEWESTFILES
Consider newest files only, 0 means keep all files, e.g. 3 means keep 3 newest files. Is applied after FILELIMIT was considered. (default: 0)
With the dataflow script installed via pipx (see above) it can be called with
dataflow ch-aws raw mount 10_meteo /home/holukas/source_code/configs -y 2023 -n 10
dataflowuses the script installed withpipxch-awsis the siterawis the datatype, in this case we want to upload raw datamountmeans we are using the mounted server locations defined in theconfigs10_meteois the filegroup, basically this is the subfolder we use to store this kind of data on the raw data server./home/holukas/source_code/configsis the location of the config files, in this case we are using the location on the Linux computer.-y 2023means that only data for the year 2023 are considered (i.e., searched and uploaded to the database)-n 10means that of all files found, only the newest 10 files are considered
This command can easily be used to automate execution e.g. via cronjobs.
Alternatively the script can be called directly from the source code via the uv-managed environment:
uv run python .\main.py ch-aws raw mount 10_meteo /home/holukas/source_code/configs -y 2023 -n 10
This example executes the script on a Windows computer using the CLI.
uv run python .\main.py ch-aws raw server 10_meteo "F:\Sync\luhk_work\20 - CODING\22 - POET\configs" -y 2023 -n 1
uv run pythonruns Python 3.12 inside theuv-managed environment for this projectmain.pyis the entry point for the scriptch-awsis the siterawis the datatype, in this case we want to upload raw dataservermeans we are using the network addresses such as\\serverxyz.ethz.ch\archive\FluxData10_meteois the filegroup, basically this is the subfolder we use to store this kind of data on the raw data server."F:\Sync\luhk_work\20 - CODING\22 - POET\configs"is the location of the config files, in this case we are using a local Windows folder.-y 2023means that only data for the year 2023 are considered (i.e., searched and uploaded to the database)-n 10means that of all files found, only the newest 10 files are considered