Tommi Holmgren
CPO
July 30, 2021 • 1 min read
Most of the customers we work with tend to start their journey with a proof of value/concept (lets just call them all PoC) project. It certainly makes a lot of sense, as one of the most critical tasks of the PoC project is to understand the state of the the available data, and test wether it will be likely to bring good results with the machine learning predictions.
For us, it often means s**tloads of data cleaning and wrangling. We have chose to do that work for our customers as part of the paid PoC projects, simply to ensure rapid progress and minimum wasted time. But boy this tends to take time and frustrate us - as we really can't spend endless hours with data janitoring.
After a couple of projects where we spent too many hours with invoice and accounting data exports from the client's SAP, we realised we need to improve our efficiency. We wanted to:
We chose to build this on Docker and Postgres. When the container starts, it creates the database and runs the scripts to ingest data from CSV files, clean what is needed, and then runs scripts to upload data in to Aito. While developing, we can start the container at any time, and have the database running and make the queries against the database to test different ideas and versions. Super simple. Lets have a detailed look at how we set it up!
First step is here. TODO.
version: "3"
services:
postgres:
image: postgres:13
restart: always
container_name: aito-customer-data-postgres
environment:
# Will create the specified user with
# superuser power and a database with the same name.
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
volumes:
- type: volume
# Volume must be present in the 'volumes'-block in this file
source: postgres
target: /var/lib/postgresql
- type: bind
source: ./tools/docker-postgres
target: /docker-entrypoint-initdb.d
- type: bind
source: ./customer-data
target: /customer-data
ports:
# host_port:container_port
- "5432:5432"
volumes:
postgres:
CREATE USER aito WITH PASSWORD 'aito';
CREATE DATABASE customer_data lc_collate 'en_US.UTF-8' lc_ctype 'en_US.UTF-8' encoding 'UTF8' template template0;
GRANT ALL PRIVILEGES ON DATABASE customer_data to aito;
#!/bin/bash
set -e
set -x
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "customer_data" <<-EOSQL
CREATE EXTENSION IF NOT EXISTS pgcrypto;
EOSQL
Episto Oy
Putouskuja 6 a 2
01600 Vantaa
Finland
VAT ID FI34337429