July 30, 2021 • 1 min read
Most of the customers we work with tend to start their journey with a proof of value/concept (lets just call them all PoC) project. It certainly makes a lot of sense, as one of the most critical tasks of the PoC project is to understand the state of the the available data, and test wether it will be likely to bring good results with the machine learning predictions.
For us, it often means s**tloads of data cleaning and wrangling. We have chose to do that work for our customers as part of the paid PoC projects, simply to ensure rapid progress and minimum wasted time. But boy this tends to take time and frustrate us - as we really can't spend endless hours with data janitoring.
After a couple of projects where we spent too many hours with invoice and accounting data exports from the client's SAP, we realised we need to improve our efficiency. We wanted to:
We chose to build this on Docker and Postgres. When the container starts, it creates the database and runs the scripts to ingest data from CSV files, clean what is needed, and then runs scripts to upload data in to Aito. While developing, we can start the container at any time, and have the database running and make the queries against the database to test different ideas and versions. Super simple. Lets have a detailed look at how we set it up!
First step is here. TODO.
version: "3" services: postgres: image: postgres:13 restart: always container_name: aito-customer-data-postgres environment: # Will create the specified user with # superuser power and a database with the same name. POSTGRES_USER: postgres POSTGRES_PASSWORD: postgres volumes: - type: volume # Volume must be present in the 'volumes'-block in this file source: postgres target: /var/lib/postgresql - type: bind source: ./tools/docker-postgres target: /docker-entrypoint-initdb.d - type: bind source: ./customer-data target: /customer-data ports: # host_port:container_port - "5432:5432" volumes: postgres:
CREATE USER aito WITH PASSWORD 'aito';
CREATE DATABASE customer_data lc_collate 'en_US.UTF-8' lc_ctype 'en_US.UTF-8' encoding 'UTF8' template template0; GRANT ALL PRIVILEGES ON DATABASE customer_data to aito;
#!/bin/bash set -e set -x psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "customer_data" <<-EOSQL CREATE EXTENSION IF NOT EXISTS pgcrypto; EOSQL
Aito Intelligence Oy
c/o Innovation Home
Toinen Linja 14
VAT ID FI28756352
COVID-19 has driven us all to work remote, please connect with us online. Stay safe & play with data!