Analytics Platform installation¶

This guide covers the installation of the Analytics Platform (AP) software. The AP backend server is composed of the following services.

API gateway
Identity
Data pipeline

The key and port of each service are described below. The key refers to the name used in configuration directories and files. The port refers to the default port for which the service will listen for incoming requests.

Name	Key	Port
API gateway	bao-api-gateway	8085
Identity	bao-identity	8086
Data pipeline	bao-data-pipeline	8084

User¶

Create an operating system user for running the AP services. This guide uses bao-admin as username, though any valid username can be used. The user has no password. For security reasons, avoiding password-based login and instead use SSH key-based login is strongly recommended.

sudo adduser --disabled-password --shell /bin/bash bao-admin

For security reasons, the AP services should not run as a privileged user. It may however be practical to allow sudo without password:

sudo usermod -aG sudo bao-admin
sudo echo "bao-admin ALL=(ALL) NOPASSWD:ALL" > /etc/sudoers.d/50-ap-users

Create the SSH directory, and add the authorized keys file. Add public keys for users which should have access.

mkdir ~/.ssh
touch ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

SSH¶

Carefully confirm that public key based authentication to the server is successful, i.e. login without specifying a password.

Disable password-based authentication for enhanced security. Create a SSH daemon config file.

sudo nano /etc/ssh/sshd_config.d/90-no-passwd-auth.conf

Add the following properties.

PubkeyAuthentication yes
PasswordAuthentication no
PermitRootLogin prohibit-password

Restart the SSH daemon to have the changes take effect.

sudo systemctl restart sshd

JAR files¶

Each service is available as an executable JAR file.

The JAR files should be installed at the following locations.

JAR file	File location
bao-api-gateway.jar	/var/lib/bao-api-gateway/bao-api-gateway.jar
bao-identity.jar	/var/lib/bao-identity/bao-identity.jar
bao-data-pipeline.jar	/var/lib/bao-data-pipeline/bao-data-pipeline.jar

Create the directories manually and make bao-admin the owner.

sudo mkdir /var/lib/bao-api-gateway
sudo mkdir /var/lib/bao-identity
sudo mkdir /var/lib/bao-data-pipeline

sudo chown bao-admin:bao-admin /var/lib/bao-api-gateway
sudo chown bao-admin:bao-admin /var/lib/bao-identity
sudo chown bao-admin:bao-admin /var/lib/bao-data-pipeline

Place the JAR files in the respective directories and make bao-admin the owner.

sudo cp bao-api-gateway.jar /var/lib/bao-api-gateway
sudo cp bao-identity.jar /var/lib/bao-identity
sudo cp bao-data-pipeline.jar /var/lib/bao-data-pipeline

sudo chown bao-admin:bao-admin /var/lib/bao-api-gateway/bao-api-gateway.jar
sudo chown bao-admin:bao-admin /var/lib/bao-identity/bao-identity.jar
sudo chown bao-admin:bao-admin /var/lib/bao-data-pipeline/bao-data-pipeline.jar

Systemd¶

The systemd service manager is used to manage the service processes. Each service has a corresponding systemd service file and a configuration file.

The systemd service files are specified below. The memory allocations should be adjusted to the available server resources. The systemd service files should be located in the /etc/systemd/system directory.

Systemd file	File location
bao-api-gateway.service	/etc/systemd/system/bao-api-gateway.service
bao-identity.service	/etc/systemd/system/bao-identity.service
bao-data-pipeline.service	/etc/systemd/system/bao-data-pipeline.service

The bao-api-gateway.service systemd service file.

sudo nano /etc/systemd/system/bao-api-gateway.service

[Unit]
Description = AP API Gateway

[Service]
Environment = "JAVA_OPTS=-Xms256M -Xmx512M"
ExecStart = /var/lib/bao-api-gateway/bao-api-gateway.jar
User = bao-admin

[Install]
WantedBy = multi-user.target

The bao-identity.service systemd service file.

sudo nano /etc/systemd/system/bao-identity.service

[Unit]
Description = AP Identity

[Service]
Environment = "JAVA_OPTS=-Xms1024M -Xmx2048M"
ExecStart = /var/lib/bao-identity/bao-identity.jar
User = bao-admin

[Install]
WantedBy = multi-user.target

The bao-data-pipeline.service systemd service file.

sudo nano /etc/systemd/system/bao-data-pipeline.service

[Unit]
Description = AP Data Pipeline

[Service]
Environment = "JAVA_OPTS=-Xms1024M -Xmx2048M"
ExecStart = /var/lib/bao-data-pipeline/bao-data-pipeline.jar
User = bao-admin

[Install]
WantedBy = multi-user.target

To enable the services on boot, invoke the following commands.

sudo systemctl enable bao-api-gateway
sudo systemctl enable bao-identity
sudo systemctl enable bao-data-pipeline

To start a service using systemd, after the JAR files and configuration files are installed, invoke the following command.

sudo systemctl start bao-data-pipeline

To stop a service using systemd, invoke the following command.

sudo systemctl stop bao-data-pipeline

PostgreSQL¶

The AP identity and data pipeline services use PostgreSQL for persistence. Note that the PostgreSQL contains metadata for data pipelines, views and more, while analytical data is stored in a data warehouse such as ClickHouse. Note that the names given to the databases and users can be adjusted as preferred, and the following names are suggestions.

Database name	Database user	Encoding
baoidentity	baoidentity	UTF-8
baodatapipeline	baodatapipeline	UTF-8

Users¶

Create the required users. Switch to the postgres user. Connect to PostgreSQL with the psql CLI.

sudo su postgres

psql

Create users for the identity and data pipeline services. Replace mypassword1 and mypassword2 with a strong passwords, and take note securely.

create user baoidentity with password 'mypassword1';

create user baodatapipeline with password 'mypassword2';

Databases¶

Create databases for the identity and data pipeline services. Set encoding to UTF-8.

create database baoidentity with owner baoidentity encoding 'utf8';

create database baodatapipeline with owner baodatapipeline encoding 'utf8';

Exit the CLI with Ctrl+D and then return to the bao-admin user with exit.

Configuration¶

Each service has a corresponding configuration file.

Config file	File location
bao-api-gateway.conf	/opt/bao-api-gateway/bao-api-gateway.conf
bao-identity.conf	/opt/bao-identity/bao-identity.conf
bao-data-pipeline.conf	/opt/bao-data-pipeline/bao-data-pipeline.conf

API gateway¶

Create the bao-api-gateway.conf configuration file for the API gateway service with chmod 600.

sudo mkdir /opt/bao-api-gateway

sudo nano /opt/bao-api-gateway/bao-api-gateway.conf

# ----------------------------------------------------------
# Service to URI mapping
# ----------------------------------------------------------

# Identity service URI
service.identity = http://localhost:8086/

# Data pipeline service URI
service.datapipeline = http://localhost:8084/

# ----------------------------------------------------------
# CORS
# ----------------------------------------------------------

# Allowed origins for CORS
cors.allowed_origins = https://localhost:3000, \
                       https://localhost:9000

sudo chown bao-admin:bao-admin /opt/bao-api-gateway/bao-api-gateway.conf

sudo chmod 600 /opt/bao-api-gateway/bao-api-gateway.conf

Identity¶

Create the bao-identity.conf configuration file for the identity service. Adjust usernames and passwords to your environment.

sudo mkdir /opt/bao-identity

sudo nano /opt/bao-identity/bao-identity.conf

# ----------------------------------------------------------
# Database connection
# ----------------------------------------------------------

# JDBC connection URL
connection.url = jdbc:postgresql://127.0.0.1/baoidentity

# JDBC connection username
connection.username = baoidentity

# JDBC connection password (confidential)
connection.password = xxxx

# ----------------------------------------------------------
# Redis
# ----------------------------------------------------------

# Redis hostname / IP address
redis.hostname = 127.0.0.1

# Redis port, optional, default: 6379
redis.port = 6379

# Redis password, optional
redis.password = 

# ----------------------------------------------------------
# Apache Pulsar
# ----------------------------------------------------------

# Pulsar hostname / IP address
pulsar.service_url = pulsar://127.0.0.1:6650

# Pulsar TLS authentication plugin, optional, TLS only
# pulsar.tls.auth.plugin = 

# Pulsar TLS certificate path, optional, optional, TLS only
# pulsar.tls.trusts.certs.file.path = 

# Pulsar TLS certificate file, optional, TLS only
# pulsar.tls.cert.file = 

# Pulsar TLS key file, optional, TLS only
# pulsar.tls.key.file = 

# ----------------------------------------------------------
# System
# ----------------------------------------------------------

# System hostname / base URL
system.base_url = https://analytics.mydomain.org

# System application title
system.application_title = Analytics Platform

# Log email invitation URLs, disable in prod, debugging only
system.user_invite.logging = off

# Name of issuer for MFA entries
system.mfa_issuer = Analytics Platform

# ----------------------------------------------------------
# Email
# ----------------------------------------------------------

# From address for outgoing emails
email.from.address = noreply@mydomain.org

# ----------------------------------------------------------
# SMTP
# ----------------------------------------------------------

# SMTP hostname or IP address
smtp.host = 127.0.0.1

# SMTP port, default: 587
smtp.port = 587

# SMTP TLS
smtp.tls = true

# SMTP username
smtp.user = myuser

# SMTP password
smtp.password = xxxx

sudo chown bao-admin:bao-admin /opt/bao-identity/bao-identity.conf

sudo chmod 600 /opt/bao-identity/bao-identity.conf

Data pipeline¶

Create the bao-data-pipeline.conf configuration file for the data pipeline service. Adjust usernames and passwords to your environment.

sudo mkdir /opt/bao-data-pipeline

sudo nano /opt/bao-data-pipeline/bao-data-pipeline.conf

# ----------------------------------------------------------
# Database connection
# ----------------------------------------------------------

# JDBC connection URL
connection.url = jdbc:postgresql://127.0.0.1/baodatapipeline

# JDBC connection username
connection.username = baodatapipeline

# JDBC connection password (confidential)
connection.password = xxxx

# ----------------------------------------------------------
# Redis
# ----------------------------------------------------------

# Redis hostname / IP address
redis.hostname = 127.0.0.1

# Redis port, optional, default: 6379
redis.port = 6379

# Redis password, optional
redis.password = 

# ----------------------------------------------------------
# Apache Pulsar
# ----------------------------------------------------------

# Pulsar hostname / IP address
pulsar.service_url = pulsar://127.0.0.1:6650

# Pulsar TLS authentication plugin, optional, TLS only
# pulsar.tls.auth.plugin = 

# Pulsar TLS certificate path, optional, optional, TLS only
# pulsar.tls.trusts.certs.file.path = 

# Pulsar TLS certificate file, optional, TLS only
# pulsar.tls.cert.file = 

# Pulsar TLS key file, optional, TLS only
# pulsar.tls.key.file = 

# ----------------------------------------------------------
# System
# ----------------------------------------------------------

# System hostname / base URL
system.base_url = https://analytics.mydomain.org

# Retain temporary data files (debugging only)
system.retain_temp_files = off

# Sample size for dataset column type detection, default: 5k
system.max_sample_size = 500000

# Email address to send alert messages on error
system.error.alert_email = alerts@mydomain.org

# ----------------------------------------------------------
# Blobstore (local filesystem only)
# ----------------------------------------------------------

# Root directory for local file system blob storage
blobstore.root_dir = /var/lib/bao-data-pipeline/data

# ----------------------------------------------------------------------
# OpenAI [Optional]
# ----------------------------------------------------------------------

# OpenAI API key
openai.api_key = 

# OpenAI model, can be 'default', 'gpt-4o-mini', 'gpt-4o', 'o3-mini'
openai.model = default

# ----------------------------------------------------------------------
# Google [Optional]
# ----------------------------------------------------------------------

# API key
google.gemini.api_key =

sudo chown bao-admin:bao-admin /opt/bao-data-pipeline/bao-data-pipeline.conf

sudo chmod 600 /opt/bao-data-pipeline/bao-data-pipeline.conf

Encryption¶

The data pipeline service encrypts all secrets at the database level, and requires an encryption key to provided.

Note

Store the encryption key in a secure manner!

The encryption key should be stored in a secure and confidential way. If the key is lost, the encrypted database content cannot be recovered. If the key is exposed, an attacker could use the key to decrypt the database secrets.

The Tink Java library is used for encryption. An encryption key can be generated using the tink CLI called Tinkey.

The encryption key file name is bao-data-pipeline-key.json and the content is in JSON format.

Download Tinkey from the following URL.

https://developers.google.com/tink/tinkey-overview

Uncompress the tar ball in a suitable location. Generate the key with the following command.

./tinkey create-keyset --key-template AES128_GCM --out-format json

Create and store the encryption key file in the data pipeline configuration directory.

sudo nano /opt/bao-data-pipeline/bao-data-pipeline-key.json

Example encryption key file.

{
  "primaryKeyId": 0000000000,
  "key": [
    {
      "keyData": {
        "typeUrl": "type.googleapis.com/google.crypto.tink.AesGcmKey",
        "value": "{secret}",
        "keyMaterialType": "SYMMETRIC"
      },
      "status": "ENABLED",
      "keyId": 0000000000,
      "outputPrefixType": "TINK"
    }
  ]
}

sudo chown bao-admin:bao-admin /opt/bao-data-pipeline/bao-data-pipeline-key.json

sudo chmod 600 /opt/bao-data-pipeline/bao-data-pipeline-key.json

Data cache¶

When AP ingests data from various data sources, it caches data in the form of data files, which are temporarily stored on the filesystem of the server where AP is deployed. Depending on the data sources, significant storage capacity is required. However, data is deleted when a data load process completes, meaning the data volume will not grow over time.

The data cache directory name is data-pipeline, and located below the configuration directory.

/opt/bao-data-pipeline/data-pipeline

Create the directory manually.

CACHE_DIR="/opt/bao-data-pipeline/data-pipeline"
sudo mkdir $CACHE_DIR
sudo chown bao-admin:bao-admin $CACHE_DIR
sudo chmod 755 $CACHE_DIR

Data storage¶

Note

This section applies only for on-premise server data storage environments

When deploying AP in on-premise server environments, take care to provision storage device (disk or SSD) with appropriate capacity. 500GB is a reasonable starting point. Separate storage devices may be provisioned for the AP software and for the data storage.

The configuration property blobstore.root_dir in bao-data-pipeline.conf defines the root directory for data storage on the local filesystem. It allows for storing data on a dedicated storage device (disk or SSD). The default location is /var/lib/bao-data-pipeline/data. Create the data directory manually.

DATA_DIR="/var/lib/bao-data-pipeline/data"
sudo mkdir $DATA_DIR
sudo chown bao-admin:bao-admin $DATA_DIR
sudo chmod 755 $DATA_DIR

In the following configuration section, the blob store container name will be specified per client (tenant). In an on-premise environment, create a directory manually to represent the container using the specified container name below the root data directory. This guide uses bao-ap-client-main as the container name for the default client, though any container name can be used. The directory should be created in the following location.

/var/lib/bao-data-pipeline/data/bao-ap-client-main

Create the directory manually. The data and client directories should be located on a storage medium with appropriate capacity.

CLIENT_DIR="/var/lib/bao-data-pipeline/data/bao-ap-client-main"
sudo mkdir $CLIENT_DIR
sudo chown bao-admin:bao-admin $CLIENT_DIR
sudo chmod 755 $CLIENT_DIR

The data storage location can be defined with the blobstore.root_dir property in the bao-data-pipeline.conf configuration file.

Read me¶

The following content is convenient to maintain in a readme.md file.

nano readme.md

# Analytics Platform

## Redis

redis-cli -h 127.0.0.1

## Apache Pulsar

sudo systemctl status apache-pulsar

sudo systemctl restart apache-pulsar

sudo journalctl -n 500 -f -u apache-pulsar

## Nginx

sudo systemctl status nginx

sudo systemctl restart nginx

sudo tail -f /var/log/nginx/access.log

## Apache Superset

sudo systemctl status apache-superset

sudo systemctl restart apache-superset

sudo journalctl -n 500 -f -u apache-superset

## AP service status

sudo systemctl status bao-api-gateway
sudo systemctl status bao-identity
sudo systemctl status bao-data-pipeline

## AP service restart

sudo systemctl restart bao-api-gateway
sudo systemctl restart bao-identity
sudo systemctl restart bao-data-pipeline

## AP service logging

sudo journalctl -n 500 -f -u bao-api-gateway -u bao-identity -u bao-data-pipeline -o cat

sudo journalctl -n 500 -f -u bao-data-pipeline

Debug¶

To adjust the log level for the Java services, append the following parameter to the ExecStartproperty in the appropriate systemd service file. The com.bao part of the parameter value refers to the package of the classes for which the logging level will apply.

ExecStart = /var/lib/bao-identity/bao-identity.jar --logging.level.com.bao=debug