Skip to content

Middleware installation

This guide covers installation of required middleware for the Analytics Platform (AP). This guide assumes Ubuntu Linux 22.04 LTS is used as the operating system and that the reader has some familiarity with Linux and terminals. The text editor used is nano.

Please consider the following:

  • There are many approaches to hosting a Java-based application such as AP. This guide outlines one of them.
  • Topics including security hardening and backup strategy are important but beyond the scope of this guide.
  • There may be several managed cloud middleware offerings available. This guide is focused on the on-premise installation scenario.

OpenJDK 17

Start by updating the operating system packages.

sudo apt update && sudo apt upgrade -y

Install OpenJDK version 17.

sudo apt install -y openjdk-17-jdk

PostgreSQL 14

Install PostgreSQL version 14. Note that later versions of PostgreSQL are supported. The installation of PostgreSQL is well covered in online installation guides.

sudo apt install -y postgresql-14

The PostgreSQL service is enabled on boot by default after installation. Verify the status of the PostgreSQL process.

sudo systemctl status postgresql

Set the PostgreSQL authentication method to md5.

sudo nano /etc/postgresql/14/main/pg_hba.conf

Make sure the authentication method is set to md5 for localhost connections, typically by modifying the two last lines.

host    all             all             127.0.0.1/32            md5
host    all             all             ::1/128                 md5

Adjust performance settings by creating a new configuration file.

nano 10-perf.conf
# PostgreSQL performance settings
max_connections = 100
shared_buffers = 768MB
work_mem = 16MB
maintenance_work_mem = 256MB
temp_buffers = 16MB
effective_cache_size = 2GB
checkpoint_completion_target = 0.8
wal_writer_delay = 1s
random_page_cost = 1.1
max_locks_per_transaction = 1024
track_activity_query_size = 8192

Set owner and permissions for the configuration file, and move it to the PostgreSQL configuration directory.

sudo chown postgres:postgres 10-perf.conf
sudo chmod 644 10-perf.conf
sudo mv 10-perf.conf /etc/postgresql/14/main/conf.d

Restart PostgreSQL to have changes take effect.

sudo systemctl restart postgresql

nginx

Install nginx.

sudo apt install -y nginx

The nginx service is enabled on boot by default after installation. Verify the status of the nginx process.

sudo systemctl status nginx

Configure a proxy cache inside the http element of the nginx config.

sudo nano /etc/nginx/nginx.conf
http {
  proxy_cache_path  /var/cache/nginx  levels=1:2  keys_zone=ap:20m  inactive=1d;
}

Configure nginx by creating a file analytics-platform.conf and place it in the nginx sites-available directory.

sudo nano /etc/nginx/sites-available/analytics-platform.conf

Configure nginx with SSL and static web app UI served from Amazon S3.

  • SSL and certificate configuration are left out, and should be configured appropriately.
  • The apigateway, web and identity services are defined as upstreams and referred to later in the config.
  • The manager and user web apps are served from Amazon S3.
  • Additional security hardening may be appropriate in a production environment.
  • Update server_name from ap.mydomain.org to match your environment.
# Upstream

upstream apigateway {
  server 127.0.0.1:8085;
}

upstream web {
  server 127.0.0.1:8081;
}

upstream identity {
  server 127.0.0.1:8086;
}

# Redirect HTTP to HTTPS
server {
  listen        [::]:80;
  listen        80;
  server_name   ap.mydomain.org;

  return 301 https://$host$request_uri;
}

# HTTPS server
server {
  listen       [::]:443 ssl;
  listen       443 ssl;
  server_name  ap.mydomain.org;

  # Compression
  gzip         on;
  gzip_types   application/json application/javascript text/javascript text/css text/plain;

  # Includes for the default hostname
  include  default.d/*.conf;

  # Includes for the default hostname under HTTPS
  include  default.d/*-https.inc;

  # https://developer.mozilla.org/en-US/docs/Web/HTTP/X-Frame-Options
  add_header  X-Frame-Options DENY;

  # https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP
  add_header  Content-Security-Policy "frame-ancestors 'none';";

  # Enable Strict Transport Security (HSTS) for https
  add_header Strict-Transport-Security "max-age=31536000" always;

  # Root URL rewrite to login page
  location = / {
    return 301 http://$host/manager/;
  }

  # Proxy settings
  proxy_set_header         host               $http_host;
  proxy_set_header         x-forwarded-host   $host;
  proxy_set_header         x-real-ip          $remote_addr;
  proxy_set_header         x-forwarded-for    $proxy_add_x_forwarded_for;
  proxy_set_header         x-forwarded-proto  $scheme;
  proxy_set_header         x-forwarded-port   $server_port;

  proxy_buffer_size        128k;
  proxy_buffers            8 128k;
  proxy_busy_buffers_size  256k;

  # Proxy forwards

  # Login check and logout
  location /login_check {
    proxy_pass             http://identity/login_check;
  }

  location /session_logout {
    proxy_pass             http://identity/session_logout;
  }

  # App
  location ~* ^/(app|doc|node_modules) {
    rewrite ^/(.*)         /$1 break;
    proxy_pass             http://web;
  }

  # Manager web app to Amazon S3  
  location /manager {
    proxy_intercept_errors on;
    proxy_set_header       X-Real-IP $remote_addr;
    proxy_set_header       X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_hide_header      x-amz-id-2;
    proxy_hide_header      x-amz-request-id;
    proxy_pass             http://bao-cloud-manager-prod.s3-website-us-east-1.amazonaws.com/manager;
    proxy_cache            ap;
  }

  # User web app to Amazon S3
  location /users {
    proxy_intercept_errors on;
    proxy_set_header       x-real-ip $remote_addr;
    proxy_set_header       x-forwarded-for $proxy_add_x_forwarded_for;
    proxy_hide_header      x-amz-id-2;
    proxy_hide_header      x-amz-request-id;
    proxy_pass             http://bao-cloud-manager-prod.s3-website-us-east-1.amazonaws.com/users;
    proxy_cache            ap;
  }

  # Increased max upload size and timeout for file upload API endpoints
  location /api/dataPipelines {
    proxy_pass             http://apigateway/api/dataPipelines;
    client_max_body_size   2048M;
    proxy_read_timeout     600;
    proxy_connect_timeout  600;
    proxy_send_timeout     600;
  }

  # API requests to API gateway service  
  location /api {
    proxy_pass             http://apigateway/api;
  }
}

Enable the server configuration by creating a symlink to the nginx sites-enabled directory.

sudo ln -s /etc/nginx/sites-available/analytics-platform.conf \
/etc/nginx/sites-enabled/analytics-platform.conf

Remove the default server configuration file.

sudo rm /etc/nginx/sites-enabled/default

Restart nginx to make changes take effect.

sudo systemctl restart nginx

Redis

Install redis server.

sudo apt install -y redis-server

The redis service is enabled on boot by default after installation. Verify the status of the redis process.

sudo systemctl status redis

Edit the redis configuration file.

sudo nano /etc/redis/redis.conf

Apache Pulsar

Installation

Install Apache Pulsar using the binary distribution. First, download and extract Pulsar using wget. Alternatively, visit the Apache Pulsar downloads page. You may want to check for a later version of Apache Pulsar.

PULSAR_VER="3.3.3"
wget https://archive.apache.org/dist/pulsar/pulsar-${PULSAR_VER}/apache-pulsar-${PULSAR_VER}-bin.tar.gz
tar xvfz apache-pulsar-${PULSAR_VER}-bin.tar.gz
mv apache-pulsar-${PULSAR_VER} apache-pulsar

Set root as owner and make binary files executable.

sudo chown root:root -R apache-pulsar
sudo chmod +x apache-pulsar/bin/pulsar*

Configuration

Optional: Adjust memory usage by modifying pulsar_env.sh .

sudo nano apache-pulsar/conf/pulsar_env.sh

Set the PULSAR_MEM variable and specify memory usage, adjusted to available server resources.

PULSAR_MEM=${PULSAR_MEM:-"-Xms2g -Xmx2g -XX:MaxDirectMemorySize=2g"}

Optional: Set new port for the HTTP server not to occupy port 8080.

sudo nano apache-pulsar/conf/standalone.conf

Set the webServicePort property to 8098.

webServicePort=8098

Move the directory to suitable installation location.

sudo mv apache-pulsar /var/lib/apache-pulsar

Create a systemd service file called apache-pulsar.service for running Pulsar in standalone mode.

nano apache-pulsar.service
[Unit]
Description = Apache Pulsar

[Service]
ExecStart = /var/lib/apache-pulsar/bin/pulsar standalone -nss

[Install]
WantedBy = multi-user.target

Note

The -nss flag is set due to Pulsar bug 5668.

Set owner and permissions for the init script.

sudo chown root:root apache-pulsar.service
sudo chmod 644 apache-pulsar.service

Move the init script to the systemd directory.

sudo mv apache-pulsar.service /etc/systemd/system/

Reload the systemd daemon.

sudo systemctl daemon-reload

Enable Pulsar on startup.

sudo systemctl enable apache-pulsar

Start Pulsar.

sudo systemctl start apache-pulsar

Verify that the Pulsar service is running.

sudo systemctl status apache-pulsar

View the Pulsar log.

sudo journalctl -f -u apache-pulsar -n 400

You should now have Pulsar running on port 6650.

To run Pulsar manually.

sudo /var/lib/apache-pulsar/bin/pulsar standalone

Troubleshooting: If Apache Pulsar fails to start due to local data corruption, a solution is to stop the service, delete the local data director and start the service. Local data will be lost, however, Apache Pulsar topics are not persisted and the data directory will be recreated on next start.

/var/lib/apache-pulsar/data

Extra

Shorthand notation for installing packages in standard Ubuntu repositories.

sudo apt update && \
sudo apt upgrade -y && \
sudo apt install -y openjdk-17-jdk postgresql-14 nginx redis-server unzip