Skip to content

Sysadmin

Welcome to the installation guide for Analytics Platform (AP from now on). This document is intended for system administrators who will be setting up and maintaining the environment required to run AP.

AP requires a Linux operating system. An Ubuntu LTS version is the recommended Linux distribution. The installation guide assumes Ubuntu Linux as the operating system and the availability of the systemd process and service manager.

AP supports a variety of public cloud providers, data storage and data warehouses. It can be deployed in a public cloud environment, and on Linux-based, on-premise server environments.

Data platform

AP uses a data storage provider to ingest and store raw data files from multiple sources in its native format. The following data storage environments are supported.

Infrastructure Data storage Data warehouse
AWS Amazon S3 ClickHouse
AWS Amazon S3 Amazon Redshift
Azure Azure Blob Storage SQL Database
Azure Azure Blob Storage Synapse
On-prem Local filesystem ClickHouse
On-prem Local filesystem PostgreSQL
On-prem Local filesystem SQL Server

Data storage

AP supports three providers for data storage:

  • Amazon S3
  • Azure Blob Storage
  • Local filesystem

Amazon S3 and Azure Blob Storage are scalable, highly durable and cost-effective public cloud storage service that allows users to store and retrieve any amount of data from anywhere on the web. These services integrate well with the vast ecosystem of data services in the AWS and Azure public clouds respectively.

Local filesystem refers to using a regular server with attached disk storage. This approach leverages the file system of the server and data files are stored in regular directories. As high-speed reading is not a priority, HHDs (Hard Disk Drives) is a cost-effective and feasible option, as opposed to more expensive and faster SSDs (Sold State Drives).

Data warehouses

  • ClickHouse
  • Amazon Redshift
  • Azure SQL Database
  • Azure Synapse
  • PostgreSQL
  • Microsoft SQL Server

In on-premise environments, ClickHouse is the preferred data warehouse, due to its open source license, well-documented server installation and high-perforance data ingestion and data querying.

AP software architecture

Middleware

Below is a summary of the necessary middleware components that your system needs to ensure optimal performance and compatibility.

  • OpenJDK 17: A robust and widely-used open-source implementation of the Java Platform which provides the runtime environment necessary for running Java applications. The AP backend services are written in Java 17.
  • PostgreSQL: Version 14 or later. A powerful, open-source relational database management system that offers advanced features such as complex queries, foreign keys, triggers, and up-to-date compliance with SQL standards. The AP backend services use PostgreSQL databases for persistence of data.
  • nginx: A high-performance, open-source HTTP server and reverse proxy that is essential for handling web traffic, load balancing, and serving static content efficiently.
  • Redis: An in-memory, open-source key-value store that provides lightning-fast data retrieval, making it ideal for caching and supporting real-time analytics, session management, and message brokering.
  • Apache Pulsar: An open-source distributed messaging and streaming platform that enables reliable, scalable, and low-latency data streaming and message queueing, suitable for event-driven applications.
  • ClickHouse: A high-performance, open-source columnar database management system designed for online analytical processing (OLAP) and real-time data analytics at scale. AP utilizes ClickHouse as data warehouse for analytical data processing and querying.

AP is based on several independent services.

  • API Gateway: The API gateway is responsible for routing API requests to the appropriate backend service. It manages authentication and user sessions.
  • Identity: The identity service is responsible for security, authentication, authorization, and for user and client management.
  • Data pipeline: The data pipeline service is the main component of AP and is responsible for data catalog, data pipelines, views, destinations, workflows, data quality checks.
  • Web UI: The UI is composed of two web apps written in React and Javascript: The analytics platform web app and the user management web app.

AP is deployed as executable JAR files, managed by the systemd system and process manager. A Docker image is planned for but not currently available.

Software architecture

AP is a multi-tenant and web-based software. Multi-tenancy is an application architecture where a single instance of the software serves multiple "tenants", also known as clients or organizations. Each tenant's data and configuration are isolated, ensuring security and privacy, but they all share the same underlying infrastructure and codebase. This approach allows for efficient use of resource, as the software instance can be maintained and updated centrally while still catering to the unique needs of different tenants. For an on-premise installation scenario used by a single organization, a single tenant can be configured, alternatively, individual tenants for development, testing and production. The high-level architecture of the AP is described in the below diagram.

AP software architecture

Network architecture

AP network architecture for on-prem hosting environments is described in the diagram below, which shows a typical example with a DHIS2 instance as data source, AP multi-tenant service and tenant-specific data storage and data warehouse.

AP network architecture

Tech stack

The AP software is built using a client-server architecture, where the client (front-end) communicates with the server (backend) over a REST HTTP API.

  • Database: The transactional database for metadata storage is PostgreSQL.
  • Backend: Backend services are written in Java using OpenJDK 17. Major frameworks are Spring Boot, Hibernate and Apache Commons. Testcontainers and JUnit are used for unit and integration testing.
  • Front-end: The front-end web apps are written in Javascript with the React framework and Ant Design UI library.