Data pipelines¶

The following section covers how to view, create, update and remove data pipelines.

Data catalog

View data pipeline¶

Click Data catalog in the left-side menu to view all data pipelines.
Click the name of a data pipeline to view more information.

Create data pipeline¶

The starting point for creating a new data pipeline is the data catalog. The data catalog displays the existing data pipelines, also referred to as datasets.

Click Connect data from the top-right corner.
Choose the data source for which you want to create a data pipeline.

Data pipeline types

General settings

In the General settings section, enter the following information. This section is present for all data pipeline types.

Field	Description
Name	The name of the data pipeline (required)
Refresh schedule	The interval for when to refresh data from the data source (required)
Description	A description of the data pipeline
Owner	The owner of the source data or system
URL	A URL to the source data or system
Tags	Free text tags which categorizes the source data
Disable pipeline	Whether to disable loading of source data
Reference	A reference text for the data source
Link to source	A URL refering to information about the data source
Link to terms of use	A URL refering to terms of use for the data source

Create data pipeline general settings

Data warehouse target

In the Data warehouse enter the following information. This section appears last of all sections, and is present for all data pipeline types.

There are two options to select from for specifying the data warehuouse target.

New schema

By selecting new schema, a dedicated, new schema will be created for the data pipeline, and the data pipeline will create and populate tables in the new schema.

Field	Description
Schema name	The name of the schema to create

Create data pipeline data warehouse target new schema

Existing schema

By selecting existing schema, the selected schema will be used, and the data data pipeline will create and populate tables in the existing schema. A base table name will be used as a prefix to all tables created by the data pipeline, to distinguish the tables from any other tables with same names.

Field	Description
Table schema	The existing data warehouse schema in which to create tables
Table name	The table name, or for multi-table data pipelins, the base name for all tables

Create data pipeline data warehouse target existing schema

The following section describes steps for creating each type of data pipeline.

BHIMA¶

Topic	Value
Connection	Web API
Authentication	API username/password
Data model	Project

Connection

Field	Description
URL	The URL for the BHIMA instance
Username	Username for the BHIMA account
Password	Password for the BHIMA account
Project	Project for which to exchange data

Settings

Stock usage

Field	Description
Depot	The depot to load data for
Inventory	The inventory to load data for
Avg consumption algorithm	The algorithm to use for average consumption calculation
Monthly interval	The monthly interval

Stock satisfaction rate

Field	Description
Start date	Start date for satisfaction rates
End date	End date for satisfication rates
Depots	Depots

CommCare¶

Topic	Value
Connection	Web API
Authentication	API username/token
Data model	Application and form

Settings

Field	Description
Domain	The domain (project) to load data for
Application	The application to load data for
Hash column names	Whether to hash column names to ensure uniqueness

DHIS2¶

Topic	Value
Connection	Web API and database connection
Authentication	API token, API username/password, database username/password
Data model	Aggregate data, program, event and enrollment

Web API

Field	Description
Base URL to web API	Base URL to web API for DHIS2 instance, do not include `/api`
Username	Username for DHIS2 user account
Password	Password for DHIS2 user account

Database

Providing a database connection URL and credentials will drastically improve performance, and is required to load enrollment and event data. If a database connection cannot be provided, the database section can be skipped, and the data pipeline will work with an API connection only. For the API connection, only metadata, data set completness and aggregate data are supported.

Field	Description
Hostname	The hostname to the PostgreSQL DHIS2 database server, do not include a protocol prefix
Port	Port number to the database server, default is 5432
SSL	Whether to enable SSL encryption for the database connection
Trust server certificate	Whether to trust the server SSL certificate for the database connection
Database name	The name of the database
Database username	The username of the database user
Database password	The password of the database user

Data types

The data types section provides selections for the data types to load.

Field	Description
Metadata	Include metadata, always enabled
Aggregate data	Include aggregate data values and complete data set registrations
Program	Include events and enrollments, use the drop-down to specify which programs to include, or leave the drop-down blank to include all current and future programs
User	Include user, user groups and user roles, excluding user passwords and secrets
Usage statistics	Include usage statistics
Data store	Include data store, including namespaces, keys and JSON values
File resource	Include file resources
Audit	Include metadata audit

Data filters

The data filters section provides filters for the data to load. All filters are optional.

Field	Description
Data element groups	The data element groups to include
Data elements	The data elements to include
Organisation units	The organisation units to include
Data sets	The data sets to include
Period last time unit	The last time periods of the specified unit to load
Include soft deleted data	Whether to include soft deleted data records
Skip wide aggregate data table	Whether to skip the wide aggregate data table
Include zero data values	Whether to include zero data values
Include narrow event table for programs	Whether to include narrow event table for programs

FHIR¶

The FHIR data pipeline allows for retrieving information, typically related to electronic health records.

Learn more about FHIR in general at fhir.org and FHR development at build.fhir.org.

Topic	Value
Connection	Web API
Authentication	API username/password

The following FHIR resources are supported.

Resource	Documentation
Code system	build.fhir.org/codesystem.html
Condition	build.fhir.org/condition.html
Encounter	build.fhir.org/encounter.html
Location	build.fhir.org/location.html
Medication	build.fhir.org/medication.html
Observation	build.fhir.org/observation.html
Organization	build.fhir.org/organization.html
Patient	build.fhir.org/patient.html
Person	build.fhir.org/person.html
Practitioner	build.fhir.org/practitioner.html
Questionnaire	build.fhir.org/questionnaire.html
Questionnaire response	build.fhir.org/questionnaireresponse.html
Value set	build.fhir.org/valueset.html

Settings

Field	Description
Questionnaires	The questionnaires to include

Global Health Observatory¶

The Global Health Observatory (GHO) data pipeline allows for retrieving indicators and datasets from the GHO public data portal.

Topic	Value
Connection	Web API
Authentication	None, publicly available
Data model	Indicator, dimension, dimension value and data

Settings

Field	Description
Indicators	The indicators to load data for

UNICEF Data¶

The UNICEF Data data pipeline allows for retrieving indicators and datasets from the UNICEF Data portal based on the SDMX standard.

Topic	Value
Connection	Web API
Authentication	None, publicly available
Data model	Data flow, data structure, dimension, code list, data set

Settings

Field	Description
Data flows	The SDMX data flows to load data for
Start period	The start year of the time range to load data from
End period	The end year of the time range to load for

World Bank Open Data¶

The World Bank Open Data data pipeline allows for retrieving indicators and datasets from the World Bank Open Data portal.

Topic	Value
Connection	Web API
Authentication	None, publicly available
Data model	Indicator and data

Settings

Field	Description
Indicators	The indicators to load data for

Google Sheets¶

The Google Sheets data pipeline allows for retrieving data from a Google Sheet and loading it into the AP.

To connect to a Google Sheet, the first step is to share the sheet with the AP service account email. The service account email address is: google-sheets@dharma-prod.iam.gserviceaccount.com.

To share the file in Google Drive:

Right-click the Google Sheet document and click Share > Share.
In the Add people, groups and calendar events field, enter the service account email address: google-sheets@dharma-prod.iam.gserviceaccount.com.
Set the permission to Viewer.
Click Send.

After sharing the sheet, get the URL to the sheet in the share dialog by clicking Copy link.

Topic	Value
Connection	Web API
Authentication	File shared with service account email
Data model	Sheet

Connection

Field	Description
URL	URL to sheet shared with service account email

Settings

Field	Description
Schema	Sheets in JSON format

The schema is a JSON array with one object per sheet. Each object has the following properties:

Property	Description
sheetName	The name of the sheet in Google Sheets. Ensure this matches the exact sheet name in the Google Sheets
ranges	An array of cell ranges in the sheet

The ranges array has one object per range. Each object has the following properties:

Property	Description
header	The cell range to use as column headers in the data warehouse table, e.g. `A1:J2`. The `header` can span multiple rows but this will be flattened into a single column header in the data warehouse
data	The cell range to use as rows in the data warehouse table, e.g. `A3:J100`

Note

The header and data ranges must have the same number of columns.

Example schema

[
  {
    "sheetName": "ANC Data January 2025",
    "ranges": [
      {
        "header": "A1:J2",
        "data": "A3:J100"
      },
      {
        "header": "A120:P121",
        "data": "A122:P200"
      }
    ]
  },
  {
    "sheetName": "TB Data January 2025",
    "ranges": [
      {
        "header": "A1:D1",
        "data": "A2:D100"
      }
    ]
  }
]

iHRIS¶

Topic	Value
Connection	Database (JDBC)
Authentication	Database username/password
Data model	Form

MySQL database

Field	Description
Hostname	The hostname for the iHRIS database
Port	The port for the iHRIS database, often 3306
SSL	Whether to enable SSL encryption for the database connection
Trust server certificate	Whether to trust the server SSL certificate for the database connection
Database name	The name of the database
Database username	The username of the database user
Database password	The password of the database user

Settings

Field	Description
Forms	The forms to load data for
References	The references to load data for
Include record history	Whether to load record history

Kobo Toolbox¶

Topic	Value
Connection	Web API
Authentication	API token
Data model	Survey

Connection

Field	Description
URL	URL to Kobo instance
Auth token	Authentication token for Kobo user account

Settings

Field	Description
Survey	The survey to load data for

ODK¶

Topic	Value
Connection	Web API
Authentication	API username/password
Data model	Project and form

Connection

Field	Description
URL	URL to ODK instance
Username	Username for ODK user account
Password	Password for ODK user account

Settings

Field	Description
Project	The project to load data for
Form	The form to load data for

Ona¶

Topic	Value
Connection	Web API
Authentication	API token
Data model	Form

Connection

Field	Description
URL	URL to Ona instance
Auth token	Authentication token for Ona user account

Settings

Field	Description
Form	The form to load data for

Talamus¶

Topic	Value
Connection	Web API
Authentication	API token
Data model	Facility

Connection

Field	Description
URL	URL to Kobo instance
Auth token	Authentication token for Kobo user account

Settings

Field	Description
Facilities	The facilities to load data for
Start date	The start date of the time range to load data for
End date	The end date of the time range to load data for

Amazon S3¶

Amazon S3 refers to files as objects.

Topic	Value
Connection	Web API
Authentication	Access key/secret key
Data model	Bucket and object

Source

Field	Description
Bucket	The bucket name
Object key	The key for the object to load
Access key	The IAM access key
Secret key	The IAM secret key

Azure Blob Storage¶

Azure Blob Storage refers to files as blobs.

Topic	Value
Connection	Web API
Authentication	Connection string
Data model	Container and blob

Source

Field	Description
Container name	The container name
Blob path	The path to the blob to load
Connection string	The connection string for the container

SQL Server¶

Topic	Value
Connection	Database (JDBC)
Authentication	Database username/password
Data model	Table

SQL Server

Field	Description
Hostname	The hostname for the database
Port	The port for the database, often 1433
SSL	Whether to enable SSL encryption for the database connection
Trust server certificate	Whether to trust the server SSL certificate for the database connection
Database name	The name of the database
Database username	The username of the database user
Database password	The password of the database user

Data source

Field	Description
SQL query	The SQL query for retrieving data to load
Tables	The database tables to load

MySQL¶

Topic	Value
Connection	Database (JDBC)
Authentication	Database username/password
Data model	Table

MySQL

Field	Description
Hostname	The hostname for the database
Port	The port for the database, often 1433
SSL	Whether to enable SSL encryption for the database connection
Trust server certificate	Whether to trust the server SSL certificate for the database connection
Database name	The name of the database
Database username	The username of the database user
Database password	The password of the database user

Data source

Field	Description
SQL query	The SQL query for retrieving data to load
Tables	The database tables to load

Oracle RDBMS¶

Topic	Value
Connection	Database (JDBC)
Authentication	Database username/password
Data model	Table

Oracle RDBMS

Field	Description
Hostname	The hostname for the database
Port	The port for the database, often 1433
SSL	Whether to enable SSL encryption for the database connection
Trust server certificate	Whether to trust the server SSL certificate for the database connection
Database name	The name of the database
Database username	The username of the database user
Database password	The password of the database user

Data source

Field	Description
SQL query	The SQL query for retrieving data to load
Tables	The database tables to load

PostgreSQL¶

Topic	Value
Connection	Database (psql)
Authentication	Database username/password
Data model	Table

PostgreSQL

Field	Description
Hostname	The hostname for the database
Port	The port for the database, often 1433
SSL	Whether to enable SSL encryption for the database connection
Trust server certificate	Whether to trust the server SSL certificate for the database connection
Database name	The name of the database
Database username	The username of the database user
Database password	The password of the database user

Data source

Field	Description
SQL query	The SQL query for retrieving data to load
Tables	The database tables to load

Amazon Redshift¶

Topic	Value
Connection	JDBC
Authentication	Database username/password, IAM ARN
Data model	Table

Amazon Redshift

Field	Description
Hostname	The hostname for the database
Port	The port for the database, often 1433
SSL	Whether to enable SSL encryption for the database connection
Trust server certificate	Whether to trust the server SSL certificate for the database connection
Database name	The name of the database
Database username	The username of the database user
Database password	The password of the database user

Data source

Field	Description
SQL query	The SQL query for retrieving data to load
Tables	The database tables to load

CSV file upload¶

Topic	Value
Connection	File upload
Authentication	-
Data model	Table

Settings

Field	Description
CSV files	One or more CSV data files to load
Delimiter	The CSV file delimiter

File format requirements

If uploading multiple files, the schema (columns) must be the same for all files
The first row should be the header defining the column names
Column names must be unique within the file
Column names are recommended to contain only letters and digits and start with a letter
The filename is recommended to contain only letters and digits and start with a letter

Parquet file upload¶

Topic	Value
Connection	File upload
Authentication	-
Data model	Table

Settings

Field	Description
Parquet file	Parquet data file to load

Edit data pipeline¶

Find and click the data pipeline to edit in the list.
Open the context menu by clicking the icon in the top-right corner.
Click Edit.
Edit values in the relevant sections.
Click Save at the bottom of the section.
Close the dialog by clicking the close icon in the top-left corner.

Remove data pipeline¶

Find and click the data pipeline to remove in the list.
Open the context menu by clicking the icon in the top-right corner.
Click Remove.

Manage access for data pipeline¶

Find and click the data pipeline in the list.
Open the context menu by clicking the icon in the top-right corner.
Click Share.
Grant appropriate access levels to users and user groups.
Click Save.