Workflows¶
Overview¶
The Analytics Platform (AP) offers a comprehensive workflow management system designed to orchestrate complex data pipelines efficiently. Workflows in AP automate the movement, transformation, and integration of data across various sources and systems into a unified data store, enabling integrated data analysis through your chosen visualization tools. This section explains the architecture of workflows within AP and their critical role in streamlining data operations.
Benefits¶
Workflows and job orchestration in AP bring several key benefits.
- Efficiency: Automating data tasks reduces manual efforts and speeds up the data transformation processes.
- Consistency: Scheduled workflows ensure that data handling is performed consistently without gaps or overlaps, leading to reliable data integrity.
- Scalability: As organizational needs grow, workflows can be scaled to handle new data sources, increasing data volumes and complexity without compromising performance.
Workflow Model¶
- Workflow: A workflow is a structured sequence of operations designed to automate processes for data loading, transformation and integration. Each workflow consists of one or many steps.
- Step: A step encapsulates work to be done. Each step contains one or many jobs.
- Job: A job represents a specific task such as data extraction, transformation, or loading.
Scheduling¶
A workflow can be scheduled to run at specific intervals, which automates the recurring tasks and ensures data freshness without manual intervention. This scheduling capability is crucial for maintaining up-to-date data views and operational readiness in dynamic business environments. Workflows can be set to continuous updates. This will apply to jobs which support continuous updates, such as DHIS2 data pipelines.
Steps¶
Steps can be configured to run jobs either in sequence or in parallel within a step. Running jobs in parallel will reduce the total runtime of a workflow, thereby enhancing the efficiency of data processing tasks. If jobs have dependencies on each other, meaning the output of one view is required as input for another view, running jobs in serial, meaning in sequence, is recommended.
Jobs¶
Jobs within a workflow are categorized into three types:
- Data pipeline: Responsible for extracting and moving data from source systems into the data store.
- View: Handles joining and integration of datasets to prepare them for analysis.
- Destination: Manages loading of processed data back into operational systems or other destinations for further use.
Manage workflows¶
The following section covers how to create, update and remove workflows.
View workflow¶
- Click Workflows in the left-side menu.
- Click the name of a workflow to see more information.
Create workflow¶
- Click the Create new button from the top-right corner.
-
In the General settings section, enter the following information.
Field Description Name The name of the view Refresh schedule The interval for when to refresh data from the data source (required) Description A description of the view Disable workflow Whether to disable the workflow
Create step in workflow¶
- In the Steps section, click + Add step to add one or more steps.
-
In the Add step section, enter the following information.
Field Description Execution mode Whether to execute jobs in the step in serial or in parallel Disable workflow Whether to disable the workflow -
Click + Add job to add one or more jobs.
-
In the section for new job, select the following information.
Field Description Type The type of job Source/Target For data pipelines, the data source type; for destinations, the target type Name The data pipeline, view or destination -
Click the check icon to save the job.
- Repeat from 3. to create additional jobs.
- Click Save to save the step.
- Repeat to create additional steps.
Edit workflow¶
- Find and click the workflow to edit in the list.
- Open the context menu by clicking the icon in the top-right corner.
- Click Edit.
- Edit values in the relevant sections.
- Click Save at the bottom of the section.
- Close the dialog by clicking the close icon in the top-left corner.
Remove workflow¶
- Find and click the workflow to remove in the list.
- Open the context menu by clicking the icon in the top-right corner.
- Click Remove.
Refresh data¶
Workflows can be manually triggered.
- Open the context menu by clicking the icon in the top-right corner.
- Click Refresh data.