Data browser¶
Overview¶
Data browser is a core component of AP, designed to empower users to engage directly with their data through interactive querying across all available datasets. This allows for exploring data in near real-time, enabling users to derive insights and make informed decisions rapidly.
In AP, data pipelines automates the ingestion of data from multiple sources into the central data store. These pipelines are configured to handle diverse data formats and sources, such as applications, databases and data files, ensuring that the data is up-to-date and readily accessible. Once the data is in the platform, users can create queries which span all data sources and datasets, allowing for integrated data exploration and analytics.
The main query language of the bata browser is SQL, or Structured Query Language. SQL is a standardized language for managing and manipulating databases and data warehouses. SQL provides a powerful means to execute queries on data in a declarative style. It allows users to specify exactly what data they need from a database without requiring detailed knowledge of how the database is structured or stored. Typical operations are selecting specific data, aggregating data across dimensions, filtering data on particular values and joining tables together. SQL is widely known and used among data professionals, making it a common language for data exploration and analysis.
Schema navigator¶
The schema navigator is placed on the left-side panel of the data browser. It outlines the entire schema of the data available to the user. The schema is displayed as a hierarchy, where the first level represents table schemas, the second level represents tables and the third level represents table columns.
Level | Description |
---|---|
1 | Schema |
2 | Tables |
3 | Columns |
Expanding an item in the hierarchy will reveal items at the next level. The column data type is displayd next to each column name.
Query editor¶
The query editor is placed at the center of the data browser. This is the area where the query can be specified. There are two types of queries: SQL and Natural text. The type of query to work with can be selected at the top bar.
SQL queries¶
Users can write their SQL queries directly into the query editor area. The editor supports auto-completion of SQL statements to make writing more efficient. To activate auto-complete, press Ctrl + Space (Windows/Linux) or Command + Space (macOS), with the cursor at the relevant position of the query.
After writing a SQL query, click Run.
SQL queries will vary depending on the schema. A simple example of a SQL query that summarizes data values and groups by data item, quarterly time periods and countries:
select
d."DataItem",
d."PT Quarterly",
d."OU Country",
sum(d."Value")
from
demo.demo d
group by
d."DataItem",
d."PT Quarterly",
d."OU Country";
Formatting queries¶
To format a SQL query to make it more readable, click Format on the top bar.
Viewing results¶
The query response will appear in the result area. The query result is displayed as a table, with the name and the data type of each column displayed on the header row. By default, the first 200 rows of the result are displayed. The number of rows to display can be changed from the bottom bar drop-down to 500 or 1000.
Natural text queries¶
Users who are not proficient in SQL can write queries in natural language text. Click Natural text in the top bar to switch to natural text queries. Select one or more schemas from the schema selector at the top bar to narrow down the part of the schema to retrieve data from.
With text queries, a user can ask simple questions about metadata, for example:
The result of metadata queries can be used to ask more sophisticated data questions, for example:
Give me the sum of data values for data items related to TB_ART and TB_PREV by
quarter and OU Region level. Include the data item. Order by data value descending.
Download query result¶
After a query has run successfully, the result of the query can be downloaded to a data file in CSV format. The user is provided with two options.
Download preview
Downloads the rows which are visible in the result area. The number of rows can be changed from the bottom bar drop-down. The format is uncompressed CSV.
Download full dataset
Downloads the entire set of rows produced by the query. This download option will stream results to the web browser. The format is Gzip compressed CSV. Note that downloading extremely large datasets is not recommended.
Tools for decompressing Gzip files are pre-installed on MacOS and Linux. The 7-Zip tool is recommended for Microsoft Windows.
SQL reference¶
SQL is a standard query language defined by ANSI which ensures interoperability across data warehouses supported by AP. Numerous courses and guides exist online for learning purposes.
However, every data warehouse provides a range of specific features and functions. Users writing SQL queries can learn about data warehouse specific functions by consulting the respective SQL reference guides listed below. You can observe the type of data warehouse from the right-side label at the bottom bar, and explore the respective SQL guide by clicking the SQL reference link next to it, or from the table below.
Data warehouse | SQL reference guide |
---|---|
PostgreSQL | www.postgresql.org |
Amazon Redshift | docs.aws.amazon.com |
ClickHouse | clickhouse.com |
SQL Server | learn.microsoft.com |
Synapse | learn.microsoft.com |
Text-to-SQL¶
The following diagram describes the tetx-to-SQL solution at a high level.