Apache Superset Installation¶
This guide explains how to install Apache Superset using the pip
package installer for Python on Ubuntu 22.04. The source installation is efficient in terms of required memory server resources, and is preferrable in an on-premise environment.
The guide assumes that a bao-admin
user exists, and that PostgreSQL, Redis and nginx are installed.
PostgreSQL¶
Create a user and database for Superset. Switch to the postgres user and connect to PostgreSQL with the psql command-line tool.
Create the PostgreSQL user for Superset with a strong password. Take note of the password.
Create the PostgreSQL database for Superset and exit the command-line tool.
Optional: Verify that PostgreSQL is running and the database is accessible as the superset
user.
Python¶
Install Python 3 and OS dependencies from the Ubuntu package repository.
sudo apt-get install build-essential libssl-dev libffi-dev python3-dev \
python3-venv python3-pip libsasl2-dev libldap2-dev libpq-dev
Verify the Python version.
Upgrade pip
.
Create an installation directory.
Create virtual environment in the installation directory.
Activate the virtual environment
Verify the Python and pip versions in the virtual environment.
Note
Do not change the location of the virtual environment after installation.
Apache Superset¶
Installation¶
Install Apache Superset, Python packages and the ClickHouse database driver.
Other relevant driver packages, like Redshift and MS SQL, can be also be installed if required.
Configuration¶
Generate a secret encryption key for the Superset config file.
Create a config file named superset_config.py
in the root Superset installation directory with the following configuration content. Note that the config file should be located in the root directory, not the venv
directory.
Adjust the required configuration values to the environment.
- Set
SECRET_KEY
to the previously generated secret key. - Set
<superset_pwd>
for the theSQLALCHEMY_DATABASE_URI
property to the previously created PostgreSQL password.
# Number of worker threads
SUPERSET_WORKERS = 4
# Gunicorn server port
SUPERSET_WEBSERVER_PORT = 8089
# Enable logging to file
ENABLE_TIME_ROTATE = True
# Encryption key
SECRET_KEY = 'mykey'
# SuperSet PostgreSQL metadata database connection
SQLALCHEMY_DATABASE_URI = 'postgresql://superset:<superset_pwd>@localhost/superset'
# Caching with Redis
CACHE_CONFIG = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 3600,
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}
# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = False
TALISMAN_ENABLED = False
CSRF_ENABLED = False
# Add endpoints that need to be exempt from CSRF protection
WTF_CSRF_EXEMPT_LIST = []
# A CSRF token that expires in 1 year
WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365
# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = ''
# Enable CORS to allow embedded dashboards
ENABLE_CORS = True
ALLOW_ORIGINS = ['http://localhost:8089']
CORS_OPTIONS = {
'supports_credentials': True,
'allow_headers': ['*'],
'resources':['*'],
'origins': ALLOW_ORIGINS
}
# Disable CSP due to bug in Superset
TALISMAN_ENABLED = False
# Enable proxy headers to support nginx
ENABLE_PROXY_FIX = True
# Enable embedded dashboards
FEATURE_FLAGS = {
"EMBEDDED_SUPERSET": True,
"DASHBOARD_RBAC": True,
"EMBEDDABLE_CHARTS": True
}
# Dashboard embedding
GUEST_ROLE_NAME = "Gamma"
Set ownership and permissions.
Initialize the PostgreSQL database and create an initial Superset admin user by invoking the following commands. Take note of the provided username and password.
# Set required environment variables
export FLASK_APP=superset
export SUPERSET_CONFIG_PATH=/var/lib/apache-superset/superset_config.py
# Run database migrations
superset db upgrade
# Initialize database
superset init
# Create admin user
superset fab create-admin
Optional: Run Superset with Gunicorn to verify the installation.
The virtual environment can now be deactivated.
Reset password¶
The following commands can be used in the event of needing to reset the password for a user, e.g. the bao-admin
user.
# Set required environment variables
export FLASK_APP=superset
export SUPERSET_CONFIG_PATH=/var/lib/apache-superset/superset_config.py
# Reset password for admin user
superset fab reset-password --username bao-admin
Upgrade¶
To upgrade Superset when a new version is available, navigate to the Superset installation directory and activate the virtual environment.
Upgrade the Superset version by executing the command below.
Upgrade the database schema by running required migrations, if any.
# Set required environment variables
export FLASK_APP=superset
export SUPERSET_CONFIG_PATH=/var/lib/apache-superset/superset_config.py
# Run database migrations
superset db upgrade
Deactivate the virtual environment.
Restart the systemd
Superset service.
Systemd¶
Create a systemd
service file for Superset called apache-superset.service
.
[Unit]
Description = Apache Superset
After = network.target
[Service]
Type = simple
User = bao-admin
Environment = 'FLASK_APP=superset'
Environment = 'SUPERSET_CONFIG_PATH=/var/lib/apache-superset/superset_config.py'
ExecStart = /var/lib/apache-superset/venv/bin/gunicorn -w 10 -k gevent -t 120 -b 127.0.0.1:8089 "superset.app:create_app()"
Restart = on-failure
RestartSec = 5s
[Install]
WantedBy=multi-user.target
Set ownership, permissions and move the init script to the systemd
service directory.
Reload the systemd daemon.
Enable Superset on boot.
Start Superset.
View status.
View the logs.
nginx¶
Configure nginx by creating a configuration file apache-superset.conf
. Requests will be proxied to Gunicorn, which will be set up later. This guide assumes that SSL and certificates are configured.
Configure a Superset server.
- SSL and certificate configuration are left out, and should be configured appropriately.
- Additional security hardening may be appropriate in a production environment.
- Update
server_name
fromsuperset.mydomain.org
to match your environment.
# Redirect HTTP to HTTPS
server {
listen [::]:80;
listen 80;
server_name superset.mydomain.org;
access_log off;
log_not_found off;
return 301 https://$host$request_uri;
}
# HTTPS server
server {
listen [::]:443 ssl;
listen 443 ssl;
server_name superset.mydomain.org;
# Proxy requests to Gunicorn on port 8089
location / {
proxy_pass http://127.0.0.1:8089/;
proxy_redirect off;
proxy_set_header host $host;
proxy_set_header x-real-ip $remote_addr;
proxy_set_header x-forwarded-for $proxy_add_x_forwarded_for;
proxy_set_header x-forwarded-proto $scheme;
proxy_set_header x-forwarded-port $server_port;
}
}
Set ownership, permissions and move config file to correct location.
Enable the server configuration by creating a symlink to the nginx sites-enabled
directory.
sudo ln -s /etc/nginx/sites-available/apache-superset.conf \
/etc/nginx/sites-enabled/apache-superset.conf
Restart nginx to make changes take effect.
Setup¶
This section provides various tips for how to set up Apache Superset.
ClickHouse¶
This section assumes that ClickHouse has been installed on the same machine and configured per the instructions in the middleware installation guide. To set up a ClickHouse data warehouse connection:
- Click Database in the top-right corner.
- Under Supported databases, select ClickHouse Connect (Superset).
- In the basic form, specify the following values.
- Host: 127.0.0.1
- Port: 8123
- Database name: baoanalytics
- Username: baoanalytics
- Password: Use password from installation
- Display name: Analytics Platform - ClickHouse
- In the advanced form, in the Performance tab, specify the following values.
- Chart cache timeout: 900 (or adjust as preferred)
- Schema cache timeout: 1
- Table cache timeout: 1
- Click Finish.
Apache Superset¶
Various considerations when configuring Apache Superset are described below.
- Database connection cache: For databases, the database schema and table metadata will by default be cached indefinitely. This means that if you add new schemas, tables and columns in the database after creating the database connection, they will not be reflected in the UI. Hence, the schema and table cache timeout values may be set to 1.
- Data sets cache: When creating datasets, in the add new dataset screen, the list of schemas and tables in the respective drop-downs may not refresh properly. To force a refresh, click the Force icon next to each drop-down.
- User roles: For user roles, ensure that the Public role does not have any permissions, and importantly does not have the can read on dashboard and can read on chart permissions. If granted, API requests to the
/api/v1/dashboards/
endpoint will return no dashboards, which prevents embedded dashboards from working properly.