Skip to content

Apache Superset Installation

This guide explains how to install Apache Superset using the pip package installer for Python on Ubuntu 22.04. The source installation is efficient in terms of required memory server resources, and is preferrable in an on-premise environment.

The guide assumes that a bao-admin user exists, and that PostgreSQL, Redis and nginx are installed.

PostgreSQL

Create a user and database for Superset. Switch to the postgres user and connect to PostgreSQL with the psql command-line tool.

sudo su postgres
psql

Create the PostgreSQL user for Superset with a strong password. Take note of the password.

create user superset with password 'mypassword';

Create the PostgreSQL database for Superset and exit the command-line tool.

create database superset with owner superset encoding 'utf8';

Optional: Verify that PostgreSQL is running and the database is accessible as the superset user.

psql -d superset -U superset -h 127.0.0.1

Python

Install Python 3 and OS dependencies from the Ubuntu package repository.

sudo apt update && sudo apt upgrade -y
sudo apt-get install build-essential libssl-dev libffi-dev python3-dev \
python3-venv python3-pip libsasl2-dev libldap2-dev libpq-dev

Verify the Python version.

python3 -V

Upgrade pip.

pip3 install --upgrade pip

Create an installation directory.

sudo mkdir -p /var/lib/apache-superset
sudo chown bao-admin:bao-admin /var/lib/apache-superset

Create virtual environment in the installation directory.

cd /var/lib/apache-superset
python3 -m venv venv

Activate the virtual environment

source venv/bin/activate

Verify the Python and pip versions in the virtual environment.

python -V
pip -V

Note

Do not change the location of the virtual environment after installation.

Apache Superset

Installation

Install Apache Superset, Python packages and the ClickHouse database driver.

pip install apache-superset psycopg2 gunicorn gevent flask_cors pillow
pip install clickhouse-connect

Other relevant driver packages, like Redshift and MS SQL, can be also be installed if required.

pip install sqlalchemy-redshift pymssql

Configuration

Generate a secret encryption key for the Superset config file.

openssl rand -base64 42

Create a config file named superset_config.py in the root Superset installation directory with the following configuration content. Note that the config file should be located in the root directory, not the venv directory.

nano superset_config.py

Adjust the required configuration values to the environment.

  • Set SECRET_KEY to the previously generated secret key.
  • Set <superset_pwd> for the the SQLALCHEMY_DATABASE_URI property to the previously created PostgreSQL password.
# Number of worker threads
SUPERSET_WORKERS = 4

# Gunicorn server port
SUPERSET_WEBSERVER_PORT = 8089

# Enable logging to file
ENABLE_TIME_ROTATE = True

# Encryption key
SECRET_KEY = 'mykey'

# SuperSet PostgreSQL metadata database connection
SQLALCHEMY_DATABASE_URI = 'postgresql://superset:<superset_pwd>@localhost/superset'

# Caching with Redis
CACHE_CONFIG = {
  'CACHE_TYPE': 'redis',
  'CACHE_DEFAULT_TIMEOUT': 3600,
  'CACHE_KEY_PREFIX': 'superset_results',
  'CACHE_REDIS_URL': 'redis://localhost:6379/0',
}

# Flask-WTF flag for CSRF
WTF_CSRF_ENABLED = False
TALISMAN_ENABLED = False
CSRF_ENABLED = False

# Add endpoints that need to be exempt from CSRF protection
WTF_CSRF_EXEMPT_LIST = []

# A CSRF token that expires in 1 year
WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365

# Set this API key to enable Mapbox visualizations
MAPBOX_API_KEY = ''

# Enable CORS to allow embedded dashboards
ENABLE_CORS = True
ALLOW_ORIGINS = ['http://localhost:8089']
CORS_OPTIONS = {
    'supports_credentials': True,
    'allow_headers': ['*'],
    'resources':['*'],
    'origins': ALLOW_ORIGINS
}

# Disable CSP due to bug in Superset
TALISMAN_ENABLED = False

# Enable proxy headers to support nginx
ENABLE_PROXY_FIX = True

# Enable embedded dashboards
FEATURE_FLAGS = {
    "EMBEDDED_SUPERSET": True,
    "DASHBOARD_RBAC": True,
    "EMBEDDABLE_CHARTS": True
}

# Dashboard embedding
GUEST_ROLE_NAME = "Gamma"

Set ownership and permissions.

sudo chown bao-admin:bao-admin superset_config.py
sudo chmod 644 superset_config.py

Initialize the PostgreSQL database and create an initial Superset admin user by invoking the following commands. Take note of the provided username and password.

# Set required environment variables
export FLASK_APP=superset
export SUPERSET_CONFIG_PATH=/var/lib/apache-superset/superset_config.py

# Run database migrations
superset db upgrade

# Initialize database
superset init

# Create admin user
superset fab create-admin

Optional: Run Superset with Gunicorn to verify the installation.

gunicorn -w 10 -k gevent -t 120 -b 127.0.0.1:8089 "superset.app:create_app()"

The virtual environment can now be deactivated.

deactivate

Reset password

The following commands can be used in the event of needing to reset the password for a user, e.g. the bao-admin user.

# Set required environment variables
export FLASK_APP=superset
export SUPERSET_CONFIG_PATH=/var/lib/apache-superset/superset_config.py

# Reset password for admin user
superset fab reset-password --username bao-admin

Upgrade

To upgrade Superset when a new version is available, navigate to the Superset installation directory and activate the virtual environment.

source venv/bin/activate

Upgrade the Superset version by executing the command below.

pip install apache-superset --upgrade

Upgrade the database schema by running required migrations, if any.

# Set required environment variables
export FLASK_APP=superset
export SUPERSET_CONFIG_PATH=/var/lib/apache-superset/superset_config.py

# Run database migrations
superset db upgrade

Deactivate the virtual environment.

deactivate

Restart the systemd Superset service.

Systemd

Create a systemd service file for Superset called apache-superset.service.

nano apache-superset.service
[Unit]
Description = Apache Superset
After = network.target

[Service]
Type = simple
User = bao-admin
Environment = 'FLASK_APP=superset'
Environment = 'SUPERSET_CONFIG_PATH=/var/lib/apache-superset/superset_config.py'
ExecStart = /var/lib/apache-superset/venv/bin/gunicorn -w 10 -k gevent -t 120 -b 127.0.0.1:8089 "superset.app:create_app()"
Restart = on-failure
RestartSec = 5s

[Install]
WantedBy=multi-user.target

Set ownership, permissions and move the init script to the systemd service directory.

sudo chown root:root apache-superset.service
sudo chmod 644 apache-superset.service
sudo mv apache-superset.service /etc/systemd/system/

Reload the systemd daemon.

sudo systemctl daemon-reload

Enable Superset on boot.

sudo systemctl enable apache-superset

Start Superset.

sudo systemctl start apache-superset

View status.

sudo systemctl status apache-superset

View the logs.

sudo journalctl -n 500 -f -u apache-superset

nginx

Configure nginx by creating a configuration file apache-superset.conf. Requests will be proxied to Gunicorn, which will be set up later. This guide assumes that SSL and certificates are configured.

nano apache-superset.conf

Configure a Superset server.

  • SSL and certificate configuration are left out, and should be configured appropriately.
  • Additional security hardening may be appropriate in a production environment.
  • Update server_name from superset.mydomain.org to match your environment.
# Redirect HTTP to HTTPS
server {
  listen        [::]:80;
  listen        80;
  server_name   superset.mydomain.org;

  access_log    off;
  log_not_found off;

  return 301 https://$host$request_uri;
}

# HTTPS server
server {
  listen       [::]:443 ssl;
  listen       443 ssl;
  server_name  superset.mydomain.org;

  # Proxy requests to Gunicorn on port 8089
  location / {
    proxy_pass             http://127.0.0.1:8089/;
    proxy_redirect         off;
    proxy_set_header       host               $host;
    proxy_set_header       x-real-ip          $remote_addr;
    proxy_set_header       x-forwarded-for    $proxy_add_x_forwarded_for;
    proxy_set_header       x-forwarded-proto  $scheme;
    proxy_set_header       x-forwarded-port   $server_port;
  }
}

Set ownership, permissions and move config file to correct location.

sudo chown root:root apache-superset.conf
sudo chmod 644 apache-superset.conf
sudo mv apache-superset.conf /etc/nginx/sites-available/

Enable the server configuration by creating a symlink to the nginx sites-enabled directory.

sudo ln -s /etc/nginx/sites-available/apache-superset.conf \
/etc/nginx/sites-enabled/apache-superset.conf

Restart nginx to make changes take effect.

sudo systemctl restart nginx

Setup

This section provides various tips for how to set up Apache Superset.

ClickHouse

This section assumes that ClickHouse has been installed on the same machine and configured per the instructions in the middleware installation guide. To set up a ClickHouse data warehouse connection:

  • Click Database in the top-right corner.
  • Under Supported databases, select ClickHouse Connect (Superset).
  • In the basic form, specify the following values.
    • Host: 127.0.0.1
    • Port: 8123
    • Database name: baoanalytics
    • Username: baoanalytics
    • Password: Use password from installation
    • Display name: Analytics Platform - ClickHouse
  • In the advanced form, in the Performance tab, specify the following values.
    • Chart cache timeout: 900 (or adjust as preferred)
    • Schema cache timeout: 1
    • Table cache timeout: 1
  • Click Finish.

Apache Superset

Various considerations when configuring Apache Superset are described below.

  • Database connection cache: For databases, the database schema and table metadata will by default be cached indefinitely. This means that if you add new schemas, tables and columns in the database after creating the database connection, they will not be reflected in the UI. Hence, the schema and table cache timeout values may be set to 1.
  • Data sets cache: When creating datasets, in the add new dataset screen, the list of schemas and tables in the respective drop-downs may not refresh properly. To force a refresh, click the Force icon next to each drop-down.
  • User roles: For user roles, ensure that the Public role does not have any permissions, and importantly does not have the can read on dashboard and can read on chart permissions. If granted, API requests to the /api/v1/dashboards/ endpoint will return no dashboards, which prevents embedded dashboards from working properly.