Docs
Databricks

Databricks

Databricks (opens in a new tab) is a data analytics platform built on top of Apache Spark.

Databricks allows you to combine the robustness of a data warehouse with the flexibility of a data lake via their lakehouse architecture (opens in a new tab). It provides a unified platform that offers a wide variety of services, allowing you to store and process large amounts of data quickly and efficiently, including real-time analytics.

Databricks can be especially useful for machine learning/artificial intelligence, data engineering, and data science applications.

How to get set up

  1. Sign up for a Databricks account or trial and create a workspace. You can find instructions on how to set up an account and workspace here (opens in a new tab).
  2. Databricks automatically configures a Starter Warehouse for new users. You can find it by clicking the SQL Warehouses link in the left sidebar of your Databricks workspace. You can find instructions on how to create or configure a warehouse here (opens in a new tab).
  3. Generate a personal access token (opens in a new tab) in Databricks for a user with access to the data you want to query:
    • In the top right corner of your Databricks workspace, click the user icon and navigate to User Settings.
    • In the left pane, click into the Developer controls. Click the Manage button in the Access tokens section.
    • Click Generate New Token. You will be asked to configure the token's lifespan. Finally, click Generate.
    • Copy the token and save it somewhere safe. You will not be able to access it again.
  4. In the sidebar of your workspace, click on SQL Warehouses under the SQL section. You will see a list of your warehouses. Click on the name of the warehouse you want to connect to. Then, navigating to the Connection details tab, you can find the hostname and HTTP path of your warehouse.
  5. In Hashboard, go to the Data sources (opens in a new tab) page.
  6. Click + Add connection, select Databricks and fill out the fields below.

Settings

  • Connection name: A nickname for your connection. Not used to connect to your database.
  • Host: The server hostname of the SQL warehouse. Can be found in the Connection Details tab of your SQL warehouse (see above for instructions), or as the URL of your Databricks workspace (excluding "https://"). Example: my-host-name.cloud.databricks.com
  • Port: (optional) The port to connect to. Set to 443 by default.
  • HTTP path: The HTTP path of the SQL warehouse. Can be found in the Connection Details tab of your SQL warehouse (see above for instructions). Example: /sql/1.0/warehouses/my-http-path
  • Access token: A personal access token generated in Databricks for a user with access to the data you want to query.
  • Catalog: The Databricks catalog (opens in a new tab) within the warehouse to connect to.
  • Schema: (optional) The schema to use for the connection. If no schema is set, all schemas in the catalog will be available.