Connecting to Databricks

What You'll Need

You'll need four pieces of information to connect:

1. Access Token
Generate a personal access token or service principal token:

In Databricks, go to Settings → User Settings → Access Tokens
Click Generate New Token
Set an expiration period and save the token securely
For production use, create a service principal token instead

2. Workspace Endpoint
Your workspace URL in the format:
https://dbc-xxxxx-xxxx.cloud.databricks.com

Find this in your browser's address bar when logged into Databricks, or in your workspace settings.

3. AWS Region
The AWS region hosting your workspace (e.g., us-west-2, us-east-1, eu-west-1).

Check your workspace URL or Databricks account console for region information.

4. Catalog Name
The Unity Catalog name you want to access (e.g., main, production, analytics).

Unity Catalog uses a three-level namespace:
catalog.schema.table

Ridge Data Connections manage authentication.

Ridge Data Sets are sourced from Connections or Files.

How to Connect to Data in Ridge

‍

First, Set Up a Connection.

Connections manage authorization and authentication to a data source, such as Databricks.

Navigate to the Data page
Click New Connection
Select Databricks as the connection type
Enter your credentials:
- Token: Your Databricks access token
- Endpoint: Your workspace URL
- AWS Region: Your workspace region
- Catalog: The Unity Catalog name
Click Save

Next, Set up a Data Set.

Once you have a Connection, create a Data Set. Data Sets point to specific tables and may specify queries and refresh schedules. To set up a Data Set,

Navigate to the Data page
Click New
Select Source: Connection
Select the Connection you just established

Then Select a Table or Set Up a Query.

Option 1: Table Reference (simplest)

catalog.schema.table_name

Example: main.sales.transactions

Option 2: Custom SQL (for filtering or transformations)

SELECT * FROM catalog.schema.table_name WHERE date > '2024-01-01'

Example:

SELECT customer_id, SUM(amount) as total_amount FROM main.sales.transactions WHERE year = 2024 GROUP BY customer_id

Finally, Set Up Ingestion.

Click New Data Source from your Databricks connection
Choose your query pattern and enter:
- Table reference: catalog.schema.table
- Custom SQL: Full SELECT statement
Set an Ingestion Schedule using cron syntax:
- Daily at midnight: 0 0 * * *
- Every hour: 0 * * * *
- Weekdays at 8am: 0 8 * * 1-5
Click Save

When ingestion runs:

Ridge AI executes your query against Databricks Unity Catalog
Results are exported to Parquet format
Data is stored in Ridge AI's object storage (R2)
Data becomes available for visualization and analysis

Best Practices

Use service principals for production: Personal access tokens expire and are tied to individual users. Service principals provide better security and stability for automated ingestion.

Scope queries appropriately: Query only the data you need. Use WHERE clauses to filter by date ranges or other criteria to improve performance.

Leverage Unity Catalog permissions: Use Databricks' built-in access controls rather than sharing credentials broadly.

Monitor ingestion schedules: Set schedules based on how frequently your source data updates. Over-scheduling wastes resources; under-scheduling delays insights.

Test queries in Databricks first: Before configuring a data source in Ridge AI, run your SQL query in the Databricks SQL editor to verify it works correctly.

Supported Features & Limitations

Supported

Unity Catalog tables (all types)
Delta Lake format
Standard SQL data types
Both table references and custom SQL queries
Scheduled ingestion (cron-based)
Filtered queries (WHERE, GROUP BY, etc.)

Supported Table Types

‍TABLE_DB_STORAGE ❌
‍TABLE_EXTERNAL ✅
‍TABLE_DELTA_EXTERNAL ✅

‍

Limitations

No incremental loading: Each ingestion performs a full refresh. Use SQL WHERE clauses to limit data volumes.
Unity Catalog required: Legacy Hive metastore is not supported
Query-time only: Cannot stream or receive real-time updates
No write-back: Ridge AI is read-only; cannot modify Databricks data
No Delta Sharing support: This integration currently only connects to external tables and does not support Delta Sharing

Troubleshooting

Authentication Errors

Symptom: "Invalid token" or "Authentication failed"

Solutions:

Verify your token hasn't expired
Check that the token has the correct permissions
Ensure you copied the token completely (no extra spaces)
For service principals, verify the principal has been granted catalog access

Permission Errors

Symptom: "Access denied" or "Insufficient privileges"

Solutions:

Grant READ permissions on the catalog: GRANT USE CATALOG ON CATALOG catalog_name TO user
Grant READ permissions on schemas: GRANT USE SCHEMA ON SCHEMA schema_name TO user
Grant SELECT permissions on tables: GRANT SELECT ON TABLE table_name TO user

Connection Errors

Symptom: "Cannot connect" or "Endpoint unreachable"

Solutions:

Verify the endpoint URL format: https://dbc-xxxxx-xxxx.cloud.databricks.com
Ensure the AWS region matches your workspace region
Check for typos in the endpoint URL
Confirm your workspace is accessible (not behind a VPN-only restriction)

Query Errors

Symptom: "Table not found" or "SQL syntax error"

Solutions:

Use the full three-level name: catalog.schema.table (not just table)
Verify the table exists: Run the query in Databricks SQL editor first
Check SQL syntax if using custom queries
Ensure the catalog is attached (Unity Catalog must be enabled)

Databricks Connections

Requirements

Key Capabilities

Connecting to Databricks

What You'll Need

How to Connect to Data in Ridge

First, Set Up a Connection.

Next, Set up a Data Set.

Then Select a Table or Set Up a Query.

Finally, Set Up Ingestion.

Best Practices

Supported Features & Limitations

Supported

Supported Table Types

Limitations

Troubleshooting

Authentication Errors

Permission Errors

Connection Errors

Query Errors

Ready to Learn more?

Stay in the loop

Send a Message