Databricks Connections

How Ridge works with Delta Lake

Ridge AI's Databricks connector enables direct access to your Unity Catalog tables and Delta Lake data. Query your Databricks data using either simple table references or custom SQL, with automated ingestion to Ridge AI's analytics platform.

Requirements

You'll need a Databricks workspace with Unity Catalog enabled, and catalog access to the tables you want to query.

Key Capabilities

  • Unity Catalog integration
  • Delta Lake table support
  • Flexible query patterns (table references or SQL)
  • Scheduled data ingestion

Connecting to Databricks

What You'll Need

You'll need four pieces of information to connect:

1. Access Token
Generate a personal access token or service principal token:

  • In Databricks, go to Settings → User Settings → Access Tokens
  • Click Generate New Token
  • Set an expiration period and save the token securely
  • For production use, create a service principal token instead

2. Workspace Endpoint
Your workspace URL in the format:
https://dbc-xxxxx-xxxx.cloud.databricks.com

Find this in your browser's address bar when logged into Databricks, or in your workspace settings.

3. AWS Region
The AWS region hosting your workspace (e.g., us-west-2, us-east-1, eu-west-1).

Check your workspace URL or Databricks account console for region information.

4. Catalog Name
The Unity Catalog name you want to access (e.g., main, production, analytics).

Unity Catalog uses a three-level namespace:
catalog.schema.table

Ridge Data Connections manage authentication.
Ridge Data Sets are sourced from Connections or Files.

How to Connect to Data in Ridge

First, Set Up a Connection.

Connections manage authorization and authentication to a data source, such as Databricks.

  1. Navigate to the Data page
  2. Click New Connection
  3. Select Databricks as the connection type
  4. Enter your credentials:
    • Token: Your Databricks access token
    • Endpoint: Your workspace URL
    • AWS Region: Your workspace region
    • Catalog: The Unity Catalog name
  5. Click Save

Next, Set up a Data Set.

Once you have a Connection, create a Data Set. Data Sets point to specific tables and may specify queries and refresh schedules. To set up a Data Set,

  1. Navigate to the Data page
  2. Click New
  3. Select Source: Connection
  4. Select the Connection you just established

Then Select a Table or Set Up a Query.

Option 1: Table Reference (simplest)

catalog.schema.table_name

Example: main.sales.transactions

Option 2: Custom SQL (for filtering or transformations)

SELECT * FROM catalog.schema.table_name WHERE date > '2024-01-01'

Example:

SELECT
 customer_id,
 SUM(amount) as total_amount
FROM main.sales.transactions
WHERE year = 2024
GROUP BY customer_id

Finally, Set Up Ingestion.

  1. Click New Data Source from your Databricks connection
  2. Choose your query pattern and enter:
    • Table reference: catalog.schema.table
    • Custom SQL: Full SELECT statement
  3. Set an Ingestion Schedule using cron syntax:
    • Daily at midnight: 0 0 * * *
    • Every hour: 0 * * * *
    • Weekdays at 8am: 0 8 * * 1-5
  4. Click Save

When ingestion runs:

  1. Ridge AI executes your query against Databricks Unity Catalog
  2. Results are exported to Parquet format
  3. Data is stored in Ridge AI's object storage (R2)
  4. Data becomes available for visualization and analysis

Best Practices

Use service principals for production: Personal access tokens expire and are tied to individual users. Service principals provide better security and stability for automated ingestion.

Scope queries appropriately: Query only the data you need. Use WHERE clauses to filter by date ranges or other criteria to improve performance.

Leverage Unity Catalog permissions: Use Databricks' built-in access controls rather than sharing credentials broadly.

Monitor ingestion schedules: Set schedules based on how frequently your source data updates. Over-scheduling wastes resources; under-scheduling delays insights.

Test queries in Databricks first: Before configuring a data source in Ridge AI, run your SQL query in the Databricks SQL editor to verify it works correctly.

Supported Features & Limitations

Supported

  • Unity Catalog tables (all types)
  • Delta Lake format
  • Standard SQL data types
  • Both table references and custom SQL queries
  • Scheduled ingestion (cron-based)
  • Filtered queries (WHERE, GROUP BY, etc.)

Supported Table Types

TABLE_DB_STORAGE
TABLE_EXTERNAL
TABLE_DELTA_EXTERNAL

Limitations

  • No incremental loading: Each ingestion performs a full refresh. Use SQL WHERE clauses to limit data volumes.
  • Unity Catalog required: Legacy Hive metastore is not supported
  • Query-time only: Cannot stream or receive real-time updates
  • No write-back: Ridge AI is read-only; cannot modify Databricks data
  • No Delta Sharing support: This integration currently only connects to external tables and does not support Delta Sharing

Troubleshooting

Authentication Errors

Symptom: "Invalid token" or "Authentication failed"

Solutions:

  • Verify your token hasn't expired
  • Check that the token has the correct permissions
  • Ensure you copied the token completely (no extra spaces)
  • For service principals, verify the principal has been granted catalog access

Permission Errors

Symptom: "Access denied" or "Insufficient privileges"

Solutions:

  • Grant READ permissions on the catalog: GRANT USE CATALOG ON CATALOG catalog_name TO user
  • Grant READ permissions on schemas: GRANT USE SCHEMA ON SCHEMA schema_name TO user
  • Grant SELECT permissions on tables: GRANT SELECT ON TABLE table_name TO user

Connection Errors

Symptom: "Cannot connect" or "Endpoint unreachable"

Solutions:

  • Verify the endpoint URL format: https://dbc-xxxxx-xxxx.cloud.databricks.com
  • Ensure the AWS region matches your workspace region
  • Check for typos in the endpoint URL
  • Confirm your workspace is accessible (not behind a VPN-only restriction)

Query Errors

Symptom: "Table not found" or "SQL syntax error"

Solutions:

  • Use the full three-level name: catalog.schema.table (not just table)
  • Verify the table exists: Run the query in Databricks SQL editor first
  • Check SQL syntax if using custom queries
  • Ensure the catalog is attached (Unity Catalog must be enabled)

Ready to Learn more?   

Sign up to get product updates. Beta coming soon! 

Stay in the loop