Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection to Databricks using Cube.py #8436

Open
GitMedic opened this issue Jul 4, 2024 · 3 comments
Open

Connection to Databricks using Cube.py #8436

GitMedic opened this issue Jul 4, 2024 · 3 comments
Assignees
Labels
driver:databricks question The issue is a question. Please use Stack Overflow for questions.

Comments

@GitMedic
Copy link

GitMedic commented Jul 4, 2024

I am using the below code to test multitenancy in cube cloud , I have two catalog which i pass as the customer_groupd_code

and my cube.py settings is this

@config('driver_factory')
def driver_factory(ctx: dict):
    try:
        customer_group_code = ctx['securityContext']['customer_group_code']
    except KeyError:
        raise ValueError('No customer_group_code found in Security Context!')
    
    catalog = customer_group_code
    databricks_token = os.getenv('CUBEJS_DB_DATABRICKS_TOKEN')
    jdbc_url = os.getenv('CUBEJS_DB_DATABRICKS_URL')

    if not databricks_token or not jdbc_url:
        raise ValueError('Databricks token or URL not found in environment variables!')
    
    return {
        "type": "databricks-jdbc",
        "url":jdbc_url,
        "catalog":catalog
    } 
``` , Please help me out in this issue 
@GitMedic
Copy link
Author

GitMedic commented Jul 4, 2024

Also for dynamic table name can i pass like this

`cubes:

  • name: some_report
    sql_table: "{securityContext.customer_code}.tb.some_report"
    joins: []`

@igorlukanin
Copy link
Member

Hi @GitMedic 👋

What is not working for you, exactly? I don't see a description of that or an error message in the issue text.

Also, I see that you have to define context_to_orchestrator_id in order for driver_factory to work correctly: https://cube.dev/docs/reference/configuration/config#context_to_orchestrator_id

@igorlukanin igorlukanin added question The issue is a question. Please use Stack Overflow for questions. driver:databricks labels Jul 5, 2024
@GitMedic
Copy link
Author

GitMedic commented Jul 8, 2024

Hello This is my code for cube.py

import os
from cube import config

@config('scheduled_refresh_contexts')
def scheduled_refresh_contexts() -> list[dict]:
    return [
        {
            'securityContext': {
                'tenant_id': "test_1",
                'bucket': 'demo'
            }
        }
    ]

@config('pre_aggregations_schema')
def pre_aggregations_schema(ctx: dict) -> str:
    try:
        return ctx['securityContext']['tenant_id']
    except KeyError:
        raise ValueError('APP_ID_ERROR:No tenant_id found in Security Context!')

@config('context_to_app_id')
def context_to_app_id(ctx: dict) -> str:
    try:
        return f"CUBE_APP_{ctx['securityContext']['tenant_id']}"
    except KeyError:
        raise ValueError('APP_ID_ERROR:No tenant_id found in Security Context!')

@config('context_to_orchestrator_id')
def context_to_orchestrator_id(ctx: dict) -> str:
    try:
        return f"CUBE_APP_{ctx['securityContext']['tenant_id']}"
    except KeyError:
        raise ValueError('ORCHESTRATOR_ID_ERROR:No tenant_id found in Security Context!')

@config('driver_factory')
def driver_factory(ctx: dict):
    try:
        tenant_id = ctx['securityContext']['tenant_id']
    except KeyError:
        raise ValueError('No tenant_id found in Security Context!')
    
    catalog = tenant_id
    databricks_token = os.getenv('CUBEJS_DB_DATABRICKS_TOKEN')
    jdbc_url = os.getenv('CUBEJS_DB_DATABRICKS_URL')

    if not databricks_token or not jdbc_url:
        raise ValueError('Databricks token or URL not found in environment variables!')
    
    return {
        "type": "databricks-jdbc",
        "database":catalog,
        "catalog":catalog
    }

class CubeConfig:
    def __init__(self, security_context: dict):
        self.security_context = security_context
        self.validate_security_context()

    def validate_security_context(self):
        if 'tenant_id' not in self.security_context:
            raise ValueError('No Customer Group Code found in Security Context!')

# Example security context for testing
if __name__ == "__main__":
    security_context = {'tenant_id': 'silo_dev_mk'}
    cube_config = CubeConfig(security_context)
    print(f"App ID: {context_to_app_id({'securityContext': security_context})}")
    print(f"Scheduled Refresh Contexts: {scheduled_refresh_contexts()}")`

And Here is my code for my datamodel

cubes:
  - name: report1
    sql_table: "test1.dev.report1"
    joins: []

    dimensions:
      - name: id
        sql: "{CUBE}.`#`"
        type: number
        title: "#"
        primary_key: true
        shown: true`

so the problem is I am getting the Security context passed via token but not able to switch to different catalog in databricks when using the driver factory setting , Can you help me achieve this ? When i try to remove the catalog name "test1" from sql_table it is gives the error as table or view not found "dev.report1" .

@igorlukanin igorlukanin self-assigned this Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
driver:databricks question The issue is a question. Please use Stack Overflow for questions.
Projects
None yet
Development

No branches or pull requests

2 participants