Module tripleblind.report_asset

Specialized Asset representing a parameterized database report.

The ReportAsset wraps a generic database query, providing a mechanism to publish a limited query (such as an SQL database query) with very tightly controlled options. These ReportAssets can then be easily executed by others to perform the limited variety of queries upon demand. Like all other Assets, the report can be manually approved or access can be granted via an Agreement.

Reports only allow changes to the limited parameters and validate provided parameters before usage, ensuring the report consumer can only utilize the intended range of options. This enables powerful but strongly controlled reporting against even the most sensitive of databases.

Parameters must be an integer, float or a string from an explicit list of options. Multiple parameters can be used by a report, including optional values and default. The parameters are used by the query_template, utilizing the Mustache templating language (see https://mustache.github.io).

Connection strings can be templated using Mustache as well, allowing for secrets to be used in the connection string. For example, a connection string could be defined as: "mssql+pyodbc://{{secret_username}}:{{secret_password}}@myserver:3306/payroll" Or secrets could be included in the parameters: username="{{secret_username}}", password="{{secret_password}}" if using a create method which accepts those parameters. See "Using Named Secrets" under https://deveval.tripleblind.app/portal/docs/user-guide/asset-owner-operations for more details.

Example report definition:

demographic_param = ReportParameter.create_string(
    name="demographic",
    description="Specific demographic to filter the report on",
    options=[
        ParameterOption("Gender"),
        ParameterOption("Ethnicity"),
    ],
)

query_template = '''
    select Dept_Name, {{demographic}}, avg(Total_Gross) as average_Pay
       from tripleblind_datasets.city_of_somerville_payroll
    group by Dept_Name, {{demographic}};
'''

report = tb.report_asset.BigQueryReport.create(
    gcp_project_id,
    bigquery_dataset,
    credentials,
    query_template=query_template,
    params=[demographic_param],
    name=f"EXAMPLE- Payroll Report",
)

Example report usage:

# Run a payroll report summarizing by Gender.
report = tb.ReportAsset.find("EXAMPLE- Payroll Report")
print(report.run({"demographic": "Gender"}).raw_content)

Classes

class AggregationMethod (value, names=None, *, module=None, qualname=None, type=None, start=1)

The aggregation methods available for use in a federated report.

Ancestors

  • enum.Enum

Class variables

var COUNT
var MAX
var MEAN
var MIN
var SUM
class AggregationRules (group_by: Union[str, List[str], ForwardRef(None)], aggregates: Optional[Mapping[str, AggregationMethod]], sort_columns: Optional[List[str]] = None, sort_order: Optional[str] = 'asc')

Federated Blind Reports require an AggregationRules object which defines the way the results from the federation members get combined. A single report can report multiple values, and each value can be aggregated in a different way. Grouping is optional and will be applied to all the results the same.

Class variables

var aggregates : Optional[Mapping[str, AggregationMethod]]

A map of column names to the aggregation method to use.

var group_by : Union[str, List[str], ForwardRef(None)]

The column(s) to group the results by.

var sort_columns : Optional[List[str]]

The columns to sort the results by.

var sort_order : Optional[str]

The order to sort the results by. Default is 'asc'.

Static methods

def create(group_by: Optional[List[str]] = None, aggregates: Optional[Mapping[str, AggregationMethod]] = None, sort_columns: Optional[List[str]] = None, sort_order: Optional[str] = 'asc')

Define the aggregation rules for a federated report. At least one group_by or aggregate is required.

Args

group_by : str or List[str]
The column(s) to group the results by.
aggregates : Mapping[str, AggregationMethod]
A map of column names to the aggregation method to use.
sort_columns : List[str], optional
The columns to sort the results by.
sort_order : str, optional
The order to sort the results by. Default is 'asc'.

Returns

AggregationRules
The new AggregationRules object.
def from_dict(kvs: Union[dict, list, str, int, float, bool, ForwardRef(None)], *, infer_missing=False) -> ~A
def from_json(s: Union[str, bytes, bytearray], *, parse_float=None, parse_int=None, parse_constant=None, infer_missing=False, **kw) -> ~A
def schema(*, infer_missing: bool = False, only=None, exclude=(), many: bool = False, context=None, load_only=(), dump_only=(), partial: bool = False, unknown=None) -> dataclasses_json.mm.SchemaF[~A]

Methods

def to_dict(self, encode_json=False) -> Dict[str, Union[dict, list, str, int, float, bool, ForwardRef(None)]]
def to_json(self, *, skipkeys: bool = False, ensure_ascii: bool = True, check_circular: bool = True, allow_nan: bool = True, indent: Union[int, str, ForwardRef(None)] = None, separators: Tuple[str, str] = None, default: Callable = None, sort_keys: bool = False, **kw) -> str
class BigQueryReport (uuid: UUID, query_template: str = '', params: List[preprocessor.report_parameters.ReportParameter] = typing.List, connection: Optional[str] = None, connection_opts: Optional[dict] = None, federation_group: Union[str, uuid.UUID, ForwardRef(None)] = None)

A parameterized database report asset backed by a view from a BigQuery database.

Ancestors

Static methods

def create(gcp_project: str, bigquery_dataset: str, credentials: Union[str, pathlib.Path], query_template: str, params: List[preprocessor.report_parameters.ReportParameter], name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, validate_sql: Optional[bool] = True, post_processing: Optional[str] = None) -> DatabaseReport

Creates a parameterized database report using a connection to a BigQuery database.

Args

gcp_project : str
The project name of your Google Cloud Project which will be used to cover any query access costs.
bigquery_dataset : str
The BigQuery dataset name.
credentials : str or Path
The path of your keyfile.json. See the Google documentation for more details. These credentials will be stored securely on your Access Point; neither TripleBlind nor anyone using your dataset will have access to it.
query_template : str
The SQL query template defined using the Mustache templating language for generating the report.
params : List[ReportParameter]
List of ReportParameters defining the configurable report parameters.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.
post_processing : str, optional
A string holding a Python function or a filename containing a Python function to use as a post-processing operation on the report output.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

DatabaseReport
New DatabaseReport asset on the Router, or None on failure

Inherited members

class DatabaseReport (uuid: UUID, query_template: str = '', params: List[preprocessor.report_parameters.ReportParameter] = typing.List, connection: Optional[str] = None, connection_opts: Optional[dict] = None, federation_group: Union[str, uuid.UUID, ForwardRef(None)] = None)

A parameterized database report asset backed by an SQL view of a database

When running a Federated Report, the same SQL query is distributed to be executed against multiple databases. Most SQL is compatible with all databases, but some specific situations require accessing features are not completely standardized. To handle this, the query_template can use DIALECT_xxx values to customize the SQL for the specific database. For example, to use the SQL Server TOP keyword in SQL Server, but LIMIT in Postgres, you could use the following query_template:

SELECT *
FROM my_table
{{#DIALECT_MSSQL}} TOP {{limit}} {{/DIALECT_MSSQL}}
{{#DIALECT_POSTGRESQL}} LIMIT {{limit}} {{/DIALECT_POSTGRESQL}}

Or you could only use the TOP keyword in SQL Server and LIMIT elsewhere:

SELECT *
FROM my_table
{{#DIALECT_MSSQL}} TOP {{limit}} {{/DIALECT_MSSQL}}
{{^DIALECT_MSSQL}} LIMIT {{limit}} {{/DIALECT_MSSQL}}

Supported DIALECT_ values are:

  • DIALECT_BIGQUERY
  • DIALECT_DATABRICKS
  • DIALECT_MONGO
  • DIALECT_MSSQL
  • DIALECT_MYSQL
  • DIALECT_ORACLE
  • DIALECT_POSTGRESQL
  • DIALECT_REDSHIFT
  • DIALECT_SNOWFLAKE
  • DIALECT_SQLITE

The active database will be given a value of "1", others will be undefined.

Ancestors

Subclasses

Class variables

var connection : Optional[str]
var connection_opts : Optional[dict]
var federation_group : Union[str, uuid.UUID, ForwardRef(None)]
var params : List[preprocessor.report_parameters.ReportParameter]
var query_template : str

Static methods

def cast(asset: Asset) -> DatabaseReport

Convert a generic Asset into a DatabaseReport

This should only be used on an asset known to be a DatabaseReport, no validation occurs during the cast.

Args

asset : Asset
A generic Asset

Returns

DatabaseReport
A DatabaseReport object
def create(query_template: str, params: List[preprocessor.report_parameters.ReportParameter], name: str, desc: str, connection: Optional[str] = None, connection_opts: Optional[dict] = None, credentials_info: Optional[dict] = None, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, validate_sql: Optional[bool] = True, federation_group: Union[ForwardRef('FederationGroup'), str, uuid.UUID, ForwardRef(None)] = None, federation_aggregation: Optional[AggregationRules] = None, post_processing: Optional[str] = None)

Creates a parameterized database report.

Args

connection : str
The connection string to the remote database connection. Secrets can be included in the connection string using Mustache templating of named secrets, e.g. "mssql+pyodbc://{{secret_username}}:{{secret_password}}@myserver.acme.com:3306/payroll". Any portion of the connection string can be templated, including the host, username, password, database, etc.
query_template : str
The SQL query template defined using the Mustache templating language for generating the report. In addition to your query parameters, your template can also include DIALECT_xxx values to customize the SQL for different databases. See the DatabaseReport documentation for more details.
params : List[ReportParameter]
List of ReportParameters defining the configurable report parameters.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown).
connection_opts : dict, optional
Optional dictionary of database connection options.
credentials_info : dict, optional
Dictionary of credentials information if not provided in the connection string.
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.
federation_group : FederationGroup, str or UUID, optional
The group, group name or UUID of the federation group to use when executing this report. If not specified, this is a standalone report.
federation_aggregation : AggregationRules, optional
Defines the aggregation to use when executing this report. Required for federated reports, invalid otherwise.
post_processing : str, Optional
A Python function or the filename containing a function to run after the report has been executed. The function must have the signature: def postprocess(df: pd.Dataframe, ctx: dict) The two arguments are the report output data frame and a dict holding the user-selected report parameters as context.

Raises

Exception
Name must be unique for each parameter.
SystemExit
SQL syntax errors were found in query.

Returns

DatabaseReport
New DatabaseReport asset on the Router, or None on failure

Inherited members

class DatabricksReport (uuid: UUID, query_template: str = '', params: List[preprocessor.report_parameters.ReportParameter] = typing.List, connection: Optional[str] = None, connection_opts: Optional[dict] = None, federation_group: Union[str, uuid.UUID, ForwardRef(None)] = None)

A table asset backed by a view from a Databricks database.

Ancestors

Static methods

def create(access_token: str, server_hostname: str, http_path: str, catalog: str, schema: str, query_template: str, params: List[preprocessor.report_parameters.ReportParameter], name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, validate_sql: Optional[bool] = True, post_processing: Optional[str] = None) -> DatabaseReport

Create a connection to a Databricks database

You can find the connection details for your Databricks cluster in the Databricks UI. Under the Compute in the sidebar, choose your target cluster. Under the Configuration tab for that cluster expand Advanced Options and choose the JDBC/ODBC tab, where you will find the needed values. See the Databricks documentation for more details: https://docs.databricks.com/en/integrations/compute-details.html

Args

access_token : str
A Databricks access token or a secret name. For example, "dapi1234567890abcdef"
server_hostname : str
The Databricks server name or a secret name. For example, "community.cloud.databricks.com"
http_path : str
The Databricks server name or a secret name. For example, "/sql/protocolv1/o/1234567890123456/0123-456789-abc123"
catalog : str
The Databricks catalog name or a secret name. For example, "default"
schema : str
The Databricks schema name or a secret name. For example, "default"
query_template : str
The SQL query template defined using the Mustache templating language for generating the report.
params : List[ReportParameter]
List of ReportParameters defining the configurable report parameters.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.
post_processing : str, optional
A string holding a Python function or a filename containing a Python function to use as a post-processing operation on the report output.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

DatabaseReport
New DatabaseReport asset on the Router, or None on failure

Inherited members

class MSSQLReport (uuid: UUID, query_template: str = '', params: List[preprocessor.report_parameters.ReportParameter] = typing.List, connection: Optional[str] = None, connection_opts: Optional[dict] = None, federation_group: Union[str, uuid.UUID, ForwardRef(None)] = None)

A parameterized database report backed by a view of a Microsoft SQL database.

Ancestors

Static methods

def create(host: str, port: int, database: str, query_template: str, params: List[preprocessor.report_parameters.ReportParameter], name: str, desc: str, username: Optional[str] = None, password: Optional[str] = None, options: Optional[dict] = {}, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, validate_sql: Optional[bool] = True, post_processing: Optional[str] = None) -> DatabaseReport

Creates a parameterized database report using a connection to a Microsoft SQL database.

Args

host : str
The host name of the Microsoft SQL database or a secret name. Example: testsqlserver123.database.windows.net
port : int
The port number of the Microsoft SQL database.
database : str
The name of the Microsoft SQL database to connect to or a secret name. Example: "dev" or "{{secret_database_name}}".
query_template : str
The SQL query template defined using the Mustache templating language for generating the report.
params : List[ReportParameter]
List of ReportParameters defining the configurable report parameters.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
username : str, optional
Username to use in the database connec