Module tripleblind.asset
Assets are the primary and most valuable objects stored on the Router.
An asset represents either an Algorithm or Data. All assets have an owner who controls the access to their assets. Additionally, all assets have prices associated with them for utilization.
To assist in working with different assets, there is a hierarchy of helper classes:
Asset
DatasetAsset
DicomDataset
ImageDataset
NumPyDataset
TabularDataset
CSVDataset
DatabaseDataset
AzureDataLakeStorageDataset
BigQueryDatabase
DatabricksDatabase
MongoDatabase
MSSQLDatabase
OracleDatabase
RedshiftDatabase
SnowflakeDatabase
S3Dataset
AlgorithmAsset
NeuralNetwork
PMMLRegression
ReportAsset
DatabaseReport
BigQueryReport
DatabricksReport
RedshiftReport
MSSQLReport
OracleReport
SnowflakeReport
XGBoostModel
ModelAsset
Regression
RegressionModel
PSIVerticalRegressionModel
ModelTrainerAsset
Most of these have a create() method to assist you in building assets easily and properly. See the specific class for more details.
Global variables
var CNN
-
The built-in CNN Operation
var DISTRIBUTED_INFERENCE
-
The built-in Distributed Inference Operation
var FEDERATED_LEARNING_PROTOCOL
-
The built-in Federated Learning Operation
var NAMESPACE_DEFAULT_USER
-
Special UUID which represents the current user
var PSI_VERTICAL_PARTITION_TRAINING
-
The built-in PSI Vertical Partition Training Operation
var PYTORCH_INFERENCE
-
The built-in PyTorch Inference Operation
var RANDOM_FOREST_INFERENCE
-
The built-in Random Forest Inference Operation
var REGRESSION_INFERENCE
-
The built-in Regression Inference Operation
var ROI_DETECTOR_INFERENCE
-
The built-in Region of Interest Detector Inference Operation
var SKLEARN_INFERENCE
-
The built-in Scikit-learn Inference Operation
var SPLIT_LEARNING_TRAINING
-
The built-in Split Learning Training Operation
var VERTICAL_PARTITION_SPLIT_TRAINING
-
The built-in Vertical Partition Split Training Operation
var XGBOOST_INFERENCE_FED
-
The built-in XGBoost Inference with FED security Operation
var XGBOOST_INFERENCE_SMPC
-
The built-in XGBoost Inference with SMPC security Operation
Functions
def create_frame(data, opcode, fin=1)
-
Monkey patch websocket-client library to skip masking.
Masking is a protocol level security feature that is redundant with TLS connections. This is monkey patched because gevents-websocket server side websocket library is incredibly slow at unmasking causing something like a 4x slow down. See: https://stackoverflow.com/a/32290330/2395133
Classes
class AlgorithmAsset (uuid: UUID)
-
An abstract Asset used to perform a calculation.
This could be a trained neural network, a prebuilt protocol, or a Python or SQL script to be executed against a DatasetAsset.
Ancestors
Subclasses
Inherited members
class Asset (uuid: UUID)
-
Points to a dataset or an algorithm indexed on the TripleBlind Router.
Subclasses
Class variables
var uuid : uuid.UUID
-
Identifier for this asset.
Static methods
def find(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, dataset: Optional[bool] = None, algorithm: Optional[bool] = None, session: Optional[Session] = None, exact_match: Optional[bool] = True) -> Asset
-
Search the Router index for an asset matching the given search
Args
search
:str
orre.Pattern
, optional- The search pattern applied to asset names. A simple string will be used as a substring search if exact_match is False, otherwise it will only return exact matches.
namespace
:UUID
, optional- The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned
:bool
, optional- Only return owned assets (either personally or by the current user's team)
owned_by
:int
, optional- Only return owned assets owned by the given team ID
dataset
:bool
, optional- Set to True to search for datasets. Default is to search for both data and algorithms.
algorithm
:bool
, optional- Set to True to search for algorithms. Default is to search for both data and algorithms.
session
:Session
, optional- A connection session. If not specified, the default session is used.
exact_match
:bool
, optional- When the 'search' is a string, setting this to True will perform an exact match. Ignored for regex patterns, defaults to True.
Raises
TripleblindAssetError
- Thrown when multiple assets are found which match the search.
Returns
Asset
- A single asset, or None if no match found
def find_all(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, dataset: Optional[bool] = None, algorithm: Optional[bool] = None, max: Optional[int] = 500, session: Optional[Session] = None) -> List[Asset]
-
Search the Router index for assets matching the given search
Args
search
:Optional[Union[str, re.Pattern]]
- Either an asset ID or a search pattern applied to asset names and descriptions. A simple string will match substrings, or a regular expression can be passed for complex searches.
namespace
:UUID
, optional- The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned
:bool
, optional- Only return owned assets (either personally or by the current user's team)
owned_by
:Optional[int]
, optional- Only return owned assets owned by the given team ID
dataset
:Optional[bool]
, optional- Set to True to search for datasets. Default is to search for both data and algorithms.
algorithm
:Optional[bool]
, optional- Set to True to search for algorithms. Default is to search for both data and algorithms.
max
:Optional[int]
, optional- Maximum number of results to return. If not specified, defaults to 500.
session
:Session
, optional- A connection session. If not specified, the default session is used.
Returns
List[Asset]
- A list of found assets, or None if no match found
def position(file_handle: Union[str, Path, Package, io.BufferedReader], name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None, metadata: dict = {}, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True, asset_type: Optional[str] = None)
-
Place data on your Access Point for use by yourself or others.
If dataset is a .csv dataset, it is assumed the first row is a header containing column names. Column validation will be performed. Use CSVDataset.position() for more options, including auto-renaming.
Args
file_handle
:str, Path, Package
orio.BufferedReader
- File handle or path to the data to place on the API user's associated Access Point.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
is_dataset
:bool
, optional- Is this a dataset? (False == algorithm)
custom_protocol
- Internal use
metadata
:dict
- Custom metadata to include in the asset
unmask_columns
:[str]
, optional- When is_dataset=True, list of column names that will be initially unmasked. Default is to mask all columns.
validate_sql
:bool
, optional- If True (the default) the query syntax is checked for common SQL syntax errors.
asset_type
:str
, optional- Type of asset to be positioned. Default is 'dataset'. Other options: 'algorithm' or 'report'.
Raises
SystemExit
- SQL syntax errors were found in query.
Returns
Asset
- New asset on the Router, or None on failure
def upload(file_handle: io.BufferedReader, name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None)
-
Deprecated, use
Asset.position()
instead.
Instance variables
var accesspoint_filename : str
-
str: Disk filename on the Access Point which holds this asset.
var activate_date : dt.datetime
-
str: Date when this asset became active
var deactivate_date : dt.datetime
-
str: Date when this asset was archived (deleted)
var desc : str
-
str: Longer description of the asset
var filename : str
-
str: Filename associated with the asset
var hash : str
-
str: Hash that was registered at the router when this asset was positioned.
var is_active : bool
-
bool: Is this an active asset?
var is_discoverable : bool
-
bool: True if anyone can discover the asset on the Router
var is_valid : bool
-
Verify that this Asset object points to a valid dataset.
Returns
bool
- True if this is a valid Asset on the Router
var k_grouping : Optional[int]
-
int: The minimum count of records with like values required for reporting.
var metadata : dict
-
str: Asset metadata (e.g. 'cols' on some datasets)
var name : str
-
str: Simple short name of the asset
var namespace : UUID
-
str: Namespace which contains the asset. Each user has a namespace, so generally assets exist under the namespace of the creator. Think of this like a personal folder within your organization to hold your assets.
var team : str
-
str: Name of the team that owns the asset
var team_id : str
-
str: ID of the team that owns the asset
Methods
def add_agreement(self, with_team: int, operation: "Union[Operation, UUID, 'Asset']" = None, expiration_date: str = None, num_uses: int = None, algorithm_security: str = 'smpc', session: Optional[Session] = None)
-
Establish Agreement allowing another team to use this Asset
Args
with_team
:int
- ID of the partner team, or "ANY" to make an Asset available to everyone without explicit permission.
operation
:Operation, UUID
orAsset
- The action being enabled by this Agreement against this Asset. If an Asset is provided, it will be treated as a algorithmic operation applied to this Asset (e.g. allowing a trained model to run against a dataset)
expiration_date
:str
- ISO formatted date on which the Agreement becomes invalid, or None for no expiration.
num_uses
:int
- The number of jobs that can be created under the Agreement before it becomes invalid, or None no limit.
algorithm_security
:str
- "smpc" or "fed". Specifies the level of algorithm security required to run the operation.
session
:Session
, optional- A connection session. If not specified the default session is used.
Returns
Agreement
- New agreement, None if unable to create
def archive(self, session: Optional[Session] = None, remote_delete: bool = False) -> bool
-
Remove asset from the Router index (and optionally the Access Point)
Args
session
:Session
, optional- A connection session. If not specified the default session is used.
remote_delete
:bool
, optional- Delete the underlying file from the Access Point? Default is to leave the file on the Access Point's attached storage.
Raises
TripleblindAssetError
- Thrown when unable to archive the asset.
TripleblindAPIError
- Thrown when unable to talk to Router
Returns
bool
- Was asset successfully deleted.
def delete(self, session: Optional[Session] = None) -> bool
-
Deprecated, use
Asset.archive()
instead. def download(self, save_as: Optional[str] = None, overwrite: Optional[bool] = False, show_progress: Optional[bool] = False, session: Optional[Session] = None) -> str
-
Deprecated. Use
Asset.retrieve()
instead def list_agreements(self, session: Optional[Session] = None)
-
List agreements governing your Asset.
Args
session
:Session
, optional- A connection session. If not specified the default session is used.
Returns
list
- List of Agreement objects connected to the Asset.
def publish_to_team(self, to_team: int, session: Optional[Session] = None, algorithm_security: str = 'smpc')
-
Expose existence of Asset to a specified team; still requiring explicit usage approval
Args
with_team
:int
- ID of the partner team.
session
:Session
, optional- A connection session. If not specified the default session is used.
algorithm_security
:str
, optional- Acceptable security level of algorithms to run with ("fed" or "smpc").
Returns
Agreement
- The new Agreement object
def retrieve(self, save_as: Optional[str] = None, overwrite: Optional[bool] = False, show_progress: Optional[bool] = False, session: Optional[Session] = None) -> str
-
Fetch an asset package and save locally.
NOTE: Asset packages are .zip file format. These files can be easily accessed via tb.Package.load(filename) or the ZipFile default Python library.
Args
save_as
:str
, optional- Filename to save under, None to use default filename in the current directory.
overwrite
:bool
, optional- Should this overwrite an already existing file?
show_progress
:bool
, optional- Display progress bar?
session
:Session
, optional- A connection session. If not specified, the default session is used.
Raises
TripleblindAPIError
- Authentication failure
IOError
- File already exists and no 'overwrite' flag
TripleblindAssetError
- Failed to retrieve
Returns
str
- Absolute path to the saved file
class AzureBlobStorageDataset (uuid: UUID)
-
A table stored in CSV format file inside an Azure Blob Storage Account.
Ancestors
Static methods
def create(storage_account_name: str, storage_key: str, file_system: str, key: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> AzureDataLakeStorageDataset
-
Create a dataset connection to a CSV file in Azure Blob Storage.
This dataset will be 'live', any updates that are made to the file resting in Azure Blob Store will be available to this dataset the next time it is used.
Args
storage_account_name
:str
- The Azure storage account to reference.
storage_key
:str
- An access token for reading from the storage account.
file_system
:str
- The file system defined in the Azure control panel for the storage account.
key
:str
- The key associated with the data in Azure Blob Storage.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
Returns
AzureBlobStorageDataset
- New asset on the Router, or None on failure
Inherited members
class AzureDataLakeStorageDataset (uuid: UUID)
-
A table stored in CSV format at a given file path inside an Azure Data Lake Instance.
Ancestors
Static methods
def create(storage_account_name: str, storage_key: str, file_system: str, path: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> AzureDataLakeStorageDataset
-
Create a dataset connected to a CSV file within Azure Data Lake Storage.
This dataset will be 'live', any updates that are made to the file resting in the Data Lake will be available to this dataset the next time it is used.
Args
storage_account_name
:str
- The Azure storage account to reference.
storage_key
:str
- An access token for reading from the storage account.
file_system
:str
- The file system defined in the Azure control panel for the storage account.
path
:str
- The full path to the file within Azure Data Lake Storage.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
Returns
AzureDataLakeStorageDataset
- New asset on the Router, or None on failure
Inherited members
class BigQueryDatabase (uuid: UUID)
-
A table asset backed by a view from a BigQuery database.
Ancestors
Static methods
def create(gcp_project: str, bigquery_dataset: str, credentials: Union[str, Path], query: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset
-
Create a connection to a BigQuery database
Args
gcp_project
:str
- The project name of your Google Cloud Project which will be used to cover any query access costs.
bigquery_dataset
:str
- The BigQuery dataset name.
credentials
:str
orPath
- The path of your keyfile.json. See the Google documentation for more details. These credentials will be stored securely on your Access Point; neither TripleBlind nor anyone using your dataset will have access to it.
query
:str
- The SQL query to generate a view on the database.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql
:bool
, optional- If True (the default) the query syntax is checked for common SQL syntax errors.
Raises
SystemExit
- SQL syntax errors were found in query.
Returns
DatabaseDataset
- New asset on the Router, or None on failure
Inherited members
class CSVDataset (uuid: UUID)
-
An static table asset, typically a CSV text file.
Ancestors
Static methods
def create(datafile: Union[str, Path], name: str, desc: str, header: List[str] = None, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, auto_rename_columns: Optional[bool] = False) -> TabularDataset
-
Place a CSV (Comma-separated value) file on your Access Point.
Args
datafile
:str, Path
- Path to the data to place on the API user's associated Access Point.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
header
:List[str]
, optional- A list of names to use as column headers. If None, headers will be detected in first row. If none exist, generic headers will be created for the dataset.
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
Returns
TabularDataset
- New asset on the Router, or None on failure
def position(file_handle: Union[str, Path, Package, io.BufferedReader], name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None, metadata: dict = {}, unmask_columns: Optional[List[str]] = None, header: List[str] = None, has_header: Optional[bool] = True, check_col_names: Optional[bool] = True, auto_rename_columns: Optional[bool] = False)
-
Place tabular data on your Access Point for use by yourself or others.
Args
file_handle
:str, Path, Package
orio.BufferedReader
- File handle or path to the data to place on the API user's associated Access Point.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
is_dataset
:bool
, optional- Is this a dataset? (False == algorithm)
custom_protocol
- Internal use
metadata
:dict
- Custom metadata to include in the asset
unmask_columns
:[str]
, optional- When is_dataset=True, list of column names that will be initially unmasked. Default is to mask all columns.
header
:List[str]
, optional- A list of names to use as column headers. If None, headers will be detected in first row. If none exist, generic headers will be created for the dataset.
has_header
:bool, Optional
- Does the dataset have a header row containing column names?
check_col_names
:bool, Optional
- Should column name validation be performed at position time? Enabled by default.
auto_rename_columns
:bool, Optional
- Should invalid column names be altered to be made legal at position time? Disabled by default.
Returns
Asset
- New asset on the Router, or None on failure
Raises
Exception("Error creating asset: Invalid field names.")
def rename_columns(invalid_col_names: list, df: Optional[pd.Dataframe] = None)
-
Adjust column names in dataframe to pass required validations
Tested/corrected name characteristics: - Name cannot be longer than 64 chars - Name cannot start with a number - Name can only contain alphanumeric characters and underscores
Args
invalid_col_names
:list
- Names determined to be invalid
df
:pd.Dataframe
, optional- Dataframe to be corrected
Returns
If dataframe is provided, renames columns in place. If no dataframe is provided, returns list of new column names.
Inherited members
class DatabaseDataset (uuid: UUID)
-
A live table asset backed by a database.
Most implementations of this class utilize a connection string to define the database, user and credentials to access the data view represented by the asset. Connection strings can be templated using Mustache, allowing for secrets to be used in the connection string. For example, a connection string could be defined as: "mssql+pyodbc://{{secret_username}}:{{secret_password}}@myserver:3306/payroll" Or secrets could be included in the parameters: username="{{secret_username}}", password="{{secret_password}}" if using a create method which accepts those parameters. See "Using Named Secrets" under https://deveval.tripleblind.app/portal/docs/user-guide/asset-owner-operations for more details.
Certain implementations may have helper methods to create the connection string. In those cases, the fields which allow for secrets are documented in their create() method signature.
Ancestors
Subclasses
- AzureBlobStorageDataset
- AzureDataLakeStorageDataset
- BigQueryDatabase
- DatabricksDatabase
- MSSQLDatabase
- MongoDatabase
- OracleDatabase
- RedshiftDatabase
- SnowflakeDatabase
Static methods
def create(connection: str, query: str, name: str, desc: str, options: Optional[dict] = None, credentials_info: Optional[dict] = None, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset
-
Create a dataset connected to a traditional database.
Unlike other Datasets a DatabaseDataset is 'live'. Every computation will query the connected database using the given query.
Args
connection
:str
- A supported connection str, such as "snowflake://youruser:yourpassword@account/yourdatabase"
query
:str
- The SQL query to generate a view on the database.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown).
options
:dict
, optional- Connection options to provide to the database connection.
credentials_info
:dict
, optional- Optional dictionary containing the credentials to use for the database connection. This is only necessary for certain databases where the connection string does not contain the credentials.
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql
:bool
, optional- If True (the default) the query syntax is checked for common SQL syntax errors.
Raises
SystemExit
- SQL syntax errors were found in query.
Returns
DatabaseDataset
- New asset on the Router, or None on failure
Methods
def unmask_columns(self, col_names: Union[str, List[str]], session: Optional[Session] = None) -> bool
-
Unmask columns identified by the supplied list of col_names.
Args
col_names
:List
orstr
- Column name or list of names to unmask.
session
:Session
, optional- A connection session. If not specified, the default session is used.
Returns
True if the operation succeeded, otherwise false.
Inherited members
class DatabricksDatabase (uuid: UUID)
-
A table asset backed by a view from a Databricks database.
Ancestors
Static methods
def create(access_token: str, server_hostname: str, http_path: str, catalog: str, schema: str, query: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset
-
Create a connection to a Databricks database
You can find the connection details for your Databricks cluster in the Databricks UI. Under the Compute in the sidebar, choose your target cluster. Under the Configuration tab for that cluster expand Advanced Options and choose the JDBC/ODBC tab, where you will find the needed values. See the Databricks documentation for more details: https://docs.databricks.com/en/integrations/compute-details.html
Args
access_token
:str
- A Databricks access token or a secret name. For example, "dapi1234567890abcdef"
server_hostname
:str
- The Databricks server name or a secret name. For example, "community.cloud.databricks.com"
http_path
:str
- The Databricks server name or a secret name. For example, "/sql/protocolv1/o/1234567890123456/0123-456789-abc123"
catalog
:str
- The Databricks catalog name or a secret name. For example, "default"
schema
:str
- The Databricks schema name or a secret name. For example, "default"
query
:str
- The SQL query to generate a view on the database.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql
:bool
, optional- If True (the default) the query syntax is checked for common SQL syntax errors.
Raises
SystemExit
- SQL syntax errors were found in query.
Returns
DatabaseDataset
- New asset on the Router, or None on failure
Inherited members
class DatasetAsset (uuid: UUID)
-
An abstract Asset containing a set of data of some form.
Ancestors
Subclasses
Inherited members
class DicomDataset (uuid: UUID)
-
An Asset containing DICOM imaging files.
DICOM is a standard for serializing medical imaging data, such as X-ray, CT, MRI, ultrasound, etc.
Ancestors
Inherited members
class ImageDataset (uuid: UUID)
-
An Asset containing a set of images and optionally labels.
Ancestors
Inherited members
class MSSQLDatabase (uuid: UUID)
-
A table asset backed by a view from a Microsoft SQL database.
Ancestors
Static methods
def create(host: str, port: int, database: str, query: str, name: str, desc: str, username: Optional[str] = None, password: Optional[str] = None, options: Optional[dict] = {}, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset
-
Create a connection to a Microsoft SQL Database.
Args
host
:str
- The host name of the Microsoft SQL database or a secret name. Example: testsqlserver123.database.windows.net
port
:int
- The port number of the Microsoft SQL database.
database
:str
- The name of the Microsoft SQL database to connect to or a secret name. Example: "dev" or "{{secret_database_name}}".
query
:str
- The SQL query to generate a view on the database.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
username
:str
, optional- Username to use in the database connection, like "myuser" or a secret name like "{{secret_username}}".
password
:str
, optional- Password to use in the database connection or a secret name.
options
:dict
, optional- Dictionary of connection options for connecting to the Microsoft SQL database. For supported connection options see https://learn.microsoft.com/en-us/sql/connect/odbc/dsn-connection-string-attribute?view=sql-server-ver16#supported-dsnconnection-string-keywords-and-connection-attributes NOTE: The driver parameter is not required and the connection will use the access point's version of the driver. Example: options={ "authentication": "ActiveDirectoryMsi" }
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql
:bool
, optional- If True (the default) the query syntax is checked for common SQL syntax errors.
Raises
SystemExit
- SQL syntax errors were found in query.
Returns
DatabaseDataset
- New asset on the Router, or None on failure
Inherited members
class MongoDatabase (uuid: UUID)
-
A table asset backed by a view from a Mongo database.
Ancestors
Static methods
def create(connection_str: str, query: dict, database: str, collection: str, name: str, desc: str, projection: dict = None, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, limit: Optional[int] = None, sort: Optional[List[Tuple]] = None)
-
Create a connection to a MongoDB database
Args
connection_str
:str
- A Mongo DB connection URI, such as "mongodb://user:[email protected]:27017/". Secrets can be included in the connection string using Mustache templating of named secrets, e.g. "mongodb://{{MY_USER}}:{{MY_PWD}}@mongo.host:{{MY_PORT}}/" Any portion of the connection string can be templated, including the host, username, password, database, etc.
query
:dict
- A MongoDB JSON query to generate a view on the database.
database
:str
- Name of the MongoDB database.
collection
:str
- Collection inside of MongoDB database to query
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
projection
:dict
, optional- MongoDB projection to manipulate output structure. Default is None.
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
limit
:int
, optional- Query results will be at most limit documents
sort
:[Tuple]
, optional- List of tuples representing how the query results should be sorted. For more details, see https://www.mongodb.com/docs/manual/reference/method/cursor.sort/#std-label-sort-asc-desc.
Returns
Asset
- New asset on the Router, or None on failure
Inherited members
class NeuralNetwork (uuid: UUID)
-
A neural network that takes one or more Datasets as input
Ancestors
Static methods
def create(model: Union[str, Path], name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None) -> Asset
-
Place a pretrained Neural Network file on your Access Point.
Args
model
:str, Path
- Path to the neural network. This accepts a Keras .h5 model file.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
Returns
Asset
- New asset on the Router, or None on failure
Inherited members
class NumPyDataset (uuid: UUID)
-
A generic n-dimensional array. Used to represent arbitrary data.
Ancestors
Static methods
def create()
Inherited members
class OracleDatabase (uuid: UUID)
-
A table asset backed by a view from an Oracle database.
Ancestors
Static methods
def create(host: str, port: int, database: str, query: str, name: str, desc: str, username: Optional[str] = None, password: Optional[str] = None, options: Optional[dict] = {}, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset
-
Create a connection to an Oracle Database.
Args
host
:str
- The host name of the Oracle database or a secret name. Example: testoracle123.database.net
port
:int
- The port number of the Oracle database. The port for most Oracle databases is 1521.
database
:str
- The name of the Oracle database to connect to or a secret name. Example: "dev" or "{{secret_database_name}}".
query
:str
- The SQL query to generate a view on the database.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
username
:str
, optional- Username to use in the database connection, like "myuser" or a secret name like "{{secret_username}}".
password
:str
, optional- Password to use in the database connection or a secret name.
options
:dict
, optional- Dictionary of connection options for connecting to the Oracle database. For supported connection options see https://www.oracle.com/database/technologies/appdev/python/quickstartpython.html#connect-python-cx_oracle-connecting
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql
:bool
, optional- If True (the default) the query syntax is checked for common SQL syntax errors.
Raises
SystemExit
- SQL syntax errors were found in query.
Returns
DatabaseDataset
- New asset on the Router, or None on failure
Inherited members
class PMMLRegression (uuid: UUID)
-
A Predictive Model Markup Language (PMML) model that takes one or more Datasets as input
Ancestors
Static methods
def create(model: Union[str, Path], name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None) -> Asset
-
Place a PMML Regression model file on your Access Point.
Args
model
:str, Path
- Path to the PMML model file.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
Returns
Asset
- New asset on the Router, or None on failure
Inherited members
class PMMLTree (uuid: UUID)
-
Creates asset for PMML Tree models.
Ancestors
Static methods
def create(model: Union[str, Path], name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None) -> Asset
-
Place a PMML Tree model file on your Access Point.
Args
model
:str, Path
- Path to the PMML model file.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
Returns
Asset
- New asset on the Router, or None on failure
Inherited members
class RedshiftDatabase (uuid: UUID)
-
A table asset backed by a view from a AWS Redshift database.
Ancestors
Static methods
def create(host: str, port: int, database: str, query: str, name: str, desc: str, username: Optional[str] = None, password: Optional[str] = None, options: Optional[dict] = {}, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset
-
Create a connection to a Redshift database
Args
host
:str
- The host name of the Redshift database or a secret name. Example: default.528.us-east-2.redshift-serverless.amazonaws.com
port
:int
- The port number of the Redshift database.
database
:str
- The name of the Redshift database to connect to or or a secret name. Example: "dev" or "{{secret_database_name}}".
query
:str
- The SQL query to generate a view on the database.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
username
:str
, optional- Username to use in the database connection, like "myuser" or a secret name like "{{secret_username}}".
password
:str
, optional- Password to use in the database connection or a secret name.
options
:dict
, optional- Dictionary of connection options for connecting to the Redshift database. Supported options are described at https://docs.aws.amazon.com/redshift/latest/mgmt/python-configuration-options.html. Example using IAM keys: options={ "iam": True, "access_key_id": "AKFCXNRSVRCFGMRQCAQR", "secret_access_key": "bEGzX7QnOb7eK9CRt4CV97n4e/bKOtQUFd9/pgIc" }
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql
:bool
, optional- If True (the default) the query syntax is checked for common SQL syntax errors.
Raises
SystemExit
- SQL syntax errors were found in query.
Returns
DatabaseDataset
- New asset on the Router, or None on failure
Inherited members
class S3Dataset (uuid: UUID)
-
A table asset stored in an Amazon Web Services S3 bucket.
Ancestors
Static methods
def create(bucket_name: str, region: str, object_name: str, aws_access_key_id: str, aws_secret_access_key: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> TabularDataset
-
Creates an asset from an AWS S3 bucket.
Args
bucket_name
:str
- The AWS bucket name to contain the file
region
:str
- The AWS region containing this bucket (eg. "us-east-1")
object_name
:str
- The name of the file in the S3 bucket
aws_access_key_id
:str
- This accounts AWS Access Key ID
aws_secret_access_key
:str
- This accounts AWS Secret Access Key
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
Returns
TabularDataset
- New asset on the Router, or None on failure
Inherited members
class SnowflakeDatabase (uuid: UUID)
-
A table asset backed by a view from a Snowflake database.
Ancestors
Static methods
def create(snowflake_username: str, snowflake_password: str, snowflake_account: str, snowflake_warehouse: str, snowflake_database: str, snowflake_schema: str, role: str, query: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True) -> DatabaseDataset
-
Create a connection to a Snowflake database
Args
snowflake_username
:str
- Your Snowflake username, like "myuser" or a secret name like "{{secret_username}}".
snowflake_password
:str
- Your Snowflake password or a secret name like "{{secret_password}}".
snowflake_account
:str
- You Snowflake account or a secret name. This is the start of the URL when you visit your console. For example, if the URL is https://ab12345.us-central1.gcp.snowflakecomputing.com/ then your snowflake_account is "ab12345.us-central1.gcp".
snowflake_warehouse
:str
- The name of the Snowflake warehouse you are connecting to for the query or a secret name.
snowflake_database
:str
- The name of the Snowflake database you are connecting to for the query or a secret name.
snowflake_schema
:str
- The name of the Snowflake schema you are connecting to for the query or a secret name.
role
:str
- The role of the Snowflake user you are using to connect to the Snowflake database or a secret name.
query
:str
- The SQL query to generate a view on the database.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
validate_sql
:bool
, optional- If True (the default) the query syntax is checked for common SQL syntax errors.
Raises
SystemExit
- SQL syntax errors were found in query.
Returns
DatabaseDataset
- New asset on the Router, or None on failure
Inherited members
class TabularDataset (uuid: UUID)
-
An abstract for data stored in rows and columns, like a spreadsheet.
Each column is a field and each row is a single record with one of each column.
Ancestors
Subclasses
Inherited members
class XGBoostModel (uuid: UUID)
-
A Scikit-learn XGBoost model
Ancestors
Static methods
def train(training_data: Union[Asset, List[Asset]], datatype: str, target_var: str, variables: Union[str, List[str]], custom_preprocessor: "'TabularPreprocessor'" = None, silent: bool = False, is_regression: bool = False, job_name: Optional[str] = None, session: Optional[Session] = None) -> XGBoostModel
-
Create an XGBoost model trained on the provided dataset(s).
Args
training_data
:Union[Asset, List[Asset]]
- Table(s) to use as training data. Can be Assets or paths to local files.
datatype
:str
- The format of the data, using numpy dtypes.
E.g. "float32", "int64", etc.
Ignored if
preprocessor
is provided. target_var
:str
- The name of the column containing the training
target value.
Ignored if
preprocessor
is provided. variables
:str, List[str]
- A list of column names containing
variables to include in training, or "ALL" for every column.
Ignored if
preprocessor
is provided. custom_preprocessor
:TabularPreprocessor
, optional- A custom
preprocessor, overriding the standard built from
datatype
,target_var
andvariables
. silent
:bool
, optional- Suppress status messages during execution? Default is to show messages.
is_regression
:bool
, optional- Is this a regression? Otherwise a classification model is built.
job_name
:Optional[str]
, optional- The name associate with the job. Default name is "XGBoost Training".
session
:Optional[Session]
, optional- A connection session. If not specified, the default session is used.
Raises
TripleblindTrainingError
- XGBoost Model training failed
Returns
XGBoostModel
- The trained model, or None if training fails.
Methods
def predict(self, inference_data: Union[Asset, List[Asset]], use_smpc: bool, custom_preprocessor: "'TabularPreprocessor'" = None, silent: bool = False, job_name: Optional[str] = None, session: Optional[Session] = None)
-
Perform an inference against a trained XGBoost model.
Output is the most likely classification. See predict_proba() if you want to information on the likelihood of the classification.
Args
inference_data
:Union[Asset, List[Asset]]
- Table(s) to use as inference data. Can be Assets or paths to local files.
use_smpc
:bool
- Flag to indicate whether the user wants to use SMPC or FED. If True, SMPC is used. If False, FED is used.
custom_preprocessor
:TabularPreprocessor
, optional- A custom preprocessor.
silent
:bool
, optional- Suppress status messages during execution? Default is to show messages.
job_name
:Optional[str]
, optional- The name associated with the job. Default name is "XGBoost Inference".
session
:Optional[Session]
, optional- A connection session. If not specified, the default session is used.
Raises
TripleblindProcessError
- XGBoost Remote Inference failed
TripleblindProcessError
- Unable to create XGBoost Inference job
Returns
pd.Dataframe
- A pandas dataframe.
def predict_proba(self, inference_data: Union[Asset, List[Asset]], use_smpc: bool, custom_preprocessor: "'TabularPreprocessor'" = None, silent: bool = False, job_name: Optional[str] = None, session: Optional[Session] = None)
-
Perform a predict_proba inference against a trained XGBoost model.
Output of predict_proba() is the probability of each classification. For example a model that classifies into three categories might return: [ 0.2, 0.1, 0.7 ] Meaning the first two classes are 20% and 10% likely for the given input, and the last class is 70% likely.
See predict() if you simply want the most likely classification.
Args
inference_data
:Union[Asset, List[Asset]]
- Table(s) to use as inference data. Can be Assets or paths to local files.
use_smpc
:bool
- Flag to indicate whether the user wants to use SMPC or FED. If True, SMPC is used. If False, FED is used.
custom_preprocessor
:TabularPreprocessor
, optional- A custom preprocessor.
silent
:bool
, optional- Suppress status messages during execution? Default is to show messages.
job_name
:Optional[str]
, optional- The name associated with the job. Default name is "XGBoost Inference".
session
:Optional[Session]
, optional- A connection session. If not specified, the default session is used.
Raises
TripleblindProcessError
- XGBoost Remote Inference failed
TripleblindProcessError
- Unable to create XGBoost Inference job
Returns
pd.Dataframe
- A pandas dataframe.
Inherited members