Module tripleblind.asset

Assets are the primary and most valuable objects stored on the Router.

An asset represents either an Algorithm or Data. All assets have an owner who controls the access to their assets. Additionally, all assets have prices associated with them for utilization.

To assist in working with different assets, there is a hierarchy of helper classes:

Asset
    DatasetAsset
        DicomDataset
        ImageDataset
        NumPyDataset
        TabularDataset
            CSVDataset
            DatabaseDataset
                AzureDataLakeStorageDataset
                BigQueryDatabase
                DatabricksDatabase
                MongoDatabase
                MSSQLDatabase
                OracleDatabase
                RedshiftDatabase
                SnowflakeDatabase
                S3Dataset

    AlgorithmAsset
        NeuralNetwork
        PMMLRegression
        ReportAsset
            DatabaseReport
                BigQueryReport
                DatabricksReport
                RedshiftReport
                MSSQLReport
                OracleReport
                SnowflakeReport
        XGBoostModel

    ModelAsset
        Regression
            RegressionModel
            PSIVerticalRegressionModel

    ModelTrainerAsset

Most of these have a create() method to assist you in building assets easily and properly. See the specific class for more details.

Global variables

var CNN

The built-in CNN Operation

var DISTRIBUTED_INFERENCE

The built-in Distributed Inference Operation

var FEDERATED_LEARNING_PROTOCOL

The built-in Federated Learning Operation

var NAMESPACE_DEFAULT_USER

Special UUID which represents the current user

var PSI_VERTICAL_PARTITION_TRAINING

The built-in PSI Vertical Partition Training Operation

var PYTORCH_INFERENCE

The built-in PyTorch Inference Operation

var RANDOM_FOREST_INFERENCE

The built-in Random Forest Inference Operation

var REGRESSION_INFERENCE

The built-in Regression Inference Operation

var ROI_DETECTOR_INFERENCE

The built-in Region of Interest Detector Inference Operation

var SKLEARN_INFERENCE

The built-in Scikit-learn Inference Operation

var SPLIT_LEARNING_TRAINING

The built-in Split Learning Training Operation

var VERTICAL_PARTITION_SPLIT_TRAINING

The built-in Vertical Partition Split Training Operation

var XGBOOST_INFERENCE_FED

The built-in XGBoost Inference with FED security Operation

var XGBOOST_INFERENCE_SMPC

The built-in XGBoost Inference with SMPC security Operation

Functions

def create_frame(data, opcode, fin=1)

Monkey patch websocket-client library to skip masking.

https://github.com/websocket-client/websocket-client/blob/df275d351f9887fba2774e2e1aa79ff1e5a24bd1/websocket/_abnf.py#L194

Masking is a protocol level security feature that is redundant with TLS connections. This is monkey patched because gevents-websocket server side websocket library is incredibly slow at unmasking causing something like a 4x slow down. See: https://stackoverflow.com/a/32290330/2395133

Classes

class AlgorithmAsset (uuid: UUID)

An abstract Asset used to perform a calculation.

This could be a trained neural network, a prebuilt protocol, or a Python or SQL script to be executed against a DatasetAsset.

Ancestors

Subclasses

Inherited members

class Asset (uuid: UUID)

Points to a dataset or an algorithm indexed on the TripleBlind Router.

Subclasses

Class variables

var uuid : uuid.UUID

Identifier for this asset.

Static methods

def find(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, dataset: Optional[bool] = None, algorithm: Optional[bool] = None, session: Optional[Session] = None, exact_match: Optional[bool] = True) -> Asset

Search the Router index for an asset matching the given search

Args

search : str or re.Pattern, optional
The search pattern applied to asset names. A simple string will be used as a substring search if exact_match is False, otherwise it will only return exact matches.
namespace : UUID, optional
The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned : bool, optional
Only return owned assets (either personally or by the current user's team)
owned_by : int, optional
Only return owned assets owned by the given team ID
dataset : bool, optional
Set to True to search for datasets. Default is to search for both data and algorithms.
algorithm : bool, optional
Set to True to search for algorithms. Default is to search for both data and algorithms.
session : Session, optional
A connection session. If not specified, the default session is used.
exact_match : bool, optional
When the 'search' is a string, setting this to True will perform an exact match. Ignored for regex patterns, defaults to True.

Raises

TripleblindAssetError
Thrown when multiple assets are found which match the search.

Returns

Asset
A single asset, or None if no match found
def find_all(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, dataset: Optional[bool] = None, algorithm: Optional[bool] = None, max: Optional[int] = 500, session: Optional[Session] = None) -> List[Asset]

Search the Router index for assets matching the given search

Args

search : Optional[Union[str, re.Pattern]]
Either an asset ID or a search pattern applied to asset names and descriptions. A simple string will match substrings, or a regular expression can be passed for complex searches.
namespace : UUID, optional
The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned : bool, optional
Only return owned assets (either personally or by the current user's team)
owned_by : Optional[int], optional
Only return owned assets owned by the given team ID
dataset : Optional[bool], optional
Set to True to search for datasets. Default is to search for both data and algorithms.
algorithm : Optional[bool], optional
Set to True to search for algorithms. Default is to search for both data and algorithms.
max : Optional[int], optional
Maximum number of results to return. If not specified, defaults to 500.
session : Session, optional
A connection session. If not specified, the default session is used.

Returns

List[Asset]
A list of found assets, or None if no match found
def position(file_handle: Union[str, Path, Package, io.BufferedReader], name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None, metadata: dict = {}, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True, asset_type: Optional[str] = None)

Place data on your Access Point for use by yourself or others.

If dataset is a .csv dataset, it is assumed the first row is a header containing column names. Column validation will be performed. Use CSVDataset.position() for more options, including auto-renaming.

Args

file_handle : str, Path, Package or io.BufferedReader
File handle or path to the data to place on the API user's associated Access Point.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
is_dataset : bool, optional
Is this a dataset? (False == algorithm)
custom_protocol
Internal use
metadata : dict
Custom metadata to include in the asset
unmask_columns : [str], optional
When is_dataset=True, list of column names that will be initially unmasked. Default is to mask all columns.
validate_sql : bool, optional
If True (the default) the query syntax is checked for common SQL syntax errors.
asset_type : str, optional
Type of asset to be positioned. Default is 'dataset'. Other options: 'algorithm' or 'report'.

Raises

SystemExit
SQL syntax errors were found in query.

Returns

Asset
New asset on the Router, or None on failure
def upload(file_handle: io.BufferedReader, name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None)

Deprecated, use Asset.position() instead.

Instance variables

var accesspoint_filename : str

str: Disk filename on the Access Point which holds this asset.

var activate_date : dt.datetime

str: Date when this asset became active

var deactivate_date : dt.datetime

str: Date when this asset was archived (deleted)

var desc : str

str: Longer description of the asset

var filename : str

str: Filename associated with the asset

var hash : str

str: Hash that was registered at the router when this asset was positioned.

var is_active : bool

bool: Is this an active asset?

var is_discoverable : bool

bool: True if anyone can discover the asset on the Router

var is_valid : bool

Verify that this Asset object points to a valid dataset.

Returns

bool
True if this is a valid Asset on the Router
var k_grouping : Optional[int]

int: The minimum count of records with like values required for reporting.

var metadata : dict

str: Asset metadata (e.g. 'cols' on some datasets)

var name : str

str: Simple short name of the asset

var namespace : UUID

str: Namespace which contains the asset. Each user has a namespace, so generally assets exist under the namespace of the creator. Think of this like a personal folder within your organization to hold your assets.

var team : str

str: Name of the team that owns the asset

var team_id : str

str: ID of the team that owns the asset

Methods

def add_agreement(self, with_team: int, operation: "Union[Operation, UUID, 'Asset']" = None, expiration_date: str = None, num_uses: int = None, algorithm_security: str = 'smpc', session: Optional[Session] = None)

Establish Agreement allowing another team to use this Asset

Args

with_team : int
ID of the partner team, or "ANY" to make an Asset available to everyone without explicit permission.
operation : Operation, UUID or Asset
The action being enabled by this Agreement against this Asset. If an Asset is provided, it will be treated as a algorithmic operation applied to this Asset (e.g. allowing a trained model to run against a dataset)
expiration_date : str
ISO formatted date on which the Agreement becomes invalid, or None for no expiration.
num_uses : int
The number of jobs that can be created under the Agreement before it becomes invalid, or None no limit.
algorithm_security : str
"smpc" or "fed". Specifies the level of algorithm security required to run the operation.
session : Session, optional
A connection session. If not specified the default session is used.

Returns

Agreement
New agreement, None if unable to create
def archive(self, session: Optional[Session] = None, remote_delete: bool = False) -> bool

Remove asset from the Router index (and optionally the Access Point)

Args

session : Session, optional
A connection session. If not specified the default session is used.
remote_delete : bool, optional
Delete the underlying file from the Access Point? Default is to leave the file on the Access Point's attached storage.

Raises

TripleblindAssetError
Thrown when unable to archive the asset.
TripleblindAPIError
Thrown when unable to talk to Router

Returns

bool
Was asset successfully deleted.
def delete(self, session: Optional[Session] = None) -> bool

Deprecated, use Asset.archive() instead.

def download(self, save_as: Optional[str] = None, overwrite: Optional[bool] = False, show_progress: Optional[bool] = False, session: Optional[Session] = None) -> str

Deprecated. Use Asset.retrieve() instead

def list_agreements(self, session: Optional[Session] = None)

List agreements governing your Asset.

Args

session : Session, optional
A connection session. If not specified the default session is used.

Returns

list
List of Agreement objects connected to the Asset.
def publish_to_team(self, to_team: int, session: Optional[Session] = None, algorithm_security: str = 'smpc')

Expose existence of Asset to a specified team; still requiring explicit usage approval

Args

with_team : int
ID of the partner team.
session : Session, optional
A connection session. If not specified the default session is used.
algorithm_security : str, optional
Acceptable security level of algorithms to run with ("fed" or "smpc").

Returns

Agreement
The new Agreement object
def retrieve(self, save_as: Optional[str] = None, overwrite: Optional[bool] = False, show_progress: Optional[bool] = False, session: Optional[Session] = None) -> str

Fetch an asset package and save locally.

NOTE: Asset packages are .zip file format. These files can be easily accessed via tb.Package.load(filename) or the ZipFile default Python library.

Args

save_as : str, optional
Filename to save under, None to use default filename in the current directory.
overwrite : bool, optional
Should this overwrite an already existing file?
show_progress : bool, optional
Display progress bar?
session : Session, optional
A connection session. If not specified, the default session is used.

Raises

TripleblindAPIError
Authentication failure
IOError
File already exists and no 'overwrite' flag
TripleblindAssetError
Failed to retrieve

Returns

str
Absolute path to the saved file
class AzureBlobStorageDataset (uuid: UUID)

A table stored in CSV format file inside an Azure Blob Storage Account.

Ancestors

Static methods

def create(storage_account_name: str, storage_key: str, file_system: str, key: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> AzureDataLakeStorageDataset

Create a dataset connection to a CSV file in Azure Blob Storage.

This dataset will be 'live', any updates that are made to the file resting in Azure Blob Store will be available to this dataset the next time it is used.

Args

storage_account_name : str
The Azure storage account to reference.
storage_key : str
An access token for reading from the storage account.
file_system : str
The file system defined in the Azure control panel for the storage account.
key : str
The key associated with the data in Azure Blob Storage.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.

Returns

AzureBlobStorageDataset
New asset on the Router, or None on failure

Inherited members

class AzureDataLakeStorageDataset (uuid: UUID)

A table stored in CSV format at a given file path inside an Azure Data Lake Instance.

Ancestors

Static methods

def create(storage_account_name: str, storage_key: str, file_system: str, path: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> AzureDataLakeStorageDataset

Create a dataset connected to a CSV file within Azure Data Lake Storage.

This dataset will be 'live', any updates that are made to the file resting in the Data Lake will be available to this dataset the next time it is used.

Args

storage_account_name : str
The Azure storage account to reference.
storage_key : str
An access token for reading from the storage account.
file_system : str
The file system defined in the Azure control panel for the storage account.
path : str
The full path to the file within Azure Data Lake Storage.
name : str
Name of the new asset.
desc : str
Description of the new asset (can include markdown)
is_discoverable : bool, optional
Should this asset be listed in the Router index to be found and used by others?
k_grouping : int, optional
The minimum count of records with like values required for reporting.
allow_overwrite : bool, optional
If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session : Session, optional
A connection session. If not specified, the default session is used.
unmask_columns : [str], optional
List of column names that will be initially unmasked. Default is to mask all columns.

Returns

AzureDataLakeStorageDataset
New asset on the Router, or None on failure

Inherited members