Module tripleblind.asset
Assets are the primary and most valuable objects stored on the Router.
An asset represents either an Algorithm or Data. All assets have an owner who controls the access to their assets. Additionally, all assets have prices associated with them for utilization.
To assist in working with different assets, there is a hierarchy of helper classes:
Asset
DatasetAsset
DicomDataset
ImageDataset
NumPyDataset
TabularDataset
CSVDataset
DatabaseDataset
AzureDataLakeStorageDataset
BigQueryDatabase
DatabricksDatabase
MongoDatabase
MSSQLDatabase
OracleDatabase
RedshiftDatabase
SnowflakeDatabase
S3Dataset
AlgorithmAsset
NeuralNetwork
PMMLRegression
ReportAsset
DatabaseReport
BigQueryReport
DatabricksReport
RedshiftReport
MSSQLReport
OracleReport
SnowflakeReport
XGBoostModel
ModelAsset
Regression
RegressionModel
PSIVerticalRegressionModel
ModelTrainerAsset
Most of these have a create() method to assist you in building assets easily and properly. See the specific class for more details.
Global variables
var CNN
-
The built-in CNN Operation
var DISTRIBUTED_INFERENCE
-
The built-in Distributed Inference Operation
var FEDERATED_LEARNING_PROTOCOL
-
The built-in Federated Learning Operation
var NAMESPACE_DEFAULT_USER
-
Special UUID which represents the current user
var PSI_VERTICAL_PARTITION_TRAINING
-
The built-in PSI Vertical Partition Training Operation
var PYTORCH_INFERENCE
-
The built-in PyTorch Inference Operation
var RANDOM_FOREST_INFERENCE
-
The built-in Random Forest Inference Operation
var REGRESSION_INFERENCE
-
The built-in Regression Inference Operation
var ROI_DETECTOR_INFERENCE
-
The built-in Region of Interest Detector Inference Operation
var SKLEARN_INFERENCE
-
The built-in Scikit-learn Inference Operation
var SPLIT_LEARNING_TRAINING
-
The built-in Split Learning Training Operation
var VERTICAL_PARTITION_SPLIT_TRAINING
-
The built-in Vertical Partition Split Training Operation
var XGBOOST_INFERENCE_FED
-
The built-in XGBoost Inference with FED security Operation
var XGBOOST_INFERENCE_SMPC
-
The built-in XGBoost Inference with SMPC security Operation
Functions
def create_frame(data, opcode, fin=1)
-
Monkey patch websocket-client library to skip masking.
Masking is a protocol level security feature that is redundant with TLS connections. This is monkey patched because gevents-websocket server side websocket library is incredibly slow at unmasking causing something like a 4x slow down. See: https://stackoverflow.com/a/32290330/2395133
Classes
class AlgorithmAsset (uuid: UUID)
-
An abstract Asset used to perform a calculation.
This could be a trained neural network, a prebuilt protocol, or a Python or SQL script to be executed against a DatasetAsset.
Ancestors
Subclasses
Inherited members
class Asset (uuid: UUID)
-
Points to a dataset or an algorithm indexed on the TripleBlind Router.
Subclasses
Class variables
var uuid : uuid.UUID
-
Identifier for this asset.
Static methods
def find(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, dataset: Optional[bool] = None, algorithm: Optional[bool] = None, session: Optional[Session] = None, exact_match: Optional[bool] = True) -> Asset
-
Search the Router index for an asset matching the given search
Args
search
:str
orre.Pattern
, optional- The search pattern applied to asset names. A simple string will be used as a substring search if exact_match is False, otherwise it will only return exact matches.
namespace
:UUID
, optional- The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned
:bool
, optional- Only return owned assets (either personally or by the current user's team)
owned_by
:int
, optional- Only return owned assets owned by the given team ID
dataset
:bool
, optional- Set to True to search for datasets. Default is to search for both data and algorithms.
algorithm
:bool
, optional- Set to True to search for algorithms. Default is to search for both data and algorithms.
session
:Session
, optional- A connection session. If not specified, the default session is used.
exact_match
:bool
, optional- When the 'search' is a string, setting this to True will perform an exact match. Ignored for regex patterns, defaults to True.
Raises
TripleblindAssetError
- Thrown when multiple assets are found which match the search.
Returns
Asset
- A single asset, or None if no match found
def find_all(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, dataset: Optional[bool] = None, algorithm: Optional[bool] = None, max: Optional[int] = 500, session: Optional[Session] = None) -> List[Asset]
-
Search the Router index for assets matching the given search
Args
search
:Optional[Union[str, re.Pattern]]
- Either an asset ID or a search pattern applied to asset names and descriptions. A simple string will match substrings, or a regular expression can be passed for complex searches.
namespace
:UUID
, optional- The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned
:bool
, optional- Only return owned assets (either personally or by the current user's team)
owned_by
:Optional[int]
, optional- Only return owned assets owned by the given team ID
dataset
:Optional[bool]
, optional- Set to True to search for datasets. Default is to search for both data and algorithms.
algorithm
:Optional[bool]
, optional- Set to True to search for algorithms. Default is to search for both data and algorithms.
max
:Optional[int]
, optional- Maximum number of results to return. If not specified, defaults to 500.
session
:Session
, optional- A connection session. If not specified, the default session is used.
Returns
List[Asset]
- A list of found assets, or None if no match found
def position(file_handle: Union[str, Path, Package, io.BufferedReader], name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None, metadata: dict = {}, unmask_columns: Optional[List[str]] = None, validate_sql: Optional[bool] = True, asset_type: Optional[str] = None)
-
Place data on your Access Point for use by yourself or others.
If dataset is a .csv dataset, it is assumed the first row is a header containing column names. Column validation will be performed. Use CSVDataset.position() for more options, including auto-renaming.
Args
file_handle
:str, Path, Package
orio.BufferedReader
- File handle or path to the data to place on the API user's associated Access Point.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
is_dataset
:bool
, optional- Is this a dataset? (False == algorithm)
custom_protocol
- Internal use
metadata
:dict
- Custom metadata to include in the asset
unmask_columns
:[str]
, optional- When is_dataset=True, list of column names that will be initially unmasked. Default is to mask all columns.
validate_sql
:bool
, optional- If True (the default) the query syntax is checked for common SQL syntax errors.
asset_type
:str
, optional- Type of asset to be positioned. Default is 'dataset'. Other options: 'algorithm' or 'report'.
Raises
SystemExit
- SQL syntax errors were found in query.
Returns
Asset
- New asset on the Router, or None on failure
def upload(file_handle: io.BufferedReader, name: str, desc: str, is_discoverable: Optional[bool] = False, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, is_dataset: bool = True, custom_protocol: Optional[Any] = None)
-
Deprecated, use
Asset.position()
instead.
Instance variables
var accesspoint_filename : str
-
str: Disk filename on the Access Point which holds this asset.
var activate_date : dt.datetime
-
str: Date when this asset became active
var deactivate_date : dt.datetime
-
str: Date when this asset was archived (deleted)
var desc : str
-
str: Longer description of the asset
var filename : str
-
str: Filename associated with the asset
var hash : str
-
str: Hash that was registered at the router when this asset was positioned.
var is_active : bool
-
bool: Is this an active asset?
var is_discoverable : bool
-
bool: True if anyone can discover the asset on the Router
var is_valid : bool
-
Verify that this Asset object points to a valid dataset.
Returns
bool
- True if this is a valid Asset on the Router
var k_grouping : Optional[int]
-
int: The minimum count of records with like values required for reporting.
var metadata : dict
-
str: Asset metadata (e.g. 'cols' on some datasets)
var name : str
-
str: Simple short name of the asset
var namespace : UUID
-
str: Namespace which contains the asset. Each user has a namespace, so generally assets exist under the namespace of the creator. Think of this like a personal folder within your organization to hold your assets.
var team : str
-
str: Name of the team that owns the asset
var team_id : str
-
str: ID of the team that owns the asset
Methods
def add_agreement(self, with_team: int, operation: "Union[Operation, UUID, 'Asset']" = None, expiration_date: str = None, num_uses: int = None, algorithm_security: str = 'smpc', session: Optional[Session] = None)
-
Establish Agreement allowing another team to use this Asset
Args
with_team
:int
- ID of the partner team, or "ANY" to make an Asset available to everyone without explicit permission.
operation
:Operation, UUID
orAsset
- The action being enabled by this Agreement against this Asset. If an Asset is provided, it will be treated as a algorithmic operation applied to this Asset (e.g. allowing a trained model to run against a dataset)
expiration_date
:str
- ISO formatted date on which the Agreement becomes invalid, or None for no expiration.
num_uses
:int
- The number of jobs that can be created under the Agreement before it becomes invalid, or None no limit.
algorithm_security
:str
- "smpc" or "fed". Specifies the level of algorithm security required to run the operation.
session
:Session
, optional- A connection session. If not specified the default session is used.
Returns
Agreement
- New agreement, None if unable to create
def archive(self, session: Optional[Session] = None, remote_delete: bool = False) -> bool
-
Remove asset from the Router index (and optionally the Access Point)
Args
session
:Session
, optional- A connection session. If not specified the default session is used.
remote_delete
:bool
, optional- Delete the underlying file from the Access Point? Default is to leave the file on the Access Point's attached storage.
Raises
TripleblindAssetError
- Thrown when unable to archive the asset.
TripleblindAPIError
- Thrown when unable to talk to Router
Returns
bool
- Was asset successfully deleted.
def delete(self, session: Optional[Session] = None) -> bool
-
Deprecated, use
Asset.archive()
instead. def download(self, save_as: Optional[str] = None, overwrite: Optional[bool] = False, show_progress: Optional[bool] = False, session: Optional[Session] = None) -> str
-
Deprecated. Use
Asset.retrieve()
instead def list_agreements(self, session: Optional[Session] = None)
-
List agreements governing your Asset.
Args
session
:Session
, optional- A connection session. If not specified the default session is used.
Returns
list
- List of Agreement objects connected to the Asset.
def publish_to_team(self, to_team: int, session: Optional[Session] = None, algorithm_security: str = 'smpc')
-
Expose existence of Asset to a specified team; still requiring explicit usage approval
Args
with_team
:int
- ID of the partner team.
session
:Session
, optional- A connection session. If not specified the default session is used.
algorithm_security
:str
, optional- Acceptable security level of algorithms to run with ("fed" or "smpc").
Returns
Agreement
- The new Agreement object
def retrieve(self, save_as: Optional[str] = None, overwrite: Optional[bool] = False, show_progress: Optional[bool] = False, session: Optional[Session] = None) -> str
-
Fetch an asset package and save locally.
NOTE: Asset packages are .zip file format. These files can be easily accessed via tb.Package.load(filename) or the ZipFile default Python library.
Args
save_as
:str
, optional- Filename to save under, None to use default filename in the current directory.
overwrite
:bool
, optional- Should this overwrite an already existing file?
show_progress
:bool
, optional- Display progress bar?
session
:Session
, optional- A connection session. If not specified, the default session is used.
Raises
TripleblindAPIError
- Authentication failure
IOError
- File already exists and no 'overwrite' flag
TripleblindAssetError
- Failed to retrieve
Returns
str
- Absolute path to the saved file
class AzureBlobStorageDataset (uuid: UUID)
-
A table stored in CSV format file inside an Azure Blob Storage Account.
Ancestors
Static methods
def create(storage_account_name: str, storage_key: str, file_system: str, key: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> AzureDataLakeStorageDataset
-
Create a dataset connection to a CSV file in Azure Blob Storage.
This dataset will be 'live', any updates that are made to the file resting in Azure Blob Store will be available to this dataset the next time it is used.
Args
storage_account_name
:str
- The Azure storage account to reference.
storage_key
:str
- An access token for reading from the storage account.
file_system
:str
- The file system defined in the Azure control panel for the storage account.
key
:str
- The key associated with the data in Azure Blob Storage.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
Returns
AzureBlobStorageDataset
- New asset on the Router, or None on failure
Inherited members
class AzureDataLakeStorageDataset (uuid: UUID)
-
A table stored in CSV format at a given file path inside an Azure Data Lake Instance.
Ancestors
Static methods
def create(storage_account_name: str, storage_key: str, file_system: str, path: str, name: str, desc: str, is_discoverable: Optional[bool] = False, k_grouping: Optional[int] = 5, allow_overwrite: Optional[bool] = False, session: Optional[Session] = None, unmask_columns: Optional[List[str]] = None) -> AzureDataLakeStorageDataset
-
Create a dataset connected to a CSV file within Azure Data Lake Storage.
This dataset will be 'live', any updates that are made to the file resting in the Data Lake will be available to this dataset the next time it is used.
Args
storage_account_name
:str
- The Azure storage account to reference.
storage_key
:str
- An access token for reading from the storage account.
file_system
:str
- The file system defined in the Azure control panel for the storage account.
path
:str
- The full path to the file within Azure Data Lake Storage.
name
:str
- Name of the new asset.
desc
:str
- Description of the new asset (can include markdown)
is_discoverable
:bool
, optional- Should this asset be listed in the Router index to be found and used by others?
k_grouping
:int
, optional- The minimum count of records with like values required for reporting.
allow_overwrite
:bool
, optional- If False an exception will be thrown if the asset name already exists. If True, an existing asset will be overwritten.
session
:Session
, optional- A connection session. If not specified, the default session is used.
unmask_columns
:[str]
, optional- List of column names that will be initially unmasked. Default is to mask all columns.
Returns
AzureDataLakeStorageDataset
- New asset on the Router, or None on failure
Inherited members