Module `tripleblind.model_asset`

Specialized Asset representing trained models, such as a neural network.

The ModelAsset wraps a generic asset, allowing the complexity of creating jobs to be completely hidden. Common operations can happen with just a few lines of code.

For example:

import tripleblind as tb

# Use a trained model to privately analyze a patient xray
model = tb.ModelAsset("diagnose_disease_model")
result = model.infer(data="xray.jpg")

print(result.table.dataframe)

Classes

class ModelAsset (uuid: UUID)

Points to a dataset or an algorithm indexed on the TripleBlind Router.

Ancestors

Asset

Subclasses

Regression

Static methods

def cast(asset: Asset) -> ModelAsset

Convert a generic Asset into a ModelAsset

This should only be used on an asset known to be model, no validation occurs during the cast.

Args

asset : Asset: A generic Asset

Returns

ModelAsset: A ModelAsset object

def find(search: Optional[Union[str, re.Pattern]], namespace: Optional[UUID] = None, owned: Optional[bool] = False, owned_by: Optional[int] = None, session: Optional[Session] = None, exact_match: Optional[bool] = True) -> ModelAsset

Search the Router index for an asset matching the given search

Args

search : str or re.Pattern, optional: Either an asset ID or a search pattern applied to asset names and descriptions. A simple string will match a substring or the entire string if exact_match is True, or a regular expression can be passed for complex searches.
namespace : UUID, optional: The UUID of the user to which this asset belongs. None indicates any user, NAMESPACE_DEFAULT_USER indicates the current API user.
owned : bool, optional: Only return owned assets (either personally or by the current user's team)
owned_by : int, optional: Only return owned assets owned by the given teamID
session : Session, optional: A connection session. If not specified, the default session is used.
exact_match : bool, optional: When the 'search' is a string, setting this to True will perform an exact match. Ignored for regex patterns, defaults to True.

Raises

TripleblindAssetError: Thrown when multiple assets are found which match the search.

Returns

ModelAsset: A single asset, or None if no match found

Methods

def infer(self, data: Union[Asset, TableAsset, str, Path, List[Asset], List[TableAsset], List[str], List[Path]], preprocessor: Optional[Union[TabularPreprocessor, List[TabularPreprocessor], TabularPreprocessorBuilder, List[TabularPreprocessorBuilder], ImagePreprocessorBuilder, List[ImagePreprocessorBuilder], NumpyInputPreprocessor, List[NumpyInputPreprocessor], NumpyInputPreprocessorBuilder, List[NumpyInputPreprocessorBuilder]]] = None, params: Optional[Dict] = None, job_name: Optional[str] = None, silent: Optional[bool] = False, session: Optional[Session] = None, stream_output: bool = False, identifier_columns: Optional[Union[List[str], str]] = None) -> Union[JobResult, StatusOutputStream]

Perform an inference using a model

NOTE: For inferences which produce textual output, such as a classifier, the result can be easily accessed via code like this:

r = model.infer("data.csv")
print(r.table)

Or the r.table.dataframe can be used as a standard Pandas dataframe.

Args

data : Asset or str: The data to infer against. Can be an Asset or or a path to a file.
preprocessor : Preprocessor: A preprocessor to apply to the data. it not defined, the dataset is used directly.
params : dict: Dictionary of unique parameters for the model. Typically, this is not needed.
job_name : str, optional: Reference name for the job with performs this task.
silent : bool, optional: Suppress status messages during execution? Default is to show messages.
session : Session, optional: A connection session. If not specified, the default session is used.
stream_output : bool, optional: Whether to start the job and return a StatusOutputStream, or wait for job completion and return a JobResult (the default).
identifier_columns : str, List[str], optional: Column or columns which will be returned alongside results. Default is None.

Raises

TripleblindAPIError: Inference failed

Returns

When stream_output is set to False (the default), a JobResult is returned once the job completes. If successful, the inference output is found at result.asset and/or result.table

If stream_output is set to True, a StatusOutputStream object is immediately returned and can be used as a Generator that outputs the status messages produced while the job is running.

def psi_infer(self, data: Union[Asset, List[Asset], TableAsset, List[TableAsset]], match_column: Union[str, List[str]], regression_type: Optional[RegressionType] = None, preprocessor: Optional[Union[TabularPreprocessor, List[TabularPreprocessor], TabularPreprocessorBuilder, List[TabularPreprocessorBuilder]]] = None, params: Optional[Dict] = None, job_name: Optional[str] = None, silent: Optional[bool] = False, session: Optional[Session] = None, stream_output: bool = False) -> Union[JobResult, StatusOutputStream]

Perform an inference using a model on distributed data matched with PSI

NOTE: For inferences which produce textual output, such as a classifier, the result can be easily accessed via code like this:

r = model.psi_infer("data.csv")
print(r.table)

Or the r.table.dataframe can be used as a standard Pandas dataframe.

Args

data : Asset or str: The data to infer against.
match_column : Union[str, List[str]]: Name of the column to match. If not the same in all datasets, a list of the matching column names, starting with the initiator asset and then listing a name in each dataset.
regression_type : RegressionType: The type of regression to be performed. If populated, indicates a regression inference will be performed. One of: tb.RegressionType.LINEAR, LOGISTIC
preprocessor : Union[TabularPreprocessor, List[TabularPreprocessor], TabularPreprocessorBuilder, List[TabularPreprocessorBuilder]], optional: A preprocessor to apply to the data. If not defined, the dataset is used directly.
params : dict: Dictionary of unique parameters for the model. Typically, this is not needed.
job_name : str, optional: Reference name for the job with performs this task.
silent : bool, optional: Suppress status messages during execution? Default is to show messages.
session : Session, optional: A connection session. If not specified, the default session is used.
stream_output : bool, optional: Whether to start the job and return a StatusOutputStream, or wait for job completion and return a JobResult (the default).

Returns

When stream_output is set to False (the default), a JobResult is returned once the job completes. If successful, the inference output is found at result.asset and/or result.table

If stream_output is set to True, a StatusOutputStream object is immediately returned and can be used as a Generator that outputs the status messages produced while the job is running.

Inherited members

Asset:
- accesspoint_filename
- activate_date
- add_agreement
- archive
- deactivate_date
- delete
- desc
- download
- filename
- find_all
- hash
- is_active
- is_discoverable
- is_valid
- k_grouping
- list_agreements
- metadata
- name
- namespace
- position
- publish_to_team
- retrieve
- team
- team_id
- upload
- uuid

class ModelTrainerAsset (uuid: UUID)

Points to a dataset or an algorithm indexed on the TripleBlind Router.

Ancestors

Asset

Static methods

def cast(asset: Asset) -> ModelTrainerAsset

Convert a generic Asset into a ModelTrainerAsset

This should only be used on an asset known to be model, no validation occurs during the cast.

Args

asset : Asset: A generic Asset

Returns

ModelTrainerAsset: A ModelTrainerAsset object

Methods

def train(self, data: Optional[Union[Asset, str, Path, Package, List[Asset], List[str], List[Path], List[Package]]], data_type: str = 'table', epochs: int = 1, model_output: str = None, data_shape: Optional[List[int]] = None, batch_size: Optional[int] = None, test_size: Optional[float] = None, preprocessor: Union[TabularPreprocessor, List[TabularPreprocessor], TabularPreprocessorBuilder, List[TabularPreprocessorBuilder], ImagePreprocessorBuilder, List[ImagePreprocessorBuilder], NumpyInputPreprocessor, List[NumpyInputPreprocessor], NumpyInputPreprocessorBuilder, List[NumpyInputPreprocessorBuilder]] = None, loss_name: str = None, loss_params: Optional[Dict] = None, optimizer_name: str = None, optimizer_params: Optional[Dict] = None, lr_scheduler_name: Optional[str] = None, lr_scheduler_params: Optional[Dict] = None, params: Optional[Dict] = None, delete_trainer: Optional[bool] = False, job_name: Optional[str] = None, silent: Optional[bool] = False, session: Optional[Session] = None, stream_output: bool = False) -> Union[JobResult, StatusOutputStream]

Train this model using the data and parameters specified

Args

data : Asset or str

One or more datasets to use for training. If a string is passed, it must be a path to valid data that will be converted into a temporary asset for the training.

dataset : Asset, str, Path, Package or list of same, optional

One or more datasets to use for training. Datasets can be specified as Assets or as a filename. When a filename is given it will automatically be converted to a temporary Asset which gets deleted at the completion of the Job.

data_type : str

The type of the training data. Valid values are "table", "image", and "numpy".

epochs : int, optional

Number of passes to make through the training data.

model_output : str

The type result generated by the model. Valid values are "regression", "multiclass", and "binary".

data_shape : List[int], optional

Description of the training data, depending on the data_type: table - number of columns of data, e.g. [cols] image - image dimensions, e.g. [width, height, bytes-per-pixel] numpy - not used

batch_size : int, optional

Number of data samples to pass at one time during training.

test_size : float, optional

A percentage of the data to be reserved for accuracy testing and reporting with each epoch.

preprocessor : Preprocessor or List[Preprocessor], optional

A single preprocessor to apply to all data, or a list of preprocessors to apply to each dataset. If a list of preprocessors is given, the count must match the number of datasets.

loss_name : str, optional

A loss function name, consistent with PyTorch. See https://pytorch.org/docs/stable/nn.html#loss-functions

loss_params : dict, optional

Dictionary of parameters appropriate for the loss function.

optimizer_name : str, optional

An optimizer function name, consistent with PyTorch. See https://pytorch.org/docs/stable/optim.html

optimizer_params : dict, optional

Dictionary of parameters appropriate for the optimizer_name.

lr_scheduler_name : str, optional

A learning rate scheduler function name, either "CyclicLR" or "CyclicCosineDecayLR". Default is to use a constant learning rate.

lr_scheduler_params : dict, optional

Dictionary of parameters appropriate for the scheduler_name. Legal values depend on the lr_scheduler_name. For "CyclicLR"::

{ 'step_size': 10, # Number of epochs over which the cycle is completed. 'base_lr': 0.0001, # Starting rate, lower boundary in the cycle 'max_lr': 0.01, # Upper boundary in the cycle. 'mode': 'triangular' # or "triangular2", or "exp_range" 'gamma': # Multiplicative factor of decay of learning rate at the end of each cycle, default=0.99 }

For "CyclicCosineDecayLR"::

{
    "init_decay_epochs": 10,  # Number of initial decay epochs.
    "min_decay_lr": 0.0001,  # Learning rate at the end of decay.
    "restart_interval": 3,  # Restart interval for fixed cycles, or None to disable cycles.
    "restart_interval_multiplier": 1.5,  # Multiplication coefficient for geometrically increasing cycles.
    "restart_lr": 0.01,  # Learning rate when cycle restarts.
    "warmup_epochs":  # Number of warmup epochs, default is None
    "warmup_start_lr":  # Learning rate at the beginning of warmup.
}

params : dictionary, optional

Additional customer parameters.

delete_trainer : bool, optional

Set to True to delete the training model after training completes. Ignored if stream_output is set to True.

job_name : str, optional

Reference name for the job which performs this task. Default is "Model training - "

silent : bool, optional

Suppress status messages during execution? Default is to show messages.

session : Session, optional

A connection session. If not specified, the default session is used.

stream_output : bool, optional

Whether to start the job and return a StatusOutputStream, or wait for job completion and return a JobResult (the default).

Raises

TripleblindTrainingError: Model training failed

Returns

When stream_output is set to False (the default), a JobResult is returned once the job completes. If successful, the inference output is found at result.asset and/or result.table

If stream_output is set to True, a StatusOutputStream object is immediately returned and can be used as a Generator that outputs the status messages produced while the job is running.

Inherited members

Asset:
- accesspoint_filename
- activate_date
- add_agreement
- archive
- deactivate_date
- delete
- desc
- download
- filename
- find
- find_all
- hash
- is_active
- is_discoverable
- is_valid
- k_grouping
- list_agreements
- metadata
- name
- namespace
- position
- publish_to_team
- retrieve
- team
- team_id
- upload
- uuid