Data Format Support

TripleBlind currently supports the following data formats. In addition to those directly supported, many other data formats that can be preprocessed into a supported format before positioning as an Asset. If there are formats you would like to work with natively, please let us know by contacting your Customer Success Manager or submitting a request to 🔗Customer Support.

Tabular Data

Any data consisting of records with the same number of fields can be represented as grids with column headings, or “tables”. This simple format is extremely versatile and is used in a vast number of computer applications.

Tabular Data in CSV Format

Tabular data stored in comma-separated value (CSV) format can be directly positioned as a Dataset Asset. Numerous data formats can be exported or preprocessed into CSV format, including spreadsheets, which can then be positioned.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use tabular data in CSV format.

Gene_Regression
PSI
PSI_Vertical_Partition
Random_Forest
Table_Search
Tabular_Data
Transfer_Learning
XGBoost
XGBoost_Regression

Tabular Data Stored in Databases

Views of tabular data stored in databases can be positioned as a Database Asset. The asset contains the description of the connection and data content, the actual data is read from the database at each usage, so it is “live”. The following databases are currently supported:

Microsoft SQL Server
MongoDB
MySQL
Oracle
Postgres
SQLite

If support for another database is needed, please let us know.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use tabular data stored in databases:

tutorials/notebooks/1b_Database_Assets
examples/Data_Connectors
examples/Data_Munging

Tabular Data Stored in Data Warehouses

Tabular data stored in data warehouses can be positioned as a Database Asset. Like Databases, the asset contains connection information and the actual data is read from the data warehouse at the time of usage, so it is “live”. The following data warehouses are currently supported:

Amazon Redshift
Amazon S3
Databricks
Google BigQuery
Microsoft Azure Blob Storage
Microsoft Azure Data Lake
Snowflake

If support for another data warehouse is needed, please let us know.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use tabular data stored in data warehouses:

examples/Data_Connectors

Image Formats

Images can be positioned as Dataset Assets. The following image formats are supported:

Any image format supported by the 🔗Python Image Library (pillow), including JPEG, PNG, BMP and many more.
DICOM x-ray images

Please refer to the following examples in the TripleBlind SDK to learn more about how to use image data:

Multimodal_AI
Cifar
Federated_Learning
Image_Data
Object_Detection

NumPy Binary Format

The 🔗NumPY NPY format is the standard binary file format for persisting a single NumPy array to disk. NumPy binary files can be directly positioned as Dataset Assets.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use NumPy binary data:

LSTM
Network_Provisioning
PMML

Text Files

Collections of text (.txt) files, such as doctor’s notes, can be positioned as a Dataset Asset.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use text data:

Redact

Compressed Files

The preprocessor.package (aka tb.Package) is a general packaging tool for efficient handling of groups of files, such as images, text files, and NumPy binary arrays. The Package requires a specific internal structure, including a .meta.json, but is otherwise a regular ZIP archive file.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use compressed packages of data files:

Federated_Learning
Image_Data
LSTM
Multimodal_AI