Data Format Support

TripleBlind currently supports the following data formats. In addition to those directly supported, many other data formats that can be preprocessed into a supported format before positioning as an Asset. If there are formats you would like to work with natively, please let us know by contacting your Customer Success Manager or submitting a request to đź”—Customer Support.

Tabular Data

Any data consisting of records with the same number of fields can be represented as grids with column headings, or “tables”. This simple format is extremely versatile and is used in a vast number of computer applications.

Tabular Data in CSV Format

Tabular data stored in comma-separated value (CSV) format can be directly positioned as a Dataset Asset. Numerous data formats can be exported or preprocessed into CSV format, including spreadsheets, which can then be positioned.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use tabular data in CSV format.

  • Gene_Regression
  • PSI
  • PSI_Vertical_Partition
  • Random_Forest
  • Table_Search
  • Tabular_Data
  • Transfer_Learning
  • XGBoost
  • XGBoost_Regression

Tabular Data Stored in Databases

Views of tabular data stored in databases can be positioned as a Database Asset. The asset contains the description of the connection and data content, the actual data is read from the database at each usage, so it is “live”. The following databases are currently supported:

  • Microsoft SQL Server
  • MongoDB
  • MySQL
  • Oracle
  • Postgres
  • SQLite

If support for another database is needed, please let us know.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use tabular data stored in databases:

  • tutorials/notebooks/1b_Database_Assets
  • examples/Data_Connectors
  • examples/Data_Munging

Tabular Data Stored in Data Warehouses

Tabular data stored in data warehouses can be positioned as a Database Asset. Like Databases, the asset contains connection information and the actual data is read from the data warehouse at the time of usage, so it is “live”. The following data warehouses are currently supported:

  • Amazon Redshift
  • Amazon S3
  • Google BigQuery
  • Microsoft Azure Blob Storage
  • Microsoft Azure Data Lake
  • Snowflake

If support for another data warehouse is needed, please let us know.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use tabular data stored in data warehouses:

  • examples/Data_Connectors

Image Formats

Images can be positioned as Dataset Assets. The following image formats are supported:

Please refer to the following examples in the TripleBlind SDK to learn more about how to use image data:

  • Multimodal_AI
  • Cifar
  • Federated_Learning
  • Image_Data
  • Object_Detection

NumPy Binary Format

The đź”—NumPY NPY format is the standard binary file format for persisting a single NumPy array to disk. NumPy binary files can be directly positioned as Dataset Assets.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use NumPy binary data:

  • LSTM
  • Network_Provisioning
  • PMML

Text Files

Collections of text (.txt) files, such as doctor’s notes, can be positioned as a Dataset Asset.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use text data:

  • Redact

Compressed Files

The preprocessor.package (aka tb.Package) is a general packaging tool for efficient handling of groups of files, such as images, text files, and NumPy binary arrays. The Package requires a specific internal structure, including a .meta.json, but is otherwise a regular ZIP archive file.

Please refer to the following examples in the TripleBlind SDK to learn more about how to use compressed packages of data files:

  • Federated_Learning
  • Image_Data
  • LSTM
  • Multimodal_AI