AI REGIO-Synthetic Data Generation Engine (SynData)

A flexible and extensible synthetic data generation engine based on mainstream statistics distributions and on timeseries generative AI techniques.

As a Service

Suite5 Data Intelligence Solutions Limited

Developed by

Suite5 Data Intelligence Solutions Limited

License

Other

Commercial License. Part of the development of this asset was supported by the AI REGIO project, which is funded by the European Union Framework Programme for Research and Innovation Horizon 2020 under Grant Agreement n° 952003.

Main Characteristic

Generation of: (a) fully synthetic data from scratch, (b) partially synthetic data to complement/augment any existing datasets based on:
- statistics distributions, e.g. for discrete data: Poisson, Binomial (Bernulli for trials = 1), Negative Binomial, Uniform Integer; for continuous data: Normal, Gamma (Exponential for shape = 1), Beta, Weibull, Uniform), that are relevant for certain columns; and
- machine learning techniques, e.g. Time-series Generative adversarial network (GAN)), to detect and analyse patterns and “inject” outliers in the synthetic dataset.
User-friendly user interface allowing for: Configuring the synthetic data generation; Downloading the created synthetic dataset as csv file while previewing online a sample to confirm that it is according to the user’s expectations; Identifying any errors/problems that prevented the synthetic data generation through informative messages.

Research areas

Integrative AI

Technical Categories

AI services Machine learning

Business Categories

Manufacturing

Keywords

Last updated

14.08.2023 - 09:57

Detailed Description

The AI REGIO Synthetic Data Generation Engine (SynData) asset allows a human to generate appropriate timeseries (IoT) data on-demand according to their needs through a combined data-driven and process-driven approach. Since the importance of synthetic data comes with its power of generating features to meet specific needs or conditions which otherwise would not be available in real-world data (e.g. for edge/sporadic cases not yet encountered, for overcoming confidentiality or privacy concerns), SynData aims at addressing the lack of representative IoT data to train artificial intelligence and machine learning (AI/ML) models.

Through SynData, the user is able to define the exact structure he/she wants to create in the synthetic dataset (if it has not been derived from an existing dataset). For each column, the user configures the set of rules that should be applied depending on the selected data type, the available options (from the supported synthetic data generation techniques) and the number of rows that are to be generated in the dataset.

Related Projects

AI REGIO