ODSS: An Open Dataset of Synthetic Speech
Multilingual, multispeaker dataset of synthetic and natural speech, designed to foster research and benchmarking of novel studies on synthetic speech detection

ODSS comprises audio utterances generated from text by state-of-the-art synthesis methods, paired with their corresponding natural counterparts. The synthetic audio data includes several languages, with an equal representation of genders.
The ODSS dataset contains fake speech examples along with their corresponding natural equivalents based on 156 voices spanning three languages – English, German, and Spanish – with a balanced gender representation. It encompasses 18,993 audio utterances synthesized from text, representing approximately 17 hours of audio data generated with two state of the art text-to-speech (TTS) methods: a two-step FastPitch+HifiGAN pipeline and the end-to-end VITS architecture. It furthermore includes preprocessed original utterances with trimmed silence and mitigated noise spikes.
The ODSS dataset can be used for development and benchmarking of synthetic speech detection methods. It incorporates tailored data distributions ready for training and provides multiple dimensions for the evaluation and analysis of generalizability. The utterances are uncompressed and don't include background noise, therefore audio augmentation techniques can also be applied to improve or test the robustness to various transformation.