Adaptation of ASR for Impaired Speech with minimum resources (AdAIS)

This micro project studied the adaptation of automatic speech recognition (ASR) systems for impaired speech

Micro-project in Humane-AI-Net

Short description

This micro project studied the adaptation of automatic speech recognition (ASR) systems for impaired speech. Specifically, the micro-project focused on improving ASR systems for speech from subjects with dysarthria speech impairment of various degrees.

The work was developed using the UASpeech English dataset, which comprises 13 healthy control speakers and 15 dysarthric speakers, and a German dataset comprising only 130 hours of untranscribed doctor-patient speech conversations.

We propose a simple yet effective adaptation (finetuning) strategy named ”wav2vec2 adapter” to incorporate wav2vec2 with speaker adaptation:
- We hypothesize that our adapter approach is independent of the pretraining data domain used in wav2vec2 model and empirically verify it
- Wav2vec2 adapter is shown to be flexible with auxiliary features such as fMLLR and xvectors
- Effect of speaker dependent and independent finetuning on wav2vec2 models are shown by comparing it with hybrid systems (DNN-HMM and LF-MMI)
- Crosslingual experiments with English (EN) and German (DE) are conducted using wav2vec2 adapter framework

Additional information

Output

Paper submitted to Interspeech 2022: https://arxiv.org/abs/2204.00770
Code released in https://github.com/creatorscan/Dysarthric-ASR, https://www.ai4europe.eu/research/ai-catalog/dysarthric-speech-asr-repo…

This Humane-AI-Net micro-project was carried out by Murali Karthick Baskar, Mireia Diez, Honza Černocký from Brno University of Technology (BUT) and Tim Herzig, Diana Nguyen and Tim Polzehl from Technische Universität Berlin (TUB).

Technical categories

Audio processing

Technology Readiness Level

TRL 1-2 (Basic research)

Keywords

Neural networks