Children’s Speech Recognition Using Noise-Robust ASR

Childrens Speech Recognition

Children’s speech recognition has been a long-term research area. However, most systems built on adult data are unsuitable for children. These reasons include vocal tract differences and acoustic/phonetic properties of the child’s voice. Several age-specific acoustic models have been proposed for improving children’s recognition performance. For example, Hagen, Pellom, and Cole developed acoustic models designed for speech by children of different ages. Learn more https://www.soapboxlabs.com/language-learning/

However, most of these systems fail to recognize children’s voices due to a number of factors. Some examples are the phonological and syntactic structure of their speech, as well as disfluencies. This makes it difficult to develop robust ASR for children’s voices.

In order to improve the ASR for children’s voices, we propose a noise-robust ASR approach. The noise-robust approach consists of two components. Firstly, acoustic models are used for the training of baseline data, and secondly, speed perturbation-based data augmentation is utilized.

From early literacy to math and language learning, speech recognition has various applications in the classroom

During the acoustic modeling, we use a triphone model consisting of 600 senones. Additionally, we train the GMM-HMM acoustic model with a total of 588 utterances from children’s read speech.

The training set contains read speech and spontaneous speech. We also explore the interaction of age, semantic context, and listening background for children’s speech recognition. Among these factors, we also consider the between-subject factor.

After we have trained the acoustic model, we then perform discriminative training. The training is conducted using the Kaldi toolkit. The discriminatively-trained model uses several types of acoustic features, including feature-space MMI and boosted MMI.

Leave a Reply

Your email address will not be published. Required fields are marked *