Cross-Lingual Phone Mapping for LVCSR of Under-Resourced Languages

Hai Do

ICSI / Nanyang Technological University

Tuesday, March 12, 2013
12:30 PM, Conference Room 5A

Abstract:

In this talk, I will present a novel acoustic modeling technique of large vocabulary automatic speech recognition for under-resourced languages by leveraging on well-trained acoustic models of other languages (called source languages). The idea is to use the source language acoustic model to score the speech feature vectors of the target language, and then map the scores to the posteriors of the target phones by using a classifier. The target phone posteriors are then used for decoding in the usual way of hybrid acoustic modeling. The motivation of such a strategy is that human languages usually share similar phone sets and it is easier to predict the target phone posteriors from the scores generated by source language acoustic models than directly modeling the under-resourced language acoustic model. The proposed method is evaluated by building an English (Aurora-4 task) acoustic model with less than 1 hour of training data. Two types of source language acoustic models are considered, i.e., hybrid HMM/MLP and conventional HMM/GMM. In addition, we also use triphone tied states in the mapping. Our experimental results show that leveraging on well trained Malay and Hungarian acoustic models, we achieved 9.10% word error rate (WER) given 55 minutes of English training data. This is close to the WER of 7.85% obtained by using the full 15 hours of training data and much better than the WER of 14.36% obtained by conventional acoustic modeling techniques with the same 55 minutes of training data.

Bio:

Hai Do (Van Hai Do)  is currently a visiting researcher at ICSI with the speech group. He received the B.Eng. and M.Sc. degree from Hanoi University of Science and Technology, Vietnam, in 2002 and 2006, respectively. Since Aug 2009 he is pursuing his Ph.D. at Nanyang Technological University, Singapore under supervision of Dr. Eng Siong Chng and Dr. Haizhou Li where he is currently working on hybrid acoustic models, cross-lingual speech recognition for under-resourced languages.