Personal tools

Martins-2010e

From IEETA

Jump to: navigation, search

Article

Title Dynamic language modeling for European Portuguese
Author Ciro Martins, António J. S. Teixeira, João Paulo Neto
Journal Computer Speech and Language
Volume 24
Number 4
Pages 750-773
Month October
Year 2010
DOI 10.1016/j.csl.2010.02.003
Group
Group (before 2015) Signal Processing Laboratory, Transverse Activity on Innovative Biomedical Technologies
Indexed by ISI Yes

This paper reports on the work done on vocabulary and language model daily adaptation for a European Portuguese broadcast news transcription system. The proposed adaptation framework takes into consideration European Portuguese language characteristics, such as its high level of inflection and complex verbal system. A multi-pass speech recognition framework using contemporary written texts available daily on the Web is proposed. It uses morpho-syntactic knowledge (part-of-speech information) about an in-domain training corpus for daily selection of an optimal vocabulary. Using an information retrieval engine and the ASR hypotheses as query material, relevant documents are extracted from a dynamic and large-size dataset to generate a story-based language model. When applied to a daily and live closed-captioning system of live TV broadcasts, it was shown to be effective, with a relative reduction of out-of-vocabulary word rate (69%) and WER (12.0%) when compared to the results obtained by the baseline system with the same vocabulary size.