Abstract:This paper introduces our current work in developing a domain
adaptation system for real world voice retrieval. This work focuses
on the two characteristics of the application of voice retrieval.
One is that the searching content covers a variety of domains. A
general language model covering domains as much as possible always
performs worse than the domain-specific one. While building
domain-specific language models leads to high cost on collecting
in-domain data manually. Another one is that the hot search terms
always change and some words are newly created. To maintain the
quality of the language models, these changes should be added into
them. This system is constructed to solve these problems. In the
system, the block-based language model is proposed. The web training
data are automatically divided into several blocks and gathered
according to the domain-related degree. To keep pace with the change
of the searching content, this system will do an update every month
with the latest web data. Besides the language model, the lexicon
will also be updated with new extracted words. Information entropy
is adopted in the word extraction. On the test of the domain of
entertainment, the experiments verified the effectiveness of this
system and a relative reduction of 6.1\% was achieved. In addition,
this system was successfully applied to some other domains, such as
weather and online shopping.
. Domain Adaptation for Language Model with Web Data for Voice Retrieval[J]. , 2015, 12(18): 6883-6892.
Mengzhe Chen;Qingqing Zhang;Zhichao Wang;Jielin Pan;Yonghong Yan. Domain Adaptation for Language Model with Web Data for Voice Retrieval. , 2015, 12(18): 6883-6892.