Domain Adaptation for Language Model with Web Data for Voice Retrieval

doi:10.12733/jics20150111

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (0 KB) HTML (0 KB)
输出: BibTeX | EndNote (RIS)

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

Abstract：This paper introduces our current work in developing a domain adaptation system for real world voice retrieval. This work focuses on the two characteristics of the application of voice retrieval. One is that the searching content covers a variety of domains. A general language model covering domains as much as possible always performs worse than the domain-specific one. While building domain-specific language models leads to high cost on collecting in-domain data manually. Another one is that the hot search terms always change and some words are newly created. To maintain the quality of the language models, these changes should be added into them. This system is constructed to solve these problems. In the system, the block-based language model is proposed. The web training data are automatically divided into several blocks and gathered according to the domain-related degree. To keep pace with the change of the searching content, this system will do an update every month with the latest web data. Besides the language model, the lexicon will also be updated with new extracted words. Information entropy is adopted in the word extraction. On the test of the domain of entertainment, the experiments verified the effectiveness of this system and a relative reduction of 6.1\% was achieved. In addition, this system was successfully applied to some other domains, such as weather and online shopping.

Key words： Hot Search Term Domain-specific Block-based Language Model Voice Retrieval

引用本文:

. Domain Adaptation for Language Model with Web Data for Voice Retrieval[J]. , 2015, 12(18): 6883-6892.
Mengzhe Chen;Qingqing Zhang;Zhichao Wang;Jielin Pan;Yonghong Yan. Domain Adaptation for Language Model with Web Data for Voice Retrieval. , 2015, 12(18): 6883-6892.

链接本文:

http://manu35.magtech.com.cn/Jwk_ics/CN/10.12733/jics20150111 或 http://manu35.magtech.com.cn/Jwk_ics/CN/Y2015/V12/I18/6883