- 作者: 吳宗憲; 陳昭宏
- 作者服務機構: 成功大學資訊工程研究所
- 中文摘要: 本論文中,我們提出一利用文脈相關以類音素為單元之國語關鍵詞辨識系統。在本系統中,使用者可自行定義關鍵詞及常用非關鍵詞而不須重新訓練。此外,我們利用一組包含176單音及483個平衡詞及平衡句作為訓練資料庫。在訓練過程中,我們提出一改良式K-means演算法以減少訓練時間。對於常用非關鍵詞則再區分為前置及後置非關鍵詞兩種,每一種均使用6個聲母及2個韻母分段式拜氏網路來表示,而對於無關之語音則以15個聲母及5個韻母拜氏網路來表示。在辨識過程中,我們利用一韻母預處理器以刪除不合理之組合,以減少辨識時問,在20個關鍵詞之實驗中,使用由15個語者(10男性,5女性)525個查詢語句作測試,其關鍵詞辨識率對獨立音而言可達98.1%,而對於夾雜於無關語音間之辦識率可達91.2%。
- 英文摘要: In this paper, a continuous Mandarin speech keyword spotting system based on context-dependentphonelike units (PLU) is presented. In this vocabulary-independent system, users can define their ownkeywords and most frequently occurring non-keywords without retraining the system. A set of 176monosyllables and 483 balanced words or sentences are used to establish the context-dependent PLU,i.e., initials or finals in Mandarin speech. Each PLU is represented by a proposed segmental Bayesiannetwork (SBN) model. In the training process, a modified K-means algorithm is proposed to reduce thetraining time. The most frequently occurring non-keywords are divided into keyword predecessors andsuccessors. Each type of keyword predecessor and successor is modeled by 6 initial part SBNs and 2final part SBNs as the garbage models. For extraneous speech, 15 initial part SBNs and 5 final part SBNsare established as the extraneous speech garbage models. In the recognition process, a final part preprocessoris used to screen out unreasonable hypotheses in order to reduce the recognition time. Using a test setof 525 conversational speech utterances from 15 speakers(10 males and 5 females), word spotting ratesof 97.4% on isolated keywords, and 92.0% when the vocabulary word was embedded in unconstrainedextraneous speech, were obtained for a user-defined 20 keyword vocabulary.
- 中文關鍵字: Mandarin speech keyword spotting; context-dependent phonelike unit; segmental Bayesian netword; K-means algorithm
- 英文關鍵字: --