國內學術電子期刊系統 Electronic Journal System of STPI

作者： 吳宗憲; 陳昭宏
作者服務機構： 成功大學資訊工程研究所
中文摘要： 本論文中，我們提出一利用文脈相關以類音素為單元之國語關鍵詞辨識系統。在本系統中，使用者可自行定義關鍵詞及常用非關鍵詞而不須重新訓練。此外，我們利用一組包含176單音及483個平衡詞及平衡句作為訓練資料庫。在訓練過程中，我們提出一改良式K-means演算法以減少訓練時間。對於常用非關鍵詞則再區分為前置及後置非關鍵詞兩種，每一種均使用6個聲母及2個韻母分段式拜氏網路來表示，而對於無關之語音則以15個聲母及5個韻母拜氏網路來表示。在辨識過程中，我們利用一韻母預處理器以刪除不合理之組合，以減少辨識時問，在20個關鍵詞之實驗中，使用由15個語者（10男性，5女性）525個查詢語句作測試，其關鍵詞辨識率對獨立音而言可達98.1%，而對於夾雜於無關語音間之辦識率可達91.2%。
英文摘要： In this paper, a continuous Mandarin speech keyword spotting system based on context-dependentphonelike units (PLU) is presented. In this vocabulary-independent system, users can define their ownkeywords and most frequently occurring non-keywords without retraining the system. A set of 176monosyllables and 483 balanced words or sentences are used to establish the context-dependent PLU,i.e., initials or finals in Mandarin speech. Each PLU is represented by a proposed segmental Bayesiannetwork (SBN) model. In the training process, a modified K-means algorithm is proposed to reduce thetraining time. The most frequently occurring non-keywords are divided into keyword predecessors andsuccessors. Each type of keyword predecessor and successor is modeled by 6 initial part SBNs and 2final part SBNs as the garbage models. For extraneous speech, 15 initial part SBNs and 5 final part SBNsare established as the extraneous speech garbage models. In the recognition process, a final part preprocessoris used to screen out unreasonable hypotheses in order to reduce the recognition time. Using a test setof 525 conversational speech utterances from 15 speakers（10 males and 5 females), word spotting ratesof 97.4% on isolated keywords, and 92.0% when the vocabulary word was embedded in unconstrainedextraneous speech, were obtained for a user-defined 20 keyword vocabulary.
中文關鍵字： Mandarin speech keyword spotting; context-dependent phonelike unit; segmental Bayesian netword; K-means algorithm
英文關鍵字： --