Please use this identifier to cite or link to this item:
Title: Design of QbE STD system audio representation and matching perspective
Researcher: Madhavi, Maulik C.
Guide(s): Hemant A. Patil
Keywords: Spoken Content Retrieval Systems
Keyword Spotting System
GMM Framework
Vocal Tract Length Normalization
Gaussian Mixture Model
Detection Subsystem
University: Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT)
Completed Date: 
Abstract: newline quotThe retrieval of the spoken document and detecting the query (keyword) within the audio document have attained huge research interest. The problem of retrieving audio documents and detecting the query (keyword) using a spoken form of a query is widely known as Query-by-Example Spoken Term Detection (QbE-STD).This thesis presents the design of QbE-STD system from the representation and matching perspective. newline newlineA speech spectrum is known to be affected by the variations in the length of the vocal tract of a speaker due to the inverse relation between formants and vocal tract length. The process of compensating spectral variation caused due to the length of the vocal tract is popularly known as Vocal Tract Length Normalization (VTLN) (especially, in speech recognition literature). VTLN is a very important speaker normalization technique for speech recognition task. In this context, this thesis proposes the use of Gaussian posteriorgram of VTL-warped spectral features newlinefor a QbE-STD task. This study presents the novel use of a Gaussian Mixture Model (GMM) framework for VTLN warping factor estimation. In particular, presentedGMMframework does not require phoneme-level transcription and hence, it can be useful for the unsupervised task. In addition, we also propose the use of the mixture of GMMs for posteriorgram design. The speech data governs acoustically similar broad phonetic structures. To capture broad phonetic structure, we exploit supplementary knowledge of broad phoneme classes (such as, vowels, semi-vowels, nasals, fricatives, plosive) for the training of GMM. The mixture of GMMs is tied with GMMs of these broad phoneme classes. AGMMtrained under no supervision assumes uniform priors to each Gaussian component, whereas a mixture of GMMs assigns the prior probability based on broad phoneme class. newlineThe novelty of our work lies in prior probability assignments (as weights of the mixture of GMMs) for better Gaussian posteriorgram design.
Pagination: xxiv, 199 p.
Appears in Departments:Department of Information and Communication Technology

Files in This Item:
File Description SizeFormat 
01_title.pdfAttached File85.66 kBAdobe PDFView/Open
02_declaration and certificate.pdf235.4 kBAdobe PDFView/Open
03_acknowledgements.pdf84.7 kBAdobe PDFView/Open
04_contents.pdf122.38 kBAdobe PDFView/Open
05_abstract.pdf105.61 kBAdobe PDFView/Open
06_list of symbol and accronyms.pdf123.79 kBAdobe PDFView/Open
07_list of tables.pdf93.5 kBAdobe PDFView/Open
08_list of figures.pdf112.56 kBAdobe PDFView/Open
09_chapter 1.pdf311.04 kBAdobe PDFView/Open
10_chapter 2.pdf1.28 MBAdobe PDFView/Open
11_chapter 3.pdf413.32 kBAdobe PDFView/Open
12_chapter 4.pdf1.22 MBAdobe PDFView/Open
13_chapter 5.pdf591.08 kBAdobe PDFView/Open
14_chapter 6.pdf336.01 kBAdobe PDFView/Open
15_chapter 7.pdf846.36 kBAdobe PDFView/Open
16_reference.pdf176.45 kBAdobe PDFView/Open

Items in Shodhganga are protected by copyright, with all rights reserved, unless otherwise indicated.