Please use this identifier to cite or link to this item: http://hdl.handle.net/10603/184865
Title: Developing a pilot Hindi Treebank based on Computational Paninian Grammar
Researcher: M.A. Rafiya Begum
Guide(s): Dipti Misra Sharma
Keywords: Dependency Grammar
Karaka Relations
TreeBank
Verb Frames
University: International Institute of Information Technology, Hyderabad
Completed Date: 06/11/2017
Abstract: Penn Treebank has proved the importance of treebanks as a linguistic resource for NLP. The current research presents an effort to develop a pilot treebank for Hindi, which could be used for creating a large scale treebank for Hindi. Building a treebank requires a computational grammar framework, an annotation scheme based on a chosen grammar, guidelines for annotating various types of constructions in the concerned language, and other related resources such as verb frames, etc. Since Hindi has a relatively free word order, dependency grammar formalism is well suited for it. So we chose Computational Paninian Grammar framework [36]. Panini s grammar is a dependency grammar [99, 162]. Hence, the scheme for annotating treebanks for Indian languages was developed based on this framework. As part of this study, a pilot treebank for Hindi (HyDT Hyderabad Dependency Treebank for Hindi) [21] was developed which was released for ICON-2009 (International Conference on Natural Language Processing-2009) [86]. The scheme [21] and guidelines for treebank annotation for Hindi developed during this study were modified and are being used for a multi-layered and multi-representational treebank for Hindi and Urdu [39, 42, 188] which is a collaborative project between various Universities. newline newlineAlong with the creation of Hindi Treebank (HyDT), I also created a supplementary resource of verb frames for 687 Hindi verbs. I present the work on verb frames [22] for Hindi verbs and show the methodology used in preparing these frames and the criteria followed for classifying Hindi verbs. The main goal of this work is to create a linguistic resource which will prove to be indispensable for various NLP applications. I have also worked on the mapping between Propbank annotation and dependency annotation, based on Paninian Grammatical Framework [21, 36]. newline newlineI have also discussed the use of HyDT data (Hyderabad Dependency Treebank for Hindi) [21] in various experiments.
Pagination: xx,272
URI: http://hdl.handle.net/10603/184865
Appears in Departments:Computational Linguistics

Files in This Item:
File Description SizeFormat 
01_titlepage.pdfAttached File548.06 kBAdobe PDFView/Open
02_chapter1.pdf138.93 kBAdobe PDFView/Open
03_chapter2.pdf305.1 kBAdobe PDFView/Open
04_chapter3.pdf304.16 kBAdobe PDFView/Open
05_chapter4.pdf386.2 kBAdobe PDFView/Open
06_chapter5.pdf477.09 kBAdobe PDFView/Open
07_chapter6.pdf1.05 MBAdobe PDFView/Open
08_chapter7.pdf564.16 kBAdobe PDFView/Open
09_chapter8.pdf452.28 kBAdobe PDFView/Open
10_chapter9.pdf85.94 kBAdobe PDFView/Open
11_references.pdf232.95 kBAdobe PDFView/Open


Items in Shodhganga are protected by copyright, with all rights reserved, unless otherwise indicated.