Unsupervised Bayesian Lexicalized Dependency Grammar Induction

Headden, William P.

Full Metadata

Overview

Year:: 2012
Contributor:: Headden, William P (creator); Johnson, Mark (Director); Charniak, Eugene (Reader); Eisner, Jason (Reader); Geman, Stuart (Reader); Brown University. Computer Science (sponsor)
Genre:: theses
Subject:: Grammar Induction; Unsupervised Learning; Dependency Parsing
Extent:: xvi, 128 p.
DOI: https://doi.org/10.7301/Z0N29V7J

Files

Description

Abstract:: This dissertation investigates learning dependency grammars for statistical natural language parsing from corpora without parse tree annotations. Most successful work in unsupervised dependency grammar induction has assumed that the input consists of sequences of parts-of-speech, ignoring words and using extremely simple probabilistic models. However, supervised parsing has long shown the value of more sophisticated models which use lexical features. These more sophisticated models however require probability distributions with complex conditioning information, which must be smoothed to avoid sparsity issues.<br/> In this work we explore several dependency grammars that use smoothing, and lexical features. We explore a variety of different smoothing regimens, and find that smoothing is helpful for even unlexicalized models such as the Dependency Model with Valence. Furthermore, adding lexical features yields the highest accuracy dependency induction on the Penn Treebank WJS10 corpus to date. In sum, this dissertation extends unsupervised grammar induction by incorporating lexical conditional information, by investigating smoothing in an unsupervised framework.
Notes:: Thesis (Ph.D. -- Brown University (2012)

Content

Access Conditions

Rights: In Copyright
Restrictions on Use: Collection is open for research.

Citation

Headden, William P., "Unsupervised Bayesian Lexicalized Dependency Grammar Induction" (2012). Computer Science Theses and Dissertations. Brown Digital Repository. Brown University Library. https://doi.org/10.7301/Z0N29V7J

Relations

Collection:

Computer Science Theses and Dissertations

Theses and Dissertations for the Computer Science department.
...