Skip to page navigation menu Skip entire header
Brown University
Skip 13 subheader links

Unsupervised Bayesian Lexicalized Dependency Grammar Induction

Description

Abstract:
This dissertation investigates learning dependency grammars for statistical natural language parsing from corpora without parse tree annotations. Most successful work in unsupervised dependency grammar induction has assumed that the input consists of sequences of parts-of-speech, ignoring words and using extremely simple probabilistic models. However, supervised parsing has long shown the value of more sophisticated models which use lexical features. These more sophisticated models however require probability distributions with complex conditioning information, which must be smoothed to avoid sparsity issues.<br/> In this work we explore several dependency grammars that use smoothing, and lexical features. We explore a variety of different smoothing regimens, and find that smoothing is helpful for even unlexicalized models such as the Dependency Model with Valence. Furthermore, adding lexical features yields the highest accuracy dependency induction on the Penn Treebank WJS10 corpus to date. In sum, this dissertation extends unsupervised grammar induction by incorporating lexical conditional information, by investigating smoothing in an unsupervised framework.
Notes:
Thesis (Ph.D. -- Brown University (2012)

Access Conditions

Rights
In Copyright
Restrictions on Use
Collection is open for research.

Citation

Headden, William P., "Unsupervised Bayesian Lexicalized Dependency Grammar Induction" (2012). Computer Science Theses and Dissertations. Brown Digital Repository. Brown University Library. https://doi.org/10.7301/Z0N29V7J

Relations

Collection: