Abstract
The availabil i ty of large, syntactically-bracketed corpora such as the Penn Tree Bank affords us the opportunity to automatically build or train broad-coverage grammars, and in particular t.o train probabilisti c grammars. A number of recent parsing experiments have also indicated that. grammars whose production probabilities are dependent on the ,context can be more effective than context-free grammars in selecting a correct parse. To make maximal use of context, we have automatically constructed, from the Penn Tree Bank version 2, a grammar in which the symbols S and NP are the only real nonterminals, and the other non-terminals or grammatical nodes are in effect embedded into the right-hand-sides of the S and NP rules. For example, one of the rnles extraded from the tree bank would be S -> NP VBX JJ CC VBX NP [1] ( where NP is a non-terminal and the other symbols are terminals - part-of-speech tags of the Tr-ee Bank ) . Tbe most common structure in t.he Tree Bank a5sociat.ed with this expansion is (S ·NP ( VP ( VP VB.I (ADJ J J ) C C (VP VBX NP ) ) ) ) [2] . So i f our parser uses rule [l] j n parsing a sentence, i t. will generate structure [2] for the corresponding part of the sentence. l. sing 94% of the Penn Tree Bank for training, we extracted 32,296 distinct rules ( 2:3,386 for S, and .910 for NP ) . We also built a smaller version of the grammar based ,on higher frequency patterns for use a5 a back-up when the larger grammar is unable to produce a parse due to memory limitation . We applied this parser to 1 ,989 Wall St1·eet Journal sentences (separate from the training set and with no lirrnt on sentence length ) . Of the parsed sentences ( 1 ,899 ) , the percentage of no-crossing sentences is 33:9%, and Parseval recall and precision are 73.43% and 72 .61 %.
Original language | English (US) |
---|---|
Pages | 216-223 |
Number of pages | 8 |
State | Published - 1995 |
Event | 4th International Workshop on Parsing Technologies, IWPT 1995 - Prague and Karlovy Vary, Czech Republic Duration: Sep 20 1995 → Sep 24 1995 |
Conference
Conference | 4th International Workshop on Parsing Technologies, IWPT 1995 |
---|---|
Country/Territory | Czech Republic |
City | Prague and Karlovy Vary |
Period | 9/20/95 → 9/24/95 |
ASJC Scopus subject areas
- Artificial Intelligence
- Human-Computer Interaction
- Linguistics and Language