Parse thicket

A parse thicket is a graph that represents the syntactic structure of a paragraph of text in natural language processing. A parse thicket includes parse tree for each sentence for this paragraph plus some arcs for other relations between words other than syntactic.[1] Parse thickets can be constructed for both constituency parse trees and dependency parse trees. The relations which link parse trees within a parse thicket are:[2]

To assess similarity between texts, such as a question and its candidate answers, parse thickets can be generalized [3]

In the image of parse thicket coreferences and entity-entity links are shown in solid red, and rhetoric/speech act relations are shown in dotted red. ETAP parser and tree visualization software is used.[4]

Parse Thicket

To compute generalization of two parse thickets, one needs to find their maximum common sub-graph (sub-thicket).[5]

Generalizing two parse thickets
A fragment showing particular cases of generalizing

References

  1. Galitsky B, Kuznetsov SO, Usikov DA. Parse Thicket Representation for Multi-sentence Search. Lecture Notes in Computer Science. 2013;7735:1072-1091. doi:10.1007/978-3-642-35786-2_12.
  2. Galitsky B, Ilvovsky D, Kuznetsov SO, Strok F. Matching sets of parse trees for answering multi-sentence questions. Recent Advances in Natural Language Processing. 2013.
  3. Galitsky B. Machine learning of syntactic parse trees for search and classification of text. Engineering Applications of Artificial Intelligence. 2013;26(3):153-172. doi:10.1016/j.engappai.2012.09.017.
  4. Boguslavsky, I., Iomdin, L., Sizov V.. Interactive enconversion by means of the ETAP-3 system. Culture, Language and Information Technologies. 2003.
  5. Galitsky B, Ilvovsky D, Kuznetsov SO, Strok F. Finding Maximal Common Sub-parse Thickets for Multi-sentence Search. Lecture Notes in Artificial Intelligence. 2013;8323.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.