We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis appli...
详细信息
We introduce (1) a novel stochastic inversion transduction grammar formalism for bilingual language modeling of sentence-pairs, and (2) the concept of bilingual parsing with a variety of parallel corpus analysis applications. Aside from the bilingual orientation, three major features distinguish the formalism from the finite-state transducers more traditionally found in computational linguistics: it skips directly to a context-free rather than finite-state base, it permits a minimal extra degree of ordering flexibility, and its probabilistic formulation admits an efficient maximum-likelihood bilingual pausing algorithm. A convenient normal form is shown to exist. Analysis of the formalism's expressiveness suggests that it is particularly well suited to modeling ordering shifts between languages, balancing needed flexibility against complexity constraints. We discuss a number of examples of how stochastic inversion transduction grammars bring bilingual constraints to bear upon problematic corpus analysis tasks such as segmentation, bracketing, phrasal alignment, and parsing.
We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive p...
详细信息
We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar;b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar;c) most likely (Viterbi) parse of the string;d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. Probabilities (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earley's top-down control structure. If can process any context-free rule format without conversion to some normal form, and combines computations for (a) through (d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracketed inputs, and for finding partial parses and their likelihoods on ungrammatical inputs.
The graph structure is a strong formalism for representing pictures in syntactic pattern recognition. Many models for graph grammars have been proposed as a kind of hyper-dimensional generating systems, whereas the us...
详细信息
The graph structure is a strong formalism for representing pictures in syntactic pattern recognition. Many models for graph grammars have been proposed as a kind of hyper-dimensional generating systems, whereas the use of such grammars for pattern recognition is relatively infrequent. One of the reasons is the difficulty of building a syntax analyzer for such graph grammars. In this paper, we define a subclass of nPCE graph grammars and present a parsing algorithm of O(n) for both sequential and parallel cases.
Plex structures specified by plex grammars are sets of symbols which interconnect in multidirection. We propose a new parsing scheme for plex grammars, which consists of two phases for symbols and their interconnectio...
详细信息
Plex structures specified by plex grammars are sets of symbols which interconnect in multidirection. We propose a new parsing scheme for plex grammars, which consists of two phases for symbols and their interconnections respectively in an input plex structure. The parsing of plex structures is simplified by our parsing scheme, because we are required to parse only a set of symbols instead of a plex structure. algorithms for the first phase are primarily discussed in this paper. Earley"s algorithm is extended to recognize a set of symbols. It is anticipated that the two-phase scheme of parsing may suggest a method for the parsing of other high-dimensional pattern grammars.
An edNLC-graph grammar, introduced by Janssens,(4) is a strong formalism for generating scene representations. This grammar generates directed node- and edge-labelled graphs, EDG-graphs. A method of construction of un...
详细信息
An edNLC-graph grammar, introduced by Janssens,(4) is a strong formalism for generating scene representations. This grammar generates directed node- and edge-labelled graphs, EDG-graphs. A method of construction of unambiguous string EDG-graph representation is briefly described. The characteristics of edNLC-graph grammar for syntactic pattern recognition allows us to construct the parsing algorithm. The deterministic top-down syntax analyzer is constructed for the subfamily of an edNLC-graph grammar, called an ETL/1-graph grammar. An ETL/1-graph grammar is parallel to a finite state string grammar. The notions introduced in the paper are useful for researches in less restricted edNLC-graph grammars, for example grammars analogical to context-free string grammars.
This study proposes a practical method for the construction of a more compact matrix structure of the precedence information used in a new weak precedence parsing. The parsing algorithm is different from the conventi...
详细信息
This study proposes a practical method for the construction of a more compact matrix structure of the precedence information used in a new weak precedence parsing. The parsing algorithm is different from the conventional weak precedence algorithm. It is possible to use the method for any weak precedence grammars without degrading the good error detection capability of the traditional weak precedence parsers. The empirical results serve to demonstrate that the obtained matrices are the very reasonable size and that the presented parsing algorithm is quite efficient.
Defines a class of context-free grammars called the simple LR(k) or SLR(k) grammars. Inclusion of weak precedence and simple precedence grammars as proper subsets in SLR(k) grammars; Construction and implementations o...
详细信息
Defines a class of context-free grammars called the simple LR(k) or SLR(k) grammars. Inclusion of weak precedence and simple precedence grammars as proper subsets in SLR(k) grammars; Construction and implementations of parsers for the SLR(k) grammars; Superiority over precedence techniques in terms of the speed of parser construction and in size and speed of resulting parsers.
暂无评论