A graph grammar is a formal tool for providing rigorous but intuitive ways to define visual languages. However, the description and implementation difficulties of graph grammars hinder their wide applications. This pa...
详细信息
ISBN:
(纸本)9781479979837
A graph grammar is a formal tool for providing rigorous but intuitive ways to define visual languages. However, the description and implementation difficulties of graph grammars hinder their wide applications. This paper, first of all introduces a description mechanism for specifying the existing Edge-based Graph Grammar (EGG) by using the XML techniques. Then, based on the mechanism, a new parsing algorithm is proposed, which is easily designed and implemented by using the XML technique.
Background: Searching for members of characterized ncRNA families containing pseudoknots is an important component of genome-scale ncRNA annotation. However, the state-of-the-art known ncRNA search is based on context...
详细信息
Background: Searching for members of characterized ncRNA families containing pseudoknots is an important component of genome-scale ncRNA annotation. However, the state-of-the-art known ncRNA search is based on context-free grammar (CFG), which cannot effectively model pseudoknots. Thus, existing CFG-based ncRNA identification tools usually ignore pseudoknots during search. As a result, dozens of sequences that do not contain the native pseudoknots are reported by these tools. When pseudoknot structures are vital to the functions of the ncRNAs, these sequences may not be true members. Results: In this work, we design a pseudoknot search tool using multiple simple sub-structures, which are derived from knot-free and bifurcation-free structural motifs in the underlying family. We test our tool on a contiguous 22-Mb region of the Maize Genome. The experimental results show that our work competes favorably with other pseudoknot search methods. Conclusions: Our sub-structure based tool can conduct genome-scale pseudoknot-containing ncRNA search effectively and efficiently. It provides a complementary pseudoknot search tool to Infernal. The source codes are available at http://***/similar to chengy/knotsearch.
Software tools are developed for computer realization of syntactic, semantic, and morphological models of natural language texts, using rule based programming. The tools are efficient for a language, which has free or...
详细信息
Software tools are developed for computer realization of syntactic, semantic, and morphological models of natural language texts, using rule based programming. The tools are efficient for a language, which has free order of words and developed morphological structure like Georgian. For instance, a Georgian verb has several thousand verb-forms. It is very difficult to express rules of morphological analysis by finite automaton and it will be inefficient as well. Resolution of some problems of full morphological analysis of Georgian words is impossible by finite automaton. Splitting of some Georgian verb-forms into morphemes requires non-deterministic search algorithm, which needs many backtrackings. To minimize backtrackings, it is necessary to put constraints, which exist among morphemes and verify them as soon as possible to avoid false directions of search. Software tool for syntactic analysis has means to reduce rules, which have the same members in different order. The authors used the tool for semantic analysis as well. Thus, proposed software tools have many means to construct efficient parser, test and correct it. The authors realized morphological and syntactic analysis of Georgian texts by these tools. In the presented paper, the authors describe the software tools and its application for Georgian language.
Background: The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely t...
详细信息
Background: The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. Results: We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. Conclusion: Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used.
In this paper, we present a Machine Translation (MT) system from English to Indonesian by applying Link Grammar (LG) formalism. The Annotated Disjunct (ADJ) technique available in the LG formalism is utilized to map E...
详细信息
ISBN:
(纸本)9783540851097
In this paper, we present a Machine Translation (MT) system from English to Indonesian by applying Link Grammar (LG) formalism. The Annotated Disjunct (ADJ) technique available in the LG formalism is utilized to map English sentences into equivalent Indonesian sentences. The ADJ is a promising technique to deal with target languages that do not have grammar formalism, parser, and corpus available like Indonesian language. An experimental evaluation shows that the applicability of LG for Indonesian language worked as expected. We have also discussed some significant issues to be considered in future development.
An improved chart parser based on active edges sharing the leftmost common elements is presented. After analyzing the mechanism of avoiding redundant work in the traditional chart parsing algorithm, the inefficient tr...
详细信息
An improved chart parser based on active edges sharing the leftmost common elements is presented. After analyzing the mechanism of avoiding redundant work in the traditional chart parsing algorithm, the inefficient treatment on active edges possessing the same leftmost common elements was discovered. Then a new presentation of the active edge was proposed to share the same leftmost elements, and thus an improved chart parser was realized with great decrease of generated active edges. The experimental results on Chinese Treebank show that the improved chart parser significantly outperforms the packed chart parser in terms of both speed (about 10 times faster) and space consumption.
In a companion paper [P.R.J. Asveld, Fuzzy context-free languages-Part 1: Generalized fuzzy context-free grammars, Theoret. Comput. Sci. (2005)] we used fuzzy context-free grammars in order to model grammatical errors...
详细信息
In a companion paper [P.R.J. Asveld, Fuzzy context-free languages-Part 1: Generalized fuzzy context-free grammars, Theoret. Comput. Sci. (2005)] we used fuzzy context-free grammars in order to model grammatical errors resulting in erroneous inputs for robust recognizing and parsing algorithms for fuzzy context-free languages. In particular, this approach enables us to distinguish between small errors ("tiny mistakes") and big errors ("capital blunders"). In this paper, we present some algorithms to recognize fuzzy context-free languages: particularly, a modification of Cocke-Younger-Kasami's algorithm and some recursive descent algorithms. Then we extend these recognition algorithms to corresponding parsing algorithms for fuzzy context-free languages. These parsing algorithms happen to be robust in some very elementary sense. (c) 2005 Elsevier B.V. All rights reserved.
In this paper, we propose a feature-based Korean grammar utilizing the learned constraint rules in order to improve parsing efficiency. The proposed grammar consists of feature structures, feature operations, and cons...
详细信息
In this paper, we propose a feature-based Korean grammar utilizing the learned constraint rules in order to improve parsing efficiency. The proposed grammar consists of feature structures, feature operations, and constraint rules;and it has the following characteristics. First, a feature structure includes several features to express useful linguistic information for Korean parsing. Second, a feature operation generating a new feature structure is restricted to the binary-branching form which can deal with Korean properties such as variable word order and constituent ellipsis. Third, constraint rules improve efficiency by preventing feature operations from generating spurious feature structures. Moreover, these rules are learned from a Korean treebank by a decision tree learning algorithm. The experimental results show that the feature-based Korean grammar can reduce the number of candidates by a third of candidates at most and it runs 1.5 similar to 2 times faster than a CFG on a statistical parser.
In this paper, we will investigate parsing algorithms for QTAGs and their extension called multifoot QTAGs (MFQTAGs). QTAG is a kind of tree adjoining grammars which generates the set of quadtrees. The complexity of o...
详细信息
In this paper, we will investigate parsing algorithms for QTAGs and their extension called multifoot QTAGs (MFQTAGs). QTAG is a kind of tree adjoining grammars which generates the set of quadtrees. The complexity of our parsing algorithms are O(N-2) by making use of good properties of quadtrees. In this case, the variable N is the diameter of the input image instead of the number of pixels. In other words, parsing for the languages of QTAGs has the linear time complexity. (C) 1999 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
This paper describes an object-oriented lexical representation language based on Unification Categorial Grammar (UCG) that encodes linguistic and semantic information uniformly as classes and objects and an efficient ...
详细信息
This paper describes an object-oriented lexical representation language based on Unification Categorial Grammar (UCG) that encodes linguistic and semantic information uniformly as classes and objects and an efficient bottom-up parsing method for UCG using selection sets technique. The lexical representation language, implemented in the logic and object-oriented programming language LIFE, introduces several new information sharing mechanisms to enable natural, declarative, modular and economial construction of large and complex computational lexicons. The selection sets are deduced from a transformation between UCG and Context-Free Grammar (CFG) and used to reduce search space for the table-driven algorithm. The experimental tests on a spoken English corpus show that the hierarchical lexicon achieves a dramatic reduction on redundant information and that selection sets significantly improve parsing UCG with a polynomial time complexity.
暂无评论