Text analytics is becoming an increasingly important tool used in biomedical research, While advances continue to be made in the core algorithms for entity identification and relation extraction, a need for practical ...
详细信息
ISBN:
(纸本)9812704175
Text analytics is becoming an increasingly important tool used in biomedical research, While advances continue to be made in the core algorithms for entity identification and relation extraction, a need for practical applications of these technologies arises. We developed a system that allows users to explore the US Patent corpus using molecular information. the core of our system contains three main technologies: A high performing chemical annotator which identifies chemical terms and converts them to structures, a similarity search engine based on the emerging IUPAC international Chemical Identifier (InChI) standard, and a set of on demand data mining tools. By leveraging this technology we were able to rapidly identify and index 3, 623, 248 unique chemical structures from 4,375,036 US Patents and Patent Applications. Using this system a user may go to a web page, draw a molecule, search for related Intellectual Property (IP) and analyze the results. Our results prove that this is a far more effective way for identifying IP than traditional keyword based approaches.
the proceedings contain 26 papers. the topics discussed include: static analysis in disjunctive numerical domains;static analysis of numerical algorithms;abstract regular tree model checking of complex dynamic data st...
详细信息
ISBN:
(纸本)3540377565
the proceedings contain 26 papers. the topics discussed include: static analysis in disjunctive numerical domains;static analysis of numerical algorithms;abstract regular tree model checking of complex dynamic datastructures;existential label flow inference via CFL reachability;abstract interpretation with specialized definitions;combining widening and acceleration in linear relation analysis;beyond reachability: shape abstraction in the presence of pointer arithmetic;interprocedural shape analysis with separated heap abstraction;catching and identifying bugs in register allocation;certificate translation for optimizing compilers;proving the properties of communicating imperfectly-clocked synchronous systems;parametric and termination-sensitive control dependence;memory leak analysis by contradiction;and path-sensitive dataflow analysis with iterative refinement.
We consider history independent datastructures as proposed for study by Naor and Teague [3]. In a history independent data structure, nothing can be learned from the memory representation of the data structure except...
详细信息
We consider history independent datastructures as proposed for study by Naor and Teague [3]. In a history independent data structure, nothing can be learned from the memory representation of the data structure except for what is available from the abstract data structure. We show that for the most part, strong history independent datastructures have canonical representations. We provide a natural alternative definition of strong history independence that is less restrictive than [3] and characterize how it restricts allowable representations. We also give a general formula for creating dynamically resizing history independent datastructures and give a related impossibility result.
A new form of optimality for comparison-based static dictionaries is introduced. this type of optimality, key-independent optimality, is motivated by applications that assign key values randomly. It is shown that any ...
详细信息
A new form of optimality for comparison-based static dictionaries is introduced. this type of optimality, key-independent optimality, is motivated by applications that assign key values randomly. It is shown that any data structure that is key-independently optimal is expected to execute any access sequence where the key values are assigned arbitrarily to unordered data as fast as any offline binary search tree algorithm, within a multiplicative constant. Asymptotically tight upper and lower bounds are presented for key-independent optimality. Splay trees are shown to be key-independently optimal.
A new priority queue structure, the queap, is introduced. the queap executes insertion in 0(l) amortized time and extract-min in O(log(k+2)) amortized time if there are k items that have been in the heap longer than t...
详细信息
A new priority queue structure, the queap, is introduced. the queap executes insertion in 0(l) amortized time and extract-min in O(log(k+2)) amortized time if there are k items that have been in the heap longer than the item to be extracted. thus if the operations on the queap are first-in first-out, as on a queue, each operation will execute in constant time. this idea of trying to make operations on the least recently accessed items fast, which we call the queueish property, is a natural complement to the working set property of certain datastructures, such as splay trees and pairing heaps, where operations on the most recently accessed data execute quickly. However, we show that the queueish property is in some sense more difficult than the working set property by demonstrating that it is impossible to create a queueish binary search tree, but that many search datastructures can be made almost queueish with a O (log log n) amortized extra cost per operation.
A new form of optimality for comparison-based static dictionaries is introduced. this type of optimality, key-independent optimality, is motivated by applications that assign key values randomly. It is shown that any ...
详细信息
ISBN:
(纸本)3540001425
A new form of optimality for comparison-based static dictionaries is introduced. this type of optimality, key-independent optimality, is motivated by applications that assign key values randomly. It is shown that any data structure that is key-independently optimal is expected to execute any access sequence where the key values are assigned arbitrarily to unordered data as fast as any offline binary search tree algorithm, within a multiplicative constant. Asymptotically tight upper and lower bounds are presented for key-independent optimality. Splay trees are shown to be key-independently optimal.
A new priority queue structure, the queap, is introduced. the queap executes insertion in 0(l) amortized time and extract-min in O(log(k+2)) amortized time if there are k items that have been in the heap longer than t...
详细信息
ISBN:
(纸本)3540001425
A new priority queue structure, the queap, is introduced. the queap executes insertion in 0(l) amortized time and extract-min in O(log(k+2)) amortized time if there are k items that have been in the heap longer than the item to be extracted. thus if the operations on the queap are first-in first-out, as on a queue, each operation will execute in constant time. this idea of trying to make operations on the least recently accessed items fast, which we call the queueish property, is a natural complement to the working set property of certain datastructures, such as splay trees and pairing heaps, where operations on the most recently accessed data execute quickly. However, we show that the queueish property is in some sense more difficult than the working set property by demonstrating that it is impossible to create a queueish binary search tree, but that many search datastructures can be made almost queueish with a O (log log n) amortized extra cost per operation.
Multi-resolution is a useful tool for managing the complexity of huge terrain and geological data sets. Since encoding large data sets may easily exceed main memory capabilities, datastructures and algorithms capable...
详细信息
Multi-resolution is a useful tool for managing the complexity of huge terrain and geological data sets. Since encoding large data sets may easily exceed main memory capabilities, datastructures and algorithms capable of efficiently working in external memory are needed. In our work, we aim at developing an out-of-core multi-resolution model dimension-independent, that can be used for both terrains, represented by Triangulated Irregular Networks(TINs), and 3D data, such as geological data, represented by tetrahedral meshes. We have based our approach on a general multi-resolution model, that we have proposed in our previous work, which supports the extraction of variable-resolution representations. As first step, we have developed, in a prototype simulation system, a large number of clustering techniques for the modifications in a multi-resolution model. Here, we describe such techniques, and analyze and evaluate them experimentally. the result of this investigation has led us to select a specific clustering approach as the basis for an efficient out-of-core data structure. Copyright 2005 ACM.
We propose space-efficient datastructures for text retrieval systems that have merits of boththeoretical datastructures like suffix trees and practical ones like inverted files. Traditional text retrieval systems u...
详细信息
ISBN:
(纸本)3540001425
We propose space-efficient datastructures for text retrieval systems that have merits of boththeoretical datastructures like suffix trees and practical ones like inverted files. Traditional text retrieval systems use the inverted files and support ranking queries based on the tf*idf (term frequency times inverse document frequency) scores of documents that contain given keywords, which cannot be solved by using only the suffix trees. A drawback of the systems is that the scores can be computed for only predetermined keywords. We extend the data structure so that the scores can-be computed for any pattern efficiently while keeping the size of the datastructures moderate. the size is comparable withthe text size, which is an improvement from existing methods using 0 (n log n) bit space for a text collection of length n.
the proceedings contain 56 papers. the topics discussed include: on the comparison-addition complexity of all-pairs shortest paths;on the clique-width of graphs in hereditary classes;the probability of a rendezvous is...
ISBN:
(纸本)3540001425
the proceedings contain 56 papers. the topics discussed include: on the comparison-addition complexity of all-pairs shortest paths;on the clique-width of graphs in hereditary classes;the probability of a rendezvous is minimal in complete graphs;on the minimum volume of a perturbed unit cube;non-Delaunay-based curve reconstruction;cutting a country for smallest square fit;quantum multi-prover interactive proof systems with limited prior entanglement;a framework for network reliability problems on graphs of bounded treewidth;a faster approximation algorithm for 2-edge-connectivity augmentation;tree spanners on chordal graphs: complexity, algorithms, open problems;an asymptotic fully polynomial time approximation scheme for bin covering;a better approximation for the two-stage assembly scheduling problem with two machines at the first stage;and characterizing history independent datastructures.
暂无评论