We show that the compressed suffix array and the compressed suffix tree of a string T can be built in O(n) deterministic time using O(n log σ) bits of space, where n is the string length and σ is the alphabet size. ...
详细信息
ISBN:
(纸本)9781510836358
We show that the compressed suffix array and the compressed suffix tree of a string T can be built in O(n) deterministic time using O(n log σ) bits of space, where n is the string length and σ is the alphabet size. Previously described deterministic algorithms either run in time that depends on the alphabet size or need ω(n log σ) bits of working space. Our result has immediate applications to other problems, such as yielding the first deterministic linear-time LZ77 and LZ78 parsing algorithms that use O(n log σ) bits.
We propose a complete probabilistic discriminative framework for performing sentence-level discourse analysis. Our framework comprises a discourse segmenter, based on a binary classifier, and a discourse parser, which...
详细信息
ISBN:
(纸本)9781622765034
We propose a complete probabilistic discriminative framework for performing sentence-level discourse analysis. Our framework comprises a discourse segmenter, based on a binary classifier, and a discourse parser, which applies an optimal CKY-like parsing algorithm to probabilities inferred from a Dynamic Conditional Random Field. We show on two corpora that our approach outperforms the state-of-the-art, often by a wide margin.
In this paper we deal with Named Entity Recognition (NER) on transcriptions of French broadcast data. Two aspects make the task more difficult with respect to previous NER tasks: i) named entities annotated used in th...
详细信息
ISBN:
(纸本)9781937284190
In this paper we deal with Named Entity Recognition (NER) on transcriptions of French broadcast data. Two aspects make the task more difficult with respect to previous NER tasks: i) named entities annotated used in this work have a tree structure, thus the task cannot be tackled as a sequence labelling task;ii) the data used are more noisy than data used for previous NER tasks. We approach the task in two steps, involving Conditional Random Fields and Probabilistic Context-Free Grammars, integrated in a single parsing algorithm. We analyse the effect of using several tree representations. Our system outperforms the best system of the evaluation campaign by a significant margin.
暂无评论