Data stream processing addresses the need for high-throughput near real-time data processing, which can be considered as one part of Big Data or Fast Data. In this paper, we study the local parallelization of stream p...
详细信息
ISBN:
(纸本)9781509009978
Data stream processing addresses the need for high-throughput near real-time data processing, which can be considered as one part of Big Data or Fast Data. In this paper, we study the local parallelization of stream processing on a single multi-core Central processing Unit (CPU) computer system, which, in our opinion, was not sufficiently addressed yet. In distributed systems, optimizing the local throughput can help to improve the overall system. In less resource demanding scenarios, it may be beneficial to use more lightweight local parallelization instead of more complex distributed approaches. We present our work-in-progress on locally parallelizing stream processing on multiple CPU cores and on ways for further improving the local data processing. In order to study the fundamental mechanisms and effects, we focused on pleasingly parallel workloads. While pleasingly parallel tasks, by definition, can be easily parallelized, our results show that stream processing adds important aspects and that the outcomes strongly vary depending on use case and parallelization approach. Furthermore, we present early stages of a stream transformation Domain Specific language and of a self-adaptive mechanism for easing and optimizing the processing. We published our implementations as Open Source Software.
Literature-Based Discovery (LBD), a kind of knowledge discovery algorithm, is proposed by Don R. Swanson, which can assist the researchers to recognize implicit knowledge connection and further accelerate the generati...
详细信息
ISBN:
(纸本)9783319410098;9783319410081
Literature-Based Discovery (LBD), a kind of knowledge discovery algorithm, is proposed by Don R. Swanson, which can assist the researchers to recognize implicit knowledge connection and further accelerate the generation of new knowledge. However, most of algorithms in the field of LBD mainly start from the co-occurrence of terms to find connections between terms, and barely consider the semantic relation actually existing between pairs of terms. In this paper, a kind of directional recognition algorithm of semantic relation is put forward to recognize the directionality of semantic relation existing between pairs of terms. this algorithm will automatically judge the direction of semantic relation based on WordNet and JWNL. the numerical experiment results have indicated that the algorithm proposed in this paper can well recognize the directionality of the semantic relation.
Discourse parsing is the process of discovering the latent relational structure of a long form piece of text and remains a significant open challenge. One of the most difficult tasks in discourse parsing is the classi...
详细信息
ISBN:
(纸本)9781941643730
Discourse parsing is the process of discovering the latent relational structure of a long form piece of text and remains a significant open challenge. One of the most difficult tasks in discourse parsing is the classification of implicit discourse relations. Most state-of-the-art systems do not leverage the great volume of unlabeled text available on the web-they rely instead on human annotated training data. By incorporating a mixture of labeled and unlabeled data, we are able to improve relation classification accuracy, reduce the need for annotated data, while still retaining the capacity to use labeled data to ensure that specific desired relations are learned. We achieve this using a latent variable model that is trained in a reduced dimensionality subspace using spectral methods. Our approach achieves an F-1 score of 0.485 on the implicit relation labeling task for the Penn Discourse Treebank.
Stubs on Wikipedia often lack comprehensive information. the huge cost of editing Wikipedia and the presence of only a limited number of active contributors curb the consistent growth of Wikipedia. In this work, we pr...
详细信息
ISBN:
(纸本)9781941643723
Stubs on Wikipedia often lack comprehensive information. the huge cost of editing Wikipedia and the presence of only a limited number of active contributors curb the consistent growth of Wikipedia. In this work, we present WikiKreator, a system that is capable of generating content automatically to improve existing stubs on Wikipedia. the system has two components. First, a text classifier built using topic distribution vectors is used to assign content from the web to various sections on a Wikipedia article. Second, we propose a novel abstractive summarization technique based on an optimization framework that generates section-specific summaries for Wikipedia stubs. Experiments show that WikiKreator is capable of generating well-formed informative content. Further, automatically generated content from our system have been appended to Wikipedia stubs and the content has been retained successfully proving the effectiveness of our approach.
Relation triples produced by open domain information extraction (open IE) systems are useful for question answering, inference, and other IE tasks. Traditionally these are extracted using a large set of patterns;howev...
详细信息
ISBN:
(纸本)9781941643723
Relation triples produced by open domain information extraction (open IE) systems are useful for question answering, inference, and other IE tasks. Traditionally these are extracted using a large set of patterns;however, this approach is brittle on out-of-domain text and long-range dependencies, and gives no insight into the substructure of the arguments. We replace this large pattern set with a few patterns for canonically structured sentences, and shift the focus to a classifier which learns to extract self-contained clauses from longer sentences. We then run natural logic inference over these short clauses to determine the maximally specific arguments for each candidate triple. We show that our approach outperforms a state-of-the-art open IE system on the end-to-end TAC-KBP 2013 Slot Filling task.
the continuous growth of unstructured textual information on the web implies the need for novel, semantically aware content processing and information retrieval (IR) methods. Following the evolution and wide adoption ...
详细信息
Efficiency of the organization of psychiatric care increases when usage of information technologies intensifies, and interaction of various institutions providing psychiatric care improves. Development and improvement...
详细信息
ISBN:
(纸本)9781509034307
Efficiency of the organization of psychiatric care increases when usage of information technologies intensifies, and interaction of various institutions providing psychiatric care improves. Development and improvement of electronic forms of medical documentation, automation of the psychiatrist's workplace, data transmission using protected information channels provide continuity in mental health care providing. However, informatization of medicine in some cases assumes a basic change of doctor's work technology, algorithms, techniques of collecting and processinginformation, and decision-making. For the purpose of studying doctors' readiness for work in this direction, we analyzed opinions of the practicing psychiatrists. Also integrity of using all opportunities for application of information technologies in psychiatric practice was investigated, data on integration of psychiatric community in use of information technologies at the present stage were compared, dynamics of process was estimated. Poll data of 100 psychiatrists working in out-patient and stationary facilities of Volgograd city have been used for research. Conclusions are drawn on the perspective directions of interdepartmental cooperation of medical psychiatric community and professionals of information technologies: development of high-quality Russian-language software for patients' electronic diaries, the multi-purpose automated questionnaire for a complex of organizational and methodical actions for carrying out routine inspections at the new technological level, creation and improvement of a medical information network for psychiatric service. the fastest introduction of the information technologies facilitating control and supervision of separate groups of the patients, for example, follow up the dispensary supervision (DS), carrying out the medical commissions, training, remote supervision for the areas of region and also for ensuring the principle of continuity in psychiatric practice is espec
In this paper, we present our Crossword Puzzle Resolution System (SACRY), which exploits syntactic structures for clue reranking and answer extraction. SACRY uses a database (DB) containing previously solved CPs in or...
详细信息
ISBN:
(纸本)9781941643990
In this paper, we present our Crossword Puzzle Resolution System (SACRY), which exploits syntactic structures for clue reranking and answer extraction. SACRY uses a database (DB) containing previously solved CPs in order to generate the list of candidate answers. Additionally, it uses innovative features, such as the answer position in the rank and aggregated information such as the min, max and average clue reranking scores. Our system is based on webCrow, one of the most advanced systems for automatic crossword puzzle resolution. Our extensive experiments over our two million clue dataset show that our approach highly improves the quality of the answer list, enabling the achievement of unprecedented results on the complete CP resolution tasks, i.e., accuracy of 99.17%.
We present a new release of the Paraphrase Database. PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0's heuristic rankings. Eac...
详细信息
ISBN:
(纸本)9781941643730
We present a new release of the Paraphrase Database. PPDB 2.0 includes a discriminatively re-ranked set of paraphrases that achieve a higher correlation with human judgments than PPDB 1.0's heuristic rankings. Each paraphrase pair in the database now also includes fine-grained entailment relations, word embedding similarities, and style annotations.
Currently, very large data have been transferred from everywhere through World Wide web. Consequently, the information extraction systems have been arising and many researches have been focusing on those data for util...
详细信息
ISBN:
(纸本)9781479960491
Currently, very large data have been transferred from everywhere through World Wide web. Consequently, the information extraction systems have been arising and many researches have been focusing on those data for utilizing them. these systems are very useful for data pre-processing and cleaning for real-time applications. Moreover, these systems can make other analyzing systems to analyze the data in real time such as social network mining, web mining, data mining, or even special tasks such as false advertisement detection, demand forecasting, and comment extraction on product and service reviews. In this paper, we focus on extracting the content data of web pages in e-commerce web sites based on subject detection and node density. In the experimental results, it can signify that our proposed method is appropriated to extract the data rich region in data-intensive pages in an automatic fashion.
暂无评论