Perhaps, PDF is the most popular format to share non-editable documents. PDF documents are often untagged. In particular, this means that positions and the cell structure of tables are not designated explicitly. PDF t...
详细信息
ISBN:
(纸本)9781665423311
Perhaps, PDF is the most popular format to share non-editable documents. PDF documents are often untagged. In particular, this means that positions and the cell structure of tables are not designated explicitly. PDF table detection predicts bounding boxes of tables on document pages. Some of the predictions inevitably happen to be false. This negatively affects the accuracy of table structure recognition. We argue that the page layout analysis in pre- and post-processing can refine the table detection. We suggest pre-processing algorithms for the recognition of headings, running titles, paragraphs, and images in PDF pages. This allows selecting areas of interest inside pages where real tables can be placed. Then we use deep neural networks to predict tables only in these areas. We also propose post-processing algorithms to verify predictions and filter out false table candidates after table detection. Our empirical study shows that the proposed approach reduces errors in the table detection and improve the PDF table extraction overall.
Monte Carlo simulation of the caffeine-DNA interaction in aqueous solution at room temperature was carried out using parallel calculations on supercomputer. Very large simulation boxes were used containing superhelica...
详细信息
Monte Carlo simulation of the caffeine-DNA interaction in aqueous solution at room temperature was carried out using parallel calculations on supercomputer. Very large simulation boxes were used containing superhelical B-DNA fragment surrounded by caffeine and water molecules. The most probable binding sites of caffeine molecules on the DNA surface as well as structural features of the respective caffeine-DNA complexes were revealed for several solutions' concentrations.
Dynamic symbolic execution (DSE) is a powerful method for path exploration during hybrid fuzzing and automatic bug detection. We propose security predicates to effectively detect undefined behavior and memory access v...
详细信息
In this paper, we present a novel stochastic method for solving variational inequalities (VI) in the context of Markovian noise. By leveraging Extragradient technique, we can productively solve VI optimization problem...
详细信息
The two-dimensional problem of a viscous laminar flow around Zhukovsky airfoils at an angle of attack is considered. Based on the approach of local similarity, which was proposed by Kochin and Loytsyansky for the equa...
详细信息
The two-dimensional problem of a viscous laminar flow around Zhukovsky airfoils at an angle of attack is considered. Based on the approach of local similarity, which was proposed by Kochin and Loytsyansky for the equations of laminar boundary layer, we have found the shear stresses at the aifoil, and the coordinates of the separation points. Assuming the values of the velocities at the separation points to be equal, we find the value of the circulation. A complete solution to the problem of the velocity and pressure field outside the boundary layer is also constructed. The theoretical results are compared with the available experimental data and numerical simulations of the Navier-Stokes equations.
The paper, basing on analysis of the Monte-Carlo Tree Search (MCTS) method and specific features of its behavior for various cases of usage, proposes a new variant of the method, which was called as Monte-Carlo Tree S...
详细信息
ISBN:
(纸本)9781509030071
The paper, basing on analysis of the Monte-Carlo Tree Search (MCTS) method and specific features of its behavior for various cases of usage, proposes a new variant of the method, which was called as Monte-Carlo Tree Search with Tree Shape Control (MCTS-TSC) and which uses original Depth-Width Criteria (DWCs) for both tree shape estimation and control during search and for estimation and selection of potentially better options for search continuation. Proposed Tree Shape Control (TSC) technique can be used with some other tuning, pruning, and learning techniques. Besides, it can provide better scheduling of MCTS parallelization.
In this work, we tackle the problem of Armenian named entity recognition, providing silverand gold-standard datasets as well as establishing baseline results on popular models. We present a 163000-token named entity c...
详细信息
ISBN:
(纸本)9781728112763;9781728112756
In this work, we tackle the problem of Armenian named entity recognition, providing silverand gold-standard datasets as well as establishing baseline results on popular models. We present a 163000-token named entity corpus automatically generated and annotated from Wikipedia, and another 53400token corpus of news sentences with manual annotation of people, organization and location named entities. The corpora were used to train and evaluate several popular named entity recognition models. Alongside the datasets, we release 50-, 100-, 200-, 300dimensional GloVe word embeddings trained on a collection of Armenian texts from Wikipedia, news, blogs, and encyclopedia.
The causes of sudden cardiac death (SCD) have not yet been completely studied. At the same time, their share of mortality from heart disease is constantly growing. The use of artificial intelligence (AI) technology fo...
详细信息
ISBN:
(数字)9798350376968
ISBN:
(纸本)9798350376975
The causes of sudden cardiac death (SCD) have not yet been completely studied. At the same time, their share of mortality from heart disease is constantly growing. The use of artificial intelligence (AI) technology for timely diagnosis of the risk of sudden cardiac death, in particular for ECG analysis, is showing impressive and promising results. Of course, simple ECG analysis is not enough for an accurate prediction. Therefore, AI can be a powerful tool for testing hypotheses about additional medical factors that can improve diagnostic accuracy. In practice, the primary issue is the absence of established standards for AI use, coupled with the technical challenge of promptly transmitting ECG readings to the diagnostic center and receiving a real-time diagnosis from the AI system.
We present a native XML database management system, Sedna, which is implemented from scratch as a full-featured database management system for storing large amounts of XML data. We believe that the key contribution of...
详细信息
In this paper we propose a new dataset for information extraction from news web pages. Accurate collection of news articles is necessary to build systems that aggregate and analyze data from a large number of news sou...
详细信息
In this paper we propose a new dataset for information extraction from news web pages. Accurate collection of news articles is necessary to build systems that aggregate and analyze data from a large number of news sources. However, existing news page datasets are designed for the problem of article content extraction and do not consider other metadata such as dates or authors, while modern generic approaches for structured web data extraction are evaluated on other domains, mostly e-commerce products websites. Our dataset contains 724 web pages from 114 Russian news websites. On each page we manually annotated title, text, publication date, tags and other article attributes. We describe data collection and annotation process and demonstrate evaluation results for open source tools and neural network based approaches. The dataset is publicly available on GitHub.
暂无评论