The taxing computational effort that is involved in solving some high-dimensional statistical problems, in particular problems involving non-convex optimization, has popularized the development and analysis of algorit...
详细信息
The taxing computational effort that is involved in solving some high-dimensional statistical problems, in particular problems involving non-convex optimization, has popularized the development and analysis of algorithms that run efficiently (polynomial-time) but with no general guarantee on statistical consistency. In light of the ever-increasing compute power and decreasing costs, a more useful characterization of algorithms is by their ability to calibrate the invested computational effort with various characteristics of the input at hand and with the available computational resources. We propose a new greedy algorithm for the l(0)-sparse PCA problem which supports the calibration principle. We provide both a rigorous analysis of our algorithm in the spiked covariance model, as well as simulation results and comparison with other existing methods. Our findings show that our algorithm recovers the spike in SNR regimes where all polynomial-time algorithms fail while running in a reasonable parallel-time on a cluster.
In the field of document analysis and recognition using mobile devices for capturing, and the field of object recognition in a video stream, an important problem is determining the time when the capturing process shou...
详细信息
ISBN:
(纸本)9781510636446
In the field of document analysis and recognition using mobile devices for capturing, and the field of object recognition in a video stream, an important problem is determining the time when the capturing process should be stopped. Efficient stopping influences not only the total time spent for performing recognition and data entry, but the expected accuracy of the result as well. This paper is directed on extending the stopping method based on next integrated recognition result modelling, in order for it to be used within a string result recognition model with per-character alternatives. The stopping method and notes on its extension are described, and experimental evaluation is performed on an open dataset MIDV-500. The method was compares with previously published methods based on input observations clustering. The obtained results indicate that the stopping method based on the next integrated result modelling allows to achieve higher accuracy, even when compared with the best achievable configuration of the competing methods.
An approximate discovery of closed itemsets is usually based on either setting a frequency threshold or computing a sequence of projections. Both approaches, being incremental, do not provide any estimate of the size ...
详细信息
The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition...
详细信息
The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.
anytime algorithms are well suited for time-limited pathfinding problems, which find a feasible sub-optimal solution quickly and then continually work on improving it until time runs out. In this paper, a new anytime ...
详细信息
anytime algorithms are well suited for time-limited pathfinding problems, which find a feasible sub-optimal solution quickly and then continually work on improving it until time runs out. In this paper, a new anytime algorithm is introduced, called anytime Rectangle Expansion A* (AREA*), which is the anytime variant of basic Rectangle Expansion A* (REA*). AREA* runs an accelerated sub-optimal search and then an incomplete REA* will repair the sub-optimality. AREA* can not only return the first feasible solution much faster than existing anytime algorithms but also converge to the optimal solution in almost the same speed with REA*, which is an order of magnitude and more faster than A*. AREA* also provides narrower and more accurate sub-optimality bounds of its solutions. Experimental results for typical benchmark problem sets show that, compared with the existing anytime techniques and the basic REA*, the new algorithm shows a significant improvement in performance for time-limited pathfinding problems.
Multi-Agent Path Finding (MAPF) is the challenging problem of computing collision-free paths for a cooperative team of moving agents. algorithms for solving MAPF can be categorized on a spectrum. At one end are (bound...
详细信息
ISBN:
(纸本)9781713832621
Multi-Agent Path Finding (MAPF) is the challenging problem of computing collision-free paths for a cooperative team of moving agents. algorithms for solving MAPF can be categorized on a spectrum. At one end are (bounded-sub)optimal algorithms that can find high-quality solutions for small problems. At the other end are unbounded-suboptimal algorithms (including prioritized and rule-based algorithms) that can solve very large practical problems but usually find low-quality solutions. In this paper, we consider a third approach that combines both advantages: anytime algorithms that quickly find an initial solution, including for large problems, and that subsequently improve the solution to near-optimal as time progresses. To improve the solution, we replan subsets of agents using Large Neighborhood Search, a popular meta-heuristic often applied in combinatorial optimization. Empirically, we compare our algorithm MAPF-LNS to the state-of-the-art anytime MAPF algorithm anytime BCBS and report significant gains in scalability, runtime to the first solution, and speed of improving solutions.
This work is about speeding up retrieval in Case-Based Reasoning (CBR) for large-scale case bases (CBs) comprised of temporally related cases in metric spaces. A typical example is a CB of electronic health records wh...
详细信息
This work is about speeding up retrieval in Case-Based Reasoning (CBR) for large-scale case bases (CBs) comprised of temporally related cases in metric spaces. A typical example is a CB of electronic health records where consecutive sessions of a patient forms a sequence of related cases. k-Nearest Neighbors (kNN) search is a widely used algorithm in CBR retrieval. However, brute-force kNN is impossible for large CBs. As a contribution to efforts for speeding up kNN search, we introduce an anytime kNN search methodology and algorithm. anytime Lazy kNN finds exact kNNs when allowed to run to completion with remarkable gain in execution time by avoiding unnecessary neighbor assessments. For applications where the gain in exact kNN search may not suffice, it can be interrupted earlier and it returns best-so-far kNNs together with a confidence value attached to each neighbor. We describe the algorithm and methodology to construct a probabilistic model that we use both to estimate confidence upon interruption and to automatize the interruption at desired confidence thresholds. We present the results of experiments conducted with publicly available datasets. The results show superior gains compared to brute-force search. We reach to an average gain of 87.18% with 0.98 confidence and to 96.84% with 0.70 confidence. (C) 2020 Elsevier B.V. All rights reserved.
The paper describes the problem of stopping the text field recognition process in a video stream, which is a novel problem, particularly relevant to real-time mobile document recognition systems. A decision-theoretic ...
详细信息
The paper describes the problem of stopping the text field recognition process in a video stream, which is a novel problem, particularly relevant to real-time mobile document recognition systems. A decision-theoretic framework for this problem is provided, and similarities with existing stopping rule problems are explored. Following the theoretical works on monotone stopping rule problems, a strategy is proposed based on thresholding the estimation of the expected difference between consequent recognition results. The efficiency of this strategy is evaluated on an openly accessible dataset. The results show that this method outperforms the previously published methods based on identical results cluster size thresholding. Notes on future work include incorporation of recognition result confidence estimations in the proposed model and more precise evaluation of the observation cost.
Subgroup discovery is the task of discovering patterns that accurately discriminate a class label from the others. Existing approaches can uncover such patterns either through an exhaustive or an approximate explorati...
详细信息
ISBN:
(纸本)9783030109288;9783030109271
Subgroup discovery is the task of discovering patterns that accurately discriminate a class label from the others. Existing approaches can uncover such patterns either through an exhaustive or an approximate exploration of the pattern search space. However, an exhaustive exploration is generally unfeasible whereas approximate approaches do not provide guarantees bounding the error of the best pattern quality nor the exploration progression ("How far are we of an exhaustive search"). We design here an algorithm for mining numerical data with three key properties w.r.t. the state of the art: (i) It yields progressively interval patterns whose quality improves over time;(ii) It can be interrupted anytime and always gives a guarantee bounding the error on the top pattern quality and (iii) It always bounds a distance to the exhaustive exploration. After reporting experimentations showing the effectiveness of our method, we discuss its generalization to other kinds of patterns. Code related to this paper is available at: https://***/Adnene93/RefineAndMine.
暂无评论