Computing intersections among sets of one-dimensional intervals is an ubiquitous problem in computational geometry with important applications in bioinformatics, where the size of typical inputs is large and it is the...
详细信息
Computing intersections among sets of one-dimensional intervals is an ubiquitous problem in computational geometry with important applications in bioinformatics, where the size of typical inputs is large and it is therefore important to use efficient algorithms. In this paper we propose a parallel algorithm for the 1D intersection -counting problem, that is, the problem of counting the number of intersections between each interval in a given set A and every interval in a set B . Our algorithm is suitable for shared -memory architectures (e.g., multicore CPUs) and GPUs. The algorithm is work -efficient because it performs the same amount of work as the best serial algorithm for this kind of problem. Our algorithm has been implemented in C++ using the Thrust parallel algorithms library, enabling the generation of optimized programs for multicore CPUs and GPUs from the same source code. The performance of our algorithm is evaluated on synthetic and real datasets, showing good scalability on different generations of hardware.
The increasing availability of large volumes of traffic data has led to the development of several short-term traffic prediction models. Training these models is a computationally intensive process due to the volume o...
详细信息
ISBN:
(纸本)9781450388979
The increasing availability of large volumes of traffic data has led to the development of several short-term traffic prediction models. Training these models is a computationally intensive process due to the volume of available traffic data. Therefore, having effective methods for accelerating this process is considered necessary. In this paper, we propose an efficient method for accelerating the training process of multiple short-term traffic prediction models in large-scale traffic networks. In particular, the traffic data is organized into separate files so that the training process for one model is independent of the others. These files are distributed in the cores of a shared-memory multicore processor so as to train multiple models simultaneously. Appropriate measures have been taken to limit the memory footprint of the proposed method, as well as to enhance its load balancing capabilities. The proposed method was applied to five short-term traffic prediction models, and evaluated using large-scale real-world traffic data. Preliminary experimental results indicate that the proposed method exhibits nearly linear speedup for the training process of all models, while maintaining their prediction performance.
暂无评论