Multicore architectures enable increasing the performance of the system withparallelprocessing. One of the challenges of a multicore embedded system is the correct usage of the processor cores. It is possible to ach...
详细信息
Multicore architectures enable increasing the performance of the system withparallelprocessing. One of the challenges of a multicore embedded system is the correct usage of the processor cores. It is possible to achieve balanced processor load on the different cores, but the communication bandwidth between the cores is often a bottleneck. Passing large amounts of data between tasks mapped to different processor cores can result in cache misses in the local cache of a processor core. this paper introduces an analyzation method based on runtime generated data flow graphs to find the data paths of an algorithm. It shows that a spectral cluster analysis can help to discover data independent subsets in the algorithm under test. Finding the data independent parts helps to partition the program to multiple slices where the inter-slice communication is kept as low as possible. With our proposed method the communication bottleneck can be evaded in a multicore, multitask implementation, possibly resulting in better performance.
Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics processing Units (GPGPU) and recent many-core CPUs h...
详细信息
Floating-point computing with more than one TFLOP of peak performance is already a reality in recent Field-Programmable Gate Arrays (FPGA). General-Purpose Graphics processing Units (GPGPU) and recent many-core CPUs have also taken advantage of the recent technological innovations in integrated circuit (IC) design and had also dramatically improved their peak performances. In this paper, we compare the trends of these computing architectures for high-performance computing and survey these platforms in the execution of algorithms belonging to different scientific application domains. Trends in peak performance, power consumption and sustained performances, for particular applications, show that FPGAs are increasing the gap to GPUs and many-core CPUs moving them away from high-performance computing with intensive floating-point calculations. FPGAs become competitive for custom floating-point or fixed-point representations, for smaller input sizes of certain algorithms, for combinational logic problems and parallel map-reduce problems.
Several supercomputer vendors now offer reconfigurable computing (RC) systems, combining general-purpose processors with fie Id-program m able gate arrays (FPGAs). the FPGAs can be configured as custom computing archi...
详细信息
Several supercomputer vendors now offer reconfigurable computing (RC) systems, combining general-purpose processors with fie Id-program m able gate arrays (FPGAs). the FPGAs can be configured as custom computing architectures for the computationally intensive parts of each application. In this paper we present an RC-based hardware accelerator for an important medical imaging algorithm: iterative sparse Fourier image reconstruction. We transform the algorithm to exploit massive parallelism available in the FPGA fabric. Our design allows different ways of chaining custom pipelined vector engines, so that different computations can be carried out without reconfiguration overhead. Actual runtime performance data show that we achieve up to 10 times speedup compared to the software-only version. the design is estimated to provide even more speedup on a next-generation RC platform.
the number of space debris has increased tremendously in the last decade, arousing the interest of the experts in the field. the surveillance of the space is a first step in monitoring the traffic of floating objects ...
详细信息
the number of space debris has increased tremendously in the last decade, arousing the interest of the experts in the field. the surveillance of the space is a first step in monitoring the traffic of floating objects and has several applications such as the correction of orbit coordinates for satellites or collision avoidance. An improved and flexible framework for real-time detection of satellites using a cheap optical surveillance system is proposed in this paper. the detection method is based on the Radon Transform. the satellite candidates resulted after processingthe Radon space are validated by imposing constraints over the satellites length and brightness, and over the stereo matching. We additionally propose a parallel approach for Radon transform on GPU in order to fulfill the real-time constraints. We test our method on a large and variate data set, containing satellites from different orbit ranges, namely medium and high orbits. A high accuracy over 95% was obtained in average for real time satellites detection with minimal false positives.
Any digitization system must be preceded by an anti-aliasing filter. For wideband high frequency applications, parallel multi-rate conversion systems such as time-interleaved or hybrid filter bank analog-to-digital co...
详细信息
Any digitization system must be preceded by an anti-aliasing filter. For wideband high frequency applications, parallel multi-rate conversion systems such as time-interleaved or hybrid filter bank analog-to-digital converters (resp. TI-ADC or HFB) are attractive solutions. this paper compares the robustness of both techniques with respect to non-idealities of the anti-aliasing filter (AAF). theoretical results show that the signal-to-noise ratio (SNR) degradation due to out-of-band signals is lesser for HFBs than for TI-ADCs, provided that the analysis filters of the HFB are selective enough. Simulation results show that this is the case even for low-order analysis filters in the case of a four-channel HFB.
the procedure of matching vehicle location data onto road map is very essential for many ITS (Intelligent Transportation System) applications. However, withthe boosting deployment of GPS devices in vehicles, the accu...
详细信息
the procedure of matching vehicle location data onto road map is very essential for many ITS (Intelligent Transportation System) applications. However, withthe boosting deployment of GPS devices in vehicles, the accumulation of huge amount of GPS data caused great challenge on the efficiency and scalability of traditional serial map matching algorithm. In this paper we address the challenge by presenting a novel parallel map matching algorithm to realize high-performance processing of GPS data. the main idea is to adapt the serial map matching algorithm for cloud computing environment by reforming its' data-intensive or I/O-intensive computing stages using MapReduce paradigm. We implemented the algorithm in Hadoop platform and tested its performance by a large GPS dataset exceeds 120 billion GPS records. Experimental results show that our approach is highly efficient and scalable for massive historical GPS data processing.
Withthe rapid development of Internet technology, various network attack methods come out one after the other. SQL injection has become one of the most severe threats to Web applications and seriously threatens vario...
详细信息
In this paper, Hamming distance is used to control individual difference in the process of creating an original population, and a peak-depot is established to preserve information of different peak-points. Some new me...
详细信息
In this paper, Hamming distance is used to control individual difference in the process of creating an original population, and a peak-depot is established to preserve information of different peak-points. Some new methods are also put forward to improve the optimization performance of a genetic algorithm (GA), such as the point-cast method and the neighborhood search strategy around peak-points. the methods, are used to deal with genetic operation as well as cross-over and mutation, in order to obtain a global optimum solution and avoid the GAs premature convergence. By means of many control rules and a peak-depot, the new algorithm carries out an optimum search surrounding several peak-points. Along withthe evolution of individuals of the population, the fitness of peak-points of peak-depot increases continually, and a global optimum solution can be obtained. the new algorithm searches around several peak-points, which increases the probability of obtaining the best global optimum solution. the results of some examples to test the modified GA indicate that what has been done makes the modified genetic algorithm effective in solving both linear optimization problems and non-linear optimization problems with restrictive functions. (C) 2002 Elsevier Science B.V. All rights reserved.
As one of the most pervasive problems in computer science, string matching is the kernel algorithm in many applications,which especially within the communities of information retrieval and computational biology. Meanw...
详细信息
As one of the most pervasive problems in computer science, string matching is the kernel algorithm in many applications,which especially within the communities of information retrieval and computational biology. Meanwhile, the CPU+GPU heterogeneous parallel platform becomes more and more popular in solving computing intensive applications. this paper implements the webpage matching system with GPU-based advanced AC algorithm, G-AC, which is almost 28 times peak performance to the original AC algorithm which is referred from Snort.
Image inpainting refers to image restoration process that reconstruct damaged image to obtain it lost information based on existing information. PDE-based approach is commonly used for image interpolation especially i...
详细信息
Image inpainting refers to image restoration process that reconstruct damaged image to obtain it lost information based on existing information. PDE-based approach is commonly used for image interpolation especially inpainting. Since PDE process express convolution and continuous change, the approach may take a lot of computational resources and will run slow on standard computer CPU. To overcome that, GPU parallel computing method for PDE-based image inpainting are proposed. these days, some handy platform or frameworks to utilize GPU are already exist like CUDA, theano and Tensorflow. CUDA is well-known as parallel computing platform and programming model to work with programming language such as C/C++. In other hand theano and Tensorflow is a bit different thing, both of them is a machine learning framework based on Python that also able to utilize GPU. Although theano and Tensorflow are specialized for machine learning and deep learning, the system is general enough to applied for computational process like image inpainting. the results of this work show benchmark performance of PDE image inpainting running on CPU using C++, theano, and Tensorflow and on GPU with CUDA, theano, and Tensorflow. the benchmark shows that parallel computing accelerated PDE image inpainting can run faster on GPU either with CUDA, theano, or Tensorflow compared to PDE image inpainting running on CPU.
暂无评论