Low latency image processing and high FPS (frames per second) is significant for high resolution decision making in many object recognition applications. Reading frames in between processing of a video is too slow and...
详细信息
ISBN:
(纸本)9781538651636
Low latency image processing and high FPS (frames per second) is significant for high resolution decision making in many object recognition applications. Reading frames in between processing of a video is too slow and sluggish as the corresponding reading and decoding the frames are done in the main processingthread. Packages, such as imutils provide such off-the-shelf image processingalgorithms which apply multi-threading to achieve low latency. However, the algorithms are unable to perform computationally expensive image processing operations. In this paper, we apply a parallelprocessing technique based on co-incident multi-threading to decrease the latency for computationally expensive cases. the technique is evaluated using a prototype of smart car to show that FPS rate is increased and time complexity of algorithms is reduced by an order of n.
Efficient load balancing algorithms are the key to many efficient parallel applications. Until now, research in this area mainly focused on static networks. However observations show that diffusive algorithms, origina...
详细信息
ISBN:
(纸本)0769521355
Efficient load balancing algorithms are the key to many efficient parallel applications. Until now, research in this area mainly focused on static networks. However observations show that diffusive algorithms, originally designed for these networks, can also be applied in non static scenarios. In this paper we prove that the general diffusion scheme can be deployed on dynamic networks and show that its convergence rate depends on the average value of the quotient of the second smallest eigenvalue and the maximum vertex degree of the networks occurring during the iterations. In the presented experiments we illustrate that even if communication links of static networks fail with high probability, load can still be balanced quite efficiently. Simulating diffusion on ad-hoc networks we demonstrate that diffusive schemes provide a reliable and efficient load balancing strategy also in mobile environments.
Complex networks are a technique for the modeling and analysis of large data sets in many scientific and engineering disciplines. Due to their excessive size conventional algorithms and single core processors struggle...
详细信息
ISBN:
(纸本)9781479904945;9781479904938
Complex networks are a technique for the modeling and analysis of large data sets in many scientific and engineering disciplines. Due to their excessive size conventional algorithms and single core processors struggle withthe efficient processing of such networks. Employing multi-core graphic processing units (GPUs) could provide sufficient processing power for the analysis of such networks. However, commonly designed algorithms cannot exploit these massively parallelprocessing power for the analysis of such networks. In this paper, we present the Multi Layer Network Decomposition (MLND) approach which provides a general approach for parallel network analysis using multi-core processors via efficient partitioning and mapping of networks onto GPU architectures. Evaluation using a 336 core GPU graphic card demonstrated a 16x speed-up in complex network analysis relative to a CPU based approach.
Reducing communication overhead has been widely recognized as a requirement for achieving efficient mappings which substantially reduce the execution time of parallelalgorithms. this paper presents an iterative heuri...
详细信息
Analytical models for adaptive routing in multicomputer interconnection networks withthe traditional non-bursty Poisson traffic have been widely reported in the literature. However, traffic loads generated by many re...
详细信息
ISBN:
(纸本)9783540729044
Analytical models for adaptive routing in multicomputer interconnection networks withthe traditional non-bursty Poisson traffic have been widely reported in the literature. However, traffic loads generated by many real-world parallel applications may exhibit bursty and batch arrival properties, which can significantly affect network performance. this paper develops a new and concise analytical model for hypercubic networks in the presence of bursty and batch arrival traffic modelled by the Compound Poisson Process (CPP) with geometrically distributed batch sizes. the computation complexity of the model is independent of network size. the analytical results are validated through comparison to those obtained from the simulation experiments. the model is used to evaluate the effects of the bursty traffic with batch arrivals on the performance of interconnection networks.
Separated grid systems. are becoming the new information islands when more and more grid systems are deployed. Grid interoperation is a direction to solve that problem. this paper introduces the implementation of data...
详细信息
ISBN:
(纸本)9783540729044
Separated grid systems. are becoming the new information islands when more and more grid systems are deployed. Grid interoperation is a direction to solve that problem. this paper introduces the implementation of data interoperation between ChinaGrid and SRB. the data interoperation between them is divided into two parts: data access from SRB to ChinaGrid and from ChinaGrid to SRB. Also this paper considers the issues about performance optimization. We get a satisfied experiment result through the optimization measures.
FFT processor used for high-speed signal processing applications, such that to meet the required high-speed applications, FFT processor uses pipeline and parallelprocessingalgorithms to provide such high-speed appli...
详细信息
the systolic screen is a very natural parallel architecture for image processing. A square root n∗ square root n systolic screen consists of a square root n∗ square root n mesh-of-processors with each processor repres...
详细信息
there are only three real "dimensions" to processor performance increases beyond Moore's law: clock frequency, superscalar instruction issue, and multiprocessing. the first two have been pushed to their ...
详细信息
ISBN:
(纸本)9783540729044
there are only three real "dimensions" to processor performance increases beyond Moore's law: clock frequency, superscalar instruction issue, and multiprocessing. the first two have been pushed to their logical limits and we must focus on multiprocessing. SMT (simultaneous multithreading) [1] and CMP(chip multiprocessing) [2] are two architectural approaches to exploit thread-level parallelism using available on-chip resources. SMT processors execute instructions from different threads in the same cycle, which has the unique ability to exploit ILP (instruction-level parallelism) and TLP(thread-level parallelism) simultaneously. EPIC(explicitly parallel instruction computing) emphasizes importance of the synergy between compiler and hardware. In this paper, we present our efforts to design and implement a parallel environment, which includes an optimizing, portable parallel compiler OpenUH and SMT architecture EDSMT based on IA-64. the performance is evaluated using the NAS parallel benchmarks.(1)
作者:
Huang, HChinese Acad Sci
Supercomp Ctr Comp Network Informat Ctr Beijing 100080 Peoples R China
In this paper, we use a new language-TPL (Tensor product Language) to compute the Fast Fourier Transform. It can provide good performance and portability. We detail the method and application to the FFT of TPL, andext...
详细信息
ISBN:
(纸本)0769515126
In this paper, we use a new language-TPL (Tensor product Language) to compute the Fast Fourier Transform. It can provide good performance and portability. We detail the method and application to the FFT of TPL, andextendto Sande-Tucky FFT algorithm.
暂无评论