Image coding is imperative for successful implementation of visual communications applications. With the emergence of enhanced multimedia technology, numerous video applications have emerged like multimedia conferenci...
详细信息
Image coding is imperative for successful implementation of visual communications applications. With the emergence of enhanced multimedia technology, numerous video applications have emerged like multimedia conferencing, video on demand and DVDs. Coding is an essential component of all video applications and it becomes necessary to have improved coding techniques for faster applications. This paper discusses how parallel video coding on load balanced multiprocessor systems can help in incorporating efficient coding techniques like vector quantization into practical applications. Two parallelprocessing platforms will be discussed namely the heterogeneous network of workstations and the TI C40 DSP Chips. The software platforms used for these are the parallel Virtual Machines (PVM) programming model and parallel C respectively. An integration of the two programming models by using a PVM to parallel C Translation and the effect of load balancing for improved performance will also be discussed.
Big Data has become a pervasive technology to manage the ever-increasing volumes of data. Among Big Data solutions, scalable data stores play an important role, especially, key-value data stores due to their large sca...
详细信息
ISBN:
(纸本)9781509066117
Big Data has become a pervasive technology to manage the ever-increasing volumes of data. Among Big Data solutions, scalable data stores play an important role, especially, key-value data stores due to their large scalability (thousands of nodes). The typical workflow for Big Data applications include two phases. The first one is to load the data into the data store typically as part of an ETL (Extract-Transform-Load) process. The second one is the processing of the data itself. BigTable and HBase are the preferred key-value solutions based on range-partitioned data stores. However, the loading phase is inefficient and creates a single node bottleneck. In this paper, we identify and quantify this bottleneck and propose a tool for parallel massive data loading that solves satisfactorily the bottleneck enabling all the parallelism and throughput of the underlying key-value data store during the loading phase as well. The proposed solution has been implemented as a tool for parallel massive data loading over HBase, the key-value data store of the Hadoop ecosystem.
This paper presents a deadlock detection algorithm under the OR request model in distributed systems. The initiator of the algorithm constructs a reduced wait-for graph through propagation of probes and receiving repl...
A parallel algorithm for EDT transform on linear array with reconfigurable pipeline bus system (LARPBS) is presented For an image with n × n pixels, the algorithm can complete the EDT transform in O(nlogn/(c(n)lo...
详细信息
ISBN:
(纸本)0769521320
A parallel algorithm for EDT transform on linear array with reconfigurable pipeline bus system (LARPBS) is presented For an image with n × n pixels, the algorithm can complete the EDT transform in O(nlogn/(c(n)logd(n))) time using n.d(n).c(n) processors, where c(n) and d(n) are parameters satisfying 1 &le c(n) &le n, and 1&le, the algorithm can be completed in O(1) time using n2+e processors. To our best knowledge, this is the most efficient constant-time EDT algorithm on LARPBS.
Classification is one of the most important applications in the field of remote sensing. How to improve the accuracy of classification is the critical topic that has long obsessed the researchers. In this paper, a fus...
详细信息
ISBN:
(纸本)9781479911141
Classification is one of the most important applications in the field of remote sensing. How to improve the accuracy of classification is the critical topic that has long obsessed the researchers. In this paper, a fusion method based on a synergic use of hyperspectral data and Polarimetric SAR (PolSAR) data is presented. This method consists of two main parts, feature-level fusion and decision-level fusion. In feature-level, parallel feature combination strategy is introduced to classification of remote sensing images. Results of feature-level fusion are used as inputs of decision-level fusion based on fuzzy set theory. The final results are compared with processing of single level and single data set, and it shows that the synergic method proposed in this paper has a superior performance in joint classification of hyperspectral and polarimetric SAR data.
Application Specific Instruction Processors (or, ASIPs) have the potential to meet the high-performance demands of multimedia applications, such as image processing, audio and video encoding, speech processing, and di...
详细信息
ISBN:
(纸本)9783540747413
Application Specific Instruction Processors (or, ASIPs) have the potential to meet the high-performance demands of multimedia applications, such as image processing, audio and video encoding, speech processing, and digital signal processing. To achieve lower cost and efficient energy for high performance embedded systems built by ASIPs, subword parallelism optimization will become an important alternative to accelerate multimedia applications. But one major problem is how to exploit subword parallelism for ASIPs with limited resources. This paper shows that loop transformations such as loop unrolling, variable expansion, etc., can be utilized to create opportunities for subword parallelism, and presents a novel approach to recognize and extract subword parallelism based on Cost Subgragh (or, CSG). This approach is evaluated on Transport Triggered Architecture (TTA), a customizable processor architecture that is particularly suitable for tailoring the hardware resources according to the requirements of the application. In our experiment, 63.58% of loops and 85.64% of instructions in these loops can exploit subword parallelism. The results indicate that significant available subword parallelism would be attained using our method.
This work presents a parallel algorithm for implementing the nonuniform Fast Fourier transform (NUFFT) on Google's Tensor processing Units (TPUs). TPU is a hardware accelerator originally designed for deep learnin...
详细信息
ISBN:
(纸本)9781665412469
This work presents a parallel algorithm for implementing the nonuniform Fast Fourier transform (NUFFT) on Google's Tensor processing Units (TPUs). TPU is a hardware accelerator originally designed for deep learning applications. NUFFT is considered as the main computation bottleneck in magnetic resonance (MR) image reconstruction when k-space data are sampled on a nonuniform grid. The computation of NUFFT consists of three operations: an apodization, an FFT, and an interpolation, all being formulated as tensor operations in order to fully utilize TPU's strength in matrix multiplications. The implementation is with TensorFlow. Numerical examples show 20x similar to 80x acceleration of NUFFT on a single-card TPU compared to CPU implementations. The strong scaling analysis shows a close-to-linear scaling of NUFFT on up to 64 TPU cores. The proposed implementation of NUFFT on TPUs is promising in accelerating MR image reconstruction and achieving practical runtime for clinical applications.
A prototype of an object-oriented system implemented in C_Prolog is described. Its main objective is to demonstrate system features that would support efficient management of objects and object-oriented databases in a...
详细信息
ISBN:
(纸本)0818620528
A prototype of an object-oriented system implemented in C_Prolog is described. Its main objective is to demonstrate system features that would support efficient management of objects and object-oriented databases in a persistent and distributed environment. Mechanisms at the low level of the system were considered to support object distribution, mobility control, and configuration management in a simple and uniform way. Objects exist in clusters, which are transparent to the applications. The prototype is a framework for a self-organizing object-oriented distributed system.
This paper is the first to present a parallelization of a highly efficient best-first branch-and-bound algorithm to solve large symmetric traveling salesman problems on a massively parallel computer containing 1024 pr...
详细信息
This paper is the first to present a parallelization of a highly efficient best-first branch-and-bound algorithm to solve large symmetric traveling salesman problems on a massively parallel computer containing 1024 processors. The underlying sequential branch-and-bound algorithm is based on 1-tree relaxation. The parallelization of the branch-and-bound algorithm is fully distributed. Every processor performs the same sequential algorithm but on a different part of the solution tree. To distribute subproblems among the processors we use a new direct-neighbor dynamic load-balancing strategy. The general principle can be applied to all other branch-and-bound algorithms leading to an 'automatic' parallelization. At present we can efficiently solve traveling salesman problems up to a size of 318 cities on networks of up to 1024 transputers. On hard problems we achieve an almost linear speed-up.
We propose two new self-stabilizing distributed algorithms for proper Δ+1 (Δ is the maximum degree of a node in the graph) coloring of arbitrary system graphs. Both algorithms are capable of working with multiple ty...
暂无评论