A two-level fast search algorithm to reduce the encoding time for hexagonal-based fractal image compression is presented. The design of the sequential algorithm is based on the distribution of matched domains in a giv...
详细信息
A two-level fast search algorithm to reduce the encoding time for hexagonal-based fractal image compression is presented. The design of the sequential algorithm is based on the distribution of matched domains in a given image. The first search level previews various portions of the image and identifies promising domains among all possible domains. The second search level picks out domain blocks in the image portion where the first level gives positive results, and compares them with a given range block for encoding. The algorithm is parallelized by a dynamic range distribution scheme to achieve load balancing. Experimental results show that by running the parallelized encoding algorithm on multiple processors, the encoding time is drastically reduced while the quality of image reconstruction is retained. A speed-up of about 9 can be obtained by using 13 processors.
Clustering algorithms group a dataset into clusters that have common features. Clustering has applications in computer vision, data mining, market segmentation etc. The kmeans clustering algorithm is one of the most p...
详细信息
ISBN:
(纸本)9781538653982
Clustering algorithms group a dataset into clusters that have common features. Clustering has applications in computer vision, data mining, market segmentation etc. The kmeans clustering algorithm is one of the most popular algorithms where the mean is used as a prototype of the cluster. In this paper, we explore accelerating the performance of k-means clustering using NVIDIA Graphics processing Units (GPUs) programmed with CUDA C. Different optimization techniques are applied such as the use of shared memory for image data and the use of constant memory for cluster data. The performance results are evaluated on a range of images from small (256x256 pixels) to large (1024x1024 pixels) and number of clusters range from 4 to 256. We find that on an average, the parallel implementation has a 9x speed up as compared to the sequential version for 4 clusters. The speedup increases to 57x as number of clusters increase to 256. This implementation also performs better than a reference implementation from Northwestern University/UC Berkeley.
The problem of efficiently processing queries that manipulate sets is considered with the objective of minimizing the processing cost by reducing the size of transmitted data as much as possible. The semantics of set ...
详细信息
The problem of efficiently processing queries that manipulate sets is considered with the objective of minimizing the processing cost by reducing the size of transmitted data as much as possible. The semantics of set operations is used to achieve this goal. A set query has the general form SET 1 op SET 2. For two sets to be related by a set operation, their sizes should satisfy a necessary condition. For the two sets to be equal, they should have the same size. For SET 1 to be a subset of SET 2, its size should be less than or equal to the size of SET 2. In the relational model, given two attributes, the size of a set of values from one attribute that is associated with a value from the other attribute can be determined using functional dependency between the two attributes. Using these semantics, a distributed set query can be converted into a distributed nonset query. When the two sets are of size greater than one, however, the query cannot be converted into a nonset query. It is converted into another distributed set query. The size of data transmitted to answer the new query is reduced as much as possible. This is done by sending sets that satisfy the necessary condition of the set operation.
We present in this paper the detailed architecture of the replica localization module implemented inside the distributed Operation Application Framework (DOAF). DOAF is a development framework designed to speed up the...
详细信息
ISBN:
(纸本)9781479904020;9781479904037
We present in this paper the detailed architecture of the replica localization module implemented inside the distributed Operation Application Framework (DOAF). DOAF is a development framework designed to speed up the development and deployment of distributedapplications. The replica localization module (PathFinder) provides standardized interfaces and default implementation for replica management and localization operations. Besides the localization and management of replicas, the replica localization module offers support for topics like secured connections and data transmission or malicious replica substitution. The PathFinder management and localization operations are designed to support system wide optimization techniques by deeply coupling them with the Optimizer module.
Visual cryptography scheme (VCS), is a cryptography technique where visual information is encrypted in such a way that decryption can be fulfilled with human visual system by direct stacking of the encrypted shares. R...
详细信息
The proceedings contain 105 papers. The topics discussed include: NUMA-aware graph mining techniques for performance and energy efficiency;classifying soft error vulnerabilities in extreme-scale scientific application...
ISBN:
(纸本)9781467308069
The proceedings contain 105 papers. The topics discussed include: NUMA-aware graph mining techniques for performance and energy efficiency;classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool;containment domains: a scalable, efficient, and flexible resilience scheme for exascale systems;critical lock analysis: diagnosing critical section bottlenecks in multithreaded applications;code generation for parallel execution of a class of irregular loops on distributed memory systems;data-intensive spatial filtering in large numerical simulation datasets;parallel particle advection and FTLE computation for time-varying flow fields;parallel I/O, analysis, and visualization of a trillion particle simulation;forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures;and a divide and conquer strategy for scaling weather simulations with multiple regions of interest.
The C* language is a data-parallel extension of the C language which incorporates parallel data types. Since the C++ language provides operator overloading, a C++ library can implement the C* parallel extensions with ...
详细信息
ISBN:
(纸本)0818678763
The C* language is a data-parallel extension of the C language which incorporates parallel data types. Since the C++ language provides operator overloading, a C++ library can implement the C* parallel extensions with a similar syntax. Although library implementations are highly portable, some overheads make them impractical. The two major overheads incurred are temporaries in each operator application, and the inability to detect regular communication patterns The C++ overloading mechanism forces a temporary for each operator application. Also, regular communications in. C* are syntactically indistinguishable from general point-to-point communications. We tackled these problems extensively in a library. The template mechanism, a type parameterization in C++, is used to eliminate temporaries by delaying operator application and evaluating the entire expression at once. The polymorphic type dispatch mechanism is used to detect regular communications by assigning particular types to potentially regular communications. We have implemented the library on the CM-5, and compared its performance with the C* compiler using three simple examples. The techniques presented offers improved performance comparable to the C* compiler, which is close or 1.5 times slower in two examples, and even faster in one example.
Due to the strong increase of processing units available to the end user, expressing parallelism of an algorithm is a major challenge for many researchers. parallelapplications are often expressed using a task-parall...
详细信息
Convergence of computer systems and communication technologies are moving to switched high-performance modular system architectures on the basis of high-speed switched interconnections. Multi-core processors become mo...
详细信息
Convergence of computer systems and communication technologies are moving to switched high-performance modular system architectures on the basis of high-speed switched interconnections. Multi-core processors become more perspective way to high-performance system, and traditional parallel bus system architectures (VME/VXI, cPCI/PXI) are moving to new higher speed serial switched interconnections. Fundamentals in system architecture development are compact modular component strategy, low-power processor, new serial high-speed interface chips on the board, and high-speed switched fabric for SAN architectures. Overview of advanced modular concepts and new international standards for development high-performance embedded and compact modular systems for real-time applications are described. (c) 2006 Elsevier B.V. All rights reserved.
The proceedings contain 87 papers. The topics discussed include: the vectorization of the Tersoff multi-body potential: an exercise in performance portability;increasing molecular dynamics simulation rates with an 8-f...
ISBN:
(纸本)9781467388153
The proceedings contain 87 papers. The topics discussed include: the vectorization of the Tersoff multi-body potential: an exercise in performance portability;increasing molecular dynamics simulation rates with an 8-fold increase in electrical power efficiency;TrueNorth ecosystem for brain-inspired computing: scalable systems, software, and applications;evaluating HPC networks via simulation of parallel workloads;PFEAST: a high performance sparse eigenvalue solver using distributed-memory linear solvers;block iterative methods and recycling for improved scalability of linear solvers;pinpointing scale-dependent integer overflow bugs in large-scale parallelapplications;compiler-directed lightweight checkpointing for fine-grained guaranteed soft error recovery;simulation and performance analysis of the ECMWF tape library system;real-time synthesis of compression algorithms for scientific data;serf: efficient scheduling for fast deep neural network serving via judicious parallelism;graph colouring as a challenge problem for dynamic graph processing on distributed systems;an exploration of optimization algorithms for high performance tensor completion;and designing MPI library with on-demand paging (ODP) of InfiniBand: challenges and benefits.
暂无评论