Now the multi-core processors are widely available everywhere, and the speedup with thread-level parallelization of sequential programs becomes quite important. In general, thread-level parallelization are performed u...
详细信息
ISBN:
(纸本)9780889868649
Now the multi-core processors are widely available everywhere, and the speedup with thread-level parallelization of sequential programs becomes quite important. In general, thread-level parallelization are performed using the source code of the target program, but the source code are not always available. In order to realize the parallelization of existing sequential programs without need of the source codes, we have developed the software system that can automatically parallelize the executable binary codes of the programs at thread level, with binary translation. For thread-level parallelization, it is necessary to correctly analyze the data dependencies of variables between threads. However, since the memory accesses for the reference of variables are performed by specifying the target addresses using the registers and the values of the registers are generally unknown until execution, it is not easy to identify the variables located on memory and to examine the data dependencies. This paper discusses the analysis for the data dependencies of variables located on memory and the thread-level parallel processing based on the analysis results, in our automatic thread-level parallelization system by binary translation. The binary-level variable analysis statically analyzes and identifies the variables on memory in order to examine the data dependencies between threads. This is the method for identifying the variable by comparing the calculation trees that represent the target addresses of variables. And the runtime inspection of memory access dependencies hardware guarantees the correct parallel execution for the parallelized binary code containing data dependencies that cannot statically determined.
Deployment of pattern recognition applications for large-scale data sets is an open issue that needs to be addressed. In this paper, an attempt is made to explore new methods of partitioning and distributing data, tha...
详细信息
ISBN:
(纸本)9783642249570;9783642249587
Deployment of pattern recognition applications for large-scale data sets is an open issue that needs to be addressed. In this paper, an attempt is made to explore new methods of partitioning and distributing data, that is, resource virtualization in the cloud by fundamentally re-thinking the way in which future data management models will need to be developed on the Internet. The work presented here will incorporate content-addressable memory into Cloud data processing to entail a large number of loosely-coupled parallel operations resulting in vastly improved performance. Using a lightweight associative memory algorithm known as distributed Hierarchical Graph Neuron (DHGN), data retrieval/processing can be modeled as pattern recognition/matching problem, conducted across multiple records and data segments within a single-cycle, utilizing a parallel approach. The proposed model envisions a distributed data management scheme for large-scale data processing and database updating that is capable of providing scalable real-time recognition and processing with high accuracy while being able to maintain low computational cost in its function.
Barrier synchronization is widely used in shared-memory parallel programs to synchronize between phases of data-parallel algorithms. With proliferation of many-core processors, barrier synchronization has been adapted...
详细信息
ISBN:
(纸本)9783642176784
Barrier synchronization is widely used in shared-memory parallel programs to synchronize between phases of data-parallel algorithms. With proliferation of many-core processors, barrier synchronization has been adapted for higher level language abstractions in new languages such as X10 wherein the processes participating in barrier synchronization are not known a priori, and the processes in distinct "places" don't share memory. Thus, the challenge here is to not only achieve barrier synchronization in a distributed setting without any centralized controller, but also to deal with dynamic nature of such a synchronization as processes are free to join and drop out at any synchronization phase. In this paper, we describe a solution for the generalized distributed barrier synchronization wherein processes can dynamically join or drop out of barrier synchronization;that is, participating processes are not known a priori. Using the policy of permitting a process to join only in the beginning of each phase, we arrive at a solution that ensures (i) Progress: a process executing phase k will enter phase k 1 unless it wants to drop out of synchronization (assuming the phase execution of the processes terminate), and (ii) Starvation Freedom: a new process that wants to join a phase synchronization group that has already started, does so in a finite number of phases. The above protocol is further generalized to multiple groups of processes (possibly non-disjoint) engaged in barrier synchronization.
The proceedings contain 73 papers. The topics discussed include: guaranteed scheduling for (m,k)-firm deadline-constrained real-time tasks on multiprocessors;a distributed task migration scheme for mesh-based chip-mul...
ISBN:
(纸本)9780769545646
The proceedings contain 73 papers. The topics discussed include: guaranteed scheduling for (m,k)-firm deadline-constrained real-time tasks on multiprocessors;a distributed task migration scheme for mesh-based chip-multiprocessors;jMigBSP: object migration and asynchronous one-sided communication for BSP applications;a social network-based information dissemination scheme;XunleiProbe: a sensitive and accurate probing on a large-scale P2SP system;data flow error recovery with checkpointing and instruction-level fault tolerance;an experimental study on memory allocators in multicore and multithreaded applications;efficient hierarchical agglomerative clustering algorithms on GPU using data partitioning;the multidimensional scaling and barycentric coordinates based distributed localization in wireless sensor networks;fast estimation of Gaussian mixture model parameters on GPU using CUDA;and optimizing web browser on many-core architectures.
Software transactional memory (STM) enhances both ease-of-use and concurrency, and is considered state-of-the-art for parallel applications to scale on modern multi-core hardware. However, there are certain situations...
详细信息
ISBN:
(纸本)9780889868649
Software transactional memory (STM) enhances both ease-of-use and concurrency, and is considered state-of-the-art for parallel applications to scale on modern multi-core hardware. However, there are certain situations where STM performs even worse than traditional locks. Upon hotspots where most threads contend over a few pieces of shared data, going transactional will result in excessive conflicts and aborts that adversely degrade performance. We present a new design of adaptive thread scheduler that manages concurrency when the system is about entering and leaving hotspots. The scheduler controls the number of threads spawning new transactions according to the live commit throughput. We implemented two feedback-control policies called Throttle and Probe to realize this adaptive scheduling. Performance evaluation with the STAMP benchmarks shows that enabling Throttle and Probe obtain best-case speedups of 87.5% and 108.7% respectively.
Given a network, we are interested in ranking sets of nodes that score highest on user-specified criteria. For instance in graphs from bibliographic data (e.g. PubMed), we would like to discover sets of authors with e...
详细信息
ISBN:
(纸本)9783642196553
Given a network, we are interested in ranking sets of nodes that score highest on user-specified criteria. For instance in graphs from bibliographic data (e.g. PubMed), we would like to discover sets of authors with expertise in a wide range of disciplines. We present this ranking task as a Top-K problem;utilize fixed-memory heuristic search;and present performance of both the serial and distributed search algorithms on synthetic and real-world data sets.
Modular software, in which strongly-separated units of functionality can be independently added to and removed from a node's running software, offers a promising approach to effective dynamic software updating in ...
详细信息
ISBN:
(纸本)9781457705137
Modular software, in which strongly-separated units of functionality can be independently added to and removed from a node's running software, offers a promising approach to effective dynamic software updating in Wireless Sensor Networks (WSNs). Modular software updating approaches offer high efficiency, in terms of both network costs and update installation at nodes, as well as low disruption, allowing existing software to continue to operate during updates. Existing approaches however critically lack safety, relying on weakly-typed event-based programming abstractions for inter-module interaction. This precludes compile-time or composition-time verification of interoperability between dynamically loaded modules and therefore presents major risks for future large-scale production-class deployments. In this paper we present Lorien: a component-based modular operating environment that employs interface-based inter-component interaction to support completely type-safe software composition, while still supporting high update efficiency and low disruption. Our approach also has very wide scope, allowing almost 90% of software to be remotely updated on common sensor platforms such as the TelosB. We compare Lorien against existing modular designs, finding that the safety properties of Lorien are offered with near equal efficiency.
Support Vector Machine (SVM) is an efficient data mining approach for data classification. However, SVM algorithm requires very large memory requirement and computational time to deal with very large dataset. To reduc...
详细信息
In recent years, the Service Oriented Architecture (SOA) has evolved itself into emerging technologies like cloud computing to give it more relevance. ANU-SOAM - a service oriented middleware - aims to provide conveni...
详细信息
ISBN:
(纸本)9780889868649
In recent years, the Service Oriented Architecture (SOA) has evolved itself into emerging technologies like cloud computing to give it more relevance. ANU-SOAM - a service oriented middleware - aims to provide convenient API, a unique data service extension and proper load-balancing techniques for high performance scientific computing. The data service extension offers both Common Data Service (CDS) and Local Data Service (LDS). CDS helps set data common to all service instances and to manipulate it using add, get, put, sync, etc. functions. The LDS allows consumer to partially replicate data among service instances to improve memory scalability. Comparable paradigms like MPI are mostly agnostic and non-responsive to heterogeneous conditions. The SOA approach enables ANU-SOAM to have load balancing techniques implemented with the help of a Resource Manager. Experiments using N Body Solver and Heat Transfer applications have shown that ANU-SOAM performs as good as most of its MPI counterparts, especially under heterogeneous conditions.
Parallel computing represents a valid solution for reducing execution times in simulations of complex geological processes, such as lava flows, debris flows and, in general, of fluid-dynamic processes. In these cases,...
详细信息
Parallel computing represents a valid solution for reducing execution times in simulations of complex geological processes, such as lava flows, debris flows and, in general, of fluid-dynamic processes. In these cases, Cellular Automata (CA) models have proved to be effective when the behavior of the system to be modeled can be described in terms of local interactions among its constituent parts. Cellular Automata are parallel computing models, discrete in space and time;space is generally subdivided into cells of uniform size and the overall dynamics of the system emerges as the result of the simultaneous application, at discrete time steps, of proper local rules of evolution to each one of them. Due to their intrinsic parallelism, CA models are attractive since they are suitable to be effectively and naturally implemented on parallel computers achieving also high performance. In the recent past, CA models were efficiently executed on distributedmemory architectures, such as Beowulf clusters and many-node Supercomputers, while fewer implementations are found regarding shared-memory computers, such as in multi-core machines. This paper shows performance results of the parallelization of a well-known CA model for simulating lava flows - the SCIARA model - in a shared memory environment, by means of OpenMP, an Application Programming Interface which supports multi-platform shared-memory parallel programming.
暂无评论