Withthe recent deployment of global experimental networking facilities, dozens of computer networks with large numbers of computers have become available for scientific studies. Multiple Replications in parallel (MRI...
详细信息
ISBN:
(纸本)9783642246685
Withthe recent deployment of global experimental networking facilities, dozens of computer networks with large numbers of computers have become available for scientific studies. Multiple Replications in parallel (MRIP) is a distributed scenario of sequential quantitative stochastic simulation which offers significant speedup of simulation if it is executed on multiple computers of a local area network. We report results of running MRIP simulations on PlanetLab, a global overlay network which can currently access more than a thousand computers in forty different countries round the globe. Our simulations were run using Akaroa2, a universal controller of quantitative discrete event simulation designed for automatic launching of MRIP-based experiments. Our experimental results provide strong evidence that global experimental networks, such as PlanetLab, can efficiently be used for quantitative simulation, without compromising speed and efficiency.
Many common workloads rely on arrays as a basic data structure on top of which they build more complex behavior. Others use them because they are a natural representation for their problem domains. Software Transactio...
详细信息
ISBN:
(纸本)9783642246685
Many common workloads rely on arrays as a basic data structure on top of which they build more complex behavior. Others use them because they are a natural representation for their problem domains. Software Transactional Memory (STM) has been proposed as a new concurrency control mechanism that simplifies concurrent programming. Yet, most STM implementations have no special representation for arrays. this results, on many STMs, in inefficient internal representations, where much overhead is added while tracking each array element individually, and on other STMs in false-sharing conflicts, because writes to different elements on the same array result in a conflict. In this work we propose new designs for array implementations that are integrated withthe STM, allowing for improved performance and reduced memory usage for read-dominated workloads, and present the results of our implementation of the new designs on top of the JVSTM, a Java library STM.
In many businesses, including hydrocarbon industries, reducing cost is of high priority. Although hydrocarbon industries appear able to afford the expensive computing infrastructure and software packages used to proce...
详细信息
ISBN:
(纸本)9783642246494
In many businesses, including hydrocarbon industries, reducing cost is of high priority. Although hydrocarbon industries appear able to afford the expensive computing infrastructure and software packages used to process seismic data in the search for hydrocarbon traps, it is always imperative to find ways to minimize cost. Seismic processing costs can be significantly reduced by using inexpensive, open source seismic data processing packages. However, hydrocarbon industries question the processing performance capability of open source packages, claiming that their seismic functions are less integrated and provide almost no technical guarantees for one to use. the objective of this paper is to demonstrate, through a comparative analysis, that open source seismic data processing packages are capable of executing the required seismic functions on an actual industrial workload. To achieve this objective we investigate whether or not open source seismic data processing packages can be executed using the same set of seismic data through data format conversions, and whether or not they can achieve reasonable performance and speedup when executing parallel seismic functions on a HPC cluster. Among the few open source packages available on the Internet, the subjects of our study are two popular packages: Seismic UNIX (SU) and Madagascar.
A reconfigurable hardware architecture for multiframe frame motion estimation (RMF-ME) is presented in this paper. the proposed architecture can be configured to perform block matching (BM) error computations for 4 re...
详细信息
ISBN:
(纸本)9781618040176
A reconfigurable hardware architecture for multiframe frame motion estimation (RMF-ME) is presented in this paper. the proposed architecture can be configured to perform block matching (BM) error computations for 4 reference frames (RF) in parallel. For single reference frame configuration, the RMF-ME can perform concurrent BM computations of 4 macro blocks (MB) to support high motion vector (MV) throughput. the BM engine of the RMF-ME performs error computations on the luminance and the chrominance components of the pixel data in order to get accurate motion trajectories. FPGA implementation results show the RMF-ME circuit supports 1080P video at 123 frames per second (fps). Simulation and hardware implementation results of the RMF-ME design demonstrate that the proposed architecture is well suited for high resolution, high quality, and high throughput video applications.
General purpose programming on the graphics processing units(GPGPU) has received a lot of attention in the parallel computing community as it promises to offer a large computational power at a very low price. GPGPU is...
详细信息
Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, w...
详细信息
ISBN:
(纸本)9783642233975
Tuning numerical libraries has become more difficult over time, as systems get more sophisticated. In particular, modern multicore machines make the behaviour of algorithms hard to forecast and model. In this paper, we tackle the issue of tuning a dense QR factorization on multicore architectures using a fully empirical approach. We exhibit a few strong empirical properties that enable us to efficiently prune the search space. Our method is automatic, fast and reliable. the tuning process is indeed fully performed at install time in less than one hour and ten minutes on five out of seven platforms. We achieve an average performance varying from 97% to 100% of the optimum performance depending on the platform. this work is a basis for autotuning the PLASMA library and enabling easy performance portability across hardware systems.
In parallel programs concurrency bugs are often caused by unsynchronized accesses to shared memory locations, which are called data races. In order to support programmers in writing correct parallel programs, it is th...
详细信息
ISBN:
(纸本)9783642246494
In parallel programs concurrency bugs are often caused by unsynchronized accesses to shared memory locations, which are called data races. In order to support programmers in writing correct parallel programs, it is therefore highly desired to have tools on hand that automatically detect such data races. Today, most of these tools only consider unsynchronized read and write operations on a single memory location. Concurrency bugs that involve multiple accesses on a set of correlated variables may be completely missed. Tools may overwhelm programmers with data races on various memory locations, without noticing that the locations are correlated. In this paper, we propose a novel approach to data race detection that automatically infers sets of correlated variables and logical operations by analyzing data and control dependencies. For data race detection itself, we combine a modified version of the lockset algorithm with happens-before analysis providing the first hybrid, dynamic race detector for correlated variables. We implemented our approach on top of the Valgrind, a framework for dynamic binary instrumentation. Our evaluation confirmed that we can catch data races missed by existing detectors and provide additional information for correct bug fixing.
Branch and Bound (B&B) algorithms are highly parallelizable but they are irregular and dynamic load balancing techniques have been used to avoid idle processors. In previous work, authors use a dynamic number of t...
详细信息
ISBN:
(纸本)9780769543284
Branch and Bound (B&B) algorithms are highly parallelizable but they are irregular and dynamic load balancing techniques have been used to avoid idle processors. In previous work, authors use a dynamic number of threads at run time, which depends on the measured performance of the application for just one interval B&B algorithm running on the system. In this way, load balancing is achieved by thread generation decisions. In this work, we extend the study of these models to non-dedicated systems. In order to have a controlled testbed and comparable results, several instances of the interval global optimization algorithm are executed in the system, withthe same model and problem to solve. therefore, a non-dedicated system is simulated because the execution of one application affects the execution of the other instances. this paper discusses different methods and models to decide when a thread should be created. Experiments show which of the proposed methods performs best in terms of maximum running time per application, using the fewest running threads. Following this parallel programming methodology, which is well suited for other B&B codes, applications can adapt their parallelism level to their performance and load of the system (at run time). this work represents a step forward towards increasing the performance of parallel algorithm running in non-dedicated and heterogeneous systems. the adaptive model discussed in this work is able to reduce the overall execution time for a set of instances of the same application running simultaneously. It also exempts the user from specifying the number of threads each application should use.
Scientific experiments in many domains generate a huge amount of data whose size is in the range of hundreds of megabytes to petabytes. these data are stored on geographically distributed and heterogeneous resources. ...
详细信息
this paper designs and implements an intelligent ubiquitous sensor network architecture for agricultural and livestock farms which embrace a variety of sensors and create a great volume of sensor data records. For the...
详细信息
ISBN:
(纸本)9783642246685
this paper designs and implements an intelligent ubiquitous sensor network architecture for agricultural and livestock farms which embrace a variety of sensors and create a great volume of sensor data records. For the sake of efficiently and accurately detecting the specific events out of the great amount of sensor data which may include not just erroneous terms but also correlative attributes, the middleware module embeds an empirical event patterns and knowledge description. For the filtered data, data mining module opens an interface to define the relationship between the environmental aspect and facility control equipments, set the control action trigger condition, and integrate new event detection logic. Finally, the remote user interface for monitoring and control is implemented by on Microsoft Windows, Web, and mobile device applications.
暂无评论