In order to reduce the overhead of barrier synchronization, we have proposed an algorithm which eliminates barrier synchronizations and evaluated its validity experimentally in our previous study. As a result, we have...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
In order to reduce the overhead of barrier synchronization, we have proposed an algorithm which eliminates barrier synchronizations and evaluated its validity experimentally in our previous study. As a result, we have found that the algorithm is more effective to the load-imbalanced program than load-balanced program. However, the degree of the load balance has not been discussed quantitatively. In this paper, we model the behavior of parallel programs. In our model, the execution time of a phase contained in a parallel program is represented as a random variable. To investigate how the degree of the load balance influences the performance of our algorithm, we varied the coefficient of variation of probability distribution which the random variable follows. Using the model, we evaluated the execution time of parallel programs and found that theoretical results are consistent with experimental ones.
Discrete event simulation is a well known technique used for modeling and simulating complex parallel systems. parallel simulation introduces multiple simulated event queues processed in parallel. A proper synchroniza...
详细信息
ISBN:
(纸本)9783642552243
Discrete event simulation is a well known technique used for modeling and simulating complex parallel systems. parallel simulation introduces multiple simulated event queues processed in parallel. A proper synchronization between parallel queues must be introduced. Program global state monitoring is a natural way to organize global simulation state monitoring and control. Every queue process reports its progress state, being the timestamp of the most recently processed event, to a global synchronizer. Reporting is done asynchronously and has no influence on the simulation process. A global simulation state can be defined as the vector containing timestamps of the most recently processed event in every queue. the paper presents the principles of parallel simulation designed by the use of a system infrastructure for global states monitoring. Comparison to existing parallel simulation methods is provided.
Our research is focused on the simplification of parallel programming for distributed memory systems. Our goal is to build a unifying framework for creating, debugging, profiling, and verifying parallel applications. ...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
Our research is focused on the simplification of parallel programming for distributed memory systems. Our goal is to build a unifying framework for creating, debugging, profiling, and verifying parallel applications. the result of this effort is an open source tool Kaira. In this paper, we focus on prototyping of parallel applications. We have extended Kaira by the ability to generate parallel libraries. More precisely, we present a framework for fast prototyping of parallel numerical computations. We demonstrate our idea on a combination of parallel libraries generated by our tool Kaira and GNU Octave. Hence, a user can verify the idea in a short time, create a real running program and verify its performance and scalability.
this paper presents a flexible and reconfigurable inverse transform architecture supporting combined MPEG-2, H. 264 and HEVC video decoding standards. the proposed architecture uses an 8-point parallel and pipelined p...
详细信息
ISBN:
(纸本)9781849199247
this paper presents a flexible and reconfigurable inverse transform architecture supporting combined MPEG-2, H. 264 and HEVC video decoding standards. the proposed architecture uses an 8-point parallel and pipelined process for the implementation of 1-D IDCT/IDST algorithms, supports processing of flexible image block sizes including 2×2, 4×4, 8×8, 16×16 and 32×32. this design has been captured using HDL and synthesised using Xilinx Virtex-7 FPGA technology. Results show that the combined architecture can support the IDCT/IDST of MPEG-2, H. 264 and HEVC with approximately a 57% reduction in area compared to the combined area of separate coder designs.
MapReduce architecture has been considered as one of the most promising candidates for efficient and reliable big data mining. While current MapReduce is basically designed for data center and enterprise networks, in ...
详细信息
ISBN:
(纸本)9781479909599
MapReduce architecture has been considered as one of the most promising candidates for efficient and reliable big data mining. While current MapReduce is basically designed for data center and enterprise networks, in which a number of servers are interconnected with optical fiber cables, prospective MapReduce would be applied in optical-wireless environment such as optical-wireless data center network, fiber-wireless (FiWi) access network, and so forth. To modify MapReduce for optical-wireless hybrid network, we need to answer the fundamental research problem, "How does MapReduce architecture use optical and wireless resources for task allocation?" To answer this question, this paper reveals some challenging issues and proposes a context-aware task allocation scheme that is designed by considering characteristics of both optical and wireless communications. Our proposed task allocation scheme can minimize the completion time of big data processing. Numerical results are presented to demonstrate the effectiveness of our proposed method compared with existing task allocation schemes.
the paper presents Comcute which is a novel multi-level implementation of the volunteer based computing paradigm. Comcute was designed to let users donate the computing power of their PCs in a simplified manner, requi...
详细信息
ISBN:
(纸本)9783642552243
the paper presents Comcute which is a novel multi-level implementation of the volunteer based computing paradigm. Comcute was designed to let users donate the computing power of their PCs in a simplified manner, requiring only pointing their web browser at a specific web address and clicking a mouse. the server side appoints several servers to be in charge of execution of particular tasks. thanks to that the system can survive failures of individual computers and allow definition of redundancy of desired order. On the client side, computations are executed within web browsers using technologies such as Java, JavaScript, Adobe Flash etc. without the need for installation of additional software. this paper presents results of scalability experiments carried on the Comcute system.
Image processing and computer vision applications are used intensively in several domains in particular multimedia and medicine. the main challenge in developing such applications is how to guarantee both high accurac...
详细信息
ISBN:
(纸本)9781479948857
Image processing and computer vision applications are used intensively in several domains in particular multimedia and medicine. the main challenge in developing such applications is how to guarantee both high accuracy and low execution time. Accordingly, we observe two research directions: the first focuses on improving the algorithms and the second focuses on designing fast hardware platforms. In this paper, we propose an efficient parallel implementation of an accurate extended Canny edge detection algorithm suitable for medical applications on embedded many-core platform. the proposed implementation is running at a frame rate of 10 frames/s for image size of 512x512 with high accurate and smooth line edges.
the proceedings contain 71 papers. the special focus in this conference is on Simulated Evolution and Learning. the topics include: Solving dynamic optimisation problem with variable dimensions;a probabilistic evoluti...
ISBN:
(纸本)9783319135625
the proceedings contain 71 papers. the special focus in this conference is on Simulated Evolution and Learning. the topics include: Solving dynamic optimisation problem with variable dimensions;a probabilistic evolutionary optimization approach to compute quasiparticle braids;adaptive system design by a simultaneous evolution of morphology and information processing;generating software test data by particle swarm optimization;a steady-state genetic algorithm for the dominating tree problem;evolution of developmental timing for solving hierarchically dependent deceptive problems;the introduction of asymmetry on traditional 2-parent crossover operators for crowding and its effects;the performance effects of interaction frequency in parallel cooperative coevolution;customized selection in estimation of distribution algorithms;a hybrid GP-tabu approach to QoS-aware data intensive web service composition;a modified screening estimation of distribution algorithm for large-scale continuous optimization;clustering problems for more useful benchmarking of optimization algorithms;fuzzy clustering with fitness predator optimizer for multivariate data problems;effects of mutation and crossover operators in the optimization of traffic signal parameters;a GP approach to QoS -aware web service composition and selection;user preferences for approximation-guided multi-objective evolution;multi-objective optimisation, software effort estimation and linear models;adaptive update range of solutions in MOEA/D for multi and many-objective optimization;classification of lumbar ultrasound images with machine learning;schemata bandits for binary encoded combinatorial optimisation problems;anomaly detection using replicator neural networks trained on examples of one class and genetic programming for multiclass texture classification using a small number of instances.
the effective parallelization of processing exploiting the MPI library for the numerically exact quantum transfer matrix (QTM) and exact diagonalization (ED) deterministic simulations of chromium-based rings is propos...
详细信息
ISBN:
(数字)9783642551956
ISBN:
(纸本)9783642551956
the effective parallelization of processing exploiting the MPI library for the numerically exact quantum transfer matrix (QTM) and exact diagonalization (ED) deterministic simulations of chromium-based rings is proposed. In the QTM technique we have exploited parallelization of summation in the partition function. the efflciency of the QTM calculations is above 80% up to about 1000 processes. With our test programs we calculated low temperature torque, specific heat and entropy for the chromium ring Cr-8 exploiting realistic Hamiltonian with singleion anisotropy and the alternation of the nearest neighbor exchange couplings. Our parallelized ED technique makes use of the self-scheduling scheme and the longest processing time algorithm to distribute and diagonalize separate blocks of a Hamiltonian matrix by slave processes. Its parallelprocessing scales very well, with efflciency above 90% up to about 10 processes only. this scheme is improved by processing more input data sets in one job which leads to very good scalability up to arbitrary number of processes. the scaling is improved for both techniques when larger systems are considered.
Relevance feedback algorithms improve content-based image retrieval (CBIR) systems by effectively using relevant/non-relevant images labeled by users. the main constraint of these algorithms is the update time for lar...
详细信息
暂无评论