High Energy Physics (HEP) experiments at the LHC collider at CERN were among the first scientific communities with very high computing requirements. Nowadays, researchers in other scientific domains are in need of sim...
详细信息
ISBN:
(纸本)9783319499567;9783319499550
High Energy Physics (HEP) experiments at the LHC collider at CERN were among the first scientific communities with very high computing requirements. Nowadays, researchers in other scientific domains are in need of similar computational power and storage capacity. Solution for the HEP experiments was found in the form of computational grid - distributedcomputing infrastructure integrating large number of computing centers based on commodity hardware. these infrastructures are very well suited for High throughput applications used for analysis of large volumes of data with trivial parallelization in multiple independent execution threads. More advanced applications in HEP and other scientific domains can exploit complex parallelization techniques using multiple interacting execution threads. A growing number of High Performance computing (HPC) centers, or supercomputers, support this mode of operation. One of the software toolkits developed for building distributedcomputing systems is the DIRAC Interware. It allows seamless integration of computing and storage resources based on different technologies into a single coherent system. this product was very successful to solve problems of large HEP experiments and was upgraded in order to offer a general-purpose solution. the DIRAC Interware can help including also HPC centers into a common federation to achieve similar goals as for computational grids. However, integration of HPC centers imposes certain requirements on their internal organization and external connectivity presenting a complex co-design problem. A distributed infrastructure including supercomputers is planned for construction. It will be applied for inter-disciplinary large-scale problems of modern science and technology.
Transactional memory is a perspective abstraction for the creating a scalable parallel programs for multi-core systems. It will be included in C++17. In this work, are proposed optimization method of conflicts detecti...
详细信息
Transactional memory is a perspective abstraction for the creating a scalable parallel programs for multi-core systems. It will be included in C++17. In this work, are proposed optimization method of conflicts detection, that accur in parallel programs withthe software transactional memory during execution. the autors have implemented a module for GCC compiler for profiling parallel programs with software transactional memory and a tool for adaptive tuning runtime-library. the efficiency of method is investigated on the STAMP benchmarks.
Graph-mining is a class of data-mining problems where programs involve the processing of data modeled as graphs. these applications often exhibit irregular and data-dependent communication patterns, hampering parallel...
详细信息
ISBN:
(纸本)9781509020881
Graph-mining is a class of data-mining problems where programs involve the processing of data modeled as graphs. these applications often exhibit irregular and data-dependent communication patterns, hampering parallelization opportunities on distributed architectures. Many tools and frameworks were created for the scalable processing of graphs but their comparison is non-trivial on distributed architectures as there is no efficiency metrics with respect to distributed resource usage. Considering an in-house use-case, program trace analysis for parallelization optimizations, we study the benefits and limits of a graph-processing framework for a tangible application. the algorithm was implemented using GraphLab and executed on a humble 7-node commodity cluster with input instances up to 40 million vertices and 50 million edges. We propose in this paper an in-depth analysis of the GraphLab system to evaluate its performance and scalability in the context of program trace analysis. the analysis is driven both by traditional and domain-specific metrics and contributes to a better understanding of the system behavior.
Integration of microgrid to grid and utilization of microgrid to share a significant load has brought revolution in the field of inverter control strategies. Modern power electronic inverters are expected to be equipp...
详细信息
ISBN:
(纸本)9789380544199
Integration of microgrid to grid and utilization of microgrid to share a significant load has brought revolution in the field of inverter control strategies. Modern power electronic inverters are expected to be equipped withparallel operation capabilities along with load sharing control design. this paper analyzes different control strategies to improve the operation of hybrid power system PV-diesel generator-battery plant. the four control strategies are proposed according to primary and secondary control of frequency. Each control strategy analyzed according to two criterions: the frequency deviations and the performance of the diesel *** are done in DigSILENT Power Factory.
parallel programs implementing stochastic cellular automata (CA) model of electronhole recombination in an inhomogeneous semiconductor for two- and three-dimensional cases are developed. the spatio-temporal distributi...
详细信息
parallel programs implementing stochastic cellular automata (CA) model of electronhole recombination in an inhomogeneous semiconductor for two- and three-dimensional cases are developed. the spatio-temporal distributions of particles are investigated by the CA simulation. Spatial separation of electrons and holes with clusters formation is found and analyzed. parallel implementation of the CA model allows us to calculate integral characteristics of the recombination process (particle densities and radiative intensity) in acceptable time. Recombination kinetics in the vicinity of the recombination centers and diffusion in two- and three-dimensional space is investigated using the parallel program.
Nowadays many job schedulers rely on checkpoint mechanisms to make long-running batch jobs resilient to node failures. At large scale stopping a job and creating its image consumes considerable amount of time. the aim...
详细信息
ISBN:
(纸本)9781509020881
Nowadays many job schedulers rely on checkpoint mechanisms to make long-running batch jobs resilient to node failures. At large scale stopping a job and creating its image consumes considerable amount of time. the aim of this study is to propose a method that eliminates this overhead. For this purpose we decompose a problem being solved into computational microkernels which have strict hierarchical dependence on each other. When a kernel abruptly stops its execution due to a node failure, it is responsibility of its principal to restart computation on a healthy node. In the course of experiments we successfully applied this method to make hydrodynamics HPC application run on constantly changing number of nodes. We believe, that this technique can be generalised to other types of scientific applications as well.
In an application that estimates the movement of pedestrians in urban areas utilizing an advancing person re-identification technique as a video analysis scheme, a massive number of simultaneous similarity searches of...
详细信息
ISBN:
(纸本)9781467388450
In an application that estimates the movement of pedestrians in urban areas utilizing an advancing person re-identification technique as a video analysis scheme, a massive number of simultaneous similarity searches of feature data, which represent a person's characteristics as numerical values, is required. the system should be able to process over 10,000 people per minute if a large-scale urban facility is assumed. However, the computation cost of similarity searches is high and the size of the feature data extracted from a video become rather large. these properties constitute the obstacles for large-scale estimations using live videos. We propose a novel design of a live video analysis system, which executes the processes of feature data extraction and similarity searches using parallel computations on distributed server nodes connected via a peer-to-peer network. We implemented the system on a testbed and evaluated its performance using a real dataset of a large-scale facility, applying an existing face recognition technique as a person re-identification scheme, and confirmed that the processes can be completed within a minute.
the VLSI technology that is in place today caters to almost all technology based products but as the need for speed and space increases VLSI technology might not be able to keep up withthe demand. there are a lot of ...
详细信息
ISBN:
(纸本)9789380544199
the VLSI technology that is in place today caters to almost all technology based products but as the need for speed and space increases VLSI technology might not be able to keep up withthe demand. there are a lot of alternatives that are being researched on but very few can match VLSI in terms of performance. technologies such as nanotechnology, photonics, quantum computing look promising in that ***-dot Cellular Automata(QCA) is one such technology which possibly can replace VLSI at the same time provides higher processing speed while occupying lesser space. In this paper we propose a design for([I])Serial Input parallel Output (SIPO)and (([I])serial Input Serial Output (SISO) registers withthe help of QCA technology Keywords Shift register, Quantum dot, QCA technology
this paper describes a parallel implementation of FRiS-Tax text document clustering algorithm. the clustering algorithm is based on an assessment of the similarity between objects in the competitive situation that lea...
详细信息
this paper describes a parallel implementation of FRiS-Tax text document clustering algorithm. the clustering algorithm is based on an assessment of the similarity between objects in the competitive situation that leads to the concept of competitive similarity function (FRiS-function). As the scales for determination of the similarity measures are selected attributes of bibliographic description of documents. the parallelization is performed on the step of coefficient tuning in similarity measure formula of the genetic algorithm, as well as directly in step of clustering. the clustering algorithm is implemented on a highperformance MPJ Express platform. Quantitative evaluation of the execution time of the process is performed, clearly demonstrating the advantages of parallel implementation of the algorithm.
this paper solves the problem of increasing the efficiency of the algorithm for the disparity map formation from stereo images. A three-dimensional model of the scene is reconstructed using this disparity map. the mos...
详细信息
this paper solves the problem of increasing the efficiency of the algorithm for the disparity map formation from stereo images. A three-dimensional model of the scene is reconstructed using this disparity map. the most computationally complex stage of the algorithm is determination of the relative points' shifts in stereo images. In previous paper we proposed an efficient algorithm for disparity map formation in which a pyramid of images was formed to improve the efficiency and reliability. this paper is dedicated to the study of the efficiency of the corresponding parallel algorithm implementation in CUDA environment.
暂无评论