GPUs are increasingly used as compute accelerators. With a large number of cores executing an even larger number of threads, significant speed-ups can be attained for parallel workloads. applications that rely on atom...
详细信息
ISBN:
(纸本)9781479929870
GPUs are increasingly used as compute accelerators. With a large number of cores executing an even larger number of threads, significant speed-ups can be attained for parallel workloads. applications that rely on atomic operations, such as histogram and Hough transform, suffer from serialization of threads in case they update the same memory location. Previous work shows that reducing this serialization with software techniques can increase performance by an order of magnitude. We observe, however, that some serialization remains and still slows down these applications. Therefore, this paper proposes to use a hash function in both the addressing of the banks and the locks of the scratchpad memory. To measure the effects of these changes, we first implement a detailed model of atomic operations on scratchpad memory in GPGPU-Sim, and verify its correctness. Second, we test our proposed hardware changes. They result in a speed-up up to 4.9x and 1.8x on implementations utilizing the aforementioned software techniques for histogram and Hough transform applications respectively, with minimum hardware costs.
The intricate properties and relevance of graph data make it difficult to collect graph statistics privately via differential privacy (DP). Traditional centralized or local DP on graph data, face challenges like third...
详细信息
Data Management requires computing devices DIM can perform data processes to form better information. With the development of data, the processor can be done with one unit only, over time required computing devices th...
详细信息
ISBN:
(纸本)9781538693780
Data Management requires computing devices DIM can perform data processes to form better information. With the development of data, the processor can be done with one unit only, over time required computing devices that have high performance. parallel Computing is one of the techniques of doing computing simultaneously by utilizing several independent computers simultaneously. parallel computers can be grouped according to the level at which hardware supports parallelism This classification is generally imalogous to the distance between basic computing nodes. This research will focus on looking at the widely used classification trends in this parallelism that affect the performance of these calculations. This study uses a systematic literature review to find many classifications in parallel computing. literature is taken from a reputable journal database is ACM Digital Library, IEEE Xplore Digital Library, Science Diroct, Ernorald Insight. The results of this study are mostly conducted in the United States and China so as to provide many contributions. classification of parallelism, mostly done in parallel computing include distributedparallel, Multi-Core Processor, Massively parallel Computing, and Graph processing Unit (GPU). In this study also illustrates the advantages in the application of computer parallel based on its classification. In essence the advantages in the application of computer parallel improve performance performance, as well as effective and efficiency in a process that is done
Multiple-disk architectures are an attractive approach to meet high performance I/O demands in I/O intensive applications such as search engines, web servers and information retrieval systems. This requires that the i...
详细信息
ISBN:
(纸本)9780889867741
Multiple-disk architectures are an attractive approach to meet high performance I/O demands in I/O intensive applications such as search engines, web servers and information retrieval systems. This requires that the issues of dynamic load balancing and access parallelism be addressed, which is the goal of this paper. In this paper we address the problem of document declus-tering in a keyword-based information retrieval system for parallel architectures consisting of a single processor and multiple disks. In the vector-space information retrieval model the inverse-document-frequency factor was found to improve the query performance. We study whether this is the case for declustering as well. We propose and evaluate experimentally four similarity-based methods, viz., vector, euclidean, vector without idf, and euclidean without idf for declus-tering documents. Interestingly, our results show that the vector method significantly outperforms the vector without idf, but the euclidean method is only slightly superior to the euclidean method without idf in one scenario and too close to call in another scenario. The vector method is the best for the so-called simple plan. We also introduce a highest-frequency first retrieval scenario and compare the methods under this scenario, and find the methods are too close to call in this case.
distributed stream processing systems (DSPSs) have enabled us to build scalable, fast, and reactive applications. As faults are common in any distributed system, DSPSs allow for recovering from faults using various te...
详细信息
080 M40 Medium carbon steel has potential applications in axles, shafts, bolts, studs, spindles, automotive components and many more. In high friction applications these steels possess better wear resistance, and afte...
详细信息
080 M40 Medium carbon steel has potential applications in axles, shafts, bolts, studs, spindles, automotive components and many more. In high friction applications these steels possess better wear resistance, and after weld overlay, they can exhibit superior properties. Additive manufacturing (AM) is an advanced method in which material in the form of powder or wire is used to form a desired shape. Laser Metal Deposition (LMD) is an AM method which is used in industrial application to recreate, repair and produce corrosion resistance weld overlay on a part. In comparison with conventional overlay techniques such as Plasma Transfer Arc (PTA), Metal Inert Gas (MIG) & Tungsten Inert Gas (TIG);Laser weld overlays are faster techniques which can make thin sections with good metallurgical bond and with high productivity and flexibility. Apart from that it has also advantage of low heat affected zone, minimum metal dilution, and can produce thin layers with good aesthetics. In this paper, 080 M40 steel samples were used as substrate and WC- NiCr powder was weld overlayed using LMD method. Such kind of weld overlays will be had potential application in the field of mining. The weld overlay was deposited in parallel layered fashion and effect of set of parameters like feed rate, dilution, size of the bead and processing speed were used to produce weld overlays of 1 mm thickness. metallurgical, micro structural and mechanical properties of these weld overlay materials have been investigated. These properties are very important in the processing of mining tools as well as other parts produced using the LMD process and also for other future applications. Further effect of laser processing parameters on the resulting properties of these steels weld overlays are discussed. (c) 2022 Elsevier Ltd. All rights reserved. Selection and peer-review under responsibility of the scientific committee of the 5th internationalconference on Advances in Steel, Power and Construction Technology
Load imbalances are a major reason for efficiency loss in highly parallelapplications. Hence, their identification is of high relevance in performance analysis and tuning. We present a low-overhead approach to automa...
详细信息
ISBN:
(纸本)9783030856656;9783030856649
Load imbalances are a major reason for efficiency loss in highly parallelapplications. Hence, their identification is of high relevance in performance analysis and tuning. We present a low-overhead approach to automatically identify load-imbalanced regions and filter out irrelevant ones based on new selection heuristics in our PIRA tool for automatic instrumentation refinement for the Score-P measurement system. For the LULESH mini-app as well as the Ice-sheet and Sea-level System Model simulation package we, thus, correctly identify existing load imbalances while maintaining a runtime overhead of less than 10% for all but one input. Moreover, the traces generated are suitable for Scalasca's automatic trace analysis.
In recent years, heterogeneous hardware have generalized in almost all supercomputer nodes, requiring a profound shift on the way numerical applications are implemented. This paper, illustrates the design and implemen...
详细信息
ISBN:
(纸本)9783319969831;9783319969824
In recent years, heterogeneous hardware have generalized in almost all supercomputer nodes, requiring a profound shift on the way numerical applications are implemented. This paper, illustrates the design and implementation of a seismic wave propagation simulator, based on the finite-differences numerical scheme, and specifically tailored for such massively parallel hardware infrastructures. The application data-flow is built on top of PaRSEC, a generic task-based runtime system. The numerical kernels, designed for maximizing data reuse can efficiently leverage large SIMD units available in modern CPU cores. A strong scalability study on a cluster of Intel KNL processors illustrates the application performances.
As the medical environment becomes increasingly electronic, clinical databases are continually growing, accruing masses of patient information. This wealth of data is an invaluable source of information to researchers...
详细信息
As the medical environment becomes increasingly electronic, clinical databases are continually growing, accruing masses of patient information. This wealth of data is an invaluable source of information to researchers, serving as a testbed for the development of new information technologies and as a repository of real-world data for data mining and population-based studies. However, the true utility of this information is not fulfilled, in part because of issues pertaining to security and patient confidentiality, but also due to the lack of an effective infrastructure to access the data. This paper describes a system, DataServer, that permits researchers to query and retrieve data from multiple clinical data sources, automatically deidentifying patient data so that it can be used for research purposes. DataServer functions as an application framework, enabling extensible markup language (XML)-based querying of existing medical databases. Key aspects of DataServer include ready inclusion of new information resources, minimal processing impact on existing clinical systems via a distributed cache, and flexible output representation via XSL (eXtensible Style Language) transforms.
In this papers the precision single-board, 8 channels, 24 bit sigma-delta ADC with on-line remote reprogramming mode is described. The results of experimental researches of the noise immunity of the developed ADC in r...
详细信息
暂无评论