检索结果-内蒙古大学图书馆

Latency-aware adaptive micro-batching techniques for streamed data compression on graphics processing units

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2021年第11期33卷 e5786-e5786页

作者： Stein, Charles M. Rockenbach, Dinei A. Griebler, Dalvan Torquati, Massimo Mencagli, Gabriele Danelutto, Marco Fernandes, Luiz G. Tres de Maio Fac SETREM Lab Adv Res Cloud Comp Tres De Maio Brazil Pontifical Catholic Univ Rio Grande Sch Technol Ipiranga Ave Porto Alegre 6681 RS Brazil Univ Pisa Comp Sci Dept Pisa Italy

Stream processing is a parallel paradigm used in many application domains. With the advance of graphics processing units (GPUs), their usage in stream processing applications has increased as well. The efficient utilization of GPU accelerators in streaming scenarios requires to batch input elements in microbatches, whose computation is offloaded on the GPU leveraging data parallelism within the same batch of data. Since data elements are continuously received based on the input speed, the bigger the microbatch size the higher the latency to completely buffer it and to start the processing on the device. Unfortunately, stream processing applications often have strict latency requirements that need to find the best size of the microbatches and to adapt it dynamically based on the workload conditions as well as according to the characteristics of the underlying device and network. In this work, we aim at implementing latency-aware adaptive microbatching techniques and algorithms for streaming compression applications targeting GPUs. The evaluation is conducted using the Lempel-Ziv-Storer-Szymanski compression application considering different input workloads. As a general result of our work, we noticed that algorithms with elastic adaptation factors respond better for stable workloads, while algorithms with narrower targets respond better for highly unbalanced workloads.

关键词： data compression algorithms dynamic reconfiguration parallel programming service level objective stream parallelism stream processing

来源：评论

学校读者我要写书评

暂无评论

Matrix Multiplication Analysis on Sequential and parallel Computation using CUDA 1

Matrix Multiplication Analysis on Sequential and Parallel Co...

引用

1st International Conference on Technology Innovation and Its Applications, ICTIIA 2022

作者： Hudi, Robertus Kartika, Alessandro Luiz Marcell, Dave Joshua Renatan, Winston Universitas Pelita Harapan Computer Science Department Tangerang Indonesia

ISBN: (数字)9781665488266

ISBN: (纸本)9781665488266

This paper aimed to implement both sequential and parallel implementations using CUDA on matrix multiplication to see the differences and effects of it, followed by an analysis of the result. We used the algorithm as that will be elaborated more on the paper, here we would like to generally compare its memory consumption and run time. It is found out that parallel implementation runs faster on an average of 31.23 compared to sequential implementation running the same task. This faster result on parallel programming comes with a trade-off that it consumes more memory rather than sequential implementation. Optimum threads to be used in parallel programming is also needed to be found, here we try to find it with trial-and-error testing. Further research would involve more complex parallel programming implementation and a more controlled testing environment. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Precision Tuning in parallel Applications 13

Precision Tuning in Parallel Applications

引用

13th Workshop on parallel programming and Run-Time Management Techniques for Many-Core Architectures and 11th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms, PARMA-DITAM 2022

作者： Magnani, Gabriele Denisov, Lev Cattaneo, Daniele Agosta, Giovanni DEIB Politecnico di Milano Italy

ISBN: (纸本)9783959772310

Nowadays, parallel applications are used every day in high performance computing, scientific computing and also in everyday tasks due to the pervasiveness of multi-core architectures. However, several implementation challenges have so far stifled the integration of parallel applications and automatic precision tuning. First of all, tuning a parallel application introduces difficulties in the detection of the region of code that must be affected by the optimization. Moreover, additional challenges arise in handling shared variables and accumulators. In this work we address such challenges by introducing OpenMP parallel programming support to the TAFFO precision tuning framework. With our approach we achieve speedups up to 750% with respect to the same parallel application without precision tuning. © 2022 Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing. All rights reserved.

关键词： Compilers parallel programming Precision Tuning

来源：评论

学校读者我要写书评

暂无评论

A multi-skilled workforce optimisation in maintenance logistics networks by multi-thread simulated annealing algorithms

引用

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH 2021年第9期59卷 2624-2646页

作者： Turan, Hasan Huseyin Kosanoglu, Fuat Atmis, Mahir Univ New South Wales Capabil Syst Ctr Canberra ACT Australia Yalova Univ Dept Ind Engn Yalova Turkey

The sustainability of service and manufacturing operations rely heavily on the availability of equipment and assets. High availability of assets can be achieved with effective maintenance strategies. In this direction, we study a multi-skilled workforce planning problem to establish a resilient maintenance service network for high-value assets. We improve the efficiency of the maintenance network by optimising the workforce capacity in repair shops and achieving workforce heterogeneity by cross-training. As a solution strategy, we develop a two-stage iterative heuristic algorithm. At the first stage, the set of all feasible cross-training policies is effectively and systematically searched via a state-of-art multi-thread simulated annealing (MTSA) metaheuristic to find a policy(ies) that achieves the minimum cost. Further, the developed MTSA algorithm is enhanced with the multi-neighbourhood feature to escape from local optimality and implemented via parallel programming techniques. In the second stage, workforce capacity and spare parts inventory levels are optimised for the cross-training policy found at the first stage by a queuing approximation and a greedy heuristic. The MTSA obtains the lowest cost in 91 cases out of 128 compared to genetic algorithm (GA), variable neighbourhood search (VNS), an improved single-thread simulated annealing algorithm (SA) and integer programming-based clustering (IPBC) algorithms.

关键词： maintenance logistics repair shop cross-training multi-thread simulated annealing queuing approximation parallel programming

来源：评论

学校读者我要写书评

暂无评论

How fast can parallel programming be taught to undergraduate students?

引用

IEEE Potentials 2013年第4期32卷 28-29页

作者： Falcao, Gabriel Department of Electrical and Computer Engineering University of Coimbra Instituto de Telecomunicacoes Coimbra 3030-290 Portugal

parallel computers are everywhere. Over the last few years, a change of paradigm occurred in the computer industry. Mainly due to power dissipation constraints and memory access time limitations, rather than increasing the processor?s frequency of operation (the usual strategy), computer manufacturers started introducing more cores per chip. This created the potential for increasing processing performance but also posed new challenges, namely regarding the extra level of effort required for programmers to exploit these new processing machines. Furthermore, the variety of multicore and manycore architectures commercially available is now relatively high. Since most use distinct programming models and languages, the challenge becomes even more significant for a programmer that wishes to develop programs for more than a single architecture. The immediate question that I?ve had for quite a long time is: if multicore computers have gone mainstream, why are we still teaching sequential programming to our undergraduate students? Why do we tell them to use only one of an increasingly number of available cores? © 1988-2012 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Predicting the Soft Error Vulnerability of parallel Applications Using Machine Learning

引用

INTERNATIONAL JOURNAL OF parallel programming 2021年第3期49卷 410-439页

作者： Oz, Isil Arslan, Sanem Izmir Inst Technol Comp Engn Dept Izmir Turkey Marmara Univ Comp Engn Dept Istanbul Turkey

With the widespread use of the multicore systems having smaller transistor sizes, soft errors become an important issue for parallel program execution. Fault injection is a prevalent method to quantify the soft error rates of the applications. However, it is very time consuming to perform detailed fault injection experiments. Therefore, prediction-based techniques have been proposed to evaluate the soft error vulnerability in a faster way. In this work, we present a soft error vulnerability prediction approach for parallel applications using machine learning algorithms. We define a set of features including thread communication, data sharing, parallel programming, and performance characteristics;and train our models based on three ML algorithms. This study uses the parallel programming features, as well as the combination of all features for the first time in vulnerability prediction of parallel programs. We propose two models for the soft error vulnerability prediction: (1) A regression model with rigorous feature selection analysis that estimates correct execution rates, (2) A novel classification model that predicts the vulnerability level of the target programs. We get maximum prediction accuracy rate of 73.2% for the regression-based model, and achieve 89% F-score for our classification model.

关键词： Soft error analysis Fault injection parallel programming Machine Learning

来源：评论

学校读者我要写书评

暂无评论

Gbit/s Throughput Under 6.3-W Lossless Hyperspectral Image Compression on parallel Embedded Devices

引用

IEEE EMBEDDED SYSTEMS LETTERS 2021年第1期13卷 13-16页

作者： Ferraz, Oscar Falcao, Gabriel Silva, Vitor Univ Coimbra Inst Telecomunicacoes Dept Elect & Comp Engn P-3030290 Coimbra Portugal

The consultative committee for space data system (CCSDS)-123 is a standard for lossless compression of multispectral and hyperspectral images with applications in on-board power-constrained systems, such as satellites and military drones. This letter explores the low-power heterogeneous architecture of the Nvidia Jetson TX2 by proposing a parallel solution to the CCSDS-123 compressor on embedded systems, reducing development effort compared with the production of dedicated circuits, while maintaining low energy consumption. This solution parallelizes the predictor on a low-power graphics processing unit (GPU) while the encoders exploit the heterogeneous multiple cores of the CPUs and GPU concurrently. We report more than 16.6 Gb/s for the predictor and 1.4-Gb/s for the whole system, requiring less than 6.3 W and providing an efficiency of 245.6 Mb/s/W.

关键词： Consultative committee for space data systems (CCSDSs)-123 lossless compression low power graphics processing units (GPUs) multispectral and hyperspectral image compression parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallelizing Multi-Keys RSA Encryption Algorithm Using OpenMP 14

Parallelizing Multi-Keys RSA Encryption Algorithm Using Open...

引用

14th IEEE International Conference on Computational Intelligence and Communication Networks, CICN 2022

作者： Alzaher, Reem Hantom, Wafa Aldweesh, Alanoud Allah, Nasro Min College of Computer Science and Information Technology Imam Abdulrahman Bin Faisal University Dammam Saudi Arabia

ISBN: (纸本)9781665487719

The RSA algorithm is an asymmetric encryption algorithm used to ensure the confidentiality and integrity of data as it travels across networks. Security has grown in importance over time, resulting into more data requiring encryption. parallelization represents an ideal solution to speed up the encryption and decryption processes. An advance implementation of RSA using parallelization concept leads to improve security and performance. In this paper, we represent a parallelized version of Multi-Keys RSA algorithm implemented using OpenMP library. Furthermore, we provide parallel implementation of Multi-Keys RSA under both static and dynamic scheduling with different chunk sizes, and our experimental results show that static scheduling is more optimum for RSA cryptography as compared to dynamic. As a final result, we have achieved an average speed up of 4.4 and efficiency of 0.7. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

The GR1 Algorithm for Subgraph Isomorphism. A Study from parallelism to Quantum Computing 1

引用

2nd International Conference on Computer Vision, High-Performance Computing, Smart Devices, and Networks, CHSN 2021

作者： Radu-Iulian, Gheorghica Faculty of Mathematics and Computer Science Babeş-Bolyai University Mihail Kogălniceanu Street nr. 1 Cluj County Cluj-Napoca City400084 Romania

ISBN: (数字)9789811698859

ISBN: (纸本)9789811698842

In this paper was described the GR1 algorithm that provides feasible execution times for the subgraph isomorphism problem. It is a parallel algorithm that uses a variant of the producer–consumer pattern. It was designed to easily accept and interchange different pruning techniques. The results obtained are occurrences of different query graphs in a RI human protein-to-protein interaction data graph (Ferro et al., [18], Szklarczyk et al., Nucleic Acids Res 39, 2011 [20]). This is the graph in which the algorithm will execute the search. The execution times are feasible for increasingly larger query graphs (from three up to twenty nodes) and with the included quantum computing approach were obtained superior results. The work consists of implementing and testing the algorithm in an original way starting from a simple multiprocessing example (Python multiprocessing producer consumer pattern, [28]) and then writing and adapting it for use with multiple consumer processes, undirected graphs and motif finding. There are also two tables containing the average execution times. The two tables represent two series of test cases. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

OpenTimer v2: A New parallel Incremental Timing Analysis Engine

引用

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2021年第4期40卷 776-789页

作者： Huang, Tsung-Wei Guo, Guannan Lin, Chun-Xun Wong, Martin D. F. Univ Utah Dept Elect & Comp Engn Salt Lake City UT 84112 USA Univ Illinois Elect & Comp Engn Dept Urbana IL 61801 USA

Since the first release in 2015, OpenTimer v1 has been used in many industrial and academic projects for analyzing the timing of custom designs. After four-year research and developments, we have announced OpenTimer v2-a major release that efficiently supports: 1) a new task-based parallel incremental timing analysis engine to break through the performance bottleneck of existing loop-based methods;2) a new application programming interface (API) concept to exploit high degrees of parallelisms;and 3) an enhanced support for industry-standard design formats to improve user experience. Compared with OpenTimer v1, we rearchitect v2 with a modern C++ programming language and advanced parallel computing techniques to largely improve the tool performance and usability. For a particular example, OpenTimer v2 achieved up to 5.33x speedup over v1 in incremental timing, and scaled higher with increasing cores. Our contributions include both technical innovations and engineering knowledge that are open and accessible to promote timing research in the community.

关键词： Computer-aided analysis parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：