检索结果-内蒙古大学图书馆

How fast can parallel programming be taught to undergraduate students?

IEEE Potentials 2013年第4期32卷 28-29页

作者： Falcao, Gabriel Department of Electrical and Computer Engineering University of Coimbra Instituto de Telecomunicacoes Coimbra 3030-290 Portugal

parallel computers are everywhere. Over the last few years, a change of paradigm occurred in the computer industry. Mainly due to power dissipation constraints and memory access time limitations, rather than increasing the processor?s frequency of operation (the usual strategy), computer manufacturers started introducing more cores per chip. This created the potential for increasing processing performance but also posed new challenges, namely regarding the extra level of effort required for programmers to exploit these new processing machines. Furthermore, the variety of multicore and manycore architectures commercially available is now relatively high. Since most use distinct programming models and languages, the challenge becomes even more significant for a programmer that wishes to develop programs for more than a single architecture. The immediate question that I?ve had for quite a long time is: if multicore computers have gone mainstream, why are we still teaching sequential programming to our undergraduate students? Why do we tell them to use only one of an increasingly number of available cores? © 1988-2012 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Predicting the Soft Error Vulnerability of parallel Applications Using Machine Learning

引用

INTERNATIONAL JOURNAL OF parallel programming 2021年第3期49卷 410-439页

作者： Oz, Isil Arslan, Sanem Izmir Inst Technol Comp Engn Dept Izmir Turkey Marmara Univ Comp Engn Dept Istanbul Turkey

With the widespread use of the multicore systems having smaller transistor sizes, soft errors become an important issue for parallel program execution. Fault injection is a prevalent method to quantify the soft error rates of the applications. However, it is very time consuming to perform detailed fault injection experiments. Therefore, prediction-based techniques have been proposed to evaluate the soft error vulnerability in a faster way. In this work, we present a soft error vulnerability prediction approach for parallel applications using machine learning algorithms. We define a set of features including thread communication, data sharing, parallel programming, and performance characteristics;and train our models based on three ML algorithms. This study uses the parallel programming features, as well as the combination of all features for the first time in vulnerability prediction of parallel programs. We propose two models for the soft error vulnerability prediction: (1) A regression model with rigorous feature selection analysis that estimates correct execution rates, (2) A novel classification model that predicts the vulnerability level of the target programs. We get maximum prediction accuracy rate of 73.2% for the regression-based model, and achieve 89% F-score for our classification model.

关键词： Soft error analysis Fault injection parallel programming Machine Learning

来源：评论

学校读者我要写书评

暂无评论

Gbit/s Throughput Under 6.3-W Lossless Hyperspectral Image Compression on parallel Embedded Devices

引用

IEEE EMBEDDED SYSTEMS LETTERS 2021年第1期13卷 13-16页

作者： Ferraz, Oscar Falcao, Gabriel Silva, Vitor Univ Coimbra Inst Telecomunicacoes Dept Elect & Comp Engn P-3030290 Coimbra Portugal

The consultative committee for space data system (CCSDS)-123 is a standard for lossless compression of multispectral and hyperspectral images with applications in on-board power-constrained systems, such as satellites and military drones. This letter explores the low-power heterogeneous architecture of the Nvidia Jetson TX2 by proposing a parallel solution to the CCSDS-123 compressor on embedded systems, reducing development effort compared with the production of dedicated circuits, while maintaining low energy consumption. This solution parallelizes the predictor on a low-power graphics processing unit (GPU) while the encoders exploit the heterogeneous multiple cores of the CPUs and GPU concurrently. We report more than 16.6 Gb/s for the predictor and 1.4-Gb/s for the whole system, requiring less than 6.3 W and providing an efficiency of 245.6 Mb/s/W.

关键词： Consultative committee for space data systems (CCSDSs)-123 lossless compression low power graphics processing units (GPUs) multispectral and hyperspectral image compression parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallelizing Multi-Keys RSA Encryption Algorithm Using OpenMP 14

Parallelizing Multi-Keys RSA Encryption Algorithm Using Open...

引用

14th IEEE International Conference on Computational Intelligence and Communication Networks, CICN 2022

作者： Alzaher, Reem Hantom, Wafa Aldweesh, Alanoud Allah, Nasro Min College of Computer Science and Information Technology Imam Abdulrahman Bin Faisal University Dammam Saudi Arabia

ISBN: (纸本)9781665487719

The RSA algorithm is an asymmetric encryption algorithm used to ensure the confidentiality and integrity of data as it travels across networks. Security has grown in importance over time, resulting into more data requiring encryption. parallelization represents an ideal solution to speed up the encryption and decryption processes. An advance implementation of RSA using parallelization concept leads to improve security and performance. In this paper, we represent a parallelized version of Multi-Keys RSA algorithm implemented using OpenMP library. Furthermore, we provide parallel implementation of Multi-Keys RSA under both static and dynamic scheduling with different chunk sizes, and our experimental results show that static scheduling is more optimum for RSA cryptography as compared to dynamic. As a final result, we have achieved an average speed up of 4.4 and efficiency of 0.7. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

The GR1 Algorithm for Subgraph Isomorphism. A Study from parallelism to Quantum Computing 1

引用

2nd International Conference on Computer Vision, High-Performance Computing, Smart Devices, and Networks, CHSN 2021

作者： Radu-Iulian, Gheorghica Faculty of Mathematics and Computer Science Babeş-Bolyai University Mihail Kogălniceanu Street nr. 1 Cluj County Cluj-Napoca City400084 Romania

ISBN: (数字)9789811698859

ISBN: (纸本)9789811698842

In this paper was described the GR1 algorithm that provides feasible execution times for the subgraph isomorphism problem. It is a parallel algorithm that uses a variant of the producer–consumer pattern. It was designed to easily accept and interchange different pruning techniques. The results obtained are occurrences of different query graphs in a RI human protein-to-protein interaction data graph (Ferro et al., [18], Szklarczyk et al., Nucleic Acids Res 39, 2011 [20]). This is the graph in which the algorithm will execute the search. The execution times are feasible for increasingly larger query graphs (from three up to twenty nodes) and with the included quantum computing approach were obtained superior results. The work consists of implementing and testing the algorithm in an original way starting from a simple multiprocessing example (Python multiprocessing producer consumer pattern, [28]) and then writing and adapting it for use with multiple consumer processes, undirected graphs and motif finding. There are also two tables containing the average execution times. The two tables represent two series of test cases. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

OpenTimer v2: A New parallel Incremental Timing Analysis Engine

引用

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2021年第4期40卷 776-789页

作者： Huang, Tsung-Wei Guo, Guannan Lin, Chun-Xun Wong, Martin D. F. Univ Utah Dept Elect & Comp Engn Salt Lake City UT 84112 USA Univ Illinois Elect & Comp Engn Dept Urbana IL 61801 USA

Since the first release in 2015, OpenTimer v1 has been used in many industrial and academic projects for analyzing the timing of custom designs. After four-year research and developments, we have announced OpenTimer v2-a major release that efficiently supports: 1) a new task-based parallel incremental timing analysis engine to break through the performance bottleneck of existing loop-based methods;2) a new application programming interface (API) concept to exploit high degrees of parallelisms;and 3) an enhanced support for industry-standard design formats to improve user experience. Compared with OpenTimer v1, we rearchitect v2 with a modern C++ programming language and advanced parallel computing techniques to largely improve the tool performance and usability. For a particular example, OpenTimer v2 achieved up to 5.33x speedup over v1 in incremental timing, and scaled higher with increasing cores. Our contributions include both technical innovations and engineering knowledge that are open and accessible to promote timing research in the community.

关键词： Computer-aided analysis parallel programming

来源：评论

学校读者我要写书评

暂无评论

Fancier: A Unified Framework for Java, C, and OpenCL Integration

引用

IEEE ACCESS 2021年 9卷 164570-164588页

作者： Afonso, Sergio Almeida, Francisco Univ La Laguna Dept Comp Engn & Syst San Cristobal De La Lagu 38200 Spain

Graphics Processing Units (GPUs) have evolved from very specialized designs geared towards computer graphics to accommodate general-purpose highly-parallel workloads. Harnessing the performance that these accelerators provide requires the use of specialized native programming interfaces, such as CUDA or OpenCL, or higher-level programming models like OpenMP or OpenACC. However, on managed programming languages, offloading execution into GPUs is much harder and error-prone, mainly due to the need to call through a native API (Application programming Interface), and because of mismatches between value and reference semantics. The Fancier framework provides a unified interface to Java, C/C++, and OpenCL C compute kernels, together with facilities to smooth the transitions between these programming languages. This combination of features makes GPU acceleration on Java much more approachable. In addition, Fancier Java code can be directly translated into equivalent C/C++ or OpenCL C code easily, which simplifies the implementation of higher-level abstractions targeting GPU or parallel execution on Java. Furthermore, it reduces the programming effort without adding significant overhead on top of the necessary OpenCL and Java Native Interface (JNI) API calls. We validate our approach on several image processing workloads running on different Android devices.

关键词： Java Codes programming Standards Runtime Libraries parallel programming Application programming interfaces hardware acceleration heterogeneous systems image processing mobile computing parallel programming performance analysis

来源：评论

学校读者我要写书评

暂无评论

Performance Implications of Thread Count on OS Level Factors in Multithreaded Applications 6

Performance Implications of Thread Count on OS Level Factors...

引用

6th International Conference on Computing, Communication, Control and Automation, ICCUBEA 2022

作者： Malave, Sachin Shinde, Subhash Lokmanya Tilak College of Engineering Computer Department New Mumbai India

ISBN: (纸本)9781665484510

In high-performance computing, picking the right number of threads to gain a good speedup is important, as many OS-level parameters are influenced by even slight adjustments in thread count. These parameters are required by the operating system for process management and should not be ignored. They also contribute overhead to the running program, which can mount up quickly if not properly managed. Using too many threads in the system raises overheads, but using too few threads in the system significantly reduces performance. In this paper, the impact of page faults, CPU migrations, CPU utilisation, and context switching on execution time is investigated. The proposed work is simulated on a dual-socket Intel Xeon E5-2603 v3 using the well-known benchmark PARSEC 3.0. After studying performance parameters, simulation results reveal that running multithreaded programs with a correct number of threads can result in greater speedup and save overall system time. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Transforming Sparse Matrix Computations

Transforming Sparse Matrix Computations

引用

作者： Cheshmi, Kazem University of Toronto (Canada)

学位级别：Ph.D., Doctor of Philosophy

Sparse matrix computations are at the heart of many scientific applications and data analytics codes. The performance and memory usage of these codes depend heavily on their use of specialized sparse matrix data structures that only store the nonzero entries. However, such compaction is done using index arrays that result in indirect array accesses such as A[B[i]] where A and B are both arrays. Numerical libraries can provide high-performance code for an individual sparse kernel however they must be manually tuned and optimized for different inputs and architectures. Alternatively, compilers are used to optimize codes that provide architecture portability. Due to these indirect array accesses, memory access information is unknown at compile-time, and thus it is challenging to vectorize a sparse matrix method or run it in parallel cores. To automate the generation of code for efficient execution of sparse code, several compile-time and runtime techniques are required. Existing techniques are either not efficient or need manual effort to extend to different sparse matrix computations. Consequently, in this dissertation, I address the problem of automating the optimization of sparse matrix code on parallel processors with a specific focus on sparse linear solvers and numerical optimizations. This dissertation presents a set of code transformations and algorithms, all implemented in a novel code generator called Sympiler, that automates the optimization of sparse matrix codes on parallel processors. Sympiler takes a sparse method, arising from a sparse linear system or sparse numerical optimization, and decouples information related to the computation pattern of the method, i.e., symbolic information, and uses this information to transform the code to vectorizable and parallel code. Sympiler also enables the reuse of symbolic information when the computation pattern remains static for a period of time in the simulations or for when it changes modestly. Evaluation result

关键词： Compilers Linear algebra Linear system solvers Numerical optimization parallel programming Sparse matrix

来源：评论

学校读者我要写书评

暂无评论

A Message Passing Interface Library for High-Level Synthesis on Multi-FPGA Systems 15

A Message Passing Interface Library for High-Level Synthesis...

引用

15th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2022

作者： Hironaka, Kazuei Iizuka, Kensuke Amano, Hideharu Keio University Dept. of Information and Computer Science Yokohama Japan

ISBN: (纸本)9781665464994

One obstacle to application development on multi-FPGA systems with high-level synthesis (HLS) is a lack of support for a programming interface. Implementing and debugging an application on multiple FPGA boards is difficult without a standard interface. Message Passing Interface (MPI) is a standard parallel programming interface commonly used in distributed memory systems. This paper presents a tool-independent MPI library called FiC-MPI that can be used in HLS for multi-FPGA systems in which each FPGA node is connected directly. By using FiC-MPI, various parallel software, including a general-purpose benchmark, can be easily implemented. FiC-MPI was implemented and evaluated on the M-KUBOS cluster consisting of Zynq MPSoC boards connected with a static time-division multiplexing network. By using the FiC-MPI simulator, parallel programs can be debugged before implementing on real machines. As a case study, the Himeno-BMT benchmark was implemented with FiC-MPI. It achieved 178.7 MFLOPS with a single node and scaled to 643.7 MFLOPS with four nodes, and 896.9 MFLOPS with six nodes of the M-KUBOS cluster. Through the implementation, the easiness of developing parallel programs with FiC-MPI on multi-FPGA systems was demonstrated. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：