检索结果-内蒙古大学图书馆

14th IEEE International Conference on Computational Intelligence and Communication Networks, CICN 2022

作者： Alzaher, Reem Hantom, Wafa Aldweesh, Alanoud Allah, Nasro Min College of Computer Science and Information Technology Imam Abdulrahman Bin Faisal University Dammam Saudi Arabia

ISBN: (纸本)9781665487719

The RSA algorithm is an asymmetric encryption algorithm used to ensure the confidentiality and integrity of data as it travels across networks. Security has grown in importance over time, resulting into more data requiring encryption. parallelization represents an ideal solution to speed up the encryption and decryption processes. An advance implementation of RSA using parallelization concept leads to improve security and performance. In this paper, we represent a parallelized version of Multi-Keys RSA algorithm implemented using OpenMP library. Furthermore, we provide parallel implementation of Multi-Keys RSA under both static and dynamic scheduling with different chunk sizes, and our experimental results show that static scheduling is more optimum for RSA cryptography as compared to dynamic. As a final result, we have achieved an average speed up of 4.4 and efficiency of 0.7. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

The GR1 Algorithm for Subgraph Isomorphism. A Study from parallelism to Quantum Computing 1

引用

2nd International Conference on Computer Vision, High-Performance Computing, Smart Devices, and Networks, CHSN 2021

作者： Radu-Iulian, Gheorghica Faculty of Mathematics and Computer Science Babeş-Bolyai University Mihail Kogălniceanu Street nr. 1 Cluj County Cluj-Napoca City400084 Romania

ISBN: (数字)9789811698859

ISBN: (纸本)9789811698842

In this paper was described the GR1 algorithm that provides feasible execution times for the subgraph isomorphism problem. It is a parallel algorithm that uses a variant of the producer–consumer pattern. It was designed to easily accept and interchange different pruning techniques. The results obtained are occurrences of different query graphs in a RI human protein-to-protein interaction data graph (Ferro et al., [18], Szklarczyk et al., Nucleic Acids Res 39, 2011 [20]). This is the graph in which the algorithm will execute the search. The execution times are feasible for increasingly larger query graphs (from three up to twenty nodes) and with the included quantum computing approach were obtained superior results. The work consists of implementing and testing the algorithm in an original way starting from a simple multiprocessing example (Python multiprocessing producer consumer pattern, [28]) and then writing and adapting it for use with multiple consumer processes, undirected graphs and motif finding. There are also two tables containing the average execution times. The two tables represent two series of test cases. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

OpenTimer v2: A New parallel Incremental Timing Analysis Engine

引用

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2021年第4期40卷 776-789页

作者： Huang, Tsung-Wei Guo, Guannan Lin, Chun-Xun Wong, Martin D. F. Univ Utah Dept Elect & Comp Engn Salt Lake City UT 84112 USA Univ Illinois Elect & Comp Engn Dept Urbana IL 61801 USA

Since the first release in 2015, OpenTimer v1 has been used in many industrial and academic projects for analyzing the timing of custom designs. After four-year research and developments, we have announced OpenTimer v2-a major release that efficiently supports: 1) a new task-based parallel incremental timing analysis engine to break through the performance bottleneck of existing loop-based methods;2) a new application programming interface (API) concept to exploit high degrees of parallelisms;and 3) an enhanced support for industry-standard design formats to improve user experience. Compared with OpenTimer v1, we rearchitect v2 with a modern C++ programming language and advanced parallel computing techniques to largely improve the tool performance and usability. For a particular example, OpenTimer v2 achieved up to 5.33x speedup over v1 in incremental timing, and scaled higher with increasing cores. Our contributions include both technical innovations and engineering knowledge that are open and accessible to promote timing research in the community.

关键词： Computer-aided analysis parallel programming

来源：评论

学校读者我要写书评

暂无评论

Performance Implications of Thread Count on OS Level Factors in Multithreaded Applications 6

Performance Implications of Thread Count on OS Level Factors...

引用

6th International Conference on Computing, Communication, Control and Automation, ICCUBEA 2022

作者： Malave, Sachin Shinde, Subhash Lokmanya Tilak College of Engineering Computer Department New Mumbai India

ISBN: (纸本)9781665484510

In high-performance computing, picking the right number of threads to gain a good speedup is important, as many OS-level parameters are influenced by even slight adjustments in thread count. These parameters are required by the operating system for process management and should not be ignored. They also contribute overhead to the running program, which can mount up quickly if not properly managed. Using too many threads in the system raises overheads, but using too few threads in the system significantly reduces performance. In this paper, the impact of page faults, CPU migrations, CPU utilisation, and context switching on execution time is investigated. The proposed work is simulated on a dual-socket Intel Xeon E5-2603 v3 using the well-known benchmark PARSEC 3.0. After studying performance parameters, simulation results reveal that running multithreaded programs with a correct number of threads can result in greater speedup and save overall system time. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Transforming Sparse Matrix Computations

Transforming Sparse Matrix Computations

引用

作者： Cheshmi, Kazem University of Toronto (Canada)

学位级别：Ph.D., Doctor of Philosophy

Sparse matrix computations are at the heart of many scientific applications and data analytics codes. The performance and memory usage of these codes depend heavily on their use of specialized sparse matrix data structures that only store the nonzero entries. However, such compaction is done using index arrays that result in indirect array accesses such as A[B[i]] where A and B are both arrays. Numerical libraries can provide high-performance code for an individual sparse kernel however they must be manually tuned and optimized for different inputs and architectures. Alternatively, compilers are used to optimize codes that provide architecture portability. Due to these indirect array accesses, memory access information is unknown at compile-time, and thus it is challenging to vectorize a sparse matrix method or run it in parallel cores. To automate the generation of code for efficient execution of sparse code, several compile-time and runtime techniques are required. Existing techniques are either not efficient or need manual effort to extend to different sparse matrix computations. Consequently, in this dissertation, I address the problem of automating the optimization of sparse matrix code on parallel processors with a specific focus on sparse linear solvers and numerical optimizations. This dissertation presents a set of code transformations and algorithms, all implemented in a novel code generator called Sympiler, that automates the optimization of sparse matrix codes on parallel processors. Sympiler takes a sparse method, arising from a sparse linear system or sparse numerical optimization, and decouples information related to the computation pattern of the method, i.e., symbolic information, and uses this information to transform the code to vectorizable and parallel code. Sympiler also enables the reuse of symbolic information when the computation pattern remains static for a period of time in the simulations or for when it changes modestly. Evaluation result

关键词： Compilers Linear algebra Linear system solvers Numerical optimization parallel programming Sparse matrix

来源：评论

学校读者我要写书评

暂无评论

A Message Passing Interface Library for High-Level Synthesis on Multi-FPGA Systems 15

A Message Passing Interface Library for High-Level Synthesis...

引用

15th IEEE International Symposium on Embedded Multicore/Many-Core Systems-on-Chip, MCSoC 2022

作者： Hironaka, Kazuei Iizuka, Kensuke Amano, Hideharu Keio University Dept. of Information and Computer Science Yokohama Japan

ISBN: (纸本)9781665464994

One obstacle to application development on multi-FPGA systems with high-level synthesis (HLS) is a lack of support for a programming interface. Implementing and debugging an application on multiple FPGA boards is difficult without a standard interface. Message Passing Interface (MPI) is a standard parallel programming interface commonly used in distributed memory systems. This paper presents a tool-independent MPI library called FiC-MPI that can be used in HLS for multi-FPGA systems in which each FPGA node is connected directly. By using FiC-MPI, various parallel software, including a general-purpose benchmark, can be easily implemented. FiC-MPI was implemented and evaluated on the M-KUBOS cluster consisting of Zynq MPSoC boards connected with a static time-division multiplexing network. By using the FiC-MPI simulator, parallel programs can be debugged before implementing on real machines. As a case study, the Himeno-BMT benchmark was implemented with FiC-MPI. It achieved 178.7 MFLOPS with a single node and scaled to 643.7 MFLOPS with four nodes, and 896.9 MFLOPS with six nodes of the M-KUBOS cluster. Through the implementation, the easiness of developing parallel programs with FiC-MPI on multi-FPGA systems was demonstrated. © 2022 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Computing of 3-D FEA Including Matrix Preconditioning for Analysis of Rotating Machines Coupled With Circuit Equations

引用

IEEE TRANSACTIONS ON MAGNETICS 2021年第6期57卷 1-4页

作者： Utsunomiya, Ryouma Yamazaki, Katsumi Chiba Inst Technol Dept Elect & Elect Engn Narashino Chiba 2750016 Japan

In this article, we propose a parallel computing method of 3-D finite-element analysis coupled with circuit equations for characteristic calculation of rotating machines. In the proposed method, the preconditioning part in the matrix solver is parallelized as well as the other part, in order to obtain the stable solution within short computational time. The proposed method is applied to the loss calculation of an interior permanent magnet synchronous motor fed by an inverter to clarify the advantages.

关键词： Eddy currents finite-element methods parallel programming permanent magnet motors

来源：评论

学校读者我要写书评

暂无评论

Optimizing the Cray Graph Engine for performant analytics on cluster, SuperDome Flex, Shasta systems and cloud deployment

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2024年第10期36卷 e7982-e7982页

作者： Rickett, Christopher D. Maschhoff, Kristyn J. Sukumar, Sreenivas R. Hewlett Packard Enterprise Spring TX 77389 USA

We present updates to the Cray Graph Engine, a high performance in-memory semantic graph database, which enable performant execution across multiple architectures as well as deployment in a container to support cloud and as-a-service graph analytics. This paper discusses the changes required to port and optimize CGE to target multiple architectures, including Cray Shasta systems, large shared-memory machines such as SuperDome Flex (SDF), and cluster environments such as Apollo systems. The porting effort focused primarily on removing dependences on XPMEM and Cray PGAS and replacing these with a simplified PGAS library based upon POSIX shared memory and one-sided MPI, while preserving the existing Coarray-C++ CGE code base. We also discuss the containerization of CGE using Singularity and the techniques required to enable container performance matching native execution. We present early benchmarking results for running CGE on the SDF, Infiniband clusters and Slingshot interconnect-based Shasta systems.

关键词： Cray Graph Engine graph analytics parallel programming pattern mining pattern search PGAS semantics

来源：评论

学校读者我要写书评

暂无评论

Towards Automatic Block Size Tuning for Image Processing Algorithms on CUDA

Towards Automatic Block Size Tuning for Image Processing Alg...

引用

17th International Conference on Software Technologies (ICSOFT)

作者： Guerfi, Imene Kriaa, Lobna Saidane, Leila Azouz Univ Manouba Natl Sch Comp Sci ENSI CRISTAL Lab RAMSIS Pole Manouba Tunisia

ISBN: (纸本)9789897585883

With the growing amount of data, computational power has became highly required in all fields. To satisfy these requirements, the use of GPUs seems to be the appropriate solution. But one of their major setbacks is their varying architectures making writing efficient parallel code very challenging, due to the necessity to master the GPU's low-level design. CUDA offers more flexibility for the programmer to exploit the GPU's power with ease. However, tuning the launch parameters of its kernels such as block size remains a daunting task. This parameter requires a deep understanding of the architecture and the execution model to be well-tuned. Particularly, in the Viola-Jones algorithm, the block size is an important factor that improves the execution time, but this optimization aspect is not well explored. This paper aims to offer the first steps toward automatically tuning the block size for any input without having a deep knowledge of the hardware architecture, which ensures the automatic portability of the performance over different GPUs architectures. The main idea is to define techniques on how to get the optimum block size to achieve the best performance. We pointed out the impact of using static block size for all input sizes on the overall performance. In light of the findings, we presented two dynamic approaches to select the best block size suitable to the input size. The first one is based on an empirical search;this approach provides the optimal performance;however, it is tough for the programmer, and its deployment is time-consuming. In order to overcome this issue, we proposed a second approach, which is a model that automatically selects a block size. Experimental results show that this model can improve the execution time by up to 2.5x over the static approach.

关键词： GPU Computing parallel programming Program Optimization Auto-tuning and Face Detection

来源：评论

学校读者我要写书评

暂无评论

SwarmL: A Language for programming Fully Distributed Intelligent Building Systems

引用

BUILDINGS 2023年第2期13卷 499页

作者： Chen, Wenjie Yang, Qiliang Jiang, Ziyan Xing, Jianchun Zhao, Shuo Zhou, Qizhen Han, Deshuai Feng, Bowei Army Engn Univ PLA Coll Def Engn Nanjing 211101 Peoples R China Tsinghua Univ Bldg Energy Res Ctr Beijing 100084 Peoples R China China Xian Satellite Control Ctr Xian 710043 Peoples R China Rocket Force Univ Engn Coll Combat Support Xian 710025 Peoples R China

Fully distributed intelligent building systems can be used to effectively reduce the complexity of building automation systems and improve the efficiency of the operation and maintenance management because of its self-organization, flexibility, and robustness. However, the parallel computing mode, dynamic network topology, and complex node interaction logic make application development complex, time-consuming, and challenging. To address the development difficulties of fully distributed intelligent building system applications, this paper proposes a user-friendly programming language called SwarmL. Concretely, SwarmL (1) establishes a language model, an overall framework, and an abstract syntax that intuitively describes the static physical objects and dynamic execution mechanisms of a fully distributed intelligent building system, (2) proposes a physical field-oriented variable that adapts the programming model to the distributed architectures by employing a serial programming style in accordance with human thinking to program parallel applications of fully distributed intelligent building systems for reducing programming difficulty, (3) designs a computational scope-based communication mechanism that separates the computational logic from the node interaction logic, thus adapting to dynamically changing network topologies and supporting the generalized development of the fully distributed intelligent building system applications, and (4) implements an integrated development tool that supports program editing and object code generation. To validate SwarmL, an example application of a real scenario and a subject-based experiment are explored. The results demonstrate that SwarmL can effectively reduce the programming difficulty and improve the development efficiency of fully distributed intelligent building system applications. SwarmL enables building users to quickly understand and master the development methods of application tasks in fully distributed intelligent

关键词： swarm intelligence fully distributed intelligent building system parallel programming domain-specific language

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：