检索结果-内蒙古大学图书馆

28th International Conference on Software Engineering and Knowledge Engineering, SEKE 2016

作者： Song, Chenxi Wang, Tao Yin, Gang Zhang, Xunhui Yang, Cheng National Laboratory for Parallel and Distributed Processing College of Computer National University of Defense Technology Changsha China

ISBN: (纸本)189170639X

With the rapid development of open source software, various elements such as OSS, developers, users and online posts, across different communities and their interactions constitute a novel software ecosystem. Most of the current researches about software ecosystems care the connections between software, and few of them consider the relationship across communities from an overall perspective, and fail to cover the users and their activities which should be an indispensable part of the OSS ecosystem. This paper model the OSS ecosystem as a graph, which combines different types of OSS communities as a whole. Based on this graph model, we analyze the characteristics of ecosystem, like the evolution, competition and symbiosis. In addition, we build a recommendation system as well, and the experiment results suggest the validation of our approach. Copyright © 2016 by KSI Research Inc. and Knowledge Systems Institute Graduate School.

关键词： Ecosystems

来源：评论

学校读者我要写书评

暂无评论

A case study of SWIM: Optimization of memory intensive application on GPGPU

A case study of SWIM: Optimization of memory intensive appli...

引用

International Symposium on parallel Architectures, Algorithms, and Programming

作者： Yi, Wei Tang, Yuhua Wang, Guibin Fang, Xudong National Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha China

ISBN: (纸本)9780769543123

Recently, GPGPU has been adopted well in the High Performance Computing (HPC) field. The limited global memory bandwidth poses a great challenge to many GPGPU programmers trying to exploit parallelism within the CPUGPU heterogeneous platform. In this paper, we choose SWIM, a typical memory intensive application from the SPEC OMP 2001 benchmark suite, for case study. We attempt to optimize the performance and energy consumption of the application utilizing different memory access mechanisms and present optimization methods including matrix transposition and kernel fusion. The experimental results on the Intel CoreTM i920 CPU plus GeForce GTX 295 platform shows that, the proposed optimizing methods achieve a speedup of 8.7X over the original OpenMP program and reduce the energy consumption by 83% for the problem size of 2048*2048. © 2010 IEEE.

关键词： Energy utilization

来源：评论

学校读者我要写书评

暂无评论

Property-oriented testing of real-time systems

Property-oriented testing of real-time systems

引用

Proceedings - 11th Asia-Pacific Software Engineering Conference, APSEC 2004

作者： Li, Shuhao Wang, Ji Dong, Wei Qi, Zhi-Chang National Laboratory for Parallel and Distributed Processing Changsha China State Key Laboratory for Software Engineering Wuhan University China

ISBN: (纸本)0769522459

Although Statecharts has gained widespread use as a formalism for modeling reactive real-time systems, testing these systems still confronts some difficulties, of which a major one is the existence of numerous and complex system behaviors. It is extremely difficult to conduct comprehensive and in-depth testing of such real-time systems. This paper presents an approach to property-oriented real-time testing. Necessary real-time extensions are proposed such that the time-enriched Statecharts can describe non-trivial timing constraints. The properties to be tested are characterized by a restricted real-time logic. Then the targeted test sequences are derived from the real-time models according to the user-specified properties. Using this approach, testing efforts can be focused on particular properties of the real-time systems and usually only a small portion of the total behaviors needs to be tested. © 2004 IEEE.

关键词： Real time systems

来源：评论

学校读者我要写书评

暂无评论

Improving performance portability for GPU-specific Open CL kernels on multi-core/many-core CPUs by analysis-based transformations

引用

Frontiers of Information Technology & Electronic Engineering 2015年第11期16卷 899-916页

作者： Mei WEN Da-fei HUANG Chang-qing XUN Dong CHEN School of Computer National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing

OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL＇s local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by （1） removing all the unwanted local-memory arrays together with the obsolete barrier statements and （2） optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel＇s many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.

关键词： OpenCL Performance portability Multi-core/many-core CPU Analysis-based transformation

来源：评论

学校读者我要写书评

暂无评论

Accelerated Selective Algebraic Multigrid Method for Fully-Coupled Incompressible Flow Solver

引用

International Journal of Computational Fluid Dynamics 2025年

作者： Liang, Yuechao Guo, Xiao-Wei Zhang, Qingyang Li, Chao Liu, Jie Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha China Laboratory of Digitizing Software for Frontier Equipment National University of Defense Technology Changsha China College of Computer Science and Technology National University of Defense Technology Changsha China

The fully coupled pressure-based algorithm is widely recognised for its superior convergence and robustness in solving incompressible flow problems. However, the increased scale of equations and the difficulty in solving linear systems have limited the widespread use of this algorithm in large-scale simulations. This paper presents an optimised block selective algebraic multigrid method significantly reducing computational complexity. Our approach employs a parallel modified independent set algorithm, allowing each process to perform matrix coarsening individually. Furthermore, an aggressive coarsening strategy is introduced to reduce complexity and enable the solution of larger-scale problems. Numerical experiments demonstrate that the solution time is shortened by 14% to 49% compared to the latest existing methods and outperforms the segregated algorithm. By addressing the computational challenges associated with the selective algebraic multigrid solver, this work unleashes the superior convergence properties of the fully coupled method. © 2025 Informa UK Limited, trading as Taylor & Francis Group.

关键词： Convergence of numerical methods

来源：评论

学校读者我要写书评

暂无评论

Large-scale parallel exact diagonalization algorithm of the Hubbard model on Tianhe-2 supercomputer 22

Large-scale parallel exact diagonalization algorithm of the ...

引用

6th International Conference on High Performance Compilation, Computing and Communications, HP3C 2022

作者： Li, Biao Liu, Jie Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Hunan Changsha China

ISBN: (纸本)9781450396295

We propose a parallel exact diagonalization method for solving the large-scale Hubbard model. The core of this algorithm is the parallelization of the Lanczos algorithm, for which we propose a hierarchical communication model and a fast strategy for finding nonzero elements of large-scale matrix, starting only from the symmetry of Hamiltonian matrix. The effect of our parallel algorithm was tested on the Tianhe-2 supercomputer, where the strong scaling efficiency could reach 53% for 30,000 cores in a 140-billion dimensional matrix, and the weak scaling efficiency remained above 40% for 60,000 cores in a 730-billion dimensional matrix. © 2022 ACM.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Survey of DHT topology construction techniques in virtual computing environments

引用

Science China(Information Sciences) 2011年第11期54卷 2221-2235页

作者： ZHANG YiMing 1,2,LU XiCheng 1,2 & LI DongSheng 1,21 National laboratory for parallel and distributed processing (PDL),Changsha 410073,China 2 School of Computer,National University of Defense Technology,Changsha 410073,China 1. National Laboratory for Parallel and Distributed Processing (PDL) Changsha 410073 China2. School of Computer National University of Defense Technology Changsha 410073 China

The Internet-based virtual computing environment (iVCE) is a novel network computing *** characteristics of growth,autonomy,and diversity of Internet resources present great challenges to resource sharing in *** DHT overlay (DHT for short) technique has various advantages such as high scalability,low latency,and desirable availability,and is thus an important approach to realizing efficient resource *** construction is a key technique for structured overlays that realizes basic overlay functions including dynamic maintenance and message *** this paper,we first introduce the traditional techniques of DHT topology construction,focusing mainly on dynamic maintenance and message routing of typical DHTs,DHT indexing techniques for complex queries,and DHT grouping techniques for matching domain *** then present recent advances in DHT topology construction techniques in iVCE taking advantage of the characteristics of Internet ***,we discuss the future of DHT topology construction techniques.

关键词： virtual computing environments DHT overlays topology construction distributed indexing flexible routing

来源：评论

学校读者我要写书评

暂无评论

Optimizing adaptive synchronization in parallel simulators for large-scale parallel systems and applications

Optimizing adaptive synchronization in parallel simulators f...

引用

10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010

作者： Xu, Chuanfu Che, Yonggang Fang, Jianbin Wang, Zhenghua National Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha China

ISBN: (纸本)9780769541082

This paper addresses the optimization of parallel simulators for large-scale parallel systems and applications. Such simulators are often based on parallel discrete event simulation with conservative or optimistic protocols to synchronize the simulating processes. The paper considers how available future information about events and application behaviors can be efficiently extracted and further exploited to improve the performance of adaptive optimistic protocols. First, we extract information about future events and their dependencies in application traces to guide adaptive adjustments of time window in trace-driven parallel simulation. Second, we use information about application behaviors, specifically the iterative behavior found in many applications, to avoid the unnecessary adjustments of time window. These techniques are implemented in the BigSim simulator and tested by real-world and standard benchmark applications including Jacobi3D and HPL. The results show that our optimization approaches can reduce the execution times of simulation ranging from 11% up to 32%. Moreover, our methods are easy to implement and don't need to augment compilers or even modify the core codes of parallel simulators. © 2010 IEEE.

关键词： Simulators

来源：评论

学校读者我要写书评

暂无评论

Automatic generation of run-time test oracles for distributed real-time systems

引用

24th IFIP WG 6.1 International Conference on Formal Techniques for Networked and distributed Systems, FORTE 2004

作者： Wang, Xin Wang, Ji Qi, Zhi-Chang National Laboratory for Parallel and Distributed Processing 300 Lichen Rd Changsha410073 China

ISBN: (纸本)3540232524

distributed real-time systems are of one important type of real-time systems. They are usually characterized by both reactive and real-time factors and it has long been recognized that how to automatically check such systems’ correctness at run time is still an unaddressed problem. As one of the main solutions, test oracle is a method usually used to check whether the system under test has behaved correctly on a particular execution. Test oracle is not only the indispensable stage of software testing, but also the weak link of the software testing research. In this paper, real-time specifications are adopted to describe the properties of distributed real-time systems and a real-time specification-based method for automatic run-time test oracles generating is proposed. The method proposed here is based on tableau construction theory of real-time model checking, automatically generates timed automata as test oracles, which can automatically check system behaviors’ correctness from real-time specifications written in MITL[0,d]. © IFIP International Federation for Information processing 2004.

关键词： Software testing

来源：评论

学校读者我要写书评

暂无评论

A prediction-based parallel replication algorithm in distributed storage system

引用

4th International Conference on Grid and Cooperative Computing - GCC 2005

作者： Wang, Yijie Zhang, Xiaoming National Laboratory for Parallel and Distributed Processing Institute of Computer National University of Defense Technology Changsha 410073 China

ISBN: (纸本)3540305106

Data replication can be used to reduce bandwidth consumption and access latency in the distributed system where users require remote access to large data objects. In this paper, according to the intrinsic characteristic of distributed storage system, the prediction-based parallel replication algorithm PPR is proposed. In the PPR, according to the characteristic of spatial data, the data that will be accessed is predicted, then the data is prefetched;during replication, according to the network state, several replicas of a data object are selected, which are of the least access cost;the different parts of the data object are transferred from these replicas, and they are used to make a new replica. The results of performance evaluation show that the PPR can utilize the network bandwidth efficiently, provide high data replication efficiency and substantially better access efficiency, and can avoid the interference between different replications efficiently. © Springer-Verlag Berlin Heidelberg 2005.

关键词： distributed computer systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：