检索结果-内蒙古大学图书馆

International Conference on Intelligent Human-Machine Systems and Cybernetics

作者： Yao, Lu Cao, Wei Li, Zongzhe Wang, Yongxian Wang, Zhenghua National Key Lab. for Parallel and Distributed Processing National Univ. of Defense Technology Changsha China

ISBN: (纸本)9780769541518

The independent set ordering algorithm is a heuristic algorithm based on finding maximal independent sets of vertices in the matrix adjacency graph, which is commonly used for parallel matrix factorization. However, Disadvantages appear when it is applied to large-scale sparse linear systems. In this paper, we propose an improved algorithm by finding an optimal size of independent set in each elimination step rather than find a maximal independent set, which is proved to be effective by both theoretical analysis and parallel implementation. © 2010 IEEE.

关键词： Lower-upper decomposition

来源：评论

学校读者我要写书评

暂无评论

Reduction transformations for optimization parameter selection

Reduction transformations for optimization parameter selecti...

引用

8th International Conference on High-Performance Computing in Asia-Pacific Region, HPC Asia 2005

作者： Yonggang, Che Zhenghua, Wang Xiaomei, Li National Lab. for Parallel and Distributed Processing Changsha 410073 China

ISBN: (纸本)0769524869

Program performance optimization often involves choosing right parameters to minimize the program's runtime. Selecting optimization parameters by means of execution-driven search is guaranteed to find excellent results, for it accurately accounts for all performance components of the target platform. But the major drawback of execution-driven approach is the excessive compilation time due to thousands of runs of the original program. In this article, we propose a novel technique called program reduction transformations to reduce the cost of execution-driven optimization parameter selection. It is based on our observation to the characteristics of the scientific applications and the optimization parameter selection task. The ideal is to transform the program before it is used in execution-driven parameter selection procedure. The transformed program runs in much shorter time but preserves the parameter selection quality. This technique greatly reduces the time spent on evaluating each candidate parameter and makes execution-driven optimization parameter selection affordable. We formulate the theoretic foundation of program reduction transformation. And we find several situations where reduction transformations can be legally applied. These situations are common in scientific applications. Experiments done for two math kernels and three SPEC benchmarks show that our approach is both feasible and effective. © 2005 IEEE.

关键词： Parameter estimation

来源：评论

学校读者我要写书评

暂无评论

Two improved GPU acceleration strategies for force-directed graph layout

Two improved GPU acceleration strategies for force-directed ...

引用

International Conference on Computer Application and System Modeling

作者： Wang, Yong-Xian Li, Zong-Zhe Yao, Lu Cao, Wei Wang, Zheng-Hua National Key Laboratory for Parallel and Distributed Processing National University of Defense Technology Changsha China

ISBN: (纸本)9781424472369

Force directed approach is one of the most widely used methods in graph drawing research. However, the running time is increased intolerablely along with the enlargement of the graph size, which restricts the algorithm's practicability. By the aid of GPU (graphics processing unit) computing platform, we can speed-up the graph layout with low cost, but the existing GPU implementation mainly employees an "one-by-one" style to update the vertex' coordination per iteration, which has a lower convergent rate than the "batch" style which is instead used commonly in traditional CPU implementation. As a result, the aesthetics of graph layout would be decreased if the total running time is restricted. It is hard to achieve both a high speedup factor of GPU over CPU and a high convergent rate in existing GPU computing implementation. In order to solve this problem partially, this paper presents two new strategies to implement the large-scale graph layout on CPU+GPU heteromerous platform to accelerate the force directed layout for graph drawing problem. The numerical computation results show that our GPU implementation can dramatically improve the performance of force-direct layout and is 20 times on a NVIDIA GeForce 9800 GT GPU at 1.44 GHz faster than the one on single-CPU core of Intel Pentium 4 PC at 3.0 GHz for the graph layout with moderate size (typically 1000 vertices). © 2010 IEEE.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

A Compact Model for Multi-Island Single Electron Transistors

A Compact Model for Multi-Island Single Electron Transistors

引用

3rd IEEE International NanoElectronics Conference (INEC)/Symposium on Nanoscience and Nanotechnology in China

作者： Chi, Yaqing Zhong, Haiqin Zhang, Chao Fang, Liang National Key Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Hunan 410073 China

ISBN: (纸本)9781424435449

Multi-island single electron transistor is an important kind of the single electron transistor, which is convenient to realize the controllable room temperature operation. A novel semi-empirical compact model for the Multi-island single electron transistor is proposed. The new approach combines the orthodox theory of the single electron tunneling through single coulomb island and a novel empirical analysis procedure for the chain of multi coulomb islands to solve the current of the whole multi-island single electron transistor. The tunneling rates are calculated based on the orthodox theory for the single electron tunneling. The tunneling currents representing the first splitted peaks in the coulomb oscillation curves are calculated according to the assumption that the currents through all the coulomb islands are equal to each other at the stable states, while the currents representing the other splitted peaks are constructed and merged together according to the empirical analysis. The model is verified by the traditional SET simulator SIMON and shows much faster calculation speed than SIMON. Therefore, the novel compact model is suitable for the large scale MISET circuit simulation.

关键词： Single Electron Transistor Coulomb Island Coulomb Oscillation Splitted Peak

来源：评论

学校读者我要写书评

暂无评论

Prediction of the Cyanobacteria Coverage in Time-series Images based on Convolutional Neural Network 21

Prediction of the Cyanobacteria Coverage in Time-series Imag...

引用

4th International Conference on Control and Computer Vision, ICCCV 2021

作者： Ye, Xiangyu Lai, Zhiquan Li, Dongsheng National Key Laboratory of Parallel and Distributed Processing Computer College National University of Defense Technology China

ISBN: (纸本)9781450390477

In recent years, the problem of lake eutrophication has become increasingly severe. The monitoring and control of cyanobacteria in lakes are of great significance. The information obtained by existing monitoring methods is relatively lagging, and it is impossible to monitor the sudden outbreak of cyanobacteria in time. Getting cyanobacteria information directly through camera images is a breakthrough. In this paper, after analyzing the characteristics of time series cyanobacteria images, we propose a block prediction scheme based on the CNN model. Experiments show that this method can quickly calculate the coverage of cyanobacteria in the monitoring image in a short time. It can also effectively distinguish cyanobacteria-rich water areas, which significantly facilitates water quality monitoring and cyanobacteria management. We can draw a chart of the changes in the coverage of cyanobacteria by analyzing multi-day time-series images. The chart helps us conduct a short-term water quality analysis to better deal with the outbreak of cyanobacteria. © 2021 ACM.

关键词： Lakes

来源：评论

学校读者我要写书评

暂无评论

Detailed and clock-driven simulation for HPC interconnection network

引用

Frontiers of Computer Science 2016年第5期10卷 797-811页

作者： Wenhao ZHOU Juan CHEN Chen CUI Qian WANG Dezun DONG Yuhua TANG State Key Laboratory of High Performance Computing School of Computer National University of Defense Technology Changsha 410073 China Science and Technology on Parallel and Distributed Processing Laboratory National University of Defense Technology Changsha 410073 China

Performance and energy consumption of high performance computing （HPC） interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation plat- form is very important for the research on HPC software and hardware technologies. To effectively evaluate the per- formance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation plat- form, called HPC-NetSim. HPC-NetSim uses application- driven workloads and inherits the characteristics of the de- tailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router＇s on/off states. We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.

关键词： high performance computing clock-driven sim-ulation interconnection network BookSim

来源：评论

学校读者我要写书评

暂无评论

HyperSpring: Accurate and stable latency estimation in the hyperbolic space

HyperSpring: Accurate and stable latency estimation in the h...

引用

15th International Conference on parallel and distributed Systems, ICPADS '09

作者： Fu, Yongquan Wang, Yijie National Key Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology China

ISBN: (纸本)9780769539003

Predicting network latencies between Internet hosts can efficiently support large-scale Internet applications, e.g., file sharing service and the overlay construction. Several study use the Hyperbolic space to model the Internet densecore and many-tendril structure. However, existing Hyperbolic space based embedding approaches are not designed for accurate latency estimation in the distributed context. We present HyperSpring, which estimates latency by modelling a mass spring system in the Hyperbolic similar with Vivaldi. HyperSpring adopts coordinate initialization to speed up the convergence of coordinate computation, uses multiple-round symmetric updates to escape from bad local minima, and stabilizes coordinates by compensating RTT measurements to reduce the coordinate drifts. Evaluation results based on a network trace of 226 Planetlab nodes indicate that, compared to Euclidean-space based Vivaldi, HyperSpring provides performance improvements for most nodes, and incurs slightly higher distortions for a small number of nodes. © 2009 IEEE.

关键词： Hyperbolic space Latency estimation Mass spring field

来源：评论

学校读者我要写书评

暂无评论

iRank: Supporting proximity ranking for peer-to-peer applications

iRank: Supporting proximity ranking for peer-to-peer applica...

引用

15th International Conference on parallel and distributed Systems, ICPADS '09

作者： Fu, Yongquan Wang, Yijie National Key Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology China

ISBN: (纸本)9780769539003

Proximity ranking according to end-to-end network distances (e.g., Round-Trip Time, RTT) can reveal detailed proximity information, which is important in network management and performance diagnosis in distributed systems. However, to the best of our knowledge, there has been no similar work on this subject in the P2P computing field. We present a distributed rating method iRank, that enables proximity rankings by providing discrete ratings in a distributed manner. It formulates the proximity ranking as a rating problem that faithfully captures the proximity based on noisy distance measurements scalably and practically. The primary challenge in inferring proximity rankings is enforcing distributed ratings with complex rating policies. Our solution is based on reconstructing ratings by decomposing a centralized rating method Maximum Margin Matrix Factorization (MMMF) into independent sub-problems, that can be efficiently solved in a decentralized manner. By relaxing the dependence on infrastructure nodes that are a single point of failure and limit scalability, iRank can gracefully handle network churns. Through real network latency data sets, we demonstrate that iRank can predict ratings with low distortion, which are smaller than 20 percentage worse than the centralized method, in the context of synthetic complex rating policies. © 2009 IEEE.

关键词： Complex networks

来源：评论

学校读者我要写书评

暂无评论

Improving performance portability for GPU-specific Open CL kernels on multi-core/many-core CPUs by analysis-based transformations

引用

Frontiers of Information Technology & Electronic Engineering 2015年第11期16卷 899-916页

作者： Mei WEN Da-fei HUANG Chang-qing XUN Dong CHEN School of Computer National University of Defense Technology National Key Laboratory of Parallel and Distributed Processing

OpenCL is an open heterogeneous programming framework. Although OpenCL programs are func- tionally portable, they do not provide performance portability, so code transformation often plays an irreplaceable role. When adapting GPU-specific OpenCL kernels to run on multi-core/many-core CPUs, coarsening the thread granularity is necessary and thus has been extensively used. However, locality concerns exposed in GPU-specific OpenCL code are usually inherited without analysis, which may give side-effects on the CPU performance. Typi- cally, the use of OpenCL＇s local memory on multi-core/many-core CPUs may lead to an opposite performance effect, because local-memory arrays no longer match well with the hardware and the associated synchronizations are costly. To solve this dilemma, we actively analyze the memory access patterns using array-access descriptors derived from GPU-specific kernels, which can thus be adapted for CPUs by （1） removing all the unwanted local-memory arrays together with the obsolete barrier statements and （2） optimizing the coalesced kernel code with vectorization and locality re-exploitation. Moreover, we have developed an automated tool chain that makes this transformation of GPU-specific OpenCL kernels into a CPU-friendly form, which is accompanied with a scheduler that forms a new OpenCL runtime. Experiments show that the automated transformation can improve OpenCL kernel performance on a multi-core CPU by an average factor of 3.24. Satisfactory performance improvements axe also achieved on Intel＇s many-integrated-core coprocessor. The resultant performance on both architectures is better than or comparable with the corresponding OpenMP performance.

关键词： OpenCL Performance portability Multi-core/many-core CPU Analysis-based transformation

来源：评论

学校读者我要写书评

暂无评论

Anadem: A hybrid overlay network for content-based data distribution

Anadem: A hybrid overlay network for content-based data dist...

引用

15th International Conference on parallel and distributed Systems, ICPADS '09

作者： Zheng, Zhong Wang, Yi-Jie National Key Laboratory for Parallel and Distributed Processing School of Computer National University of Defense Technology Changsha China

ISBN: (纸本)9780769539003

As an infrastructure for data distribution, overlay networks have to feature efficient routing and adequate robustness to achieve fast and accurate data distribution in the environment with node churn. Considering that the existing overlay networks mostly focus on single optimization objective and fail to ensure routing efficiency and robustness simultaneously, a hybrid overlay network for content-based data distribution - Anadem is proposed in this paper. Anadem achieves a better compromise between routing efficiency and robustness by combining the intercluster multiple structured topologies with the intra-cluster unstructured topologies. Anadem also provides mechanisms for dynamic concurrent cluster creation, cluster departure and load balance to make data distribution more adaptive to the dynamic network environment. Experimental results reveal that compared with existing overlay networks, Anadem can support fast and accurate content-based data distribution even when large amount of nodes fail in the system. © 2009 IEEE.

关键词： Overlay networks

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：