检索结果-内蒙古大学图书馆

15th International Conference on Algorithms and architectures for Parallel Processing, ICA3PP 2015

作者： Chen, Jiahao Deng, Yuhui Huang, Zhan Department of Computer Science Jinan University Guangzhou510632 China State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing100190 China

ISBN: (纸本)9783319271217

Hot data is very important for optimizing modern computer systems. For example, the identified hot data can be employed to extend the lifespan of flash memory. However, it is very challenging to effectively identify hot data with low memory consumption and low runtime overhead. This paper proposes a Hot Data Catcher (HDCat) which can effectively identify hot data in large-scale I/O streams by leveraging enhanced temporal locality. HDCat only maintains a hot data queue and a candidate hot data queue to record the data access pattern by tracking limited data set, thus effectively reducing the memory consumption. Furthermore, HDCat adopts a D-bit counter and a recency-bit to leverage both the frequency and recency contained in the data stream. Additionally, HDCat can significantly reduce the conversion between hot data and cold data. Real traces are used to evaluate the proposed approach. Experimental results demonstrate that HDCat significantly outperforms the state-of-the-art Multi-hash algorithm and the two-level LRU algorithm. © Springer International Publishing Switzerland 2015.

关键词： Hash functions

来源：评论

学校读者我要写书评

暂无评论

Optimizing Parallel Kinetic Monte Carlo Simulation by Communication Aggregation and Scheduling

Optimizing Parallel Kinetic Monte Carlo Simulation by Commun...

引用

2015大数据技术及应用论坛

作者： Baodong Wu Shigang Li Yunquan Zhang State Key Laboratory of Computer System and Architecture Institute of Computing TechnologyChinese Academy of Sciences

Kinetic Monte Carlo(KMC) algorithm has been widely applied for simulation of radiation damage, grain growth and chemical reactions. To simulate at a large temporal and spatial scale, domain decomposition is commonly used to parallelize the KMC algorithm. However, through experimental analysis, we find that the communication overhead is the main bottleneck which affects the overall performance and limits the scalability of parallel KMC algorithm on large-scale clusters. To alleviate the above problems, we present a communication aggrega‐tion approach to reduce the total number of messages and eliminate the commu‐nication redundancy, and further utilize neighborhood collective operations to optimize the communication scheduling. Experimental results show that the opti‐mized KMC algorithm exhibits better performance and scalability compared with the well-known open-source library—SPPARKS. On 32-node Xeon E5-2680 cluster(total 640 cores), the optimized algorithm reduces the total execution time by 16 %, reduces the communication time by 50 % on average, and achieves 24 times speedup over the single node(20 cores) execution.

关键词： Domain decomposition Communication aggregation Communication scheduling Neighborhood collectives

来源：评论

学校读者我要写书评

暂无评论

Achieving high throughput and low delay in mobile data networks by accurately predicting queue lengths 15

Achieving high throughput and low delay in mobile data netwo...

引用

12th ACM International Conference on computing Frontiers, CF 2015

作者： Liu, Ke Lee, Jack Y.B. State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences China Department of Information Engineering Chinese University of Hong Kong Hong Kong Hong Kong

ISBN: (纸本)9781450333580

Knowledge of the queue length for a radio link in a mobile data network has a significant effect on the performance of the communication protocol TCP. If the queue length can be accurately estimated and regulated to a target value, then low end-to-end delay and high bandwidth utilization can be achieved. One method for estimating and regulating the queue length is the queue-length-based congestion control (QCC) algorithm. However, this algorithm estimates the queue length over one RTT interval prior to transmission, and the actual queue length after that time can differ significantly, because the bandwidth can vary substantially between the neighboring propagation delays, which could result in a false positive in the queue length adaption, thereby affecting the QoS performance. To address this problem, we propose PQ-TCP, a method that predicts the queue length directly by predicting the bandwidth variations over the ensuing period of time equal to the propagation delay and using post-bandwidth analysis to minimize the prediction error. Trace-driven simulations are used to show that the QoS performance of PQ-TCP is superior to that of current QCC algorithms. PQ-TCP achieves the lowest RTT while maintaining nearly 90% bandwidth utilization for a small target queue length of 5 packets. © Copyright 2015 ACM.

关键词： Quality of service

来源：评论

学校读者我要写书评

暂无评论

Optimizing CPU cache performance for Pregel-like graph computation

Optimizing CPU cache performance for Pregel-like graph compu...

引用

International Conference on Data Engineering Workshops

作者： Songjie Niu Shimin Chen State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences

ISBN: (纸本)9781479984435

In-memory graph computation systems have been used to support many important applications, such as PageRank on the web graph and social network analysis. In this paper, we study the CPU cache performance of graph computation. We have implemented a graph computation system, called GraphLite, in C/C++ based on the description of Pregel. We analyze the CPU cache behavior of the internal data structures and operations of graph computation. Then we exploit CPU cache prefetching techniques to improve the cache performance. Real machine experimental results show that our solution achieves 1.9-2.2x speedups compared to the baseline implementation.

关键词： Prefetching Arrays Computational modeling Aggregates Web pages Programming

来源：评论

学校读者我要写书评

暂无评论

A co-evolutionary particle swarm optimization with dynamic topology for solving multi-objective optimization problems

Advances in Modelling and Analysis A

引用

Advances in Modelling and Analysis A 2016年第1期53卷 145-159页

作者： Wu, Daqing Tang, Lixiang Li, Haiyan Ouyang, LiJun Computer Science and Technology Institute University of South China HangyangHunan China Antai College of Economics and Management Shanghai Jiao Tong University Shanghai200240 China Zigong643000 China Key Laboratory of Guangxi High Schools for Complex System and Computational Intelligence Guangxi University for Nationalities Nanning530006 China Key Laboratory of Intelligent Computing and Signal Processing Ministry of Education Anhui University HefeiAnhui Province230039 China Department of Business Administration Hunan University of Finance and Economics Hunan410205 China

This paper proposes a multi-objective with dynamic topology particle swarm optimization (PSO) algorithm for solving multi-objective problems, named DTPSO. One of the main drawbacks of classical multi-objective particle swarm optimization algorithm is low diversity. To overcome this disadvantage, DTPSO uses two dynamic local best particles to lead the search particles with multiple populations to deal with multiple objectives, and maintains diversity of new found non-dominated solutions via partitioned the searching space into fixed number of cells. The proposed DTPSO is validated through comparisons with other two multi-objective algorithms using established benchmarks and metrics. Simulation results demonstrated that DTPSO shows competitive, if not better, performance as compared to the other algorithms. © 2016, AMSE Press. All rights reserved.

关键词： Multiobjective optimization

来源：评论

学校读者我要写书评

暂无评论

An Intra-Server Interconnect Fabric for Heterogeneous computing

引用

Journal of computer Science & technology 2014年第6期29卷 976-988页

作者：曹政Zheng Cao 刘小丽李强刘小兵王展安学军 CCF ACM State Key Laboratory of Computer Architecture Institute of Computing TechnologyChinese Academy of Sciences

With the increasing diversity of application needs and computing units, the server with heterogeneous pro- cessors is more and more widespread. However, conventional SMP/ccNUMA server architecture introduces communication bottleneck between heterogeneous processors and only uses heterogeneous processors as coprocessors, which limits the efficiency and flexibility of using heterogeneous processors. To solve this problem, this paper proposes an intra-server inter- connect fabric that supports both intra-server peer-to-peer interconnection and I/O resource sharing among heterogeneous processors. By connecting processors and I/O devices with the proposed fabric, heterogeneous processors can perform direct communication with each other and run in stand-alone mode with shared intra-server resources. We design the proposed fabric by extending the de-facto system I/O bus protocol PCIe （Peripheral computer Interconnect Express） and implement it with a single chip cZodiac. By making full use of PCIe＇s original advantages, the interconnection and the I/O sharing mechanism are light weight and efficient. Evaluations that have been carried out on both the FPGA （Field Programmable Gate Array） prototype and the cycle-accurate simulator demonstrate that our design is feasible and scalable. In addition, our design is suitable for not only the heterogeneous server but also the high density server.

关键词： heterogeneous system interconnection I/O virtualization PCI-express

来源：评论

学校读者我要写书评

暂无评论

A small-footprint accelerator for large-scale neural networks

A small-footprint accelerator for large-scale neural network...

引用

作者： Chen, Tianshi Zhang, Shijin Liu, Shaoli Du, Zidong Luo, Tao Gao, Yuan Liu, Junjie Wang, Dongsheng Wu, Chengyong Sun, Ninghui Chen, Yunji Temam, Olivier State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing100190 China TNLIST Tsinghua University Beijing100084 China Inria Saclay France CAS Center for Excellence in Brain Science China

Machine-learning tasks are becoming pervasive in a broad range of domains, and in a broad range of systems (from embedded systems to data centers). At the same time, a small set of machine-learning algorithms (especially Convolutional and Deep Neural Networks, i.e., CNNs and DNNs) are proving to be state-of-theart across many applications. As architectures evolve toward heterogeneous multicores composed of a mix of cores and accelerators, a machine-learning accelerator can achieve the rare combination of efficiency (due to the small number of target algorithms) and broad application scope. Until now, most machine-learning accelerator designs have been focusing on efficiently implementing the computational part of the algorithms. However, recent state-of-the-art CNNs and DNNs are characterized by their large size. In this study, we design an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance, and energy. We show that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s (key NN operations such as synaptic weight multiplications and neurons outputs additions) in a small footprint of 3.02mm2 and 485mW;compared to a 128-bit 2GHz SIMD processor, the accelerator is 117.87?faster, and it can reduce the total energy by 21.08×. The accelerator characteristics are obtained after layout at 65nm. Such a high throughput in a small footprint can open up the usage of state-of-the-art machine-learning algorithms in a broad set of systems and for a broad set of applications. © 2015 ACM.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

A polynomial algorithm to performance analysis of concurrent systems via petri nets and ordinary differential equations

A polynomial algorithm to performance analysis of concurrent...

引用

作者： Ding, Zuohua Zhou, Yuan Zhou, Meng Chu Laboratory of Intelligent Computing and Software Engineering Zhejiang Sci-Tech University Hangzhou310018 China Key Laboratory of Embedded System and Service Computing Ministry of Education Tongji University Shanghai200092 China Department of Electrical and Computer Engineering New Jersey Institute of Technology NewarkNJ07102-1982 United States

In this paper, a new method is proposed to evaluate the performance of concurrent systems. A concurrent system consisting of multiple processes that communicate via message passing mechanisms is modeled by a Petri net, which is in turn represented by a set of ordinary differential equations (ODEs) of a restricted type. The equations describe the system state changes, and the solutions, also called state measures, can be used for the performance analysis such as estimating response time, throughput and efficiency. This method can avoid a state explosion problem encountered by the conventional methods based on Continuous-Time Markov Chains. Its application to an IBM business system is given as an example. © 2004-2012 IEEE.

关键词： Ordinary differential equations

来源：评论

学校读者我要写书评

暂无评论

OPUF: Obfuscation logic based physical unclonable function

OPUF: Obfuscation logic based physical unclonable function

引用

IEEE Symposium on On-Line Testing (IOLTS)

作者： Jing Ye Yu Hu Xiaowei Li State Key Laboratory of Computer Architecture Institute of Computing Technology CAS Beijing P. R. China

ISBN: (纸本)9781467379069

The Physical Unclonable Function (PUF) has broad application prospects in the field of hardware security. The arbiter PUF is a typical kind of strong PUF. However, due to its deterministic logic, attackers can use modeling techniques to break it in short time. Therefore, this paper proposes an Obfuscation logic based PUF (OPUF) design. A Boolean obfuscation module is proposed to obfuscate the logic which is employed to select the path segments in the arbiter PUF. In this way, the nondeterminacy of PUF is improved, and the computation complexities of modeling attacks are significantly increased, making the OPUF much safer against modeling attack. Both the theoretical analysis and the experimental results show the proposed OPUF design has good stability and randomness.

关键词： Computational modeling Stability analysis Delays Manufacturing Complexity theory SRAM cells Hardware

来源：评论

学校读者我要写书评

暂无评论

MIMS:Towards a Message Interface Based Memory system

引用

Journal of computer Science & technology 2014年第2期29卷 255-272页

作者：陈荔城陈明宇阮元黄永兵崔泽汉卢天越包云岗 State Key Laboratory of Computer Architecture Institute of Computing TechnologyChinese Academy of Sciences University of Chinese Academy of Sciences

The decades-old synchronous memory bus interface has restricted many innovations in the memory system, which is facing various challenges （or walls） in the era of multi-core and big data. In this paper, we argue that a message- based interface should be adopted to replace the traditional bus-based interface in the memory system. A novel message interface based memory system called MIMS is proposed. The key innovation of MIMS is that processors communicate with the memory system through a universal and flexible message packet interface. Each message packet is allowed to encapsulate multiple memory requests （or commands） and additional semantic information. The memory system is more intelligent and active by equipping with a local buffer scheduler, which is responsible for processing packets, scheduling memory requests, preparing responses, and executing specific commands with the help of semantic information. Under the MIMS framework, many previous innovations on memory architecture as well as new optimization opportunities such as address compression and continuous requests combination can be naturally incorporated. The experimental results on a 16-core cycle-detailed simulation system show that： with accurate granularity message, MIMS can improve system performance by 53.21% and reduce energy delay product （EDP） by 55.90%. Furthermore, it can improve effective bandwidth utilization by 62.42% and reduce memory access latency by 51% on average.

关键词： message interface memory system asynchronous granularity semantic information

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：