检索结果-内蒙古大学图书馆

7th International Green and Sustainable computing Conference, IGSC 2016

作者： Zhu, Yatao Zhang, Shuai Ye, Xiaochun Wang, Da Tan, Xu Fan, Dongrui Zhang, Zhimin Li, Hongliang State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing China School of Computer and Control Engineering University of Chinese Academy of Sciences Beijing China National Computer Network Emergency Response Technical Team/Coordination Center of China Beijing China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi Jiangsu China

ISBN: (纸本)9781509051175

In recent years, many studies on optimization of energy consumption have focused on heterogeneous processor architectures. Heterogeneous computing model composed of CPU and GPU has developed from co-processing between the chips to CPU and GPU integration on the same die. On the one hand, because GPU can work under low voltage and ultra-low frequency, heterogeneous multi-core processors exhibit better energy efficiency compared to traditional CPU;On the other hand, because GPU usually has little or no shared cache, its memory bandwidth demand will be several times more than that of CPU for the same performance. Considering the constraints of power consumption and memory bandwidth, the contention problems of power and bandwidth among CPU and GPU in heterogeneous processor will inevitably occur. In this paper, energy-efficient bandwidth allocation method for heterogeneous processors allocates bandwidth to CPU and GPU and increases the overall throughput of CPU and GPU. Meanwhile the efficiency of energy consumption of heterogeneous processors is improved. The queuing delay is introduced to build a performance-energy model for heterogeneous processors. According to experience parameters, we analyze how the memory bandwidth allocation of off-chip affects the total throughput and energy efficiency. This paper also explores the change of performance-energy gains with the cache miss ratio of processors and the degree of scarce of available peak bandwidth. Based on the above research, the analytical solution of energy-efficient bandwidth allocation for optimal performance is obtained, and the solution can get more performance-energy gains compared with natural bandwidth allocation. © 2016 IEEE.

关键词： Bandwidth

来源：评论

学校读者我要写书评

暂无评论

Thread-aware adaptive prefetcher on multicore systems: Improving the performance for multithreaded workloads

Thread-aware adaptive prefetcher on multicore systems: Impro...

引用

作者： Liu, Peng Yu, Jiyang Huang, Michael C. College of Information Science and Electronic Engineering Zhejiang University Hangzhou310027 China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi214125 China Department of Shannon Lab. Huawei Technologies Co. Ltd. Hangzhou310051 China Department of Electrical and Computer Engineering University of Rochester RochesterNY14627-0231 United States

Most processors employ hardware data prefetching techniques to hide memory access latencies. However, the prefetching requests from different threads on a multicore processor can cause severe interference with prefetching and/or demand requests of others. The data prefetching can lead to significant performance degradation due to shared resource contention on shared memory multicore systems. This article proposes a thread-aware data prefetching mechanism based on low-overhead runtime information to tune prefetching modes and aggressiveness, mitigating the resource contention in the memory system. Our solution has three new components: (1) a self-tuning prefetcher that uses runtime feedback to dynamically adjust data prefetching modes and arguments of each thread, (2) a filtering mechanism that informs the hardware about which prefetching request can cause shared data invalidation and should be discarded, and (3) a limiter thread acceleration mechanism to estimate and accelerate the critical thread which has the longest completion time in the parallel region of execution. On a set of multithreaded parallel benchmarks, our thread-aware data prefetchingmechanism improves the overall performance of 64-core system by 13% over a multimode prefetch baseline system with two-level cache organization and conventional modified, exclusive, shared, and invalid-based directory coherence *** compare our approach with the feedback directed prefetching technique and find that it provides 9% performance improvement on multicore systems, while saving the memory bandwidth consumption.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

PACM: A Prediction-Based Auto-Adaptive Compression Model for HDFS

PACM: A Prediction-Based Auto-Adaptive Compression Model for...

引用

IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Ruijian Wang Chao Wang Li Zha University of the Chinese Academy of Sciences Beijing China Institute of Computing Technology Chinese Academy of Sciences Beijing China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi China

ISBN: (纸本)9781509036837

Nowadays the digital universe becomes larger and larger. The data created every year has been up to ZB level. How to store the data in many normal servers is a critical issue. Although a distributed file system alleviates the storage problem, there is still a need to reduce the storage and speed up transmission of largescale data. The distributed file system like HDFS already offers compression schemes to cater to the need, however, when the workload and data format change, configuring the compression with only one kind of algorithm is not always effective. In this paper, we proposed a model called PACM (Prediction-based Auto-adaptive Compression Model) to optimize the storage and performance by using different algorithms, e.g. quicklz, zlib, snappy according to variable data format and workload. We also implemented the model inHadoop and our empirical evaluation shows that by using PACM, the write throughput has been improved by 2-5 times.

关键词： Adaptation models Compression algorithms Data models Prediction algorithms Load modeling Predictive models Throughput

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis of GPU-Based Convolutional Neural Networks

Performance Analysis of GPU-Based Convolutional Neural Netwo...

引用

International Conference on Parallel Processing (ICPP)

作者： Xiaqing Li Guangyan Zhang H. Howie Huang Zhufan Wang Weimin Zheng Tsinghua National Laboratory for Information Science and Technology State Key Lab of Mathematical Engineering and Advanced Computing Wuxi China Department of Computer Science and Technology Tsinghua University Department of Electrical and Computer Engineering George Washington University

ISBN: (纸本)9781509028245

As one of the most important deep learning models, convolutional neural networks (CNNs) have achieved great successes in a number of applications such as image classification, speech recognition and nature language understanding. Training CNNs on large data sets is computationally expensive, leading to a flurry of research and development of open-source parallel implementations on GPUs. However, few studies have been performed to evaluate the performance characteristics of those implementations. In this paper, we conduct a comprehensive comparison of these implementations over a wide range of parameter configurations, investigate potential performance bottlenecks and point out a number of opportunities for further optimization.

关键词： Convolution Graphics processing units Runtime Training Computational modeling Computer architecture Kernel

来源：评论

学校读者我要写书评

暂无评论

RLWE-based key-policy ABE scheme

引用

Tongxin Xuebao/Journal on Communications 2016年 37卷 125-131页

作者： Sun, Ze-Dong Zhu, Yue-Fei Gu, Chun-Xiang Zheng, Yong-Hui PLA Information Engineering University Zhengzhou450001 China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi214125 China

Based on the attribute-based encryption(ABE) scheme which was proposed by Brakerski and constructed on the LWE problem, a RLWE-based key-policy ABE scheme was presented. Efficiency and key size of this scheme overtakes old ones which are based on the LWE problem. Under the RLWE assumption, this scheme supports attributes of unbounded length and semi-adaptive security. Moreover, a compiler was constructed and could compile ABE scheme that meets its demand into an attribute-based fully homomorphic encryption (ABFHE) scheme. © 2016, Editorial Board of Journal on Communications. All right reserved.

关键词： Cryptography

来源：评论

学校读者我要写书评

暂无评论

FPGA-Based Parallel Implementation of SURF Algorithm

FPGA-Based Parallel Implementation of SURF Algorithm

引用

International Conference on Parallel and Distributed Systems (ICPADS)

作者： Wenjie Chen Shuaishuai Ding Zhilei Chai Daojing He Weihua Zhang Guanhua Zhang Qiwei Peng Wang Luo MoE Engineering Research Center for Software/Hardware Co-design Technology and Application East China Normal University Shanghai China School of loT Engineering Jiangnan University Wuxi China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi China Parallel Processing Institute Fudan University Shanghai China

ISBN: (纸本)9781509053827

SURF (Speeded up robust features) detection is used extensively in object detection, tracking and matching. However, due to its high complexity, it is usually a challenge to perform such detection in real time on a general-purpose processor. This paper proposes a parallel computing algorithm for the fast computation of SURF, which is specially designed for FPGAs. By efficiently exploiting the advantages of the architecture of an FPGA, and by appropriately handling the inherent parallelism of the SURF computation, the proposed algorithm is able to significantly reduce the computation time. Our experimental results show that, for an image with a resolution of 640x480, the processing time for computing using SURF is only 0.047 seconds on an FPGA (XC6SLX150T, 66.7 MHz), which is 13 times faster than when performed on a typical i3-3240 CPU (with a 3.4 GHz main frequency) and 249 times faster than when performed on a traditional ARM system (CortexTM-A8, 1 GHz).

关键词： Feature extraction Parallel processing Field programmable gate arrays Hardware Algorithm design and analysis Interpolation Real-time systems

来源：评论

学校读者我要写书评

暂无评论

Business process mining based insider threat detection system

引用

Tongxin Xuebao/Journal on Communications 2016年 37卷 180-188页

作者： Zhu, Tai-Ming Guo, Yuan-Bo Ju, An-Kang Ma, Jun School of Cyberspace Security PLA Information Engineering University Zhengzhou450001 China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi214000 China

Current intrusion detection systems are mostly for detecting external attacks, but sometimes the internal staff may bring greater harm to organizations in information security. Traditional insider threat detection methods often do not combine the behavior of people with business activities, making the threat detection rate to be improved. An insider threat detection system based on business process mining from two aspects was proposed, the implementation of insider threats and the impact of threats on system services. Firstly, the normal control flow model of business activities and the normal behavior profile of each operator were established by mining the training log. Then, the actual behavior of the operators was compared with the pre-established normal behavior contours during the operation of the system, which was supplemented by control flow anomaly detection and performance anomaly detection of business processes, in order to discover insider threats. A variety of anomalies were defined and the corresponding detection algorithms were given. Experiments were performed on the ProM platform. The results show the designed system is effective. © 2016, Editorial Board of Journal on Communications. All right reserved.

关键词： Intrusion detection

来源：评论

学校读者我要写书评

暂无评论

Carrier-depletion mach-zehnder silicon optical modulator for bpsk and multi-level applications

Carrier-depletion mach-zehnder silicon optical modulator for...

引用

Asia Communications and Photonics Conference, ACPC 2015

作者： Ding, Jianfeng Yang, Lin Zhang, Lei State Key Laboratory on Integrated Optoelectronics Institute of Semiconductors P. O. Box 912 Beijing100083 China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi214125 China

ISBN: (纸本)9781943580064

关键词： Light modulators

来源：评论

学校读者我要写书评

暂无评论

An Embedded FPGA Operating System Optimized for Vision computing (Abstract Only) 15

An Embedded FPGA Operating System Optimized for Vision Compu...

引用

Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

作者： Zhilei Chai Jin Yu Zhibin Wang Jie Zhang Haojie Zhou Jiangnan University Wuxi China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi China

ISBN: (纸本)9781450333153

Although FPGA's power and performance advantages were recognized widely, designing applications on FPGA-based systems is traditionally a task undertaken by hardware experts. It is significant to allow application-level programmers with less system-level but more algorithm knowledge to realize their applications conveniently on FPGAs. In this paper, an embedded FPGA operating system is proposed to facilitate application-level programmers to use FPGAs. Firstly, it builds specific I/Os and optimizes bus interconnection among I/Os, DDR memory, user IPs etc within the FPGA for vision computing. Secondly, it manages resources of the FPGA such as I/Os, DDR memory, communication etc, frees users from low-level details. Thirdly, it schedules tasks (IPs) executed on the FPGA dynamically in runtime, which makes the FPGA multiplexed when necessary. After porting the FPGA operating system to different FPGA platforms and implementing vision algorithms based on that, it shows the FPGA operating system is able to simplify algorithm development on FPGA platforms and improve portability of user applications. Furthermore, implementation results of several popular vision algorithms show the FPGA operating system is efficient and effective for vision computing. Finally, experimental results shows that for multiple algorithms requiring more FPGA resources, runtime task scheduling of multiple IPs is more efficient than a fixed IP when the SoC of FPGA is considered.

关键词： vision computing operating systems portability of applications application programmers fpgas

来源：评论

学校读者我要写书评

暂无评论

Non-malleability under selective opening attacks: Implication and separation 13th

Non-malleability under selective opening attacks: Implicatio...

引用

13th International Conference on Applied Cryptography and Network Security, ACNS 2015

作者： Huang, Zhengan Liu, Shengli Mao, Xianping Chen, Kefei Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai200240 China School of Science Hangzhou Normal University Hangzhou310036 China State Key Laboratory of Mathematical Engineering and Advanced Computing Wuxi214000 China

ISBN: (纸本)9783319281650

We formalize the security notions of non-malleability under selective opening attacks (NM-SO security) in two approaches: the indistinguishability-based approach and the simulation-based approach. We explore the relations between NM-SO security notions and the known selective opening security notions, and the relations between NM-SO security notions and the standard non-malleability notions. © Springer International Publishing Switzerland 2015.

关键词： Public key cryptography

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：