检索结果-内蒙古大学图书馆

26th ACM International Conference on Supercomputing, ICS'12

作者： Li, Jiajia Li, Xingjian Tan, Guangming Chen, Mingyu Sun, Ninghui State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing China

ISBN: (纸本)9781450313162

In heterogeneous systems that include CPUs and GPUs, the data transfers between these components play a critical role in determining the performance of applications. Software pipelining is a common approach to mitigate the overheads of those transfers. In this paper we investigate advanced software-pipelining optimizations for the double-precision general matrix multiplication (DGEMM) algorithm running on a heterogeneous system that includes ATI GPUs. Our approach decomposes the DGEMM workload to a finer detail and hides the latency of CPU-GPU data transfers to a higher degree than previous approaches in literature. We implement our approach in a five-stage software pipelined DGEMM and analyze its performance on a platform including x86 multi-core CPUs and an ATI Radeon™ HD5970 GPU that has two Cypress GPU chips on board. Our implementation delivers 758 GFLOPS (82% floating-point efficiency) when it uses only the GPU, and 844 GFLOPS (80% efficiency) when it distributes the workload on both CPU and GPU. We analyze the performance of our optimized DGEMM as the number of GPU chips employed grows from one to two, and the results show that resource contention on the PCIe bus and on the host memory are limiting factors. Copyright 2012 ACM.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Joint Optimization of Latency and Energy Consumption for Mobile Edge Computing Based Proximity Detection in Road Networks

引用

China Communications 2022年第4期19卷 274-290页

作者： Tongyu Zhao Yaqiong Liu Guochu Shou Xinwei Yao School of Information and Communication Engineering Beijing University of Posts and TelecommunicationsBeijing 100876China Beijing Laboratory of Advanced Information Networks Beijing Key Laboratory of Network System Architecture and ConvergenceBeijing 100876China School of Computer Science and Technology Zhejiang University of TechnologyChina

In recent years, artificial intelligence and automotive industry have developed rapidly, and autonomous driving has gradually become the focus of the industry. In road networks, the problem of proximity detection refers to detecting whether two moving objects are close to each other or not in real time. However, the battery life and computing capability of mobile devices are limited in the actual scene,which results in high latency and energy consumption. Therefore, it is a tough problem to determine the proximity relationship between mobile users with low latency and energy consumption. In this article, we aim at finding a tradeoff between latency and energy consumption. We formalize the computation offloading problem base on mobile edge computing(MEC)into a constrained multiobjective optimization problem(CMOP) and utilize NSGA-II to solve it. The simulation results demonstrate that NSGA-II can find the Pareto set, which reduces the latency and energy consumption effectively. In addition, a large number of solutions provided by the Pareto set give us more choices of the offloading decision according to the actual situation.

关键词： proximity detection mobile edge computing road networks constrained multiobjective optimization

来源：评论

学校读者我要写书评

暂无评论

A priority-aware NoC to reduce squashes in Thread Level Speculation for Chip Multiprocessors

A priority-aware NoC to reduce squashes in Thread Level Spec...

引用

9th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2011

作者： Dai, Wenbo An, Hong Li, Qi Li, Gongming Deng, Bobin Wu, Shilei Li, Xiaomei Liu, Yu School of Computer Science and Technology University of Science and Technology of China Hefei China Key Laboratory of Computer System and Architecture Chinese Academy of Sciences Beijing China

ISBN: (纸本)9780769544281

Thread Level Speculation (TLS) is a technique aims at boosting the performance of sequential programs running on Chip Multiprocessors (CMPs) by automatically parallelizing them. It exempts programmers from the heavy task of parallel programming. But its performance may suffer from frequent squashing caused by inter-thread data dependency violation. In this paper, we propose a Network-on-Chip (NoC) in CMP that employs a priority-aware packet arbitration policy. Packet scheduling guided by such policy reduces the occurrence of TLS squashes. Simulation results with 5 applications show that our policy reduces squashes by 22% in best case and 15% on average. Moreover, our priority-aware approach could be generalized to similar scenarios in which different threads running on CMP manifest different priorities. © 2011 IEEE.

关键词： Network-on-chip

来源：评论

学校读者我要写书评

暂无评论

Cs 5D_(5/2)-6F 728 nm Laser Spectroscopy with Single Pumping Laser

引用

Chinese Physics Letters 2017年第3期34卷 53-56页

作者：周琦常鹏媛刘忠征张晓刚祝传文陈景标 School of Optoelectronic Information University of Electronic Science and Technology of China State Key Laboratory of Advanced Optical Communication System and Network Institute of Quantum ElectronicsSchool of Electronics Engineering & Computer SciencePeking University

The sub-Doppler absorption laser spectroscopy at 728nm transition from the 5D5/2 state to the 6F state of cesium with linewidth near 10 MHz is first experimentally performed with indirect pumping from the ground state 6S1/2 to the state 7P3/2 by a 455.5nm diode laser. Using a 455.5nm diode laser as an indirect pump laser, several excited states will be populated due to spontaneous decay from the 7P state. We first implement the sub-Doppler absorption laser spectroscopy at 728nm from the 5D5/2 state to the 6F state when Cs atoms within thermal glass cell decay to the 5D5/2 state. Due to velocity transfer effect, the hyperfine structure of 5D5/2 shows a mixed and complicated pattern but very e/ear structure when the 455.5nm pumping laser is counter-propagating （or co-propagating） with the 728nm probing laser.

关键词： Cs 5D ab length nm Laser Spectroscopy with Single Pumping Laser FADOF

来源：评论

学校读者我要写书评

暂无评论

A Reduced Reachability Tree for a Class of Unbounded Petri Nets

引用

IEEE/CAA Journal of Automatica Sinica 2015年第4期2卷 345-352页

作者： Shouguang Wang Mengdi Gan Mengchu Zhou Dan You School of Information and Electronic Engineering Zhejiang Gongshang University State Key Laboratory for Manufacturing Systems Engineering Xi’an Jiaotong University Ministry of Education(MoE)Key Laboratory of Embedded System and Service Computing Tongji University MoE Key Laboratory of Embedded System and Service Computing Tongji University Department of Electrical and Computer Engineering New Jersey Institute of Technology

As a powerful analysis tool of Petri nets, reachability trees are fundamental for systematically investigating many characteristics such as boundedness, liveness and reversibility. This work proposes a method to generate a reachability tree, called ωRT for short, for a class of unbounded generalized nets called ω-independent nets based on new modified reachability trees (NMRTs). ωRT can effectively decrease the number of nodes by removing duplicate and ω-duplicate nodes in the tree, and verify properties such as reachability, liveness and deadlocks. Two examples are provided to show its superiority over NMRTs in terms of tree size. © 2014 Chinese Association of Automation.

关键词： Petri nets

来源：评论

学校读者我要写书评

暂无评论

A non-blocking programming framework for pipeline application on multi-core platform

A non-blocking programming framework for pipeline applicatio...

引用

9th IEEE International Symposium on Parallel and Distributed Processing with Applications, ISPA 2011

作者： Li, Xiaoqiang An, Hong Liu, Gu Han, Wenting Xu, Mu Zhou, Wei Li, Qi School of Computer Science and Technology University of Science and Technology of China Hefei China Key Laboratory of Computer System and Architecture Chinese Academy of Sciences Beijing China

ISBN: (纸本)9780769544281

Many applications meet certain programming patterns like pipeline, fork-join, do-all etc. While tools such as OS threads and OpenMP allow programmers only to express task or data parallelism, special support for programming patterns is distinctly lacking. Intel threading building blocks (TBB) is developed to address this problem, but its scheduler is general and not optimized for any of its parallel algorithms which include pipeline specially. In this paper, we provide a non-blocking framework for pipeline application on multi-core platform. We target linear pipeline in which each filter has one entrance and one exit. We design a novel work-stealing scheduler optimized specially for pipeline application: first, priority based stealing, priority is calculated for each filter in pipeline so that a worker can find the optimal "victim" easily when it needs to steal;second, multiple tasks can be stolen at a time so that much stealing time is reduced. A nonblock queue is used to store intermediate result to reduce lock overhead and increase scalability. We apply our framework to four case studies, including text filter, twofish, ferret, dedup. And our framework reduces execution time of TBB by 72% in best case and 20% on average on an 8 core machine. © 2011 IEEE.

关键词： Pipelines

来源：评论

学校读者我要写书评

暂无评论

Implementation of Full Spin-state Interferometer

引用

Chinese Physics Letters 2019年第5期36卷 7-11页

作者： Peng-Ju Tang Peng Peng Xiang-Yu Dong Xu-Zong Chen Xiao-Ji Zhou State Key Laboratory of Advanced Optical Communication System and Network School of Electronics Engineering and Computer SciencePeking UniversityBeijing 100871 Collaborative Innovation Center of Extreme Optics Shanxi UniversityTaiyuan 030006

Matter-wave interferometers with spin quantum states are attractive in quantum manipulation and precision measurements. Here, five spatial interference patterns corresponding to the full spin states are observed in each run of the experiment, by the combination of the Majorana transition according to the exponential modulation of the magnetic field pulse decline curve and radio frequency coupling among multiple magnetic *** to the realization of two Majorana transitions, the interference fringe for the magnetic field insensitive state also has a higher contrast. After spatially overlapping the full magnetic sub-state interference patterns dozens of times in consecutive experimental measurements, clear fringes are still observed, indicating the great stability of the relative phases of different components. This indicates the potential to achieve an interferometer with multiple spin clocks.

关键词： In Implementation of Full Spin-state Interferometer

来源：评论

学校读者我要写书评

暂无评论

Improved structural modeling based on conserved domain clusters and structure-anchored alignments

Proceedings of the ACM Symposium on Applied Computing

引用

Proceedings of the ACM Symposium on Applied Computing 2007年 128-132页

作者： Zhang, Fa Xu, Lin Liu, Zhiyong Yuan, Bo Key Laboratory of Computer System and Architecture Institute of Computing Technology Chinese Academy of Science Graduate School Chinese Academy of Science Institute of Computing Technology Chinese Academy of Science Department of Computer Science and Engineering Shanghai Jiaotong University

ISBN: (纸本)1595934804;9781595934802

In this paper, we presented a method to improve structural modeling based on conserved domain clusters and structure-anchored alignments. We first constructed a template library of structural clusters for all conserved sequence domains. Then, for each cluster, we built the profile using the structure and sequence information. Finally we use the profile and structural alignments as anchors to increase the alignment accuracy between a query and its templates. Our preliminary results show that this method can be used for the partial prediction for a majority of known protein sequences with better qualities. Copyright 2007 ACM.

关键词： Proteins

来源：评论

学校读者我要写书评

暂无评论

Network-based humanoid robot remote interaction with the actual situation Fusion Technology

Network-based humanoid robot remote interaction with the act...

引用

The 2011 International Conference on Advanced Materials and Information Technology Processing(AMITP 2011)

作者： YU Guochen WANG Zhiliang XIE Lun XU Jiaming State Key Laboratory of Robotics and System Harbin Institute of Technology School of Computer and Communication Engineering University of Science and Technology Beijing

ISBN: (纸本)9783037851579

With the rapid development of network technology, network-based humanoid robot technology will also be open to the development of gradual and orderly progress. This article is based on the C / S architecture, the server responsible for controlling the record of news and information network transit between paragraphs;through remote interaction, real-time client to complete the real humanoid robot control functions. Interoperability between the client, first to sign the server. Server information of all registered users to return to the client process, then the client users will be able to get online users to select the remote robot interaction..When a user operation, the client program as a virtual robot through the virtual robot laboratory will be displayed in realtime robot control results.

关键词： Humanoid robot,remote interactive,virtual,real-time,network

来源：评论

学校读者我要写书评

暂无评论

Indexing Techniques of Distributed Ordered Tables： A Survey and Analysis

引用

Journal of computer Science & Technology 2018年第1期33卷 169-189页

作者： Chen Feng Chun-Dian Li Rui Li State Key Laboratory of Computer Architecture Institute of Computing Technology Chinese Academy of Sciences Beijing 100190 China~TT ' University of Chinese Academy of Sciences Beijing 100049 China Tencent Inc. Beijing 100080 China

Many NoSQL （Not Only SQL） databases were proposed to store and query on a huge amount of data. Some of them like BigTable, PNUTS, and HBase, can be modeled as distributed ordered tables （DOTs）. Many additional indexing techniques have been presented to support queries on non-key columns for DOTs. However, there was no comprehensive analysis or comparison of these techniques, which brings troubles to users in selecting or proposing a proper indexing technique for a certain workload. This paper proposes a taxonomy based on six indexing issues to classify indexing techniques on DOTs and provides a comprehensive review of the state-of-the-art techniques. Based on the taxonomy, we propose a performance model named QSModel to estimate the query time and storage cost of these techniques and run experiments on a practical workload from Tencent to evaluate this model. The results show that the maximum error rates of the query time and storage cost are 24.2% and 9.8% respectively. Furthermore, we propose IndexComparator, an open source project that implements representative indexing techniques. Therefore, users can select the best-fit indexing technique based on both theoretical analysis and practical experiments.

关键词： database Not Only SQL （NoSQL） range query indexing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：