检索结果-内蒙古大学图书馆

IEEE International conference on computer Design: VLSI in computers and Processors, (ICCD)

作者： Jiangsu Du Minghua Shen Yunfei Du School of Data and Computer Science Sun Yat-sen University Guangzhou China

ISBN: (数字)9781728197104

ISBN: (纸本)9781728197111

CNN is a popular deep learning structure able to provide intelligent processing in IoT applications. Instead of deploying the resource-hungry CNN inference workloads on the cloud, it would be promising to utilize local IoT devices for the in-situ processing. Since a single IoT device has only limited resources available, distributing over multiple local devices becomes a potential solution, especially for high-accuracy and time-sensitive tasks. However, it is non-trivial to distribute the inference of existing CNN models efficiently as they are inherently tightly-coupled structure. In this paper, we propose a distributed in-situ CNN inference system with the loosely-coupled CNN structure (LCS), the synchronization-oriented partitioning (SOP), and the decentralized asynchronous communication (DAC) for IoT applications. LCS is based on two novel design ideas, the homogeneous group and the intermittent shuffle. Experiments on ImageNet classification illustrate that LCS has the leading accuracy compared with other structures, under a given computation budget. SOP and DAC target on converting the loosely-coupled feature of LCS into practical performance improvement. SOP tries to partition LCS with fewer synchronization points and DAC reduces the communication overhead by overlapping communications. When the number of IoT devices increases from 1 to 4, our system accelerates by up to 3.85 ×, and reduces the memory footprint in each device by 70%, outperforming other approaches.

关键词： performance evaluation Deep learning Parallel processing Internet of Things Synchronization Acceleration Task analysis

来源：评论

学校读者我要写书评

暂无评论

Uncertainty quantification in dynamic systems with applications to combustion-related problems: Analysis, approaches, and challenges 54th

Uncertainty quantification in dynamic systems with applicati...

引用

54th AIAA/SAE/ASEE Joint Propulsion conference, 2018

作者： Florez, Horacio A. Ceberio, Martine C. Bravo, Luis Valera, Leobardo Contreras, Angel Garcia Computer Science Department The University of Texas at El Paso United States Propulsion Division Vehicle Technology Directorate ARL APG AberdeenMD United States

来源：评论

学校读者我要写书评

暂无评论

A Realtime Monitoring Method for Cluster System Running State Based on Network 4

A Realtime Monitoring Method for Cluster System Running Stat...

引用

2019 4th Annual International conference on Information System and Artificial Intelligence, ISAI 2019

作者： Chi, Wanqing Zhou, Wei School of Computer National University of Defense Technology No. 109 Deya Road Kaifu District Changsha Hunan410073 China College of Foreign Languages Changsha University No. 98 Hongshan Road Kaifu District Changsha Hunan410003 China

high performance Computing (HPC) system running state monitoring is an important task. Small-scale cluster systems exist separately as one whole system or exist as a part of a large-scale HPC system. Such cluster systems are often the core of the entire system. Therefore, for such systems, there should be more timely and accurate monitoring of running state. However, various HPC system monitoring softwares currently focus on the aggregation of overall running and resource state, and cannot provide realtime monitoring, detailed information, and process control. Based on the existing single-machine real-time monitoring tools, this paper adds network-based information exchange protocol, multi-threaded parallel information acquisition service, extends information representation and display output style, adds remote process control, then carries out related tests, which implements a more efficient real-time running state monitoring tool for small-scale cluster systems. © 2019 Published under licence by IOP Publishing Ltd.

关键词： Process control

来源：评论

学校读者我要写书评

暂无评论

K-means assisted adaptively partitioned entropy loading for FBMC/OQAM system

K-means assisted adaptively partitioned entropy loading for ...

引用

Optical Fiber Communication conference, OFC 2020

作者： Chen, Xi Yan, Shuangyi Tang, Ming Fu, Songnian Liu, Deming Simeonidou, Dimitra School of Optical and Electronic Information Huazhong University of Science and Technology Wuhan China High Performance Networks Group Department of Electrical and Electronic Engineering University of Bristol Bristol United Kingdom

We adopted K-means clustering to efficiently partition the subcarriers to reduce the complexity of PS-QAM on FBMC/OQAM system using KK receiver. The net data rate of 100 Gb/s is achieved after 125 km transmission. ... 详细信息

ISBN: (纸本)9781943580712

关键词： Optical fiber communication

来源：评论

学校读者我要写书评

暂无评论

Diverse demands estimation and ranking based on user behaviors 2nd

Diverse demands estimation and ranking based on user behavio...

引用

2nd International Workshop on high performance Computing for Advanced Modeling and Simulation in Nuclear Energy and Environmental Science, HPCMS 2018, and 1st International Workshop on HPC Supported Data Analytics for Edge Computing, HiDEC 2018 held at the 32nd ACM International conference on Supercomputing, ACM ICS 2018

作者： Chen, Liandong Li, Shigang Zhou, Chunbao Liu, Fang Xu, Rui Li, Shuo Wang, Jue Zhang, Boyao State Grid Hebei Electric Power Supply Co. Ltd. Shijiazhuang China Computer Network Information Center Chinese Academy of Science Beijing China Institute of Computing Technology Chinese Academy of Sciences Beijing China University of Chinese Academy of Sciences Beijing China

ISBN: (纸本)9789813299863

In the big data era, users can get massive information from the Internet, but the value density is very low. In order to help users find the information they need more quickly, this paper presents the mechanism of diverse demands estimation and ranking based on user behaviors. Firstly, a definition of classification system for users query intent is proposed. Secondly, in order to mine the documents on the websites of specific classification, LDA model is used to cluster and annotate the websites. To speed up the inference process of LDA, we take advantage of MPI and OpenMP hybrid parallelism techniques to reduce both internode and intra-node communication cost. Lastly, according to the historical behaviors of users and the search engine return results, we rank the classifications on Map-Reduce platform and present the top-ranking ones to users © Springer Nature Singapore Pte Ltd 2019.

关键词： Websites

来源：评论

学校读者我要写书评

暂无评论

UIMigrate: Adaptive Data Migration for Hybrid Non-Volatile Memory Systems 22

UIMigrate: Adaptive Data Migration for Hybrid Non-Volatile M...

引用

22nd Design, Automation and Test in Europe conference and Exhibition (DATE)

作者： Tan, Yujuan Wang, Baiping Yan, Zhichao Deng, Qiuwei Chen, Xianzhang Liu, Duo Chongqing Univ Coll Comp Sci Chongqing Peoples R China Univ Texas Arlington Dept Comp Sci & Engn Arlington TX 76019 USA

ISBN: (纸本)9783981926323

Byte-addressable, non-volatile memory (NVRAM) combines the benefits of DRAM and flash memory. Its slower speed compared to DRAM, however, makes it hard to entirely replace DRAM with NVRAM. Hybrid NVRAM systems that equip both DRAM and NVRAM on the memory bus become a better solution: frequently accessed, hot pages can be stored in DRAM while other cold pages can reside in NVRAM. This way, the system gets the benefits of both high performance (from DRAM) and lower power consumption and cost/ performance (from NVRAM). Realizing an efficient hybrid NVRAM system requires careful page migration and accurate data temperature measurement. Existing solutions, however, often cause invalid migrations due to inaccurate data temperature accounting, because hot and cold pages are separately identified in DRAM and NVRAM regions. Based on this observation, we propose UIMigrate, an adaptive data migration approach for hybrid NVRAM systems. The key idea is to consider data temperature across the whole DRAMNVRAM space when determining whether a page should be migrated between DRAM and NVRAM. In addition, UIMigrate adapts workload changes by dynamically adjusting migration decisions as workload changes. Our experiments using SPEC 2006 show that UIMigrate can reduce the number of migrations and improves performance by up to 90.4% compared to existing state-of-the-art approaches.

关键词： Random access memory Nonvolatile memory computer science Temperature measurement performance evaluation Phase change materials Attenuation

来源：评论

学校读者我要写书评

暂无评论

Research on programming Model and Compilation Optimization Technology of Multi-Core GPU

引用

Journal of Physics: conference Series 2022年第1期2173卷

作者： Liyan Chen School of Computer and Information Engineering Nanchang Institute of Technology Nanchang Jiangxi 330044 China

GPGPU (General Purpose Computing on Graphics Processing Units) has been widely applied to high performance computing. However, GPU architecture and programming model are different from that of traditional CPU. Accordingly, it is rather challenging to develop efficient GPU applications. This paper focuses on the key techniques of programming model and compiler optimization for many-core GPU, and addresses a number of key theoretical and technical issues. This paper proposes a many-threaded programming model ab-Stream, which would transparentize architecture differences and provide an easy to parallel, easy to program, easy to extend and easy to tune programming model. In addition, this paper proposes memory optimization and data transfer transformation according to data classification. Firstly, this paper proposes data layout pruning based on classification memory, and then proposes Ta T (Transfer after Transformed) for transferring Strided data between CPU and GPU. Experimental results demonstrate that proposed techniques would significantly improve performance for GPGPU applications.

关键词：

来源：评论

学校读者我要写书评

暂无评论

OMPRacer: A Scalable and Precise Static Race Detector for OpenMP programs 20

OMPRacer: A Scalable and Precise Static Race Detector for Op...

引用

Supercomputing conference

作者： Bradley Swain Yanze Li Peiming Liu Ignacio Laguna Giorgis Georgakoudis Jeff Huang Computer Science and Engineering Texas A & M University College Station TX Coderrect Inc College Station TX Center for Applied Scientific Computing Lawrence Livermore National Laboratory Livermore CA

ISBN: (数字)9781728199986

ISBN: (纸本)9781728199993

We present OMPRACER, a static tool that uses flow-sensitive, interprocedural analysis to detect data races in OpenMP programs. OMPRACER is fast, scalable, has high code coverage, and supports the most common OpenMP features by combining state-of-the-art pointer analysis, novel value-flow analysis, happens-before tracking, and generalized modelling of OpenMP APIs. Unlike dynamic tools that currently dominate data race detection, OMPRACER achieves almost 100% code coverage using static analysis to detect a broader category of races without running the program or relying on specific input or runtime behaviour. OMPRACER has competitive precision with dynamic tools like Archer and ROMP: passing 105/116 cases in DataRaceBench with a total accuracy of 91%. OMPRACER has been used to analyze several Exascale Computing Project proxy applications containing over 2 million lines of code in under 10 minutes. OMPRACER has revealed previously unknown races in an ECP proxy app and a production simulation for COVID19.

关键词： Runtime Exascale computing Static analysis Production Tools Hardware performance analysis

来源：评论

学校读者我要写书评

暂无评论

Coda: An End-to-End Neural program Decompiler 33

Coda: An End-to-End Neural Program Decompiler

引用

33rd conference on Neural Information Processing Systems (NeurIPS)

作者： Fu, Cheng Chen, Huili Liu, Haolan Chen, Xinyun Tian, Yuandong Koushanfar, Farinaz Zhao, Jishen Univ Calif San Diego San Diego CA 92103 USA Univ Calif Berkeley Berkeley CA USA Facebook Menlo Pk CA USA

Reverse engineering of binary executables is a critical problem in the computer security domain. On the one hand, malicious parties may recover interpretable source codes from the software products to gain commercial advantages. On the other hand, binary decompilation can be leveraged for code vulnerability analysis and malware detection. However, efficient binary decompilation is challenging. Conventional decompilers have the following major limitations: (i) they are only applicable to specific source-target language pair, hence incurs undesired development cost for new language tasks;(ii) their output high-level code cannot effectively preserve the correct functionality of the input binary;(iii) their output program does not capture the semantics of the input and the reversed program is hard to interpret. To address the above problems, we propose Coda(1), the first end-to-end neural-based framework for code decompilation. Coda decomposes the decompilation task into of two key phases: First, Coda employs an instruction type-aware encoder and a tree decoder for generating an abstract syntax tree (AST) with attention feeding during the code sketch generation stage. Second, Coda then updates the code sketch using an iterative error correction machine guided by an ensembled neural error predictor. By finding a good approximate candidate and then fixing it towards perfect, Coda achieves superior performance compared to baseline approaches. We assess Coda's performance with extensive experiments on various benchmarks. Evaluation results show that Coda achieves an average of 82% program recovery accuracy on unseen binary samples, where the state-of-the-art decompilers yield 0% accuracy. Furthermore, Coda outperforms the sequence-to-sequence model with attention by a margin of 70% program accuracy. Our work reveals the vulnerability of binary executables and imposes a new threat to the protection of Intellectual Property (IP) for software development.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Distributed processor load balancing based on multi-objective extremal optimization 1

引用

12th International conference on Internet and Distributed Computing Systems, IDCS 2019

作者： De Falco, Ivanoe Laskowski, Eryk Olejnik, Richard Scafuri, Umberto Tarantino, Ernesto Tudruj, Marek Institute of High Performance Computing and Networking CNR Naples Italy Institute of Computer Science Polish Academy of Sciences Warsaw Poland Université Lille—CRISTAL CNRS Lille France Polish-Japanese Academy of Information Technology Warsaw Poland

ISBN: (数字)9783030349141

ISBN: (纸本)9783030349134

The paper proposes and discusses distributed processor load balancing algorithms which are based on nature inspired approach of multi-objective Extremal Optimization. Extremal Optimization is used for defining task migration aiming at processor load balancing in execution of graph-represented distributed programs. The analysed multi-objective algorithms are based on three or four criteria selected from the following four choices: the balance of computational loads of processors in the system, the minimal total volume of application data transfers between processors, the number of task migrations during program execution and the influence of task migrations on computational load imbalance and the communication volume. The quality of the resulting load balancing is assessed by simulation of the execution of the distributed program macro data flow graphs, including all steps of the load balancing algorithm. It is done following the event-driven model in a simulator of a message passing multiprocessor system. The experimental comparison of the multi-objective load balancing to the single objective algorithms demonstrated the superiority of the multi-objective approach. © Springer Nature Switzerland AG 2019.

关键词： Multiobjective optimization

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：