检索结果-内蒙古大学图书馆

2013 ieee 37th Annual Computer Software and Applications Conference, COMPSAC 2013

作者： Xu, Rengan Araya-Polo, Mauricio Chapman, Barbara Department of Computer Science University of Houston Houston United States Geophysics Development Repsol Houston United States

ISBN: (纸本)9780769549798

the growing speed gap between CPU and memory makes I/O the main bottleneck of many industrial applications. Some applications need to perform I/O operations for very large volume of data frequently, which will harm the performance seriously. this work's motivation are geophysical applications used for oil and gas exploration. these applications process Terabyte size datasets in HPC facilities. the datasets represent subsurface models and field recorded data. In general term, these applications read as inputs and write as intermediate/final results huge amount of data, where the underlying algorithms implement seismic imaging techniques. the traditional sequential I/O, even when couple with advance storage systems, cannot complete all I/O operations for so large volumes of data in an acceptable time range. parallel I/O is the general strategy to solve such problems. However, because of the dynamic property of many of these applications, each parallel process does not know the data size it needs to write until its computation is done, and it also cannot identify the position in the file to write. In order to write correctly and efficiently, communication and synchronization are required among all processes to fully exploit the parallel I/O paradigm. To tackle these issues, we use a dynamic load balancing framework that is general enough for most of these applications. And to reduce the expensive synchronization and communication overhead, we introduced a I/O node that only handles I/O request and let compute nodes perform I/O operations in parallel. By using both POSIX I/O and memory-mapping interfaces, the experiment indicates that our approach is scalable. For instance, with 16 processes, the bandwidth of parallel reading can reach the theoretical peak performance (2.5 GB/s) of the storage infrastructure. Also, the parallel writing can be up to 4.68x (speedup, POSIX I/O) and 7.23x (speedup, memory-mapping) more efficient than the serial I/O implementation. Since, mo

关键词： Dynamic loads

来源：评论

学校读者我要写书评

暂无评论

WeeFence: Toward making fences free in TSO 13

WeeFence: Toward making fences free in TSO

引用

40th Annual International symposium on Computer Architecture, ISCA 2013

作者： Duan, Yuelu Muzahid, Abdullah Torrellas, Josep University of Illinois Urbana-Champaign United States University of Texas San Antonio United States

ISBN: (纸本)9781450320795

Although fences are designed for low-overhead concurrency coordination, they can be expensive in current machines. If fences were largely free, faster fine-grained concurrent algorithms could be devised, and compilers could guarantee Sequential Consistency (SC) at little cost. In this paper, we present WeeFence (or WFence for short), a fence that is very cheap because it allows post-fence accesses to skip it. Such accesses can typically complete and retire before the pre-fence writes have drained from the write buffer. Only when an incorrect reordering of accesses is about to happen, does the hardware stall to prevent it. In the paper, we present the WFence design for TSO, and compare it to a conventional fence with speculation for 8-processor multicore simulations. We run parallel kernels that contain explicit fences and parallel applications that do not. For the kernels, WFence eliminates nearly all of the fence stall, reducing the kernels' execution time by an average of 11%. For the applications, a conservative compiler algorithm places fences in the code to guarantee SC. In this case, on average, WFences reduce the resulting fence overhead from 38% of the applications' execution time to 2% (in a centralized WFence design), or from 36% to 5% (in a distributed WFence design). Copyright 2013 ACM.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A novel scheduling methodology based on SDL process migration for the LTE higher layer protocol on multi-core mobile terminals

A novel scheduling methodology based on SDL process migratio...

引用

10th ieee International symposium on Wireless Communication Systems 2013, ISWCS 2013

作者： Showk, Anas Bilgic, Attila KROHNE Innovation GmbH Ludwig-Krohne-Str. 5 47058 Duisburg Germany

ISBN: (纸本)9783800735297

Since data rate in wireless communication systems has exponentially increased during the last decades, serious efforts must be considered in developing mobile terminals to provide enough processing capability. For instance, the computational capabilities provided by single core processors, at reasonable power consumption, is hardly enough for the Long Term Evolution (LTE) protocol stack processing demands. therefore, multi-core processors are a favorable solution for providing high performance at low power consumption. In this paper, parallel software architecture for the access stratum part of the LTE protocol stack on the mobile terminal side is introduced. the LTE protocol stack is parallelized and executed on a multi-core processor, by employing the Extended Finite State Machines (EFSMs) concurrency. Moreover, the LTE system is scheduled on multi-core by customizing the Specification and Description Language (SDL) scheduler to implement a data pipeline scheduler. Furthermore, a new load balancer scheme is proposed by moving the load balancer to the modem subsystem's layer and using the SDL process migration concept. the performance speedup on two, three, or four cores, compared to a single core, increases to around 1.95, 2.9, and 3.6, respectively. © VDE Verlag GMBH.

关键词： Network architecture

来源：评论

学校读者我要写书评

暂无评论

On partitioning and reordering problems in a hierarchically parallel hybrid linear solver

On partitioning and reordering problems in a hierarchically ...

引用

2013 ieee 37th Annual Computer Software and Applications Conference, COMPSAC 2013

作者： Yamazaki, Ichitaro Li, Xiaoye S. Rouet, Francois-Henry Ucar, Bora Electrical Engineering and Computer Science Department University of Tennessee Knoxville United States Computational Research Division Lawrence Berkeley National Laboratory United States ENSEEIHT-IRIT Lawrence Berkeley National Laboratory United States CNRS and LIP UMR CNRS-ENS-INRIA-UCBL Lyon France

ISBN: (纸本)9780769549798

PDSLin is a general-purpose algebraic parallel hybrid (direct/iterative) linear solver based on the Schur complement method. the most challenging step of the solver is the computation of a preconditioner based on the global Schur complement. Efficient parallel computation of the preconditioner gives rise to partitioning problems with sophisticated constraints and objectives. In this paper, we identify two such problems and propose hyper graph partitioning methods to address them. the first problem is to balance the work loads associated with different sub domains to compute the preconditioner. We first formulate an objective function and a set of constraints to model the preconditioner computation time. then, to address these complex constraints, we propose a recursive hyper graph bisection method. the second problem is to improve the data locality during the parallel solution of a sparse triangular system with multiple sparse right-hand sides. We carefully analyze the objective function and show that it can be well approximated by a standard hyper graph partitioning method. Moreover, an ordering compatible with a post ordering of the sub domain elimination tree is shown to be very effective in preserving locality. To evaluate the two proposed methods in practice, we present experimental results using linear systems arising from some applications of our interest. First, we show that in comparison to a commonly-used nested graph dissection method, the proposed recursive hyper graph partitioning method reduces the preconditioner construction time, especially when the number of sub domains is moderate. this is the desired result since PDSLin is based on a two-level parallelization to keep the number of sub domains small by assigning multiple processors to each sub domain. We also show that our second proposed hyper graph method improves the data locality during the sparse triangular solution and reduces the solution time. Moreover, we show that partitioning time can be

关键词： Linear systems

来源：评论

学校读者我要写书评

暂无评论

Reducing false transactional conflicts with speculative sub-blocking state-An empirical study for ASF transactional memory system

Reducing false transactional conflicts with speculative sub-...

引用

2013 ieee 37th Annual Computer Software and Applications Conference, COMPSAC 2013

作者： Nai, Lifeng Lee, Hsien-Hsin S. School of Electrical and Computer Engineering Georgia Institute of Technology United States

ISBN: (纸本)9780769549798

Conflict detection and resolution are among the most fundamental issues in transactional memory systems. Hardware transactional memory (HTM) systems such as AMD's Advanced Synchronization Facility (ASF) employ inherent cache coherence protocol messages to perform conflict detection among transactions. Such an implementation has the advantage of design simplicity, nonetheless, it also generates false transactional conflicts due to false sharing within cache lines, unnecessarily reducing the overall performance. In this work, we first investigated the behavior of false transactional conflicts under the AMD's ASF system. It is found that false conflicts showed rather stable pattern within each cache line that subsequently inspired our false transactional conflict reduction technique using our proposed speculative sub-blocking state. By adding an extra speculative state for each cache line's sub-block, we can maintain conflict detection at the granularity of sub-blocks while keeping the original cache coherence protocol intact. the overall design is simple and highly implementable for achieving a high-efficiency HTM system with minimum impact in hardware. We evaluated our proposed technique using PTLsim-ASF and compared it with a baseline ASF HTM system and an ideal system with no false transactional conflict. Our results showed that the proposed lightweight technique can avoid false conflicts effectively and efficiently. With four sub-blocks in a cache line, our technique can eliminate 56.4% false transactional conflicts and 31.3% of all transactional conflicts on average, which approaches the performance of an ideal system. © 2013 ieee.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

A Study on Fast Visualized parallel Calculation of LRCS Based on CUDA

A Study on Fast Visualized Parallel Calculation of LRCS Base...

引用

2013 ieee 5th International symposium on Microwave,Antenna,Propagation and EMC Technologies for Wireless Communications

作者： Lanhai Jin Liangchao Li Kang Wang Yanni Guo School of Science Xidian University

ISBN: (纸本)9781467360760

A fast algorithm of calculating LRCS of a complex target is presented and computational unified device architecture(CUDA) is used for accelerating the color-flagged Graphical Electromagnetic Computing(GRECO) *** on the five parameters bidirectional reflectance distribution function(BRDF) model,one target's LRCS can be divided into millions of pixels' computation which are processed on a great many of threads running in *** obtained comparison of CPU calculation and CUDA parallel calculation of one aircraft shows that the speed of CUDA parallel calculation improves several *** the help of CUDA and GRECO method,electromagnetic simulation of LRCS can be processed much more efficiently.

关键词： CUDA LRCS color flagged GRECO parallel calculation parallel processing (COMPUTERS)

来源：评论

学校读者我要写书评

暂无评论

Workshop on parallel computing and optimization-PCO

Proceedings - IEEE 27th International Parallel and Distribut...

引用

proceedings - ieee 27th International parallel and distributed processing symposium Workshops and PhD Forum, IPDPSW 2013 2013年 1743-1743页

作者： El Baz, Didier CDA LAAS CNRS France

来源：评论

学校读者我要写书评

暂无评论

Building a Privacy-Preserving Semantic Overlay for Peer-to-Peer Networks

Building a Privacy-Preserving Semantic Overlay for Peer-to-P...

引用

5th ieee International Workshop on Information Forensics and Security (WIFS)

作者： Zeilemaker, Niels Erkin, Zekeriya Palmieri, Paolo Pouwelse, Johan Delft Univ Technol Parallel & Distributed Syst Grp Mekelweg 4 NL-2628 CD Delft Netherlands Delft Univ Technol Informat Secur & Privacy Lab NL-2628 CD Delft Netherlands

ISBN: (纸本)9781467355919

Searching a Peer-to-Peer (P2P) network without using a central index has been widely investigated but proved to be very difficult. Various strategies have been proposed, however no practical solution to date also addresses privacy concerns. By clustering peers which have similar interests, a semantic overlay provides a method for achieving scalable search. Traditionally, in order to find similar peers, a peer is required to fully expose its preferences for items or content, therefore disclosing this private information. However, in a hostile environment, such as a P2P system, a peer can not know the true identity or intentions of fellow peers. In this paper, we propose two protocols for building a semantic overlay in a privacy-preserving manner by modifying existing solutions to the Private Set Intersection (PSI) problem. Peers in our overlay compute their similarity to other peers in the encrypted domain, allowing them to find similar peers. Using homomorphic encryption, peers can carrying out computations on encrypted values, without needing to decrypt them first. We propose two protocols, one based on the inner product of vectors, the other on multivariate polynomial evaluation, which are able to compute a similarity value between two peers. Both protocols are implemented on top of an existing P2P platform and are designed for actual deployment. Using a supercomputer and a dataset extracted from a real world instance of a semantic overlay, we emulate our protocols in a network consisting of a thousand peers. Finally, we show the actual computational and bandwidth usage of the protocols as recorded during those experiments.

关键词： computer network security cryptographic protocols data privacy peer-to-peer computing polynomials

来源：评论

学校读者我要写书评

暂无评论

Workshop on large-scale parallel processing-LSPP

Proceedings - IEEE 27th International Parallel and Distribut...

引用

proceedings - ieee 27th International parallel and distributed processing symposium Workshops and PhD Forum, IPDPSW 2013 2013年 1665-1666页

作者： Kerbyson, Darren J. Rajamony, Ram Weems, Charles Pacific Northwest National Laboratory United States IBM Austin Research Lab United States University of Massachusetts Amherst United States

来源：评论

学校读者我要写书评

暂无评论

the 2nd parallel and distributed computing for machine learning and inference problems (ParLearning '2013) in conjunction with IPDPS 2013, Boston, MA

Proceedings - IEEE 27th International Parallel and Distribut...

引用

proceedings - ieee 27th International parallel and distributed processing symposium Workshops and PhD Forum, IPDPSW 2013 2013年 1856-1858页

作者： Choudhury, Sutanay Chin, George Xia, Yinglong Pacific Northwest National Laboratory United States IBM T.J. Watson Research Center United States

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：