检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

15 篇 会议
1 篇 期刊文献

馆藏范围

16 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

10 篇 工学
- 8 篇 计算机科学与技术...
- 7 篇 软件工程
- 2 篇 电气工程
- 2 篇 电子科学与技术（可...
- 1 篇 材料科学与工程（可...
- 1 篇 信息与通信工程
- 1 篇 航空宇航科学与技...
- 1 篇 生物工程
2 篇 理学
- 1 篇 数学
- 1 篇 生物学

主题

2 篇 performance
2 篇 parallel archite...
2 篇 cache-oblivious ...
2 篇 mpi_alltoall
2 篇 many-core
1 篇 global virtual t...
1 篇 parallel algorit...
1 篇 shared memory ar...
1 篇 concurrent compu...
1 篇 computer hardwar...
1 篇 multi-core archi...
1 篇 green computing ...
1 篇 compression algo...
1 篇 computer archite...
1 篇 graphics process...
1 篇 network topology
1 篇 coarse-grained r...
1 篇 energy aware sys...
1 篇 defects
1 篇 time sharing

机构

1 篇 univ calif river...
1 篇 suny binghamton ...
1 篇 university of mi...
1 篇 computer enginee...
1 篇 certco 55 broad ...
1 篇 royal inst techn...
1 篇 tallinn univ tec...
1 篇 turku ctr comp s...
1 篇 indian inst tech...
1 篇 sandia national ...
1 篇 univ turku turku
1 篇 eth zurich
1 篇 computer technol...
1 篇 karlsruhe instit...
1 篇 lab-sticc lorien...
1 篇 northeastern uni...
1 篇 stanford univ co...
1 篇 univ illinois de...
1 篇 department of co...
1 篇 univ massachuset...

作者

1 篇 subramoni hari
1 篇 martin eric
1 篇 ellervee peeter
1 篇 tenhunen hannu
1 篇 briki aroua
1 篇 holcomb daniel
1 篇 torsten hoefler
1 篇 li shigang
1 篇 sreedharan deepa...
1 篇 hoefler torsten
1 篇 thakur dushyant
1 篇 yunquan zhang
1 篇 ruppa k. thulasi...
1 篇 hemani ahmed
1 篇 park jongsoo
1 篇 dally william j.
1 篇 chiu kenneth
1 篇 agha gul
1 篇 meilian xu
1 篇 plosila juha

语言

16 篇 英文

检索条件"任意字段=22nd ACM Symposium on Parallelism in Algorithms and Architectures"

共 16 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

A design approach dedicated to network-based and conflict-free parallel interleavers 12

A design approach dedicated to network-based and conflict-fr...

引用

22nd Great Lakes symposium on VLSI, GLSVLSI'2012

作者： Briki, Aroua Chavet, Cyrille Coussy, Philippe Martin, Eric Lab-STICC Lorient France

ISBN: (纸本)9781450312448

For high throughput applications, efficient parallel architectures require to avoid collision accesses, i.e. concurrent read/write accesses to the same memory bank have to be avoided. This consideration applies for example to the two main classes of turbo-like codes that are Low Density Parity Check and Turbo- Codes. These error correcting codes, that scramble data by using an interleaving law, are used in most of recent communication standards and storage systems like wireless access, digital video broadcasting or magnetic storage in hard disk drives. In order to optimize the architectural cost and to reduce the control complexity of such integrated circuits, designers usually use standard interconnection networks with low complexity topologies between processing elements and memory banks. However the design constraints, i.e. interleaving law, parallelism and interconnection network, often prevent mapping the data in the memory banks without any conflict. In this paper we propose a methodology which always finds a collision-free memory mapping for a given set of design constraints. The approach uses additional registers each time the design constraints forbid to use memory banks without conflict. Our approach is compared to state of the art methods and its interest is shown through the design of parallel interleavers for industrial applications: Multi Band-Orthogonal Frequency-Division Multiplexing Ultra- WideBand (MB-OFDM UWB) and non-binary LDPC decoders. Copyright 2012 acm.

关键词： Parallel architectures

来源：评论

学校读者我要写书评

暂无评论

An Improved Clocking Methodology for Energy Efficient Low Area AES architectures using Register Renaming 22

An Improved Clocking Methodology for Energy Efficient Low Ar...

引用

22nd IEEE/acm International symposium on Low Power Electronics and Design (ISLPED)

作者： Dhanuskodi, Siva Nishok Holcomb, Daniel Univ Massachusetts Dept Elect & Comp Engn Amherst MA 01003 USA

ISBN: (纸本)9781509060238

Sub-round implementations of AES have been explored as an area and energy efficient solution to encrypt data in resource constrained applications such as the Internet of Things. Symmetry in AES operations across bytes and words allows the datapath to be scaled down to 8 bits resulting in very compact designs. However, such designs incur an area penalty to store intermediate results or energy penalty to shift data through registers without performing useful computation. We propose a smart clocking scheme and rename registers to minimize data movement and clock loading, and also avoid storing a duplicate copy of the system state. In comparison to the most efficient 8-bit implementation from literature, we save 45% energy per encryption and reduce clock energy by 70% at a reasonable area cost.

关键词： Estimation Field programmable gate arrays OFDM Modulation Filtration Clocks Signal processing algorithms

来源：评论

学校读者我要写书评

暂无评论

POSTER: Cache-Oblivious MPI All-to-All Communications on Many-Core architectures 17

POSTER: Cache-Oblivious MPI All-to-All Communications on Man...

引用

Proceedings of the 22nd acm SIGPLAN symposium on Principles and Practice of Parallel Programming

作者： Shigang Li Yunquan Zhang Torsten Hoefler Institute of Computing Technology Chinese Academy of Sciences Beijing China ETH Zurich Zurich Switzerland

ISBN: (纸本)9781450344937

In the many-core era, the performance of MPI collectives is more dependent on the intra-node communication component. However, the communication algorithms generally inherit from the inter-node version and ignore the cache complexity. We propose cache-oblivious algorithms for MPI all-to-all operations, in which data blocks are copied into the receive buffers in Morton order to exploit data locality. Experimental results on different many-core architectures show that our cache-oblivious implementations significantly outperform the naive implementations based on shared heap and the highly optimized MPI libraries.

关键词： mpi_alltoall many-core cache-oblivious algorithms

来源：评论

学校读者我要写书评

暂无评论

“Dynamic-fault-prone BSP”: a paradigm for robust computations in changing environments 98

“Dynamic-fault-prone BSP”: a paradigm for robust computati...

引用

Proceedings of the tenth annual acm symposium on Parallel algorithms and architectures

作者： Spyros C. Kontogiannis Grammati E. Pantziou Paul G. Spirakis Moti Yung Computer Engineering and Informatics Department Patras University 26500 Rion Patras Greece and Computer Technology Institute Kolokotroni 3 26221 Patras Greece Computer Technology Institute Kolokotroni 3 26221 Patras Greece Certco 55 Broad St. 22nd suite New York NY

来源：评论

学校读者我要写书评

暂无评论

Exploiting Data Locality in FFT Using Indirect Swap Network on Cell/B.E.

Exploiting Data Locality in FFT Using Indirect Swap Network ...

引用

IEEE International symposium on High Performance Computing Systems and Applications (HPCS)

作者： Meilian Xu Parimala Thulasiraman Ruppa K. Thulasiram Department of Computer Science University of Manitoba Winnipeg MAN Canada

Communication and synchronization are two main latency issues in computing FFT on parallel architectures. Both latencies have to be either hidden or tolerated to achieve high performance. One approach to achieve this is by multithreading. Another approach to tolerate latency is to map data efficiently onto the processors' local memory and exploiting data locality. Indirect swap networks, an idea proposed in VLSI circuits can be efficiently used to compute the butterfly computations in FFT. Data mapping in the swap network topology reduces the communication overhead by half at each iteration. Cell broadband engine (Cell/B.E.)processor is a heterogeneous multicoreprocessor for stream data applications and high performance computing. Its eight SIMD processing elements, synergistic processor elements (SPEs), provide multi-folded parallelism. In this paper, we investigate the improved Cooley-Tukey FFT algorithm based on indirect swap network, and design the parallel algorithm taking into consideration all the features of the Cell/B.E. architecture. The performance results show that the new algorithm on Cell/B.E. is 3.7 faster than the cluster for 4K input data size and 6.4 faster than the cluster for 16K input data size at the processor level.

关键词： Delay Computer networks Clustering algorithms Concurrent computing Parallel architectures Multithreading Very large scale integration Circuits Network topology Engines

来源：评论

学校读者我要写书评

暂无评论

An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern architectures

An In-depth Performance Characterization of CPU- and GPU-bas...

引用

2017 Machine Learning in HPC Environments, MLHPC 2017

作者： Awan, Ammar Ahmad Subramoni, Hari Panda, Dhabaleswar K. Dept. of Computer Science and Engg Ohio State University United States

ISBN: (纸本)9781450351379

Traditionally, Deep Learning (DL) frameworks like Caffe, TensorFlow, and Cognitive Toolkit exploited GPUs to accelerate the training process. This has been primarily achieved by aggressive improvements in parallel hardware as well as through sophisticated software frameworks like cuDNN and cuBLAS. However, recent enhancements to CPU-based hardware and software has the potential to significantly enhance the performance of CPU-based DL training. In this paper, we provide a complete performance landscape of CPU- and GPU-based DNN training. We characterize performance of DNN training for AlexNet and ResNet-50 for a wide-range of CPU and GPU architectures including the latest Intel Xeon Phi (Knights Landing) processors and NVIDIA Pascal GPUs. We also present multi-node DNN training performance results for AlexNet and ResNet-50 using Intel Machine Learning Scaling (MLSL) Library and Intel-Caffe. In addition, we provide a CPU vs. GPU comparison for multi-node training using OSU-Caffe and Intel-Caffe. To the best of our knowledge, this is the first study that dives deeper into the performance of DNN training in a holistic manner yet provides an in-depth look at layer-wise performance for different DNNs. We provide multiple key insights: 1) Convolutions account for the majority of time (up to 83% time) consumed in DNN training, 2) GPU-based training continues to deliver excellent performance (up to 18% better than KNL) across generations of GPU hardware and software, and 3) Recent CPU-based optimizations like MKL-DNN and OpenMP-based thread parallelism leads to excellent speed-ups over under-optimized designs (up to 3.2X improvement for AlexNet training). © 2017 Association for Computing Machinery.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共2页 << < 1 2 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：