检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

59 篇 会议

馆藏范围

59 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

41 篇 工学
- 37 篇 计算机科学与技术...
- 16 篇 软件工程
- 7 篇 电气工程
- 2 篇 电子科学与技术（可...
- 2 篇 信息与通信工程
- 1 篇 机械工程
- 1 篇 光学工程
- 1 篇 材料科学与工程（可...
- 1 篇 动力工程及工程热...
- 1 篇 建筑学
- 1 篇 生物工程
- 1 篇 安全科学与工程
10 篇 理学
- 8 篇 数学
- 2 篇 物理学
- 2 篇 统计学（可授理学、...
- 1 篇 生物学
- 1 篇 系统科学
5 篇 管理学
- 4 篇 管理科学与工程(可...
- 2 篇 工商管理
- 1 篇 图书情报与档案管...

主题

5 篇 scheduling
4 篇 parallel archite...
4 篇 computer archite...
3 篇 reconfigurable a...
3 篇 delay
2 篇 computer science
2 篇 scalability
2 篇 registers
2 篇 asynchronous var...
2 篇 static analysis
2 篇 dependence graph
2 篇 multicore archit...
2 篇 costs
2 篇 computer program...
2 篇 high performance...
2 篇 program compiler...
2 篇 hardware
2 篇 field programmab...
2 篇 dynamic scheduli...
2 篇 design methodolo...

机构

2 篇 univ texas austi...
2 篇 ohio state univ ...
2 篇 univ calif berke...
1 篇 math. & computer...
1 篇 department of co...
1 篇 univ texas austi...
1 篇 univ politecn ca...
1 篇 ohio state univ ...
1 篇 lawrence livermo...
1 篇 brookhaven natio...
1 篇 tsinghua univ ct...
1 篇 tsinghua univ tn...
1 篇 lawrence livermo...
1 篇 federal universi...
1 篇 univ jaume 1 dep...
1 篇 pacific nw natl ...
1 篇 univ passau inst...
1 篇 hunan univ coll ...
1 篇 department of el...
1 篇 university of ch...

作者

2 篇 song shuaiwen le...
1 篇 wang qinggang
1 篇 nassi ike
1 篇 bellini riccardo
1 篇 chan ernie
1 篇 rülke s
1 篇 dehnavi maryam m...
1 篇 chen hang
1 篇 cotet costel emi...
1 篇 lee yunsup
1 篇 du jiayi
1 篇 pilato christian
1 篇 zhang minjia
1 篇 kaya kamer
1 篇 li kenli
1 篇 zheng long
1 篇 h. corporaal
1 篇 barker kevin
1 篇 mehta gayatri
1 篇 jin ruoming

语言

56 篇 英文
3 篇 其他

检索条件"任意字段=Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures"

共 59 条记录，以下是31-40 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Annual acm symposium on parallelism in algorithms and architectures

Annual ACM Symposium on Parallelism in Algorithms and Archit...

引用

25th acm symposium on parallelism in algorithms and architectures, SPAA 2013

the proceedings contain 39 papers. the topics discussed include: Fast greedy algorithms in MapReduce and streaming;reduced hardware transactions: a new approach to hybrid transactional memory;recursive design of hardware priority queues;drop the anchor: lightweight memory management for non-blocking data structures;scalable statistics counters;storage and search in dynamic peer-to-peer networks;expected sum and maximum of displacement of random sensors for coverage of a domain;on dynamics in selfish network creation;brief announcement: truly parallel burrows-wheeler compression and decompression;brief announcement: locality in wireless scheduling;brief announcement: universally truthful secondary spectrum auctions;and brief announcement: online batch scheduling for flow objectives.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Efficient Software-based Runtime Binary Translation for Coarse-Grained Reconfigurable architectures 28

Efficient Software-based Runtime Binary Translation for Coar...

引用

28th IEEE International Parallel & Distributed Processing symposium Workshops (IPDPSW)

作者： Mai, Toan X. Lee, Jongeun UNIST Sch ECE Ulsan South Korea

ISBN: (纸本)9781479941162

the increasing use of runtime-compiled applications provides an opportunity for coarse-grained reconfigurable architecture (CGRA) accelerators to be used in a user-transparent way. the challenge is to provide efficient runtime translation for CGRAs. Despite the apparent difficulties stemming from the heterogeneous nature of CGRAs, this paper demonstrates that it is possible to speed up runtime-compiled applications using CGRAs in a runtime-complied way. In particular this paper presents a runtime translation framework, called RBTVM, for CGRA accelerators, based on the LLVM Just-In-Time (JIT) compiler. Also two optimizations for the RBTVM are proposed. Experimental results show that the proposed RBTVM approach can improve the performance of runtime-compiled applications by 1.44 times on average compared to using the baseline JIT compiler only that does not take advantage of the accelerator, demonstrating the efficacy of the proposed approach.

关键词： Program compilers

来源：评论

学校读者我要写书评

暂无评论

SWIFT: A Transparent and Flexible Communication Layer for PCIe-coupled Accelerators and (Co-)Processors 28

SWIFT: A Transparent and Flexible Communication Layer for PC...

引用

28th IEEE International Parallel & Distributed Processing symposium Workshops (IPDPSW)

作者： Pickartz, Simon Reble, Pablo Clauss, Carsten Lankes, Stefan Rhein Westfal TH Aachen Inst Automat Complex Power Syst Aachen Germany Rhein Westfal TH Aachen Fac Elect Engn & Informat Technol Aachen Germany ParTec Cluster Competence Ctr GmbH Munich Germany

ISBN: (纸本)9781479941162

the Peripheral Component Interconnect Express (PCIe) is the predominant interconnect enabling the CPU to communicate with attached input/output and storage devices. Considering its high performance and capabilities to connect different address domains via the so-called Non-Transparent Bridging (NTB) technology, it starts to be an alternative or addition to traditional interconnects. the PCIe technology enables devices to communicate in a peer-to-peer manner allowing for new implementation possibilities of tomorrow's high-performance systems. Components being attached to the same computer rack are connected by means of PCIe and the racks themselves by using traditional network technologies. this leads to a heterogeneous landscape of compute nodes and high-performance interconnects. the Socket Wheeled Intelligent Fabric Transport (SWIFT) takes up the challenge of programming these systems. the presented implementation is highly portable due to a hardware abstraction layer allowing for bringing the implemented concepts to new interconnects with minimal effort. It is evaluated on a test system exposing different compute nodes equipped with coprocessors, which take part in a PCIe non-transparent bridging architecture. Besides low-level benchmarks investigating principal performance characteristics of the communication layer, MPI benchmark results are presented illustrating how scientific applications may be ported to heterogeneous environments.

关键词： Heterogeneous PCIe-coupled Compute Nodes PCI Express non-transparent bridging architectures PCIe-based accelerators and coprocessors

来源：评论

学校读者我要写书评

暂无评论

Acceleration of GPU-based ultrasound simulation via data compression 28

Acceleration of GPU-based ultrasound simulation via data com...

引用

28th IEEE International Parallel & Distributed Processing symposium Workshops (IPDPSW)

作者： Haigh, Andrew A. McCreath, Eric C. Australian Natl Univ Res Sch Comp Sci Canberra ACT Australia

ISBN: (纸本)9781479941162

the realistic simulation of ultrasound wave propagation is computationally intensive. the large size of the grid and low degree of reuse of data means that it places a great demand on memory bandwidth. Graphics Processing Units (GPUs) have attracted attention for performing scientific calculations due to their potential for efficiently performing large numbers of floating point computations. However, many applications may be limited by memory bandwidth, especially for data sets whose size is larger than that of the GPU platform. this problem is only partially mitigated by applying the standard technique of breaking the grid into regions and overlapping the computation of one region with the host-device memory transfer of another. In this paper, we implement a memory-bound GPU-based ultrasound simulation and evaluate the use of a technique for improving performance by compressing the data into a fixed-point representation that reduces the time required for inter-host- device transfers. We demonstrate a speedup of 1.5 times on a simulation where the data is broken into regions that must be copied back and forth between the CPU and GPU. We develop a model that can be used to determine the amount of temporal blocking required to achieve near optimal performance, without extensive experimentation. this technique may also be applied to GPU-based scientific simulations in other domains such as computational fluid dynamics and electromagnetic wave simulation.

关键词： GPGPU Data compression Parallel architectures Memory architecture Nonlinear acoustics

来源：评论

学校读者我要写书评

暂无评论

A Hybrid Parallel Tridiagonal Solver on Multi-core architectures 28

A Hybrid Parallel Tridiagonal Solver on Multi-core Architect...

引用

28th IEEE International Parallel & Distributed Processing symposium Workshops (IPDPSW)

作者： Tang, Guangping Li, Kenli Li, Keqin Chen, Hang Du, Jiayi Hunan Univ Coll Informat Sci & Engn Changsha 410082 Hunan Peoples R China State Univ New York New Paltz Dept Comp Sci New Paltz NY 12561 USA

ISBN: (纸本)9781479941162

An optimized parallel algorithm is proposed to solve the problem occurred in the process of complicated backward substitution of cyclic reduction during solving tridiagonal linear systems. Adopting a hybrid parallel model, this algorithm combines the cyclic reduction method and the partition method. this hybrid algorithm has simple backward substitution on parallel computers comparing with the cyclic reduction method. In this paper, the operation count and execution time are obtained to evaluate and make comparison for these methods. On the basis of results of these measured parameters, the hybrid algorithm using the hybrid approach with a multi-threading implementation achieves better efficiency than the other parallel methods, i.e., the cyclic reduction and the partition methods. Among them, the cyclic reduction method is previously found to be the fastest algorithm in many ways for solutions. In particular, the approach involved in this paper has the least scalar operation count and the shortest execution time on multi-core computer when the size of an equation is large enough. the hybrid parallel algorithm improves the performance of the cyclic reduction and partition methods by 30% and 20% respectively.

关键词： parallel tridiagonal hybrid algorithm multi-threading

来源：评论

学校读者我要写书评

暂无评论

Parallelization of asynchronous variational integrators forshared memory architectures 14

Parallelization of asynchronous variational integrators fors...

引用

proceedings of the 26th acm symposium on parallelism in algorithms and architectures

作者： M. Amber Hassaan Donald Nguyen Keshav Pingali The University of Texas at Austin Austin TX USA

ISBN: (纸本)9781450328210

Asynchronous variational integrators (AVIs) are used in computational mechanics and graphics to solve complex contact mechanics problems. the parallelization of AVI is difficult problem because it is not possible to build a dependence graph for AVI either at compile-time or at runtime. However, we show that if the dependence graph for AVI can be updated incrementally as the computation is performed, it is possible to parallelize AVI in a systematic way. Using this approach, we are able to obtain speedups of up to 20 on 24 cores for relatively small AVI problems.

关键词： dependence graph scheduling asynchronous variational integrators

来源：评论

学校读者我要写书评

暂无评论

MIC-SVM: Designing a Highly Efficient Support Vector Machine for Advanced Modern Multi-core and Many-Core architectures

MIC-SVM: Designing a Highly Efficient Support Vector Machine...

引用

International symposium on Parallel and Distributed Processing (IPDPS)

作者： Yang You Shuaiwen Leon Song Haohuan Fu Andres Marquez Maryam Mehri Dehnavi Kevin Barker Kirk W. Cameron Amanda Peters Randles Guangwen Yang Tsinghua National Laboratory for Information Science and Technology (TNList) Tsinghua University Beijing China Pacific Northwest National Lab Richland WA USA Ministry of Education Key Laboratory for Earth System Modeling Tsinghua University Beijing China MIT Computer Science and Artificial Intelligence Laboratory Cambridge MA USA Virignia Tech Blacksburg VA USA Lawrence Livermore National Lab Livermore CA USA

ISBN: (纸本)9781479938018

Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. Advanced multi- and many-core architectures offer massive parallelism with complex memory hierarchies which can make runtime training possible, but form a barrier to efficient parallel SVM design. To address the challenges above, we designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multi-core and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools. MIC-SVM achieves 4.4-84x and 18-47x speedups against the popular LIBSVM, on MIC and Ivy Bridge CPUs respectively, for several real-world data-mining datasets. Even compared with GPUSVM, run on a top of the line NVIDIA k20x GPU, the performance of our MIC-SVM is competitive. We also conduct a cross-platform performance comparison analysis, focusing on Ivy Bridge CPUs, MIC and GPUs, and provide insights on how to select the most suitable advanced architectures for specific algorithms and input data patterns.

关键词： Support vector machines Training Computer architecture Parallel processing Microwave integrated circuits Equations Bridges

来源：评论

学校读者我要写书评

暂无评论

Smart cities software architectures: A survey 13

Smart cities software architectures: A survey

引用

28th Annual acm symposium on Applied Computing, SAC 2013

作者： Da Silva, Welington M. Tomas, Gustavo H.R.P. Dias, Kelvin L. Alvaro, Alexandre Afonso, Ricardo A. Garcia, Vinicius C. Informatics Center Federal University of Pernambuco Recife Center for Advanced Studies and Systems - C.E.S.A.R. Brazil Federal University of São Carlos Campus Sorocaba Brazil Federal University of Alagoas Campus Arapiraca Brazil

ISBN: (纸本)9781450316569

the smart cities concept arises from the need to manage, automate, optimize and explore all aspects of a city that could be improved. For this purpose it is necessary to build a robust architecture that satisfies a minimal number of requirements such as distributed sensing, integrated management and flexibility. Several architectures have been proposed with different goals, but none of them met satisfactorily the needs that permeate smart cities. In this work various architectures are discussed, highlighting the main requirements that they aim to fulfill. Furthermore, based on different architectures with the most varied purposes, a set of requirements for the implementation of a smart city is presented and discussed. Copyright 2013 acm.

关键词： Internet of things

来源：评论

学校读者我要写书评

暂无评论

HiDP: A Hierarchical Data Parallel Language

HiDP: A Hierarchical Data Parallel Language

引用

11th IEEE/acm International symposium on Code Generation and Optimization (CGO)

作者： Zhang, Yongpeng Mueller, Frank N Carolina State Univ Raleigh NC 27695 USA

ISBN: (纸本)9781467355254;9781467355247

Problem domains are commonly decomposed hierarchically to fully utilize parallel resources in modern microprocessors. Such decompositions can be provided as library routines, written by experienced experts, for general algorithmic patterns. But such APIs tend to be constrained to certain architectures or data sizes. Integrating them with application code is often an unnecessarily daunting task, especially when these routines need to be closely coupled with user code to achieve better performance. this paper contributes HiDP, a high-level hierarchical data parallel language. the purpose of HiDP is to improve the coding productivity of integrating hierarchical data parallelism without significant loss of performance. HiDP is a source-to-source compiler that converts a very concise data parallel language into CUDA C++ source code. Internally, it performs necessary analysis to compose user code with efficient and architecture-aware code snippets. this paper discusses various aspects of HiDP systematically: the language, the compiler and the run-time system with built-in tuning capabilities. they enable HiDP users to express algorithms in less code than low-level SDKs require for native platforms. HiDP also exposes abundant computing resources of modern parallel architectures. Improved coding productivity tends to come with a sacrifice in performance. Yet, experimental results show that the generated code delivers performance very close to handcrafted native GPU code.

关键词： C++ language application program interfaces graphics processing units microprocessor chips parallel architectures parallel languages program compilers source coding

来源：评论

学校读者我要写书评

暂无评论

Convergence and Scalarization for Data-Parallel architectures

Convergence and Scalarization for Data-Parallel Architecture...

引用

11th IEEE/acm International symposium on Code Generation and Optimization (CGO)

作者： Lee, Yunsup Krashinsky, Ronny Grover, Vinod Keckler, Stephen W. Asanovic, Krste Univ Calif Berkeley Berkeley CA 94720 USA NVIDIA Santa Clara CA USA Univ Texas Austin Austin TX 78712 USA

ISBN: (纸本)9781467355254;9781467355247

Modern throughput processors such as GPUs achieve high performance and efficiency by exploiting data parallelism in application kernels expressed as threaded code. One draw-back of this approach compared to conventional vector architectures is redundant execution of instructions that are common across multiple threads, resulting in energy inefficiency due to excess instruction dispatch, register file accesses, and memory operations. this paper proposes to alleviate these overheads while retaining the threaded programming model by automatically detecting the scalar operations and factoring them out of the parallel code. We have developed a scalarizing compiler that employs convergence and variance analyses to statically identify values and instructions that are invariant across multiple threads. Our compiler algorithms are effective at identifying convergent execution even in programs with arbitrary control flow, identifying two-thirds of the opportunity captured by a dynamic oracle. the compile-time analysis leads to a reduction in instructions dispatched by 29%, register file reads and writes by 31%, memory address counts by 47%, and data access counts by 38%.

关键词： CUDA GPU Scalarization

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共6页 << < 1 2 3 4 5 6 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：