检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

488 篇 会议
16 篇 期刊文献
9 册 图书

馆藏范围

513 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

211 篇 工学
- 197 篇 计算机科学与技术...
- 98 篇 软件工程
- 42 篇 电气工程
- 12 篇 电子科学与技术（可...
- 8 篇 信息与通信工程
- 6 篇 控制科学与工程
- 6 篇 生物工程
- 4 篇 机械工程
- 3 篇 力学（可授工学、理...
- 3 篇 化学工程与技术
- 1 篇 材料科学与工程（可...
- 1 篇 冶金工程
- 1 篇 动力工程及工程热...
- 1 篇 土木工程
- 1 篇 水利工程
- 1 篇 生物医学工程（可授...
55 篇 理学
- 35 篇 数学
- 6 篇 生物学
- 4 篇 系统科学
- 3 篇 物理学
- 3 篇 化学
- 1 篇 大气科学
- 1 篇 统计学（可授理学、...
16 篇 管理学
- 9 篇 管理科学与工程(可...
- 7 篇 图书情报与档案管...
- 5 篇 工商管理
2 篇 经济学
- 2 篇 应用经济学
2 篇 法学
- 2 篇 社会学

主题

97 篇 programming
82 篇 parallel process...
79 篇 parallel archite...
71 篇 parallel program...
63 篇 concurrent compu...
59 篇 computer archite...
46 篇 hardware
43 篇 computational mo...
39 篇 programming prof...
39 篇 algorithm design...
36 篇 parallel algorit...
34 篇 computer science
26 篇 dynamic programm...
24 篇 runtime
24 篇 heuristic algori...
22 篇 program processo...
22 篇 partitioning alg...
21 篇 costs
21 篇 instruction sets
21 篇 clustering algor...

机构

6 篇 school of comput...
5 篇 school of comput...
4 篇 school of comput...
3 篇 department of co...
3 篇 department of co...
3 篇 college of infor...
3 篇 school of electr...
2 篇 department of co...
2 篇 univ minnesota d...
2 篇 school of scienc...
2 篇 soochow univ sch...
2 篇 college of optoe...
2 篇 school of comput...
2 篇 beijing research...
2 篇 department of co...
2 篇 school of comput...
2 篇 department of co...
2 篇 department of co...
2 篇 department of co...
2 篇 vision computing...

作者

9 篇 zhong cheng
6 篇 cheng zhong
6 篇 jigang wu
5 篇 hui li
4 篇 shikai guo
4 篇 yeh-cheng chen
4 篇 zhang jinxiong
4 篇 yidong li
4 篇 rong chen
4 篇 sivasankaran raj...
4 篇 hong shen
3 篇 chen danyang
3 篇 wei liu
3 篇 liu jun
3 篇 ruey-shun chen
3 篇 wang shunxu
3 篇 rajamanickam siv...
3 篇 naixue xiong
3 篇 guangzhong sun
3 篇 tonglai liu

语言

510 篇 英文
2 篇 其他
2 篇 中文

检索条件"任意字段=Seventh International Symposium on Parallel Architectures, Algorithms and Programming"

共 513 条记录，以下是131-140 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

parallel Implementation Strategies for Hierarchical Non-Uniform Memory Access Systems by Example of the Scale-Invariant Feature Transform Algorithm 30

Parallel Implementation Strategies for Hierarchical Non-Unif...

引用

30th IEEE international parallel and Distributed Processing symposium (IPDPS)

作者： Plauth, Max Hagen, Wieland Feinbube, Frank Eberhardt, Felix Feinbube, Lena Polze, Andreas Univ Potsdam Hasso Plattner Inst Software Syst Engn Operating Syst & Middleware Grp Potsdam Germany

ISBN: (纸本)9781509036820

The domains of parallel and distributed computing have been converging continuously up to the degree that state-of-the-art server computer systems incorporate characteristics from both domains: They comprise a hierarchy of enclosures, where each enclosure houses multiple processor sockets and each socket again contains multiple memory controllers. A global address space and cache coherency are facilitated using multiple layers of fast interconnection technologies even across enclosures. The growing popularity of such systems creates an urge for efficient mappings of cardinal algorithms onto such hierarchical architectures. However, the growing complexity of such systems and the inconsistencies between implementation strategies of different hardware vendors make it increasingly harder to do find efficient mapping strategies that are universally valid. In this paper, we present scalable optimization and mapping strategies in a case study of the popular Scale-Invariant Feature Transform (SIFT) computer vision algorithm. Our approaches are evaluated using a state-of-the-art hierarchical Non-Uniform Memory Access (NUMA) system with 240 physical cores and 12 terabytes of memory, apportioned across 16 NUMA nodes (sockets). SIFT is particularly interesting since the algorithm utilizes a variety of common data access patterns, thus allowing us to discuss the scaling properties of optimization strategies from the distributed and parallel computing domains and their applicability on emerging server systems.

关键词： parallel programming Memory management Multiprocessor interconnection networks Network topology

来源：评论

学校读者我要写书评

暂无评论

Minimal Aggregated Shared Memory Messaging on Distributed Memory Supercomputers 30

Minimal Aggregated Shared Memory Messaging on Distributed Me...

引用

30th IEEE international parallel and Distributed Processing symposium (IPDPS)

作者： Jamroz, Benjamin F. Dennis, John M. Natl Ctr Atmospher Res Computat Informat Syst Lab Boulder CO 80301 USA

ISBN: (纸本)9781509021406

Many high-performance distributed memory applications rely on point-to-point messaging using the Message Passing Interface (MPI). Due to the latency of the network, and other costs, this communication can limit the scalability of an application when run on high node counts of distributed memory supercomputers. Communication costs are further increased on modern multi- and many-core architectures, when using more than one MPI process per node, as each process sends and receives messages independently, inducing multiple latencies and contention for resources. In this paper, we use shared memory constructs available in the MPI 3.0 standard to implement an aggregated communication method to minimize the number of inter-node messages to reduce these costs. We compare the performance of this Minimal Aggregated SHared Memory (MASHM) messaging to the standard point-to-point implementation on large-scale supercomputers, where we see that MASHM leads to enhanced strong scalability of a weighted Jacobi relaxation. For this application, we also see that the use of shared memory parallelism through MASHM and MPI 3.0 can be more efficient than using Open Multi-Processing (OpenMP). We then present a model for the communication costs of MASHM which shows that this method achieves its goal of reducing latency costs while also reducing bandwidth costs. Finally, we present MASHM as an open source library to facilitate the integration of this efficient communication method into existing distributed memory applications.

关键词： Scalability parallel algorithms parallel programming

来源：评论

学校读者我要写书评

暂无评论

Detecting, Exposing, and Classifying Sequential Consistency Violations 27

Detecting, Exposing, and Classifying Sequential Consistency ...

引用

27th IEEE international symposium on Software Reliability Engineering (ISSRE)

作者： Islam, Mohammad Majharul Muzahid, Abdullah Univ Texas San Antonio San Antonio TX 78249 USA

ISBN: (纸本)9781467390026

Sequential Consistency (SC) is the most intuitive memory model for parallel programs. However, modern architectures aggressively reorder and overlap memory accesses, causing SC violations. An SC violation is virtually always a bug. Most prior schemes either search the entire state space of a program, or use a constraint solver to find SC violations. A promising recent scheme uses active testing technique but fails to be effective for SC violations involving larger number of threads and variables, and larger codebases. We propose Orion, the first active testing technique that can detect, expose, and classify any arbitrary SC violations in any program. Orion works in two phases. In the first phase, it finds potential SC violation cycles by focusing on racing accesses. In the second phase, it exposes each SC violation cycle by enforcing the exact scheduling order. We present a detailed design of Orion in the paper. We tested different concurrent algorithms, bug kernels, SPLASH2, PARSEC applications, and an open source program, Apache. We experimented with TSO and PSO memory models. We detected and exposed 60 SC violations of which 15 violations involve more than two processors and variables. Orion exposes SC violations quickly and with high probability. Compared to a state-of-the-art active testing technique, it has a much better SC violation detection ability.

关键词： Memory model Sequential consistency Active testing parallel programming

来源：评论

学校读者我要写书评

暂无评论

A Comparison of High-Level programming Choices for Incomplete Sparse Factorization Across Different architectures

A Comparison of High-Level Programming Choices for Incomplet...

引用

IEEE international symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Joshua Dennis Booth Kyungjoo Kim Sivasankaran Rajamanickam Center for Computing Research Sandia National Laboratories Albuquerque NM USA

ISBN: (纸本)9781509036837

All many-core systems require fine-grained shared memory parallelism, however the most efficient way to extract such parallelism is far from trivial. Fine-grained parallel algorithms face various performance trade-offs related to tasking, accesses to global data-structures, and use of shared cache. While programming models provide high level abstractions, such as data and task parallelism, algorithmic choices still remain open on how to best implement irregular algorithms, such as sparse factorizations, while taking into account the trade-offs mentioned above. In this paper, we compare these performance trade-offs for task and data parallelism on different hardware architectures such as Intel Sandy Bridge, Intel Xeon Phi, and IBM Power8. We do this by comparing the scaling of a new task-parallel incomplete sparse Cholesky factorization called Tacho and a new data-parallel incomplete sparse LU factorization called Basker. Both solvers utilize Kokkos programming model and were developed within the ShyLU package of Trilinos. Using these two codes we demonstrate how high-level programming changes affect performance and overhead costs on multiple multi/many-core systems. We find that Kokkos is able to provide comparable performance with both parallel_for and task/futures on traditional x86 multicores. However, the choice of which high-level abstraction to use on many-core systems depends on both the architectures and input matrices.

关键词： parallel processing Sparse matrices programming Multicore processing Program processors Hardware

来源：评论

学校读者我要写书评

暂无评论

An Electric Power Big Data Deployment Solution For Distributed Memory Computing 7

An Electric Power Big Data Deployment Solution For Distribut...

引用

seventh international symposium on parallel architectures, algorithms and programming

作者： Yang, Zhi Zhang, Chunping Hu, Mu Lin, Feng State Grid Elect Power Sci Res Inst Nanjing Jiangsu Peoples R China

ISBN: (纸本)9781467391160

In the Big Data computing, improving performance with memory computing is one of hot spots. In the memory computing, the data deployment directly affects load balance and task efficiency. In the scene of memory computing of electric power data, two unsolved problems are: (1) only memory space, without the CPU frequency and nuclear number, could be considered for load balance and improving performance;(2) there are so many manual operations that it is difficult to complete data deployment automatically. This paper provides an electric power data deployment solution for distributed memory computing to solve the above challenges. In the solution, according to business logic and hardware configuration of cluster nodes, the data deployment strategy can be established. Then, the deployment scheme can be implemented with interface operation. Lastly, cluster nodes load data according to the deployment scheme. The solution has been applied to the Objectification parallel Computing (OPC). The application result shows that OPC can achieve the best performance which can meet the demand of system efficiency and the operation of data deployment is simple.

关键词： component: Big Data Distributed Memory Computing Objectification parallel Computing Data Deployment strategy

来源：评论

学校读者我要写书评

暂无评论

Improvement of Workload Balancing Using parallel Loop Self-Scheduling on Xeon Phi 7

Improvement of Workload Balancing Using Parallel Loop Self-S...

引用

seventh international symposium on parallel architectures, algorithms and programming

作者： Huang, Chao-Wei Kuo, Chan-Fu Yang, Chao-Tung Liu, Jung-Chun Chen, Shuo-Tsung Tunghai Univ Dept Comp Sci Taichung 40704 Taiwan

ISBN: (纸本)9781467391160

In this paper, we will examine how to improve workload balancing on a computing cluster by a parallel loop self-scheduling scheme. We use hybrid MPI and OpenMP parallel programming in C language. The block partition loop is according to the performance weighting of compute nodes. This study implements parallel loop self-scheduling use Xeon Phi, with its characteristics to improve workload balancing between heterogeneous nodes. The parallel loop self-scheduling is composed of the static and dynamic allocation. A weighting algorithm is adopted in the static part while the well-known loop self-scheduling scheme is adopted in the dynamic part. In recent years, Intel promotes its new product Xeon Phi coprocessor, which is similar to the x86 architecture coprocessor. It has about 60 cores and can be regarded as a single computing node, with the computing power that cannot be ignored. In our experiment, we will use a plurality of computing nodes. We compute four applications, i.e., matrix multiplication, sparse matrix multiplication, Mandelbrot set computation, and the circuit satisfiability problem. Our results will show how to do the weight allocation and how to choose a scheduling scheme to achieve the best performance in the parallel loop self-scheduling.

关键词： Xeon Phi Many-core OpenMP MPI parallel Loop Self-Scheduling

来源：评论

学校读者我要写书评

暂无评论

LinROS: A Linux-Based Runtime System for Reconfigurable MPSoCs

LinROS: A Linux-Based Runtime System for Reconfigurable MPSo...

引用

IEEE international symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Jens Rettkowski Philipp Wehner Evgheni Cutiscev Diana Göhringer Application-Specific Multi-Core Architectures (MCA) Group Ruhr-University Bochum (RUB) Bochum Germany

Modern FPGA-based Multiprocessor Systems-on-Chip (MPSoCs) support dynamic reconfiguration of processing elements (PEs) such as processors and accelerators. The reconfiguration improves the flexibility of the system due to the dynamic and partial exchange of PEs. However, the design of reconfigurable MPSoCs leads also to higher complexity in programming due to the huge design space. To bridge this gap, this paper presents a runtime system consisting of a software layer called LinROS. It dynamically schedules and reconfigures PEs. LinROS uses a novel Linux device driver that automatically manages the software and hardware of the reconfigurable MPSoC at runtime. In addition, an IP core developed for LinROS facilitates an easy hardware integration of PEs using High-Level-Synthesis tools such as VivadoHLS. Data exchange between PEs is managed by the IP core. The entire system is evaluated on a Xilinx Zynq SoC performing image processing algorithms. The results show a negligible overhead of the scheduling while the programming of reconfigurable MPSoCs is significantly simplified by the device driver. Furthermore, the hardware design of PEs is also simplified due to the High-Level Synthesis.

关键词： Program processors Hardware Linux Operating systems Runtime programming

来源：评论

学校读者我要写书评

暂无评论

High-performance Graph Analytics on Manycore Processors 29

High-performance Graph Analytics on Manycore Processors

引用

29th IEEE international parallel and Distributed Processing symposium (IPDPS)

作者： Slota, George M. Rajamanickam, Sivasankaran Madduri, Kamesh Penn State Univ Comp Sci & Engn University Pk PA 16802 USA Sandia Natl Labs Scalable Algorithms Dept Albuquerque NM 87185 USA

ISBN: (纸本)9781479986484

The divergence in the computer architecture landscape has resulted in different architectures being considered mainstream at the same time. For application and algorithm developers, a dilemma arises when one must focus on using underlying architectural features to extract the best performance on each of these architectures, while writing portable code at the same time. We focus on this problem with graph analytics as our target application domain. In this paper, we present an abstraction-based methodology for performance-portable graph algorithm design on manycore architectures. We demonstrate our approach by systematically optimizing algorithms for the problems of breadth-first search, color propagation, and strongly connected components. We use Kokkos, a manycore library and programming model, for prototyping our algorithms. Our portable implementation of the strongly connected components algorithm on the NVIDIA Tesla K40M is up to 3.25x faster than a state-of-the-art parallel CPU implementation on a dual-socket Sandy Bridge compute node.

关键词： graph computations BFS color propagation GPU parallel performance portability

来源：评论

学校读者我要写书评

暂无评论

Exploiting recent SIMD architectural advances for irregular applications

Exploiting recent SIMD architectural advances for irregular ...

引用

international symposium on Code Generation and Optimization (CGO)

作者： Linchuan Chen Peng Jiang Gagan Agrawal The Ohio State University Columbus OH USA

ISBN: (纸本)9781509042456

A broad class of applications involve indirect or data-dependent memory accesses and are referred to as irregular applications. Recent developments in SIMD architectures - specifically, the emergence of wider SIMD lanes, combination of SIMD parallelism with many-core MIMD parallelism, and more flexible programming APIs - are providing new opportunities as well as challenges for this class of applications. In this paper, we propose a general optimization methodology, to effectively optimize different subclasses of irregular applications. Based on the observation that all applications with indirect memory accesses can be viewed as sparse matrix computations, we design an optimization methodology, which includes three sub-steps: 1) locality enhancement through tiling, 2) data access pattern identification, and 3) write conflict removal at both SIMD and MIMD levels. This method has been applied to unstructured grids, molecular dynamics, and graph applications, in addition to sparse matrix computations. The speedups achieved by our single threaded vectorized code over serial code is up to 9.05, whereas the overall speedup while utilizing both SIMD and MIMD (61 cores in Intel Xeon Phi) with our approach is up to 467.1. Further optimization using matrix reordering on irregular reductions and graph algorithms is able to achieve an incremental speedup of up to 1.69, though at a relatively high preprocessing cost. Moreover, SpMM using our approach outperforms routines from a highly optimized commercial library by up to 2.81×.

关键词： Sparse matrices parallel processing Instruction sets Arrays programming Hardware Optimization

来源：评论

学校读者我要写书评

暂无评论

Keynotes

Keynotes

引用

international symposium on parallel architectures, algorithms and programming (PAAP)

作者： Guoliang Chen Teofilo Gonzalez Shenzhen Univ. Shenzhen China Univ. of California CA USA

These keynote discusses the following: parallel and Interactive Computing of Big Data; Approximation algorithms: Methodologies, Applications and Empirical Evaluation.

关键词： big data Approximation algorithms Interactive parallel Lines

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共52页 << < 10 11 12 13 14 15 16 17 18 19 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：