检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

5,038 篇 会议
1,444 篇 期刊文献
129 册 图书
75 篇 学位论文

馆藏范围

6,686 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

3,970 篇 工学
- 3,387 篇 计算机科学与技术...
- 2,002 篇 软件工程
- 990 篇 电气工程
- 237 篇 信息与通信工程
- 178 篇 电子科学与技术（可...
- 137 篇 控制科学与工程
- 66 篇 机械工程
- 52 篇 生物医学工程（可授...
- 52 篇 生物工程
- 44 篇 仪器科学与技术
- 32 篇 材料科学与工程（可...
- 30 篇 力学（可授工学、理...
- 28 篇 动力工程及工程热...
- 28 篇 土木工程
- 21 篇 光学工程
- 21 篇 石油与天然气工程
677 篇 理学
- 396 篇 数学
- 118 篇 物理学
- 87 篇 生物学
- 78 篇 系统科学
- 33 篇 化学
- 28 篇 统计学（可授理学、...
- 25 篇 地球物理学
355 篇 管理学
- 263 篇 管理科学与工程(可...
- 98 篇 图书情报与档案管...
- 62 篇 工商管理
68 篇 教育学
- 62 篇 教育学
59 篇 医学
- 44 篇 临床医学
- 22 篇 基础医学(可授医学...
30 篇 法学
- 27 篇 社会学
17 篇 农学
15 篇 经济学
12 篇 文学
6 篇 艺术学
4 篇 军事学

主题

6,686 篇 parallel program...
1,067 篇 concurrent compu...
1,005 篇 parallel process...
572 篇 programming prof...
482 篇 application soft...
466 篇 computer science
466 篇 computer archite...
401 篇 hardware
340 篇 message passing
334 篇 distributed comp...
320 篇 libraries
315 篇 computational mo...
248 篇 computer languag...
231 篇 high performance...
230 篇 program processo...
229 篇 runtime
198 篇 parallel archite...
196 篇 parallel algorit...
193 篇 yarn
179 篇 costs

机构

14 篇 carnegie mellon ...
13 篇 barcelona superc...
11 篇 brno university ...
11 篇 univ illinois de...
11 篇 school of comput...
11 篇 intel corporatio...
10 篇 univ pisa dept c...
10 篇 stanford univ st...
9 篇 school of applie...
9 篇 department of co...
9 篇 carnegie mellon ...
9 篇 mathematics and ...
9 篇 department of co...
9 篇 rice univ housto...
8 篇 department of co...
8 篇 ibm thomas j. wa...
8 篇 univ alberta dep...
8 篇 department of co...
8 篇 irisa rennes
8 篇 tech univ berlin

作者

31 篇 griebler dalvan
25 篇 sarkar vivek
21 篇 danelutto marco
20 篇 fernandes luiz g...
19 篇 loulergue freder...
17 篇 badia rosa m.
16 篇 torquati massimo
15 篇 mencagli gabriel...
15 篇 olukotun kunle
14 篇 wolf felix
12 篇 g. runger
12 篇 gonzalez-escriba...
12 篇 ayguade eduard
12 篇 m. sato
11 篇 hoefler torsten
11 篇 dinavahi venkata
11 篇 benini luca
11 篇 valero mateo
11 篇 sato mitsuhisa
11 篇 t. rauber

语言

6,494 篇 英文
139 篇 其他
21 篇 中文
17 篇 俄文
7 篇 土耳其文
2 篇 德文
2 篇 朝鲜文
1 篇 西班牙文
1 篇 日文
1 篇 葡萄牙文

检索条件"主题词=Parallel programming"

共 6686 条记录，以下是1541-1550 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Productive parallel programming with Pomelo

Productive parallel programming with Pomelo

引用

International Conference on Computer Science and Service System (CSSS)

作者： Yin Qiu Xiang Ding Dong Shao Software Institute Nanjing University Nanjing Jiangsu China

ISBN: (纸本)9781424497621

Although many existing distributed computing platforms have simplified the development of parallel programs to various degrees, none of them have good notions of software reuse, which is vital to reducing development cost and decreasing the bug rate. On those platforms, code reuse is strongly tied to the experience and skills of developers, effectively leading to low productivity. This paper describes Pomelo, a distributed computing platform designed to aid code reuse in parallel programming. Pomelo provides support for reusing software at different granularities with its task-oriented architecture. Equipped with infrastructural mechanisms for tasks to communicate with each other, it especially facilitates component-based programming with several built-in types of tasks. Preliminary experiments suggest that Pomelo has promising performance and good scalability. Our experience also shows that it is helpful for writing parallel programs productively.

关键词： Software parallel programming Distributed computing Computers Productivity Synchronization

来源：评论

学校读者我要写书评

暂无评论

Detecting, Exposing, and Classifying Sequential Consistency Violations 27

Detecting, Exposing, and Classifying Sequential Consistency ...

引用

27th IEEE International Symposium on Software Reliability Engineering (ISSRE)

作者： Islam, Mohammad Majharul Muzahid, Abdullah Univ Texas San Antonio San Antonio TX 78249 USA

ISBN: (纸本)9781467390026

Sequential Consistency (SC) is the most intuitive memory model for parallel programs. However, modern architectures aggressively reorder and overlap memory accesses, causing SC violations. An SC violation is virtually always a bug. Most prior schemes either search the entire state space of a program, or use a constraint solver to find SC violations. A promising recent scheme uses active testing technique but fails to be effective for SC violations involving larger number of threads and variables, and larger codebases. We propose Orion, the first active testing technique that can detect, expose, and classify any arbitrary SC violations in any program. Orion works in two phases. In the first phase, it finds potential SC violation cycles by focusing on racing accesses. In the second phase, it exposes each SC violation cycle by enforcing the exact scheduling order. We present a detailed design of Orion in the paper. We tested different concurrent algorithms, bug kernels, SPLASH2, PARSEC applications, and an open source program, Apache. We experimented with TSO and PSO memory models. We detected and exposed 60 SC violations of which 15 violations involve more than two processors and variables. Orion exposes SC violations quickly and with high probability. Compared to a state-of-the-art active testing technique, it has a much better SC violation detection ability.

关键词： Memory model Sequential consistency Active testing parallel programming

来源：评论

学校读者我要写书评

暂无评论

Comparative Analysis of OpenACC Compilers 16th

Comparative Analysis of OpenACC Compilers

引用

16th International Conference on Algorithms and Architectures for parallel Processing (ICA3PP)

作者： Barba, Daniel Gonzalez-Escribano, Arturo Llanos, Diego R. Univ Valladolid Dept Informat Valladolid Spain

ISBN: (纸本)9783319499567;9783319499550

OpenACC has been on development for a few years now. The OpenACC 2.5 specification was recently made public and there are some initiatives for developing full implementations of the standard to make use of accelerator capabilities. There is much to be done yet, but currently, OpenACC for GPUs is reaching a good maturity level in various implementations of the standard, using CUDA and OpenCL as backends. Nvidia is investing in this project and they have released an OpenACC Toolkit, including the PGI Compiler. There are, however, more developments out there. In this work, we analyze different available OpenACC compilers that have been developed by companies or universities during the last years. We check their performance and maturity, keeping in mind that OpenACC is designed to be used without extensive knowledge about parallel programming. Our results show that the compilers are on their way to a reasonable maturity, presenting different strengths and weaknesses.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

HARP: Predictive Transfer Optimization Based on Historical Analysis and Real-time Probing 16

HARP: Predictive Transfer Optimization Based on Historical A...

引用

International Conference on High Performance Computing, Networking, Storage and Analysis (SC)

作者： Arslan, Engin Guner, Kemal Kosar, Tevfik SUNY Buffalo Dept Comp Sci & Engn Buffalo NY 14260 USA

ISBN: (纸本)9781467388153

Increasingly data-intensive scientific and commercial applications require frequent movement of large datasets from one site to the other. Despite the growing capacity of the networking capacity, these data movements rarely achieve the promised data transfer rates of the underlying physical network due to the poorly tuned data transfer protocols. Accurately and efficiently tuning the data transfer protocol parameters in a dynamically changing network environment is a big challenge and still an open research problem. In this paper, we present predictive end-to-end data transfer optimization algorithms based on historical data analysis and real-time background traffic probing, dubbed HARP. Most of the existing work in this area is solely based on real time network probing, which either cause too much sampling overhead or fail to accurately predict the correct transfer parameters. Combining historical data analysis with real time sampling enables our algorithms to tune the application level data transfer parameters accurately and efficiently to achieve close-to-optimal end-to-end data transfer throughput with very low overhead. Our experimental analysis over a variety of network settings shows that HARP outperforms existing solutions by up to 50% in terms of the achieved throughput.

关键词： Throughput Pipeline processing Data transfer Concurrent computing Protocols Real-time systems data transmission Pipeline processing Throughput real-time systems protocols parallel programming pleiotrophin HARP harp Data analysis end of data sampling time

来源：评论

学校读者我要写书评

暂无评论

Daino: A High-level Framework for parallel and Efficient AMR on GPUs 16

Daino: A High-level Framework for Parallel and Efficient AMR...

引用

International Conference on High Performance Computing, Networking, Storage and Analysis (SC)

作者： Wahib, Mohamed Maruyama, Naoya Aoki, Takayuki RIKEN Adv Inst Computat Sci JST CREST Kobe Hyogo 6500047 Japan Tokyo Inst Technol Meguro Ku 2-12-1 Ookayama Tokyo 6500047 Japan

ISBN: (纸本)9781467388153

Adaptive Mesh Refinement methods reduce computational requirements of problems by increasing resolution for only areas of interest. However, in practice, efficient AMR implementations are difficult considering that the mesh hierarchy management must be optimized for the underlying hardware. Architecture complexity of GPUs can render efficient AMR to be particularity challenging in GPU-accelerated supercomputers. This paper presents a compiler-based high-level framework that can automatically transform serial uniform mesh code annotated by the user into parallel adaptive mesh code optimized for GPU-accelerated supercomputers. We also present a method for empirical analysis of a uniform mesh to project an upper-bound on achievable speedup of a GPU-optimized AMR code. We show experimental results on three production applications. The speedups of code generated by our framework are comparable to hand-written AMR code while achieving good and weak scaling up to 1000 GPUs.

关键词： Accelerator processing Adaptive mesh refinement parallel programming Performance analysis

来源：评论

学校读者我要写书评

暂无评论

A Big Data Platform Integrating Compressed Linear Algebra with Columnar Databases 4

A Big Data Platform Integrating Compressed Linear Algebra wi...

引用

4th IEEE International Conference on Big Data (Big Data)

作者： Harish, Vishnu Gowda Bingi, Vinay Kumar Miller, John A. Univ Georgia Dept Comp Sci Athens GA 30602 USA

ISBN: (纸本)9781467390057

Key foundational components of Big Data frameworks include efficient large-scale storage and high-performance linear algebra. This paper discusses efficient implementations that utilize compression techniques inspired by columnar relational databases for improving space and time profiles for vector and matrix operations. In addition, linear algebra operations are integrated with columnar relational algebra operations both in dense and compressed forms. For several of the operations substantial speedups are obtained by operating directly on the compressed relations, vectors and matrices. Advantages of mixing and matching relational and linear algebra operations are also pointed out. Both serial and parallel implementations are provided in the ScalaTion Big Data Analytics Framework.

关键词： Data analysis data compression data mining linear algebra parallel programming

来源：评论

学校读者我要写书评

暂无评论

Integrating Asynchronous Task parallelism with OpenSHMEM 3rd

Integrating Asynchronous Task Parallelism with OpenSHMEM

引用

3rd Workshop on OpenSHMEM and Related Technologies - Enhancing OpenSHMEM for Hybrid Environments (OpenSHMEM)

作者： Grossman, Max Kumar, Vivek Budimlic, Zoran Sarkar, Vivek Rice Univ Houston TX 77005 USA

ISBN: (纸本)9783319509952;9783319509945

Partitioned Global Address Space (PGAS) programming models combine shared and distributed memory features, and provide a foundation for high-productivity parallel programming using lightweight one-sided communications. The OpenSHMEM programming interface has recently begun gaining popularity as a lightweight library-based approach for developing PGAS applications, in part through its use of a symmetric heap to realize more efficient implementations of global pointers than in other PGAS systems. However, current approaches to hybrid inter-node and intra-node parallel programming in OpenSHMEM rely on the use of multithreaded programming models (e.g., pthreads, OpenMP) that harness intra-node parallelism but are opaque to the OpenSHMEM runtime. This OpenSHMEM+X approach can encounter performance challenges such as bottlenecks on shared resources, long pause times due to load imbalances, and poor data locality. Furthermore, OpenSHMEM+X requires the expertise of hero-level programmers, compared to the use of just OpenSHMEM. All of these are hard challenges to mitigate with incremental changes. This situation will worsen as computing nodes increase their use of accelerators and heterogeneous memories. In this paper, we introduce the AsyncSHMEM PGAS library which supports a tighter integration of shared and distributed memory parallelism than past OpenSHMEM implementations. AsyncSHMEM integrates the existing OpenSHMEM reference implementation with a thread-pool-based, intra-node, work-stealing runtime. It aims to prepare OpenSHMEM for future generations of HPC systems by enabling the use of asynchronous computation to hide data transfer latencies, supporting tight interoperability of OpenSHMEM with task parallel programming, improving load balance (both of communication and computation), and enhancing locality. In this paper we present the design of AsyncSHMEM, and demonstrate the performance of our initial AsyncSHMEM implementation by performing a scalability analysis of

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

Optimizing a Multiple Right-Hand Side Dslash Kernel for Intel Knights Corner

Optimizing a Multiple Right-Hand Side Dslash Kernel for Inte...

引用

International Supercomputing Conference (ISC High Performance)

作者： Walden, Aaron Khan, Sabbir Joo, Balint Ranjan, Desh Zubair, Mohammad Old Dominion Univ Dept Comp Sci Norfolk VA 23529 USA Thomas Jefferson Natl Accelerator Facil Newport News VA 23606 USA

ISBN: (纸本)9783319460796;9783319460789

There is a significant interest in the computational physics community to perform lattice quantum chromodynamics (LQCD) simulations, which can run into the trillions of operations. LQCD computations solve a sparse linear system using a Wilson Dslash kernel, which has an arithmetic intensity of 0.88-2.29. This makes Dslash memory bandwidth-bound on most architectures, including Intel Xeon Phi Knights Corner (KNC). Most research optimizing the Dslash operator has been focused on single right-hand side (SRHS) linear solvers. There is a class of LQCD computations which aims to solve systems with multiple right-hand sides (MRHS), presenting additional opportunities for data reuse and vectorization. We present two approaches to MRHS Dslash: a vector register blocking approach and one using the software package QPhiX with a custom code generator for low-level intrinsics. We observed significant speedups using our approaches, with sustained performance of over 700 GFLOPS (single precision) in one instance. We achieved up to 29% of theoretical peak performance compared to a maximum of 13% obtained by the previous SRHS method using QPhiX.

关键词： LQCD Optimization Performance Wilson-Dslash Code generator parallel programming Vectorization Xeon Phi Knights Corner

来源：评论

学校读者我要写书评

暂无评论

Acceleration of Tear Film Map Definition on Multicore Systems

Acceleration of Tear Film Map Definition on Multicore System...

引用

16th Annual International Conference on Computational Science (ICCS)

作者： Gonzalez-Dominguez, Jorge Remeseiro, Beatriz Martin, Maria J. Univ A Coruna Grp Arquitectura Comp Campus Elvina S-N La Coruna 15071 Spain INESC TEC INESC Technol & Sci CampusFEUPRua Dr Roberto Frias P-4200465 Porto Portugal

Dry eye syndrome is a public health problem, and one of the most common conditions seen by eye care specialists. Among the clinical tests for its diagnosis, the evaluation of the interference patterns observed in the tear film lipid layer is often employed. In this sense, tear film maps illustrate the spatial distribution of the patterns over the whole tear film and provide useful information to practitioners. However, the creation of a single map usually takes tens of minutes. Medical experts currently demand applications with lower response time in order to provide a faster diagnosis for their patients. In this work, we explore different parallel approaches to accelerate the definition of the tear film map by exploiting the power of today's ubiquitous multicore systems. They can be executed on any multicore system without special software or hardware requirements. The experimental evaluation determines the best approach (on-demand with dynamic seed distribution) and proves that it can significantly decrease the runtime. For instance, the average runtime of our experiments with 50 real-world images on a system with AMD Opteron processors is reduced from more than 20 minutes to one minute and 12 seconds.

关键词： parallel programming Multithreading Image Segmentation Dry Eye

来源：评论

学校读者我要写书评

暂无评论

Cache Aware Dynamics Data Layout for Efficient Shared Memory parallelisation of EUROPLEXUS

Cache Aware Dynamics Data Layout for Efficient Shared Memory...

引用

16th Annual International Conference on Computational Science (ICCS)

作者： Sridi, Marwa Raffin, Bruno Faucher, Vincent CEA DEN DANS DM2SSEMIDYN F-91191 Gif Sur Yvette France Univ Grenoble Alpes INRIA Grenoble France CEA DEN Cadarache DTN Dir F-13108 St Paul Les Durance France

parallelizing industrial simulation codes like the EUROPLEXUS software dedicated to the analysis of fast transient phenomena, is challenging. In this paper we focus on the efficient parallelization on a multi-core shared memory node. We propose to have each thread gather the data it needs for processing a given iteration range, before to actually advance the computation by one time step on this range. This lazy cache aware layout construction enables to keep the original data structure and leads to very localised code modifications. We show that this approach can improve the execution time by up to 40% when the task size is set to have the data fit in the L2 cache.

关键词： EUROPLEXUS Shared Memory Cache-aware Data Layout parallel programming

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 151 152 153 154 155 156 157 158 159 160 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：