检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

3,823 篇 会议
185 篇 期刊文献
83 册 图书

馆藏范围

4,091 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

2,097 篇 工学
- 1,912 篇 计算机科学与技术...
- 1,024 篇 软件工程
- 369 篇 电气工程
- 153 篇 信息与通信工程
- 137 篇 电子科学与技术（可...
- 76 篇 控制科学与工程
- 30 篇 机械工程
- 30 篇 生物工程
- 24 篇 材料科学与工程（可...
- 24 篇 生物医学工程（可授...
- 22 篇 仪器科学与技术
- 20 篇 光学工程
- 19 篇 建筑学
- 17 篇 测绘科学与技术
- 16 篇 土木工程
- 13 篇 动力工程及工程热...
- 12 篇 农业工程
526 篇 理学
- 417 篇 数学
- 50 篇 物理学
- 39 篇 系统科学
- 33 篇 生物学
- 30 篇 统计学（可授理学、...
- 16 篇 化学
- 16 篇 地球物理学
207 篇 管理学
- 154 篇 管理科学与工程(可...
- 61 篇 工商管理
- 54 篇 图书情报与档案管...
19 篇 农学
- 14 篇 作物学
18 篇 法学
- 18 篇 社会学
15 篇 经济学
- 15 篇 应用经济学
13 篇 医学
3 篇 文学
3 篇 军事学
2 篇 教育学
2 篇 艺术学
1 篇 哲学

主题

653 篇 parallel process...
545 篇 parallel program...
530 篇 computer archite...
463 篇 parallel archite...
448 篇 concurrent compu...
360 篇 parallel algorit...
322 篇 programming
316 篇 hardware
285 篇 computer science
276 篇 algorithm design...
263 篇 computational mo...
214 篇 programming prof...
166 篇 parallel process...
165 篇 dynamic programm...
154 篇 application soft...
138 篇 program processo...
138 篇 costs
137 篇 distributed comp...
136 篇 libraries
133 篇 runtime

机构

9 篇 stanford univ st...
9 篇 intel corporatio...
8 篇 barcelona superc...
8 篇 oak ridge natl l...
8 篇 univ calif berke...
7 篇 school of comput...
7 篇 oak ridge nation...
7 篇 carnegie mellon ...
7 篇 college of compu...
7 篇 oak ridge nation...
7 篇 univ texas austi...
6 篇 school of comput...
6 篇 sandia national ...
6 篇 department of co...
6 篇 department of co...
6 篇 school of comput...
6 篇 department of co...
5 篇 department of co...
5 篇 nvidia corporati...
5 篇 pacific northwes...

作者

15 篇 jack dongarra
12 篇 dongarra jack
11 篇 hong shen
10 篇 hoefler torsten
9 篇 zhong cheng
9 篇 olukotun kunle
9 篇 gu yan
8 篇 chapman barbara
7 篇 garcia i.
7 篇 forsell martti
7 篇 sun yihan
7 篇 jigang wu
7 篇 nakano koji
7 篇 danelutto marco
6 篇 cheng zhong
6 篇 v.k. prasanna
6 篇 blelloch guy e.
6 篇 h.j. siegel
6 篇 lumsdaine andrew
6 篇 tsigas philippas

语言

4,038 篇 英文
45 篇 其他
13 篇 中文

检索条件"任意字段=International Symposium on Parallel Architectures, Algorithms, and Programming"

共 4091 条记录，以下是191-200 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

CUDAMicroBench: Microbenchmarks to Assist CUDA Performance programming

CUDAMicroBench: Microbenchmarks to Assist CUDA Performance P...

引用

35th IEEE international parallel and Distributed Processing symposium (IPDPS)

作者： Yi, Xinyao Stokes, David Yan, Yonghong Liao, Chunhua Univ North Carolina Charlotte Dept Comp Sci Charlotte NC 28223 USA Lawrence Livermore Natl Lab Ctr Appl Sci Comp Livermore CA 94550 USA

ISBN: (纸本)9781665435772

programming to achieve high performance for NVIDIA GPUs using CUDA has been known to be challenging. A GPU has hundreds or thousands of cores that a program must exhibit sufficient parallelism to achieve maximum GPU utilization. A system with GPU accelerators has a heterogeneous and deep memory system that programmers must effectively and correctly use to fully take advantage of the GPU's parallelism capability. In this paper, we present CUDAMicroBench, a collection of fourteen microbenchmarks that demonstrate performance challenges in CUDA programming and techniques to optimize the CUDA programs to address these challenges. It also includes examples and techniques for using advanced CUDA features such as data shuffling between threads, dynamic parallelism, etc that can help users optimize the CUDA program for performance. The microbenchmark can be used for evaluating the performance of GPU architectures, the memory systems of GPU itself and of the whole system architectures, and for evaluating the effectiveness of compiler and performance tools for performance analysis. It can be used to help users understand the complexity of heterogeneous GPU-accelerator systems through examples and guide users for performance optimization. It is released as BSD-licensed open-source from https://***/passlab/***.

关键词： GPU CUDA Performance Optimization parallelism Memory Hierarchy

来源：评论

学校读者我要写书评

暂无评论

Evaluation of programming Models and Performance for Stencil Computation on GPGPUs

Evaluation of Programming Models and Performance for Stencil...

引用

IEEE international symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Baodi Shan Mauricio Araya-Polo TotalEnergies EP Research & Technology US LLC Houston Texas USA

ISBN: (数字)9798350364606

ISBN: (纸本)9798350364613

GPGPUs are widely used in high-performance computing. Therefore, it is crucial to experiment and discover how to better utilize their latest generations of relevant applications. In this paper, we introduce highly tuned stencil-based kernels for NVIDIA A100 and H100 (of a GH200) GPGPUs. Performance results yield useful insights into the behavior of this type of computation for these new accelerators. This knowledge can be leveraged by many scientific applications which involves stencil computations. Further, evaluation of three different programming models: CUDA, OpenACC, and OpenMP target offloading is conducted on aforementioned accelerators. We extensively study the performance and portability of various kernels under each programming model and provide corresponding optimization recommendations. Furthermore, we compare the performance of different programming models on the mentioned architectures. Up to 58% performance improvement was achieved against the previous GPGPU generation for a highly optimized kernel of the same class, and up to 42% for all classes. In terms of programming models, and keeping portability in mind, optimized OpenACC implementation outperforms OpenMP implementation by 33%. If portability is not a factor, the best CUDA implementation outperforms the optimized OpenACC one by 2.1×.

关键词： Solid modeling Distributed processing Three-dimensional displays Computational modeling High performance computing Conferences Graphics processing units

来源：评论

学校读者我要写书评

暂无评论

Scalable parallel algorithm for fast computation of Transitive Closure of Graphs on Shared Memory architectures 6

Scalable parallel algorithm for fast computation of Transiti...

引用

IEEE/ACM 6th international Workshop on Extreme Scale programming Models and Middleware (ESPM2)

作者： Patel, Sarthak Dave, Bhrugu Kumbhani, Smit Desai, Mihir Kumar, Sidharth Chaudhury, Bhaskar DA IICT Grp Computat Sci & HPC Gandhinagar India Univ Alabama Birmingham Dept Comp Sci Birmingham AL 35294 USA

ISBN: (纸本)9781665411400

We present a scalable algorithm that computes the transitive closure of a graph on shared memory architectures using the OpenMP API in C++. Two different parallelization strategies have been presented and the performance of the two algorithms has been compared for several data-sets of varying sizes. We demonstrate the scalability of the best parallel implementation up to 176 threads on a shared memory architecture, by producing a graph with more than 3.82 trillion edges. To the best of our knowledge, this is the first implementation that has computed the transitive closure of such a large graph on a shared memory system. Optimization strategies for better cache utilization for large data-sets have been discussed. The important issue of load balancing has been analyzed and its mitigation using the optimal OpenMP scheduling clause has been discussed in detail.

关键词： Graph algorithms OpenMP Transitive Closure Scalability Shared memory

来源：评论

学校读者我要写书评

暂无评论

An Evolvable Swarm-parallel Algorithm Framework for Satellite-Ground Networking Problem

An Evolvable Swarm-Parallel Algorithm Framework for Satellit...

引用

Aerospace Engineering and Systems (ISAES), international symposium on

作者： Cheng Chen Yonghao Du Feng Yao College of Systems Engineering National University of Defense Technology Changsha China

ISBN: (数字)9798350350418

ISBN: (纸本)9798350350425

Satellite Internet has been deployed rapidly, leading to an explosive growth in satellite scale, further strengthening the contradiction between the growing demand for satellite-ground communication and the limited ground station resources. To address this challenge, this paper introduces an evolvable swarm-parallel algorithm framework (ESPAF), which comprises a swarm-parallel solving and deconflicting framework, a learning-assisted metaheuristic (LM) fast-solving algorithm that combines reinforcement learning (RL) and Tabu simulated annealing (TSA), and a linear programming (LP) exact-solving algorithm. The ESPAF breaks through the traditional task-driven satellite scheduling approach found in existing research and presents a novel and appropriate method for optimizing satellite-ground networking under a new macro paradigm. A series of experimental instances involving up to 1,000 satellites and 100 ground stations demonstrate the efficient performance of ESPAF. Comparative experiments with alternative metaheuristic algorithms and the CPLEX solver further emphasized ESP AF's ability to generate high-quality solutions more rapidly.

关键词： Satellites Metaheuristics Simulated annealing Reinforcement learning Integer linear programming Linear programming Mathematical models Explosives Internet Mathematical programming

来源：评论

学校读者我要写书评

暂无评论

CaSMap: Agile Mapper for Reconfigurable Spatial architectures by Automatically Clustering Intermediate Representations and Scattering Mapping Process 22

CaSMap: Agile Mapper for Reconfigurable Spatial Architecture...

引用

49th IEEE/ACM Annual international symposium on Computer Architecture (ISCA)

作者： Man, Xingchen Zhu, Jianfeng Song, Guihuan Yin, Shouyi Wei, Shaojun Liu, Leibo Tsinghua Univ Beijing Natl Res Ctr Informat Sci & Technol BNRis Sch Integrated Circuits Beijing Peoples R China

ISBN: (纸本)9781450386104

Today, reconfigurable spatial architectures (RSAs) have sprung up as accelerators for compute- and data-intensive domains because they deliver energy and area efficiency close to ASICs and still retain sufficient programmability to keep the development cost low. The mapper, which is responsible for mapping algorithms onto RSAs, favors a systematic backtracking methodology because of high portability for evolving RSA designs. However, exponentially scaling compilation time has become the major obstacle. The key observation of this paper is that the key limiting factor to the systematic backtracking mappers is the waterfall mapping model which resolves all mapping variables and constraints at the same time using single-level intermediate representations (IRs). This work proposes CaSMap, an agile mapper framework independent of software and hardware of RSAs. By clustering the lowest-level software and hardware IRs into multi-level IRs, the original mapping process can be scattered as multi-stage decomposed ones and therefore the mapping problem with exponential complexity is mitigated. This paper introduces (a) strategies for clustering low-level hardware and software IRs with static connectivity and critical path analysis. (b) a multi-level scattered mapping model in which the higher-level model carries out the heuristics from IR clustering, endeavors to promote mapping success rate, and reduces the scale of the lower-level model. Our evaluation shows that CaSMap is able to reduce the problem scale (nonzeros) by 80.5% (23.1%-94.9%) and achieve a mapping time speedup of 83x over the state-of-the-art waterfall mapper across four different RSA topologies: MorphoSys, HReA, HyCUBE, and REVEL.

关键词： Reconfigurable Spatial Architecture Coarse-Grained Reconfigurable Architecture Compiler Integer Linear programming

来源：评论

学校读者我要写书评

暂无评论

Parla: A Python Orchestration System for Heterogeneous architectures

Parla: A Python Orchestration System for Heterogeneous Archi...

引用

international Conference for High Performance Computing, Networking, Storage and Analysis (HPC)

作者： Lee, Hochan Ruys, William Henriksen, Ian Peters, Arthur Yan, Yineng Stephens, Sean You, Bozhi Fingler, Henrique Burtscher, Martin Gligoric, Milos Schulz, Karl Pingali, Keshav Rossbach, Christopher J. Erez, Mattan Biros, George Univ Texas Austin Austin TX 78712 USA Texas State Univ San Marcos TX USA

ISBN: (纸本)9781665454445

Python's ease of use and rich collection of numeric libraries make it an excellent choice for rapidly developing scientific applications. However, composing these libraries to take advantage of complex heterogeneous nodes is still difficult. To simplify writing multi-device code, we created Parla, a heterogeneous task-based programming framework that fully supports Python's scientific programming stack. Parla's API is based on Python decorators and allows users to wrap code in Parla tasks for parallel execution. Parla arrays enable automatic movement of data between devices. The Parla runtime handles resourceaware mapping, scheduling, and execution of tasks. Compared to other Python tasking systems, Parla is unique in its parallelization of tasks within a single process, its GPU context and resourceaware runtime, and its design around gradual adoption to provide easy migration of and integration into existing Python applications. We show that Parla can achieve performance competitive with hand-optimized code while improving ease of development.

关键词： parallel application frameworks task based parallelism heterogeneous computing load balancing and scheduling algorithms

来源：评论

学校读者我要写书评

暂无评论

Traffic Speed Prediction of Road Cluster with Heterogeneous Sampling Frequency

Traffic Speed Prediction of Road Cluster with Heterogeneous ...

引用

international symposium on parallel architectures, algorithms and programming (PAAP)

作者： Guiyuan Jiang Peilan He Jigang Wu Yidan Sun Thambipillai Srikanthan Ocean University of China Qingdao China Guangdong University of Technology Guangzhou China Nanyang Technological University Singapore

ISBN: (纸本)9781665452199

Accurate short-term road traffic prediction is essential for achieving intelligent transportation systems, such as traffic management, travel route planning, and navigation. The existing works typically provide the prediction for an individual road segment each time. Even though some models aim to simultaneously predict the traffic of a cluster of road segments, they usually assume that the road cluster has a regular network topology (e.g., ring network or grid network). These methods cannot be easily extended to road networks of arbitrary graph topology. This paper addresses the problem of traffic speed prediction for a cluster of road segments with arbitrary topology and heterogeneous sampling frequency of traffic states. We propose a novel prediction framework consisting of three modules: network partitioning, feature extraction, and traffic prediction modules. The first module divides the entire traffic network into several disjoint clusters with high intra-clusters similarity and low intercluster similarity, based on our proposed measurement metrics for measuring the similarity of time series with heterogeneous sampling frequency. The second module extract features that capture temporal correlations of speed series and contextual factors (e.g., road network characteristics and extrinsic factors) while considering the heterogeneity in data frequency. The third module relies on the obtained features to simultaneously predict the traffic states of all road segments in a cluster, where the spatial correlations among roadways are captured via an attention mechanism. The performance is evaluated using large-scale real-world traffic data involving 42 bus services.

关键词： Time-frequency analysis Correlation Network topology Roads Time series analysis programming Feature extraction

来源：评论

学校读者我要写书评

暂无评论

parallel programming Models and Paradigms: OpenMP Analysis 5

Parallel Programming Models and Paradigms: OpenMP Analysis

引用

5th international Conference on Computing Methodologies and Communication, ICCMC 2021

作者： Alrawais, Arwa Prince Sattam Bin Abdulaziz University College of Computer Engineering and Sciences Al-Kharj11942 Saudi Arabia

ISBN: (纸本)9781665403603

The increase demand for processing power has grown over the years, this demand lend to the parallel approach which means linking a bunch of computers together to jointly increase both the speed and efficiency. The parallelism approach plays a significant role in the new generation's applications by moving the technologies from expensive and specialized parallel supercomputers to linking a set of computers. Throughout the years, the parallel approach lends to parallel programming models which exists above hardware and memory architectures. It is a collection of software technologies that present parallel algorithms and resembling applications with underlying system. This paper describes the essential concept of parallel programming and a brief overview of different areas of parallel programming models and paradigms. Furthermore, it implements and evaluates OpenMP parallel programming and illustrates its effectiveness. © 2021 IEEE.

关键词： parallel programming

来源：评论

学校读者我要写书评

暂无评论

ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

ParaGraph: Weighted Graph Representation for Performance Opt...

引用

IEEE international symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Ali TehraniJamsaz Alok Mishra Akash Dutta Abid M. Malik Barbara Chapman Ali Jannesari Iowa State University Ames Iowa USA Hewlett Packard Enterprise Milpitas California USA Stony Brook University Stony Brook New York USA

ISBN: (数字)9798350364606

ISBN: (纸本)9798350364613

GPU-based HPC clusters are attracting more sci-entific application developers due to their extensive parallelism and energy efficiency. In order to achieve portability among a variety of multi/many core architectures, a popular choice for an application developer is to utilize directive-based parallel programming models, such as OpenMP. However, even with OpenMP, the developer must choose from among many strategies for exploiting a GPU or a CPU. This paper introduces a new graph-based program representation for optimization of OpenMP applications. The originality of this work lies in the augmentations of Abstract Syntax Trees (ASTs) and the introduction of edge weights to account for loop and condition information. We evaluate our proposed representation by training a Graph Neural Network (GNN) to predict the runtime of OpenMP code regions across CPUs and GPUs. Various transformations utilizing collapse and data transfer between the CPU and GPU are used to construct the dataset. The trained model is used to determine which transformation provides the best performance. Results indicate that our approach is effective and has normalized RMSE as low as $4\times 10^{-3}$ to at most $1\times 10^{-2}$ in its runtime predictions.

关键词： Training Runtime parallel programming Graphics processing units Syntactics parallel processing Graph neural networks

来源：评论

学校读者我要写书评

暂无评论

A Novel DA-Based parallel Architecture for Inner-Product of Variable Vectors

A Novel DA-Based Parallel Architecture for Inner-Product of ...

引用

IEEE international symposium on Circuits and Systems (ISCAS)

作者： Anil Kali Samrat L. Sabat Pramod K. Meher CASEST University of Hyderabad Hyderabad India Department of Computer Science & Engineering C. V. Raman Global University Bhubaneswar India

ISBN: (数字)9798350330991

ISBN: (纸本)9798350331004

Computation of the inner products is frequently used in machine learning (ML) algorithms apart from signal processing and communication applications. Distributed arithmetic (DA) has been frequently employed for area-time efficient inner-product implementations. In conventional DA-based architectures, one of the vectors is constant and known a priori. Hence, the traditional DA architectures are not suitable when both vectors are variable. However, computing the inner product of a pair of variable vectors is frequently used for matrix multiplication of various forms and convolutional neural networks. In this paper, we present a novel DA-based architecture for computing the inner product of variable vectors. To derive the proposed architecture, the inner product of any given length is decomposed into a set of short-length inner products, such that the inner product could be computed by successive accumulation of the results of short-length inner products. We have designed a DA-based architecture for the computation of the short-length inner-product of variable vectors and used that in successive clock cycles to compute the whole inner-product by successive accumulation. The post-layout synthesis results using Cadence Innovus with a GPDK 90nm technology library show that the proposed DA-based parallel architecture offers significant advantages in area-delay product and energy consumption over the bit-serial DA architecture.

关键词： Energy consumption Machine learning algorithms Architecture Signal processing algorithms Machine learning Signal processing Vectors

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共410页 << < 16 17 18 19 20 21 22 23 24 25 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：