检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

2,501 篇 会议
69 篇 期刊文献
4 册 图书

馆藏范围

2,574 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

1,951 篇 工学
- 1,911 篇 计算机科学与技术...
- 889 篇 软件工程
- 388 篇 信息与通信工程
- 292 篇 电气工程
- 139 篇 电子科学与技术（可...
- 102 篇 控制科学与工程
- 66 篇 网络空间安全
- 32 篇 动力工程及工程热...
- 25 篇 建筑学
- 24 篇 机械工程
- 21 篇 生物医学工程（可授...
- 20 篇 土木工程
- 16 篇 生物工程
- 15 篇 交通运输工程
- 15 篇 安全科学与工程
- 14 篇 环境科学与工程（可...
- 13 篇 光学工程
- 10 篇 力学（可授工学、理...
- 10 篇 化学工程与技术
419 篇 理学
- 345 篇 数学
- 42 篇 统计学（可授理学、...
- 40 篇 系统科学
- 38 篇 物理学
- 21 篇 生物学
- 18 篇 化学
353 篇 管理学
- 304 篇 管理科学与工程(可...
- 126 篇 工商管理
- 74 篇 图书情报与档案管...
23 篇 经济学
- 22 篇 应用经济学
14 篇 法学
- 14 篇 社会学
8 篇 农学
7 篇 医学
5 篇 教育学
2 篇 文学
2 篇 军事学

主题

188 篇 parallel process...
155 篇 application soft...
137 篇 graphics process...
130 篇 parallel process...
122 篇 computer archite...
114 篇 hardware
110 篇 computational mo...
102 篇 distributed comp...
101 篇 concurrent compu...
94 篇 computer science
86 篇 runtime
86 篇 distributed comp...
84 篇 parallel program...
67 篇 scalability
65 篇 graphics process...
61 篇 libraries
61 篇 instruction sets
60 篇 resource managem...
56 篇 kernel
55 篇 bandwidth

机构

15 篇 oak ridge natl l...
12 篇 cent s univ sch ...
11 篇 argonne natl lab...
11 篇 univ tennessee k...
10 篇 guangzhou univ s...
10 篇 school of comput...
10 篇 univ manchester ...
10 篇 ohio state univ ...
8 篇 oak ridge natl l...
7 篇 univ chinese aca...
7 篇 hunan univ coll ...
6 篇 chinese acad sci...
6 篇 iit dept comp sc...
6 篇 oak ridge nation...
6 篇 hunan engn lab r...
6 篇 univ illinois de...
6 篇 department of co...
5 篇 univ sci & techn...
5 篇 georgia state un...
5 篇 georgia inst tec...

作者

16 篇 dongarra jack
13 篇 wang guojun
12 篇 sun xian-he
10 篇 cerin christophe
9 篇 schulz martin
9 篇 guo minyi
9 篇 agrawal gagan
9 篇 wolf felix
9 篇 robert yves
8 篇 matsuoka satoshi
8 篇 jin hai
7 篇 li kenli
7 篇 prasad sushil k.
7 篇 banicescu ioana
7 篇 antoniu gabriel
7 篇 kale laxmikant v...
7 篇 zhou xuehai
7 篇 labarta jesus
6 篇 li xi
6 篇 hoefler torsten

语言

2,572 篇 英文
1 篇 葡萄牙文
1 篇 其他

检索条件"任意字段=13th IEEE International Symposium on Parallel and Distributed Processing with Applications"

共 2574 条记录，以下是171-180 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Plex: Scaling parallel Lexing with Backtrack-Free Prescanning 35

Plex: Scaling Parallel Lexing with Backtrack-Free Prescannin...

引用

35th ieee international parallel and distributed processing symposium (IPDPS)

作者： Li, Le Sato, Shigeyuki Liu, Qiheng Taura, Kenjiro Univ Tokyo Tokyo Japan

ISBN: (纸本)9781665440660

Lexical analysis, which converts input text into a list of tokens, plays an important role in many applications, including compilation and data extraction from texts. To recognize token patterns, a lexer incorporates a sequential computation model - automaton as its basic building component. As such, it is considered difficult to parallelize due to the inherent data dependency. Much work has been done to accelerate lexical analysis through parallel techniques. Unfortunately, existing attempts mainly rely on language-specific remedies for input segmentation, which makes it not only tricky for language extension, but also challenging for automatic lexer generation. this paper presents Plex - an automated tool for generating parallel lexers from user-defined grammars. To overcome the inherent sequentiality, Plex applies a fast prescanning phase to collect context information prior to scanning. To reduce the overheads brought by prescanning, Plex adopts a special automaton, which is derived from that of the scanner, to avoid backtracking behavior and exploits data-parallel techniques. the evaluation under several languages shows that the prescanning overhead is small, and consequently Plex is scalable and achieves 9.8-11.5X speedups using 18 threads.

关键词： Lexical Analysis Finite Automaton parallelism

来源：评论

学校读者我要写书评

暂无评论

Plaster: an Embedded FPGA-based Cluster Orchestrator for Accelerated distributed Algorithms

Plaster: an Embedded FPGA-based Cluster Orchestrator for Acc...

引用

35th ieee international parallel and distributed processing symposium (IPDPS)

作者： Farinelli, Lorenzo De Vincenti, Daniele Valentino Damiani, Andrea Stornaiuolo, Luca Brondolin, Rolando Santambrogio, Marco D. Sciuto, Donatella Politecn Milan Dipartimento Elettron Informaz & Bioingn DEIB Milan Italy

ISBN: (纸本)9781665435772

the increasing use of real-time data-intensive applications and the growing interest in Heterogeneous Architectures have led to the need for increasingly complex embedded computing systems. An example of this is the research carried out by both the scientific community and companies toward embedded multi-FPGA systems for the implementation of the inference phase of Convolutional Neural Networks. In this paper, we focus on optimizing the management system of these embedded FPGA-based distributed systems. We extend the state-of-the-art FARD framework to data-intensive applications in an embedded scenario. Our orchestration and management infrastructure benefits from compiled language and is accessible to end-users by the means of Python APIs, which provides a simple way to interact with the cluster and design apps to run on the embedded nodes. the proposed prototype system consists of a PYNQ-based cluster of multiple FPGAs and has been evaluated by running an FPGA-based You Only Look Once (YOLO) image classification algorithm.

关键词： multi-FPGAs Cluster Orchestrator Embedded

来源：评论

学校读者我要写书评

暂无评论

Revisiting Credit Distribution Algorithms for distributed Termination Detection

Revisiting Credit Distribution Algorithms for Distributed Te...

引用

35th ieee international parallel and distributed processing symposium (IPDPS)

作者： Bosilca, George Bouteiller, Aurelien Herault, thomas Le Fevre, Valentin Robert, Yves Dongarra, Jack Univ Tennessee Knoxville TN 37996 USA Barcelona Supercomp Ctr Barcelona Spain ENS Lyon Lab LIP Lyon France

ISBN: (纸本)9781665435772

this paper revisits distributed termination detection algorithms in the context of High-Performance Computing (HPC) applications. We introduce an efficient variant of the Credit Distribution Algorithm (CDA) and compare it to the original algorithm (HCDA) as well as to its two primary competitors: the Four Counters algorithm (4C) and the Efficient Delay-Optimal distributed algorithm (EDOD). We analyze the behavior of each algorithm for some simplified task-based kernels and show the superiority of CDA in terms of the number of control messages.

关键词： Termination detection credit distribution algorithms task-based HPC application control messages

来源：评论

学校读者我要写书评

暂无评论

High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers 35

High-Level FPGA Accelerator Design for Structured-Mesh-Based...

引用

35th ieee international parallel and distributed processing symposium (IPDPS)

作者： Kamalakkannan, Kamalavasan Mudalige, Gihan R. Reguly, Istvan Z. Fahmy, Suhaib A. Univ Warwick Dept Comp Sci Warwick England Pazmany Peter Catholic Univ Fac Informat Technol & Bion Budapest Hungary King Abdullah Univ Sci & Technol KAUST Thuwal Saudi Arabia

ISBN: (纸本)9781665440660

this paper presents a workflow for synthesizing near-optimal FPGA implementations of structured-mesh based stencil applications for explicit solvers. It leverages key characteristics of the application class and its computation-communication pattern and the architectural capabilities of the FPGA to accelerate solvers for high-performance computing applications. Key new features of the workflow are (1) the unification of standard state-of-the-art techniques with a number of high-gain optimizations such as batching and spatial blocking/tiling, motivated by increasing throughput for real-world workloads and (2) the development and use of a predictive analytical model to explore the design space, and obtain resource and performance estimates. three representative applications are implemented using the design workflow on a Xilinx Alveo U280 FPGA, demonstrating near-optimal performance and over 85% predictive model accuracy. these are compared with equivalent highly-optimized implementations of the same applications on modern HPC-grade GPUs (Nvidia V100), analyzing lime to solution, bandwidth, and energy consumption. Performance results indicate comparable runtimes with the V100 GPU, with over 2x energy savings for the largest non-trivial application on the FPGA. Our investigation shows the challenges of achieving high performance on current generation FPGAs compared to traditional architectures. We discuss determinants for a given stencil code to be amenable to FPGA implementation, providing insights into the feasibility and profitability of a design and its resulting performance.

关键词： FPGAs Stencil applications Explicit solvers

来源：评论

学校读者我要写书评

暂无评论

Extremely Fast and Energy Efficient One-way Wave Equation Migration on GPU-based heterogeneous architecture 35

Extremely Fast and Energy Efficient One-way Wave Equation Mi...

引用

35th ieee international parallel and distributed processing symposium (IPDPS)

作者： Qu, Long Lucido, Loris Bonnasse-Gahot, Marie Vezolle, Pascal Klahr, Diego Total Pau France Eolen Pau France IBM Corp Montpellier France Total Houston TX USA

ISBN: (纸本)9781665440660

One-way Wave Equation Migration (OWEM) is a classic seismic imaging method offering a good trade-off between quality and compute cost in most geological cases. In recent years, GPU-based heterogeneous architecture has gained popularity for seismic imaging. In this paper, we present a generic design for asynchronous processing and data management. By applying this design, we present an efficient GPU implementation of OWEM combining OpenACC and CUDA. Our approach improves upon classic designs by exploring asynchronous compute and data transfer between CPU and GPU using high-speed NVLink, completely masking the cost of MPI communications and I/O. Using 3,018 GPUs, our tine-tuned OWEM can process 11,172 seismic shots in less than 75 minutes. By tuning CPU and GPU clock frequencies, we achieve around 30% energy saving with only 4% loss of performance on PANGEA III supercomputer. We believe our design combined with the energy-aware tuning will be beneficial to many GPU applications.

关键词： seismic imaging GPU accelerated computing parallel computing heterogeneous architecture energy-aware HPC

来源：评论

学校读者我要写书评

暂无评论

Performance Evaluation of Adaptive Routing on Dragonfly-based Production Systems 35

Performance Evaluation of Adaptive Routing on Dragonfly-base...

引用

35th ieee international parallel and distributed processing symposium (IPDPS)

作者： Chunduri, Sudheer Harms, Kevin Groves, Taylor Mendygral, Peter Zarins, Justs Weiland, Michele Ghadar, Yasaman Argonne Natl Lab Argonne IL 60439 USA Lawrence Berkeley Natl Lab Berkeley CA USA Hewlett Packard Enterprise Spring TX USA Univ Edinburgh EPCC Edinburgh Midlothian Scotland

ISBN: (纸本)9781665440660

Performance of applications in production environments can he sensitive to network congestion. Cray Aries supports adaptively routing each network packet independently based on the load or congestion encountered as a packet traverses the network. Software can dictate different routing policies, adjusting between minimal and non-minimal bias, for each posted message. We have extensively evaluated the sensitivity of the routing bias selection on application performance as well as whole system performance in both production and controlled conditions. We show that the default routing bias used in Aries-based systems is often sub-optimal and that using a higher bias towards minimal routes will not only reduce the congestion effects on the application but also will decrease the overall congestion on the network. this routing scheme results in not only improved mean performance (by up to 12%) of most production applications hut also reduced run-to-run variability. Our study prompted the two supercomputing facilities (ALCF and NERSC) to change the default routing mode on their Aries-based systems. We present the substantial improvement measured in the overall congestion management and interconnect performance in production after making this change.

关键词： Performance evaluation Production systems distributed processing Adaptive systems Sensitivity Runtime System performance

来源：评论

学校读者我要写书评

暂无评论

High-Level Synthesis of parallel Specifications Coupling Static and Dynamic Controllers 35

High-Level Synthesis of Parallel Specifications Coupling Sta...

引用

35th ieee international parallel and distributed processing symposium (IPDPS)

作者： Castellana, Vito Giovanni Tumeo, Antonino Ferrandi, Fabrizio Pacific Northwest Natl Lab High Performance Comp Richland WA 99352 USA Politecn Milan DEIB I-20133 Milan Italy

ISBN: (纸本)9781665440660

Conventional High-Level Synthesis (HLS) tools exploit parallelism mostly at the Instruction Level (ILP). they statically schedule the input specifications and build centralized Finite State Machine (FSM) controllers. However, aggressive exploitation of ILP in many applications has diminishing returns and, usually, centralized approaches do not efficiently exploit coarser parallelism, because FSMs are inherently serial. In this paper we present an HIS framework able to synthesize applications that, beside ILP, also expose Task Level parallelism (TLP). An application can expose TLP through annotations that identify the parallel functions (i.e., tasks). To generate accelerators that efficiently execute concurrent tasks, we need to solve several issues: devise a mechanism to support concurrent execution flows, exploit memory parallelism, and manage synchronization. To support concurrent execution flows, we introduce a novel adaptive controller. the adaptive controller is composed of a set of interacting control elements that independently manage the execution of a task. these control elements check dependencies and resource constraints at runtime, enabling as soon as possible execution. To support parallel access to shared memories and synchronization, we integrate with a novel Hierarchical Memory Interface (HMI). With respect to previous solutions, the proposed interface supports multi-ported memories and atomic memory operations, which commonly occur in parallel programming. Our framework can generate the hardware implementation of C functions by employing two different approaches, depending on its characteristics. If a function exposes TLP, then the framework generates hardware implementations based on the adaptive controller. Otherwise, the framework implements the function through the FSM approach, which is optimized for ILP exploitation. We evaluate our framework on a set of parallel applications and show substantial performance improvements (average speedup of 4.

关键词： Annotations Memory management parallel processing Tools Dynamic scheduling Hardware Finite element analysis

来源：评论

学校读者我要写书评

暂无评论

Efficient Spatio-Temporal-Data-Oriented Range Query processing for Air Traffic Flow Statistics 19

Efficient Spatio-Temporal-Data-Oriented Range Query Processi...

引用

19th ieee international symposium on parallel and distributed processing with applications (ieee ISPA)

作者： Yu, Huayan Li, Xin Yuan, Ligang Qin, Xiaolin Nanjing Univ Aeronaut & Astronaut Coll Comp Sci & Technol Nanjing Peoples R China Nanjing Univ State Key Lab Novel Software Technol Nanjing Peoples R China Nanjing Univ Aeronaut & Astronaut Coll Civil Aviat Nanjing Peoples R China

ISBN: (纸本)9781665435741

Timely and efficient air traffic flow statistics play a significant role in improving the accuracy and intelligence of air traffic flow management (ATFM). the enormous spatio-temporal data collected by location-based services (LBS) intensely aggravate the burden of the statistical tasks. the traditional approaches of calculating such tasks show their weakness in two parts: 1) they fail to capture the features of complicated three-dimensional time-dependent airspace, and 2) they are not optimized to deal with big volume spatio-temporal data covering high-dimensional features. Spatio-temporal range queries have advantages in calculating the eligible flow records. therefore, exploring the efficiency of distributed range query processing methods helps improve the performance of air traffic flow statistics and gain insights into the rationality of the air traffic. To analyze the large-scale spatio-temporal aviation data efficiently, we propose two spatio-temporal range query MapReduce algorithms: 1) spatio-temporal polygon range query, which aims to find all records from a polygonal location in a time interval, 2) spatio-temporal k nearest neighbors query, which directly searches the k closest neighbors of the query point. Moreover, we design an air traffic flow statistic strategy to accurately calculate traffic flow in arbitrary airspace based on real-world aviation trajectory datasets. the experimental results demonstrate that our algorithms perform better in answering spatio-temporal range queries over counterpart algorithms and the average response time is reduced by 81%. the evaluation also proves the effectiveness of our algorithms concerning air traffic flow statistics.

关键词： Air traffic flow management spatio-temporal data management spatio-temporal range query

来源：评论

学校读者我要写书评

暂无评论

Pilot-Edge: distributed Resource Management Along the Edge-to-Cloud Continuum

Pilot-Edge: Distributed Resource Management Along the Edge-t...

引用

35th ieee international parallel and distributed processing symposium (IPDPS)

作者： Luckow, Andre Rattan, Kartik Jha, Shantenu Rutgers State Univ ECE RADICAL Piscataway NJ 08854 USA Ludwig Maximilians Univ Munchen Munich Germany Clemson Univ Clemson SC 29631 USA Brookhaven Natl Lab Upton NY 11973 USA

ISBN: (纸本)9781665435772

Many science and industry IoT applications necessitate data processing across the edge-to-cloud continuum to meet performance, security, cost, and privacy requirements. However, diverse abstractions and infrastructures for managing resources and tasks across the edge-to-cloud scenario are required. We propose Pilot-Edge as a common abstraction for resource management across the edge-to-cloud continuum. Pilot-Edge is based on the pilot abstraction, which decouples resource and workload management, and provides a Function-as-a-Service (FaaS) interface for application-level tasks. the abstraction allows applications to encapsulate common functions in high-level tasks that can then be configured and deployed across the continuum. We characterize Pilot-Edge on geographically distributed infrastructures using machine learning workloads (e. g., k-means and auto-encoders). Our experiments demonstrate how Pilot-Edge manages distributed resources and allows applications to evaluate task placement based on multiple factors (e. g., model complexities, throughput, and latency).

关键词： Edge cloud IoT abstractions machine learning

来源：评论

学校读者我要写书评

暂无评论

Noise-Resilient Empirical Performance Modeling with Deep Neural Networks 35

Noise-Resilient Empirical Performance Modeling with Deep Neu...

引用

35th ieee international parallel and distributed processing symposium (IPDPS)

作者： Ritter, Marcus Geiss, Alexander Wehrstein, Johannes Calotoiu, Alexandru Reimann, thorsten Hoefler, Torsten Wolf, Felix Tech Univ Darmstadt Dept Comp Sci Darmstadt Germany Swiss Fed Inst Technol Dept Comp Sci Zurich Switzerland

ISBN: (纸本)9781665440660

Empirical performance modeling is a proven instrument to analyze the scaling behavior of HPC applications. Using a set of smaller-scale experiments, it can provide important insights into application behavior at larger scales. Extra-P is an empirical modeling tool that applies linear regression to automatically generate human-readable performance models. Similar to other regression-based modeling techniques, the accuracy of the models created by Extra-I' decreases as the amount of noise in the underlying data increases. this is why the performance variability observed in many contemporary systems can become a serious challenge. In this paper, we introduce a novel adaptive modeling approach that makes Extra-P more noise resilient, exploiting the ability of deep neural networks to discover the effects of numerical parameters, such as the number of processes or the problem size, on performance when dealing with noisy measurements. Using synthetic analysis and data from three different case studies, we demonstrate that our solution improves the model accuracy at high noise levels by up to 25% while increasing their predictive power by about 15%.

关键词： Performance analysis performance modeling deep learning artificial neural networks high performance computing parallel processing

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共258页 << < 14 15 16 17 18 19 20 21 22 23 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：