检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

2,501 篇 会议
69 篇 期刊文献

馆藏范围

2,570 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

1,951 篇 工学
- 1,911 篇 计算机科学与技术...
- 889 篇 软件工程
- 388 篇 信息与通信工程
- 292 篇 电气工程
- 139 篇 电子科学与技术（可...
- 102 篇 控制科学与工程
- 66 篇 网络空间安全
- 32 篇 动力工程及工程热...
- 25 篇 建筑学
- 24 篇 机械工程
- 21 篇 生物医学工程（可授...
- 20 篇 土木工程
- 16 篇 生物工程
- 15 篇 交通运输工程
- 15 篇 安全科学与工程
- 14 篇 环境科学与工程（可...
- 13 篇 光学工程
- 10 篇 力学（可授工学、理...
- 10 篇 化学工程与技术
419 篇 理学
- 345 篇 数学
- 42 篇 统计学（可授理学、...
- 40 篇 系统科学
- 38 篇 物理学
- 21 篇 生物学
- 18 篇 化学
353 篇 管理学
- 304 篇 管理科学与工程(可...
- 126 篇 工商管理
- 74 篇 图书情报与档案管...
23 篇 经济学
- 22 篇 应用经济学
14 篇 法学
- 14 篇 社会学
8 篇 农学
7 篇 医学
5 篇 教育学
2 篇 文学
2 篇 军事学

主题

188 篇 parallel process...
155 篇 application soft...
137 篇 graphics process...
130 篇 parallel process...
122 篇 computer archite...
114 篇 hardware
110 篇 computational mo...
102 篇 distributed comp...
101 篇 concurrent compu...
94 篇 computer science
86 篇 runtime
86 篇 distributed comp...
84 篇 parallel program...
67 篇 scalability
65 篇 graphics process...
61 篇 libraries
61 篇 instruction sets
60 篇 resource managem...
56 篇 kernel
55 篇 bandwidth

机构

15 篇 oak ridge natl l...
12 篇 cent s univ sch ...
11 篇 argonne natl lab...
11 篇 univ tennessee k...
10 篇 guangzhou univ s...
10 篇 school of comput...
10 篇 univ manchester ...
10 篇 ohio state univ ...
8 篇 oak ridge natl l...
7 篇 univ chinese aca...
7 篇 hunan univ coll ...
6 篇 chinese acad sci...
6 篇 iit dept comp sc...
6 篇 oak ridge nation...
6 篇 hunan engn lab r...
6 篇 univ illinois de...
6 篇 department of co...
5 篇 univ sci & techn...
5 篇 georgia state un...
5 篇 georgia inst tec...

作者

16 篇 dongarra jack
13 篇 wang guojun
12 篇 sun xian-he
10 篇 cerin christophe
9 篇 schulz martin
9 篇 guo minyi
9 篇 agrawal gagan
9 篇 wolf felix
9 篇 robert yves
8 篇 matsuoka satoshi
8 篇 jin hai
7 篇 li kenli
7 篇 prasad sushil k.
7 篇 banicescu ioana
7 篇 antoniu gabriel
7 篇 kale laxmikant v...
7 篇 zhou xuehai
7 篇 labarta jesus
6 篇 li xi
6 篇 hoefler torsten

语言

2,568 篇 英文
1 篇 葡萄牙文
1 篇 其他

检索条件"任意字段=13th IEEE International Symposium on Parallel and Distributed Processing with Applications"

共 2570 条记录，以下是81-90 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

CSC: Collaborative System Configuration for I/O-Intensive applications in Multi-Tenant Clouds 36

CSC: Collaborative System Configuration for I/O-Intensive Ap...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Huang, Haowei Pang, Pu Chen, Quan Zhao, Jieru Zheng, Wenli Guo, Minyi Shanghai Jiao Tong Univ Dept Comp Sci & Engn Shanghai Peoples R China

ISBN: (纸本)9781665481069

I/O-intensive applications are important workloads of public clouds. Multiple cloud applications co-run on the same physical machine in different virtual machines (VMs), and the shared resources (e.g., disk bandwidth) are often isolated for fairness. Our investigation shows that the performance of an I/O-intensive application is impacted by both disk bandwidth allocation and the page cache settings in the guest operating system. However, none of prior work considers adjusting the page cache settings for better performance, when the disk bandwidth allocation is adjusted. We therefore propose CSC, a system that collaboratively identifies the appropriate disk bandwidth allocation and page cache settings in the guest operating system of each VM. CSC aims to improve the system-wide I/O throughput of the physical machine, while also improve the I/O throughput of each individual I/O-intensive application in VMs. CSC comprises an online disk bandwidth allocator and an adaptive dirty page setting optimizer. the bandwidth allocator monitors the disk bandwidth utilization and re-allocates some bandwidth from free VMs to busy VMs periodically. After the re-allocation, the optimizer identifies the appropriate dirty page settings in the guest operating system of the VMs using Bayesian Optimization. the experimental results show that CSC improves the performance of I/O-intensive applications by 9.5% on average (up to 17.29%) when 5 VMs are co-located while fairness is guaranteed.

关键词： I/O-intensive applications page cache multi-tenant clouds

来源：评论

学校读者我要写书评

暂无评论

Benchmarking the Linear Algebra Awareness of TensorFlow and PyTorch 36

Benchmarking the Linear Algebra Awareness of TensorFlow and ...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Sankaran, Aravind Alashti, Navid Akbari Psarras, Christos Bientinesi, Paolo Rhein Westfal TH Aachen Aachen Germany Umea Univ Umea Sweden

ISBN: (纸本)9781665497473

Linear algebra operations, which are ubiquitous in machine learning, form major performance bottlenecks. the High-Performance Computing community invests significant effort in the development of architecture-specific optimized kernels, such as those provided by the BLAS and LAPACK libraries, to speed up linear algebra operations. However, end users are progressively less likely to go through the error prone and time-consuming process of directly using said kernels;instead, frameworks such as TensorFlow (TF) and PyTorch (PyT), which facilitate the development of machine learning applications, are becoming more and more popular. Although such frameworks link to BLAS and LAPACK, it is not clear whether or not they make use of linear algebra knowledge to speed up computations. For this reason, in this paper we develop benchmarks to investigate the linear algebra optimization capabilities of TF and PyT. Our analyses reveal that a number of linear algebra optimizations are still missing;for instance, reducing the number of scalar operations by applying the distributive law, and automatically identifying the optimal parenthesization of a matrix chain. In this work, we focus on linear algebra computations in TF and PyT;we both expose opportunities for performance enhancement to the benefit of the developers of the frameworks and provide end users with guidelines on how to achieve performance gains.

关键词： Performance analysis Machine Learning Linear Algebra

来源：评论

学校读者我要写书评

暂无评论

ASimulated Logistic Algorithm for distributed Flexible Assembly Flowshop Scheduling with Post-processing 15

ASimulated Logistic Algorithm for Distributed Flexible Assem...

引用

15th international symposium on Industrial Electronics and applications, INDEL 2024

作者： Hao, Haiqiang Zhu, Haiping School of Mechanical Science and Engineering Huazhong University of Science and Technology Wuhan China

ISBN: (纸本)9798350352320

An optimized mathematical model has been developed to streamline gas turbine production by targeting the minimization of the overall completion time, known as makespan. the production process is broken down into a series of stages: jobs are processed across multiple parallel factories, assembled into semi-finished goods with flexibility in their assembly order, and then transformed into final products through various stages of additional processing. To initiate solutions, a three-tier composite heuristic is employed, supplemented by six specialized neighborhood structures devised according to the different stages of the problem. Drawing inspiration from natural population growth mechanisms, we introduce a Simulated Logistic Algorithm (SLA) designed for robust solution enhancement. Comparative experiments on both small and extensive scales demonstrate the SLA's superior performance over established optimization techniques such as Particle Swarm Optimization, Tabu Search, and Genetic Algorithms, especially notable in large-scale applications. thorough mathematical and statistical evaluations reinforce the efficacy of the SLA and validate the proposed approach. © 2024 ieee.

关键词： Genetic algorithms

来源：评论

学校读者我要写书评

暂无评论

An Architecture-Independent CGRA Compiler enabling OpenMP applications 36

An Architecture-Independent CGRA Compiler enabling OpenMP Ap...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Kojima, Takuya Adhi, Boma Cortes, Carlos Tan, Yiyu Sano, Kentaro Univ Tokyo Grad Sch Informat Sci & Technol Tokyo Japan RIKEN Ctr Computat Sci R CCS Kobe Hyogo Japan

ISBN: (纸本)9781665497473

Coarse-Grained reconfigurable architecture (CGRA) is a promising platform for HPC systems in the post-Moore's era. A single-source programming model is essential for practical heterogeneous computing. However, we do not have a canonical programming model and a frontend compiler for it. Existing versatile CGRAs, in respect to their execution model, computational capability, and system structure, magnify the difficulty of orchestrating the compiler techniques. It consequently forces designers of the CGRAs to develop the compiler from scratch, working only for their architectures. Such an approach is outdated, given other successful accelerators like GPU and FPGAs. this paper presents a new CGRA compiler framework in order to reduce development efforts of CGRA applications. OpenMP annotated codes are fed into the proposed compiler, as recent OpenMP support device offloading to the accelerators. this property improves the reusability of the existing source code for HPC workloads. the design of the compiler is inspired by LLVM, which is the most famous compiler framework so that the frontend is built to be architecture-independent. In this work, we demonstrate that the proposed compiler can handle different types of CGRAs without changing the source codes. In addition, we discuss the effect of architecture-independent optimization algorithms. We also provide an open-source implementation of the compiler framework at https://***/hal-lab-u-tokyo/CGRAOmp.

关键词： Codes Runtime Computational modeling Graphics processing units Programming Heterogeneous networks Reconfigurable architectures

来源：评论

学校读者我要写书评

暂无评论

Mnemonic: A parallel Subgraph Matching System for Streaming Graphs 36

Mnemonic: A Parallel Subgraph Matching System for Streaming ...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Bhattarai, Bibek Huang, Howie George Washington Univ Washington DC 20052 USA

ISBN: (纸本)9781665481069

Finding patterns in large highly connected datasets is critical for value discovery in business development and scientific research. this work focuses on the problem of subgraph matching on streaming graphs, which provides utility in a myriad of real-world applications ranging from social network analysis to cybersecurity. Each application poses a different set of control parameters, including the restrictions for a match, type of data stream, and search granularity. the problem-driven design of existing subgraph matching systems makes them challenging to apply for different problem domains. this paper presents Mnemonic, a programmable system that provides a high-level API and democratizes the development of a wide variety of subgraph matching solutions. Importantly, Mnemonic also delivers key data management capabilities and optimizations to support real-time processing on long-running, high-velocity multi-relational graph streams. the experiments demonstrate the versatility of Mnemonic, as it outperforms several state-of-the-art systems by up to two orders of magnitude.

关键词： subgraph graph pattern matching isomorphism streaming

来源：评论

学校读者我要写书评

暂无评论

Shared-Memory parallel Algorithms for Fully Dynamic Maintenance of 2-Connected Components 36

Shared-Memory Parallel Algorithms for Fully Dynamic Maintena...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Haryan, Chirayu Anant Ramakrishna, G. Kothapalli, Kishore Banerjee, Dip Sankar Indian Inst Technol Tirupati Tirupati Andhra Pradesh India Int Inst Informat Technol Hyderabad Hyderabad India Indian Inst Technol Jodhpur Jodhpur Rajasthan India

ISBN: (纸本)9781665481069

Finding the biconnected components of a graph has a large number of applications in many other graph problems including planarity testing, computing the centrality metrics, finding the (weighted) vertex cover, coloring, and the like. Recent years saw the design of efficient algorithms for this problem across sequential and parallel computational models. However, current algorithms do not work in the setting where the underlying graph changes over time in a dynamic manner via the insertion or deletion of edges. the insertion or deletion of edges. Dynamic algorithms in the sequential setting that obtain the biconnected components of a graph upon insertion or deletion of a single edge are known from over two decades ago. parallel algorithms for this problem are not heavily studied. In this paper, we design shared-memory parallel algorithms that obtain the biconnected components of a graph subsequent to the insertion or deletion of a batch of edges. Our algorithms hence will be capable of exploiting the parallelism adduced due to a batch of updates. We implement our algorithms on an AMD EPYC 7742 CPU having 128 cores. Our experiments on a collection of 10 realworld graphs from multiple classes indicate that our algorithms outperform parallel state-of-the-art static algorithms.

关键词： Semiconductor device modeling Measurement distributed processing Heuristic algorithms Instruction sets Computational modeling Maintenance engineering

来源：评论

学校读者我要写书评

暂无评论

Dynamic Task Shaping for High throughput Data Analysis applications in High Energy Physics 36

Dynamic Task Shaping for High Throughput Data Analysis Appli...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Tovar, Ben Lyons, Ben Mohrman, Kelci Sly-Delgado, Barry Lannon, Kevin thain, Douglas Univ Notre Dame Dept Comp Sci & Engn Notre Dame IN 46556 USA Univ Notre Dame Dept Phys Notre Dame IN 46556 USA

ISBN: (纸本)9781665481069

distributed data analysis frameworks are widely used for processing large datasets generated by instruments in scientific fields such as astronomy, genomics, and particle physics. Such frameworks partition petabyte-size datasets into chunks and execute many parallel tasks to search for common patterns, locate unusual signals, or compute aggregate properties. When well-configured, such frameworks make it easy to churn through large quantities of data on large clusters. However, configuring frameworks presents a challenge for end users, who must select a variety of parameters such as the blocking of the input data, the number of tasks, the resources allocated to each task, and the size of nodes on which they run. If poorly configured, the result may perform many orders of magnitude worse than optimal, or the application may even fail to make progress at all. Even if a good configuration is found through painstaking observations, the performance may change drastically when the input data or analysis kernel changes. this paper considers the problem of automatically configuring a data analysis application for high energy physics (TopEFT) built upon standard frameworks for physics analysis (Coffea) and distributed tasking (Work Queue). We observe the inherent variability within the application, demonstrate the problems of poor configuration, and then develop several techniques for automatically sizing tasks to meet goals of resource consumption, and overall application completion.

关键词： distributed processing High energy physics Data analysis Instruments Genomics throughput Task analysis

来源：评论

学校读者我要写书评

暂无评论

Generalized Flow-Graph Programming Using Template Task-Graphs: Initial Implementation and Assessment 36

Generalized Flow-Graph Programming Using Template Task-Graph...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Schuchart, Joseph Nookala, Poornima Javanmard, Mohammad Mahdi Herault, thomas Valeev, Edward F. Bosilca, George Harrison, Robert J. Univ Tennessee Innovat Comp Lab Knoxville TN 37996 USA SUNY Stony Brook Inst Adv Computat Sci Stony Brook NY 11794 USA Meta Platforms Inc New York NY USA Virginia Polytech Inst & State Univ Dept Chem Blacksburg VA 24061 USA

ISBN: (纸本)9781665481069

We present and evaluate TTG, a novel programming model and its C++ implementation that by marrying the ideas of control and data flowgraph programming supports compact specification and efficient distributed execution of dynamic and irregular applications. Programming interfaces that support taskbased execution often only support shared memory parallel environments;a few support distributed memory environments, either by discovering the entire DAG of tasks on all processes, or by introducing explicit communications. the first approach limits scalability, while the second increases the complexity of programming. We demonstrate how TTG can address these issues without sacrificing scalability or programmability by providing higher-level abstractions than conventionally provided by taskcentric programming systems, without impeding the ability of these runtimes to manage task creation and execution as well as data and resource management efficiently. TTG supports distributed memory execution over 2 different task runtimes, PaRSEC and MADNESS. Performance of four paradigmatic applications (in graph analytics, dense and block-sparse linear algebra, and numerical integrodifferential calculus) with various degrees of irregularity implemented in TTG is illustrated on large distributed-memory platforms and compared to the state-of-theart implementations.

关键词： Hybrid programming techniques in applications

来源：评论

学校读者我要写书评

暂无评论

Dalorex: A Data-Local Program Execution and Architecture for Memory-bound applications 29

Dalorex: A Data-Local Program Execution and Architecture for...

引用

29th ieee international symposium on High-Performance Computer Architecture (HPCA)

作者： Orenes-Vera, Marcelo Tureci, Esin Wentzlaff, David Martonosi, Margaret Princeton Univ Princeton NJ 08544 USA

ISBN: (纸本)9781665476522

applications with low data reuse and frequent irregular memory accesses, such as graph or sparse linear algebra workloads, fail to scale well due to memory bottlenecks and poor core utilization. While prior work with prefetching, decoupling, or pipelining can mitigate memory latency and improve core utilization, memory bottlenecks persist due to limited off-chip bandwidth. Approaches doing processing in-memory (PIM) with Hybrid Memory Cube (HMC) overcome bandwidth limitations but fail to achieve high core utilization due to poor task scheduling and synchronization overheads. Moreover, the high memory-per-core ratio available with HMC limits strong scaling. We introduce Dalorex, a hardware-software co-design that achieves high parallelism and energy efficiency, demonstrating strong scaling with >16,000 cores when processing graph and sparse linear algebra workloads. Over the prior work in PIM, both using 256 cores, Dalorex improves performance and energy consumption by two orders of magnitude through (1) a tile-based distributed-memory architecture where each processing tile holds an equal amount of data, and all memory operations are local;(2) a task-based parallel programming model where tasks are executed by the processing unit that is co-located with the target data;(3) a network design optimized for irregular traffic, where all communication is one-way, and messages do not contain routing metadata;(4) novel traffic-aware task scheduling hardware that maintains high core utilization;and (5) a data-placement strategy that improves work balance. this work proposes architectural and software innovations to provide the greatest scalability to date for running graph algorithms while still being programmable for other domains.

关键词： distributed scalable parallel data-local near-memory architecture sparse graph NoC network bandwidth

来源：评论

学校读者我要写书评

暂无评论

Lightning: Scaling the GPU Programming Model Beyond a Single GPU 36

Lightning: Scaling the GPU Programming Model Beyond a Single...

引用

36th ieee international parallel and distributed processing symposium (ieee IPDPS)

作者： Heldens, Stijn Hijma, Pieter Werkhoven, Ben Van Maassen, Jason van Nieuwpoort, Rob, V Netherlands eSci Ctr Amsterdam Netherlands Univ Amsterdam Amsterdam Netherlands Vrije Univ Amsterdam Amsterdam Netherlands

ISBN: (纸本)9781665481069

the GPU programming model is primarily aimed at the development of applications that run one GPU. However, this limits the scalability of GPU code to the capabilities of a single GPU in terms of compute power and memory capacity. To scale GPU applications further, a great engineering effort is typically required: work and data must be divided over multiple GPUs by hand, possibly in multiple nodes, and data must be manually spilled from GPU memory to higher-level memories. We present Lightning: a framework that follows the common GPU programming paradigm but enables scaling to large problems with ease. Lightning supports multi-GPU execution of GPU kernels, even across multiple nodes, and seamlessly spills data to higher-level memories (main memory and disk). Existing CUDA kernels can easily be adapted for use in Lightning, with data access annotations on these kernels allowing Lightning to infer their data requirements and the dependencies between subsequent kernel launches. Lightning efficiently distributes the work/data across GPUs and maximizes efficiency by overlapping scheduling, data movement, and kernel execution when possible. We present the design and implementation of Lightning, as well as experimental results on up to 32 GPUs for eight benchmarks and one real-world application. Evaluation shows excellent performance and scalability, such as a speedup of 57.2x over the CPU using Lighting with 16 GPUs over 4 nodes and 80 GB of data, far beyond the memory capacity of one GPU.

关键词： GPU distributed computing CUDA programming model

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共257页 << < 5 6 7 8 9 10 11 12 13 14 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：