检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

10,925 篇 会议
278 篇 期刊文献
24 册 图书
1 篇 科技报告

馆藏范围

11,228 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

5,462 篇 工学
- 5,203 篇 计算机科学与技术...
- 2,309 篇 软件工程
- 883 篇 电气工程
- 573 篇 信息与通信工程
- 252 篇 控制科学与工程
- 221 篇 电子科学与技术（可...
- 137 篇 网络空间安全
- 91 篇 机械工程
- 60 篇 生物工程
- 58 篇 动力工程及工程热...
- 52 篇 生物医学工程（可授...
- 46 篇 环境科学与工程（可...
- 45 篇 仪器科学与技术
- 41 篇 安全科学与工程
- 40 篇 建筑学
- 40 篇 测绘科学与技术
- 34 篇 化学工程与技术
- 33 篇 土木工程
- 32 篇 交通运输工程
1,033 篇 理学
- 813 篇 数学
- 121 篇 系统科学
- 116 篇 统计学（可授理学、...
- 83 篇 物理学
- 70 篇 生物学
- 48 篇 化学
- 41 篇 地球物理学
988 篇 管理学
- 843 篇 管理科学与工程(可...
- 285 篇 工商管理
- 176 篇 图书情报与档案管...
57 篇 经济学
- 54 篇 应用经济学
36 篇 法学
26 篇 医学
21 篇 农学
9 篇 教育学
6 篇 文学
3 篇 艺术学
2 篇 军事学

主题

1,705 篇 concurrent compu...
1,582 篇 distributed comp...
1,152 篇 parallel process...
986 篇 computer science
963 篇 computational mo...
873 篇 computer archite...
860 篇 application soft...
724 篇 grid computing
598 篇 high performance...
591 篇 hardware
530 篇 processor schedu...
488 篇 computer network...
456 篇 resource managem...
428 篇 scalability
399 篇 costs
395 篇 bandwidth
394 篇 runtime
383 篇 parallel program...
367 篇 peer to peer com...
356 篇 delay

机构

47 篇 college of compu...
40 篇 institute of com...
30 篇 oak ridge natl l...
29 篇 ibm thomas j. wa...
25 篇 institute of com...
23 篇 inria
23 篇 oak ridge nation...
22 篇 school of comput...
21 篇 mathematics and ...
18 篇 argonne natl lab...
18 篇 department of co...
18 篇 department of co...
18 篇 tokyo institute ...
17 篇 department of co...
17 篇 georgia inst tec...
17 篇 iit dept comp sc...
16 篇 univ chinese aca...
16 篇 lawrence berkele...
16 篇 department of co...
15 篇 department of co...

作者

32 篇 dongarra jack
30 篇 jack dongarra
23 篇 p. sadayappan
21 篇 a. choudhary
21 篇 david a. bader
20 篇 maciejewski anth...
19 篇 feng wu-chun
18 篇 dhabaleswar k. p...
18 篇 hoefler torsten
17 篇 p. banerjee
17 篇 h. casanova
17 篇 bader david a.
17 篇 howard jay siege...
17 篇 schulz martin
17 篇 panda dhabaleswa...
15 篇 s.k. das
15 篇 h.j. siegel
15 篇 hai jin
14 篇 gopalakrishnan g...
14 篇 alexey lastovets...

语言

11,200 篇 英文
18 篇 其他
10 篇 中文

检索条件"任意字段=International Symposium on Parallel and Distributed Computing"

共 11228 条记录，以下是111-120 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Evaluating Energy Efficiency of GPUs using Machine Learning Benchmarks

Evaluating Energy Efficiency of GPUs using Machine Learning ...

引用

37th IEEE international parallel and distributed Processing symposium (IPDPS)

作者： Foster, Brett Taneja, Shubbhi Manzano, Joseph Barker, Kevin Worcester Polytech Inst Dept Comp Sci Worcester MA 01609 USA Pacific Northwest Natl Lab High Performance Comp Grp Richland WA USA

ISBN: (纸本)9798350311990

As we enter the exascale era, the energy efficiency and performance of High-Performance computing (HPC) systems, especially running Machine Learning (ML) applications, are becoming increasingly important. Nvidia recently released its 9th-generation HPC-grade Graphics Processing Unit (GPU) microarchitecture, Ampere, claiming significant improvements over the previous generation's Volta architecture. In this paper, we perform fine-grained power collection and assess the performance of these two HPC architectures' performance by profiling ML benchmarks. In addition, we analyze various hyperparameters, primarily the batch size and the number of GPUs, to determine their impact on these systems' performance and power efficiency. While Ampere is 3.16x more energy-efficient than Volta in isolation, this is counteracted by the PCIe interconnects of the A100s as the ML tasks are parallelized to run on more GPUs.

关键词： high-performance computing benchmarking machine learning GPU Ampere NVLink nvprof memory footprint data movement hugging face

来源：评论

学校读者我要写书评

暂无评论

A Novel DA-Based parallel Architecture for Inner-Product of Variable Vectors

A Novel DA-Based Parallel Architecture for Inner-Product of ...

引用

IEEE international symposium on Circuits and Systems (ISCAS)

作者： Kali, Anil Sabat, Samrat L. Mehert, Pramod K. Univ Hyderabad CASEST Hyderabad India CV Raman Global Univ Dept Comp Sci & Engn Bhubaneswar India

ISBN: (纸本)9798350330991;9798350331004

Computation of the inner products is frequently used in machine learning (ML) algorithms apart from signal processing and communication applications. distributed arithmetic (DA) has been frequently employed for area-time efficient inner-product implementations. In conventional DA-based architectures, one of the vectors is constant and known a priori. Hence, the traditional DA architectures are not suitable when both vectors are variable. However, computing the inner product of a pair of variable vectors is frequently used for matrix multiplication of various forms and convolutional neural networks. In this paper, we present a novel DA-based architecture for computing the inner product of variable vectors. To derive the proposed architecture, the inner product of any given length is decomposed into a set of short-length inner products, such that the inner product could be computed by successive accumulation of the results of shortlength inner products. We have designed a DA-based architecture for the computation of the short-length inner-product of variable vectors and used that in successive clock cycles to compute the whole inner-product by successive accumulation. The post-layout synthesis results using Cadence Innovus with a GPDK 90nm technology library show that the proposed DA-based parallel architecture offers significant advantages in area-delay product and energy consumption over the bit-serial DA architecture.

关键词： parallel distributed arithmetic Inner-product Radix-4 modified Booth encoding Adder tree

来源：评论

学校读者我要写书评

暂无评论

MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic 38

MUSE: A Runtime Incrementally Reconfigurable Network Adaptin...

引用

international parallel and distributed Processing symposium (IPDPS)

作者： Li, Zijian Chen, Zixuan Tang, Yiying Ai, Xin Zhu, Yuanyi Zhao, Zhigao Shao, Jiang Liu, Guowei Liu, Sen Liu, Bin Xu, Yang Fudan Univ Shanghai Peoples R China Tsinghua Univ Beijing Peoples R China

ISBN: (纸本)9798350387117;9798350387124

Interconnection network in HPC is becoming a bottleneck due to increasing traffic load. We model adaptive routing mechanisms and prove that even with advanced adaptive routing, static networks like Dragonfly cannot handle non-uniform traffic efficiently, let alone the frequently changing non-uniform traffic. Therefore, it requires architectural changes for network-wide improvements, e.g., reconfigurable networks. Existing reconfigurable networks hardly support agile reaction to traffic changes with little impact on network. Therefore, we propose MUSE1, a Dragonfly-based runtime incrementally reconfigurable network to enable a small number of link adjustments for agility and little impact on transmitting flows during every reconfiguration with optical circuit switch (OCS). Simulations with both synthetic traffic and real-world workloads prove that MUSE can prevent saturation under typical traffic patterns that cause congestion in static Dragonfly. MUSE is 30-55% better than static Dragonfly and Flexfly w.r.t commonly used performance metrics like flow completion time (FCT). We also build a MUSE prototype and demonstrate that MUSE enables 20-30% less application finish time (AFT).

关键词： high performance computing network topology reconfiguration

来源：评论

学校读者我要写书评

暂无评论

SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors 38

SYNPA: SMT Performance Analysis and Allocation of Threads to...

引用

international parallel and distributed Processing symposium (IPDPS)

作者： Navarro, Marta Feliu, Josue Petit, Salvador Gomez, Maria E. Sahuquillo, Julio Univ Politecn Valencia Valencia Spain

ISBN: (纸本)9798350387117;9798350387124

Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference within the core, which has a negative impact on individual application performance and can significantly increase the turnaround time of multi-program workloads. The severity of the interference effects depends on the competing co-runners sharing the core. Thus, it can be mitigated by applying a thread-to-core allocation policy that smartly selects applications to be run in the same core to minimize their interference. This paper presents SYNPA, a simple approach that dynamically allocates threads to cores in an SMT processor based on their run-time dynamic behavior. The approach uses a regression model to select synergistic pairs to mitigate intra-core interference. The main novelty of SYNPA is that it uses just three variables collected from the performance counters available in current ARM processors at the dispatch stage. Experimental results show that SYNPA outperforms the default Linux scheduler by around 36%, on average, in terms of turnaround time in 8-application workloads combining frontend-bound and backend-bound benchmarks.

关键词： Simultaneous Multithreading Scheduling High-Performance computing HPC applications Thread-to-Core allocation synergistic applications intra-core interference

来源：评论

学校读者我要写书评

暂无评论

Stochastic Neuromorphic Circuits for Solving MAXCUT 37

Stochastic Neuromorphic Circuits for Solving MAXCUT

引用

37th IEEE international parallel and distributed Processing symposium (IPDPS)

作者： Theilman, Bradley H. Wang, Yipu Parekht, Ojas Severa, William Smith, J. Darby Aimone, James B. Sandia Natl Labs Neural Explorat & Res Lab POB 5800 Albuquerque NM 87185 USA Sandia Natl Labs Discrete Math & Optimizat Albuquerque NM USA

ISBN: (纸本)9798350337662

Finding the maximum cut of a graph (MAXCUT) is a classic optimization problem that has motivated parallel algorithm development. While approximate algorithms to MAXCUT offer attractive theoretical guarantees and demonstrate compelling empirical performance, such approximation approaches can shift the dominant computational cost to the stochastic sampling operations. Neuromorphic computing, which uses the organizing principles of the nervous system to inspire new parallel computing architectures, offers a possible solution. One ubiquitous feature of natural brains is stochasticity: the individual elements of biological neural networks possess an intrinsic randomness that serves as a resource enabling their unique computational capacities. By designing circuits and algorithms that make use of randomness similarly to natural brains, we hypothesize that the intrinsic randomness in microelectronics devices could be turned into a valuable component of a neuromorphic architecture enabling more efficient computations. Here, we present neuromorphic circuits that transform the stochastic behavior of a pool of random devices into useful correlations that drive stochastic solutions to MAXCUT. We show that these circuits perform favorably in comparison to software solvers and argue that this neuromorphic hardware implementation provides a path for scaling advantages. This work demonstrates the utility of combining neuromorphic principles with intrinsic randomness as a computational resource for new computational architectures.

关键词： discrete optimization MAXCUT neuromorphic computing stochastic algorithms

来源：评论

学校读者我要写书评

暂无评论

Performance Analysis and Optimization of Nvidia H100 Confidential computing for AI Workloads 22

Performance Analysis and Optimization of Nvidia H100 Confide...

引用

22nd IEEE international symposium on parallel and distributed Processing with Applications, ISPA 2024

作者： Tan, Yifan Mi, Zeyu Institute of Parallel and Distributed Systems Seiee Shanghai Jiao Tong University China

ISBN: (纸本)9798331509712

NVIDIA's H100 Confidential computing (CC) counters the security hazards inherent in cloud AI workloads. It enforces data encryption to achieve data confidentiality, which leads to substantial throughput reductions as high as 93% in various AI workloads (such as TensorRT, PEFT and vLLM). Confronting this substantial overhead issue, we first delve into the underlying causes through meticulous analysis. This groundwork enables us to devise an innovative runtime system that operates seamlessly in the background, completely transparent to end-users. The cornerstone of our system lies in leveraging multiple encryption workers. Experiments demonstrate that our solution effectively reduces throughput drop to less than 28.1%. © 2024 IEEE.

关键词： Confidential computing GPU H100 LLM

来源：评论

学校读者我要写书评

暂无评论

rFaaS: Enabling High Performance Serverless with RDMA and Leases 37

rFaaS: Enabling High Performance Serverless with RDMA and Le...

引用

37th IEEE international parallel and distributed Processing symposium (IPDPS)

作者： Copik, Martin Taranov, Konstanfin Calotoiu, Alexandru Hoefler, Torsten Swiss Fed Inst Technol Dept Comp Sci Zurich Switzerland Microsoft Redmond WA USA

ISBN: (纸本)9798350337662

High performance is needed in many computing systems, from batch-managed supercomputers to general-purpose cloud platforms. However, scientific clusters lack elastic parallelism, while clouds cannot offer competitive costs for highperformance applications. In this work, we investigate how modern cloud programming paradigms can bring the elasticity needed to allocate idle resources, decreasing computation costs and improving overall data center efficiency. Function-as-aService (FaaS) brings the pay-as-you-go execution of stateless functions, but its performance characteristics cannot match coarse-grained cloud and cluster allocations. To make serverless computing viable for high-performance and latency-sensitive applications, we present rFaaS, an RDMA-accelerated FaaS platform. We identify critical limitations of serverless - centralized scheduling and inefficient network transport - and improve the FaaS architecture with allocation leases and microsecond invocations. We show that our remote functions add only negligible overhead on top of the fastest available networks, and we decrease the execution latency by orders of magnitude compared to contemporary FaaS systems. Furthermore, we demonstrate the performance of rFaaS by evaluating real-world FaaS benchmarks and parallel applications. Overall, our results show that new allocation policies and remote memory access help FaaS applications achieve high performance and bring serverless computing to HPC.

关键词： Serverless Function-as-a-Service HighPerformance computing RDMA

来源：评论

学校读者我要写书评

暂无评论

ArkFS: A distributed File System on Object Storage for Archiving Data in HPC Environment 37

ArkFS: A Distributed File System on Object Storage for Archi...

引用

37th IEEE international parallel and distributed Processing symposium (IPDPS)

作者： Cho, Kyu-Jin Kang, Injae Kim, Jin-Soo Seoul Natl Univ Seoul South Korea

ISBN: (纸本)9798350337662

As the burst buffer is being widely deployed in the HPC (High-Performance computing) systems, the distributed file system layer is taking the role of campaign storage where scalability and cost-effectiveness are of paramount importance. However, the centralized metadata management in the distributed file system layer poses a scalability challenge. The object storage system has emerged as an alternative thanks to its simplified interface and scale-out architecture. Despite this, the HPC communities are used to working with the POSIX interface to organize their files into a global directory hierarchy and control access through access control lists. In this paper, we present ArkFS, a near-POSIX compliant and scalable distributed file system implemented on top of the object storage system. ArkFS achieves high scalability without any centralized metadata servers. Instead, ArkFS lets each client manage a portion of the file system metadata on a perdirectory basis. ArkFS supports any distributed object storage system such as Ceph RADOS or S3-compatible system with an appropriate API translation module. Our experimental results indicate that ArkFS shows significant performance improvement under metadata-intensive workloads while showing near-linear scalability. We also demonstrate that ArkFS is suitable for handling the bursty I/O traffic coming from the burst buffer layer to archive cold data.

关键词： High-performance computing distributed file system Object storage

来源：评论

学校读者我要写书评

暂无评论

A parallel Machine Learning Workflow for Neutron Scattering Data Analysis

A Parallel Machine Learning Workflow for Neutron Scattering ...

引用

37th IEEE international parallel and distributed Processing symposium (IPDPS)

作者： Wang, Tianle Seal, Sudip K. Kannan, Ramakrishnan Garcia-Cardona, Cristina Proffen, Thomas Jha, Shantenu Brookhaven Natl Lab Computat Sci Initiat Upton NY 11973 USA Oak Ridge Natl Lab Comp Sci & Math Div Oak Ridge TN USA Los Alamos Natl Lab Comp Computat & Stat Sci Div Los Alamos NM USA Oak Ridge Natl Lab Spallat Neutron Source Oak Ridge TN USA

ISBN: (纸本)9798350311990

As part of a larger effort, this work-in-progress reports the possible advantages of modifying conventional workflows used to generate labelled training samples and train machine learning (ML) models on them. We compare results from three different workflows using neutron scattering data analysis as the motivating application and report about 20% improvement in speedup, with no appreciable loss of model accuracy, over a baseline workflow.

关键词： heterogeneous computing Neutron Data Analysis parallel Machine Learning Workflow

来源：评论

学校读者我要写书评

暂无评论

Fast, Accurate and distributed Simulation of novel HPC systems incorporating ARM and RISC-V CPUs 24

Fast, Accurate and Distributed Simulation of novel HPC syste...

引用

33rd international symposium on High-Performance parallel and distributed computing (HPDC)

作者： Tampouratzis, Nikolaos Papaefstathiou, Ioannis EXAPSYS Plc Exascale Performance Syst Iraklion Greece

ISBN: (纸本)9798400704130

The growing developments of HPC systems used in a plethora of domains (healthcare, financial services, government and defense, energy) triggers an urgent demand for simulation frameworks that can simulate, in an integrated manner, both processing and network components of an HPC system-under-design (SuD). The main problem, however, is that, currently, there is a shortage of simulation frameworks that can handle the simulation of actual HPC systems, including the hardware, complete software stack and network dynamics in an integrated manner. In this work we start from the first known, open-source, fully-distributed Cloud simulation framework, COSSIM, and, as part of the RED-SEA1 and Vitamin-V2 European projects, we extend it so as to be able to accurately simulate HPC systems. The extended simulator has been evaluated when executing the very-widely used HPCG & LAMMPS benchmarks on both ARM & RISC-V architectures;the results demonstrate that the presented approach has up to 95% accuracy in the reported SuD aspects.

关键词： HPC Simulator distributed Systems Simulator ARM RISC-V

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 8 9 10 11 12 13 14 15 16 17 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：