检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

3,672 篇 会议
122 篇 期刊文献
6 册 图书

馆藏范围

3,800 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

2,671 篇 工学
- 2,547 篇 计算机科学与技术...
- 1,152 篇 软件工程
- 412 篇 信息与通信工程
- 411 篇 电气工程
- 207 篇 电子科学与技术（可...
- 136 篇 控制科学与工程
- 78 篇 网络空间安全
- 40 篇 动力工程及工程热...
- 37 篇 机械工程
- 37 篇 建筑学
- 33 篇 生物医学工程（可授...
- 29 篇 光学工程
- 29 篇 生物工程
- 28 篇 土木工程
- 22 篇 仪器科学与技术
- 20 篇 化学工程与技术
- 20 篇 安全科学与工程
- 18 篇 力学（可授工学、理...
634 篇 理学
- 493 篇 数学
- 88 篇 物理学
- 67 篇 统计学（可授理学、...
- 56 篇 系统科学
- 35 篇 生物学
- 31 篇 化学
402 篇 管理学
- 339 篇 管理科学与工程(可...
- 157 篇 工商管理
- 84 篇 图书情报与档案管...
28 篇 医学
- 25 篇 临床医学
26 篇 经济学
- 25 篇 应用经济学
18 篇 法学
- 18 篇 社会学
12 篇 农学
6 篇 教育学
3 篇 文学
1 篇 军事学
1 篇 艺术学

主题

348 篇 parallel process...
302 篇 application soft...
238 篇 distributed comp...
208 篇 computer archite...
204 篇 concurrent compu...
197 篇 hardware
181 篇 computational mo...
177 篇 parallel process...
172 篇 graphics process...
171 篇 computer science
129 篇 runtime
120 篇 parallel program...
104 篇 processor schedu...
103 篇 distributed comp...
101 篇 distributed proc...
100 篇 grid computing
97 篇 scalability
96 篇 high performance...
96 篇 delay
94 篇 libraries

机构

12 篇 school of comput...
12 篇 ohio state univ ...
10 篇 argonne natl lab...
9 篇 univ chinese aca...
9 篇 hiroshima univ d...
9 篇 oak ridge natl l...
7 篇 ibm thomas j. wa...
7 篇 oak ridge nation...
7 篇 univ warwick dep...
7 篇 carnegie mellon ...
7 篇 department of co...
7 篇 ibm corp thomas ...
6 篇 oak ridge natl l...
6 篇 iit dept comp sc...
6 篇 lawrence berkele...
6 篇 georgia inst tec...
6 篇 department of co...
6 篇 univ coll dublin...
6 篇 department of co...
6 篇 department of co...

作者

20 篇 nakano koji
17 篇 lastovetsky alex...
16 篇 ito yasuaki
11 篇 dongarra jack
11 篇 jarvis stephen a...
11 篇 sun xian-he
11 篇 agrawal gagan
10 篇 wolf felix
9 篇 schulz martin
9 篇 guo minyi
9 篇 robert yves
8 篇 hoefler torsten
8 篇 h. casanova
8 篇 prasad sushil k.
8 篇 casanova henri
8 篇 magoules frederi...
8 篇 kale laxmikant v...
8 篇 labarta jesus
7 篇 bader david a.
7 篇 m. kandemir

语言

3,792 篇 英文
6 篇 其他
1 篇 土耳其文
1 篇 中文

检索条件"任意字段=4th International Symposium on Parallel and Distributed Processing and Applications"

共 3800 条记录，以下是21-30 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

Cloud Services Enable Efficient AI-Guided Simulation Workflows across Heterogeneous Resources

Cloud Services Enable Efficient AI-Guided Simulation Workflo...

引用

37th IEEE international parallel and distributed processing symposium (IPDPS)

作者： Ward, Logan Pauloski, J. Gregory Hayot-Sasson, Valerie Chard, Ryan Babuji, Yadu Sivaraman, Ganesh Choudhury, Sutanay Chard, Kyle thakur, Rajeev Foster, Ian Argonne Natl Lab Lemont IL 60439 USA Univ Chicago Chicago IL 60637 USA Pacific Northwest Natl Lab Richland WA 99354 USA

ISBN: (纸本)9798350311990

applications that fuse machine learning and simulation can benefit from the use of multiple computing resources, with, for example, simulation codes running on highly parallel supercomputers and AI training and inference tasks on specialized accelerators. Here, we present our experiences deploying two AI-guided simulation workflows across such heterogeneous systems. A unique aspect of our approach is our use of cloud-hosted management services to manage challenging aspects of cross-resource authentication and authorization, function-as-a-service (FaaS) function invocation, and data transfer. We show that these methods can achieve performance parity with systems that rely on direct connection between resources. We achieve parity by integrating the FaaS system and data transfer capabilities with a system that passes data by reference among managers and workers, and a user-configurable steering algorithm to hide data transfer latencies. We anticipate that this ease of use can enable routine use of heterogeneous resources in computational science.

关键词： Heterogeneous Computing Function-as-a-Service Machine Learning distributed Systems Computational Steering

来源：评论

学校读者我要写书评

暂无评论

Alioth: A Machine Learning Based Interference -Aware Performance Monitor for Multi -Tenancy applications in Public Cloud 37

Alioth: A Machine Learning Based Interference -Aware Perform...

引用

37th IEEE international parallel and distributed processing symposium (IPDPS)

作者： Shi, Tianyao Yang, Yingxuan Cheng, Yunlong Gao, Xiaofeng Fang, Zhen Yang, Yongqiang Shanghai Jiao Tong Univ Shanghai Peoples R China Huawei Technol Cloud BU Beijing Peoples R China Shanghai Jiao Tong Univ Dept Comp Sci & Engn MoE Key Lab Artificial Intelligence Shanghai Peoples R China

ISBN: (纸本)9798350337662

Multi-tenancy in public clouds may lead to colocation interference on shared resources, which possibly results in performance degradation of cloud applications. Cloud providers want to know when such events happen and how serious the degradation is, to perform interference-aware migrations and alleviate the problem. However, virtual machines (VM) in Infrastructure-as-a-Service public clouds are black boxes to providers, where application-level performance information cannot be acquired. this makes performance monitoring intensely challenging as cloud providers can only rely on low-level metrics such as CPU usage and hardware counters. We propose a novel machine learning framework, Alioth, to monitor the performance degradation of cloud applications. To feed the data-hungry models, we first elaborate interference generators and conduct comprehensive co-location experiments on a testbed to build Alioth-dataset which reflects the complexity and dynamicity in real-world scenarios. then we construct Alioth by (1) augmenting features via recovering low-level metrics under no interference using denoising auto-encoders, (2) devising a transfer learning model based on domain adaptation neural network to make models generalize on test cases unseen in offline training, and (3) developing a SHAP explainer to automate feature selection and enhance model interpretability. Experiments show that Alioth achieves an average mean absolute error of 5.29% offline and 10.8% when testing on applications unseen in the training stage, outperforming the baseline methods. Alioth is also robust in signaling quality-of-service violation under dynamicity. Finally, we demonstrate a possible application of Alioth's interpretability, providing insights to benefit the decisionmaking of cloud operators. the dataset and code of Alioth have been released on Github.

关键词： QoS Interference Multi-Tenancy Public Cloud Machine Learning

来源：评论

学校读者我要写书评

暂无评论

ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs 37

ByteTransformer: A High-Performance Transformer Boosted for ...

引用

37th IEEE international parallel and distributed processing symposium (IPDPS)

作者： Zhai, Yujia Jiang, Chengquan Wang, Leyuan Jia, Xiaoying Zhang, Shang Chen, Zizhong Liu, Xin Zhu, Yibo Univ Calif Riverside Riverside CA USA ByteDance Ltd Beijing Peoples R China NVIDIA Corp Santa Clara CA USA

ISBN: (纸本)9798350337662

Transformers have become keystone models in natural language processing over the past decade. they have achieved great popularity in deep learning applications, but the increasing sizes of the parameter spaces required by transformer models generate a commensurate need to accelerate performance. Natural language processing problems are also routinely faced with variable-length sequences, as word counts commonly vary among sentences. Existing deep learning frameworks pad variable-length sequences to a maximal length, which adds significant memory and computational overhead. In this paper, we present ByteTransformer, a high-performance transformer boosted for variable-length inputs. We propose a padding-free algorithm that liberates the entire transformer from redundant computations on zero padded tokens. In addition to algorithmic-level optimization, we provide architecture-aware optimizations for transformer functional modules, especially the performancecritical algorithm Multi-Head Attention (MHA). Experimental results on an NVIDIA A100 GPU with variable-length sequence inputs validate that our fused MHA outperforms PyTorch by 6.13x. the end-to-end performance of ByteTransformer for a forward BERT transformer surpasses state-of-the-art transformer frameworks, such as PyTorch JIT, TensorFlow XLA, Tencent TurboTransformer, Microsoft DeepSpeed-Inference and NVIDIA FasterTransformer, by 87%, 131%, 138%, 74% and 55%, respectively. We also demonstrate the general applicability of our optimization methods to other BERT-like models, including ALBERT, DistilBERT, and DeBERTa.

关键词： Transformer BERT Multi-head Attention Large Language Models Natural Language processing NVIDIA GPU CUTLASS

来源：评论

学校读者我要写书评

暂无评论

Experimental Proof-of-Concept Design of Self-Power LiFi Communication System for IoT applications 14

Experimental Proof-of-Concept Design of Self-Power LiFi Comm...

引用

14th international symposium on Communication Systems, Networks and Digital Signal processing (CSNDSP)

作者： Vladuceanu, Filip Ijaz, Muhammad Ekpo, Sunday Adebisi, Bamidele Soto, Ismael Ghafoor, Salman Manchester Metropolitan Univ Dept Engn Manchester Lancs England Santiago Chile Univ Dept Elect Engn Santiago Chile NUST Natl Univ Sci & Technol SEECS Islamabad Pakistan

ISBN: (纸本)9798350348750;9798350348743

this paper experimentally investigates the performance of a proof-of-concept self-power Li-Fi System based on solar cell for future Internet of things (IoT) applications. the proposed system is capable to provide a low bandwidth connectivity and wireless energy harvesting simultaneously which consists of multiple input and multiple output (MIMO) LiFi transceiver. Different configurations of MIMO LiFi based on solar cell receiver are used in series and parallel and tested experimentally to evaluate its bandwidth and harvested power. Experimental results show that the series combination of solar cells in 4x4 achieves the higher bandwidth, B= 71.97 KHz due to better accumulation of signal to noise ratio (SNR). the larger configurations, 4x4 when connected in series achieves the higher electrical power harvested of 80 mW than 65 mW in parallel combination. this harvested power could be stepped up and stored. Furthermore, for the communication performance, an on-off keying (OOK)- non return to zero (NRZ) modulation is implemented and tested. the results show that using a SISO system, a data rate of 50 Kb/s is achieved at BER= 5x10(-3), however, the data rate is doubled to 100 Kb/s at BER= 2.8x10(-3) using a 4x4 MIMO in series due to higher SNR and improved bandwidth. the results could be further justified with the received signal eye-diagrams and histograms.

关键词： Signal to noise ratio

来源：评论

学校读者我要写书评

暂无评论

Twenty Years of Automated Methods for Mapping applications on CGRA 36

Twenty Years of Automated Methods for Mapping Applications o...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Martin, Kevin J. M. Univ Bretagne Sud Lab STICC UMR CNRS 6285 Lorient France

ISBN: (纸本)9781665497473

Coarse-Grained Reconfigurable Architectures (CGRAs) emerged about 30 years ago. the very first CGRAs were programmed manually. Fortunately, some compilation approaches appeared rapidly to automate the mapping process. Numerous surveys on these architectures exist. Other surveys also gather the tools and methods, but none of them focuses on the mapping process only. this paper focuses solely on automated methods and techniques for mapping applications on CGRA and covers the last two decades of research. this paper aims at providing the terminology, the problem formulation, and a classification of existing methods. the paper ends with research challenges and trends for the future.

关键词： CGRA Mapping Compilation

来源：评论

学校读者我要写书评

暂无评论

mpisee: MPI Profiling for Communication and Communicator Structure 36

mpisee: MPI Profiling for Communication and Communicator Str...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Vardas, Ioannis Hunold, Sascha Ajanohoun, Jordy, I Traeff, Jesper Larsson TU Wien Res Grp Parallel Comp Fac Informat Vienna Austria

ISBN: (纸本)9781665497473

Cumulative performance profiling is a fast and lightweight method for gaining summary information about where and how communication time in parallel MPI applications is spent. MPI provides mechanisms for implementing such profilers that can be transparently used with applications. Existing profilers typically profile on a process basis and record the frequency, total time, and volume of MPI operations per process. this can lead to grossly misleading cumulative information for applications that make use of MPI features for partitioning the processes into different communicators. We present a novel MPI profiler, mpisee, for communicator-centric profiling that separates and records collective and pointto-point communication information per communicator in the application. We discuss the implementation of mpisee which makes significant use of the MPI attribute mechanism. We evaluate our tool by measuring its overhead and profiling a number of standard applications. Our measurements with thirteen MPI applications show that the overhead of mpisee is less than 3 %. Moreover, using mpisee, we investigate in detail two particular MPI applications, SPLATT and GROMACS, to obtain information on the various MPI operations for the different communicators of these applications. Such information is not available by other, state-of-the-art profilers. We use the communicator-centric information to improve the performance of SPLATT resulting in a significant runtime decrease when run with 1024 processes.

关键词： Time-frequency analysis distributed processing Runtime Scalability Conferences Benchmark testing Topology

来源：评论

学校读者我要写书评

暂无评论

OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems 36

OMB-Py: Python Micro-Benchmarks for Evaluating Performance o...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Alnaasan, Nawras Jain, Arpan Shafi, Aamir Subramoni, Hari Panda, Dhabaleswar K. Ohio State Univ Dept Comp Sci & Engn Columbus OH 43210 USA

ISBN: (纸本)9781665497473

Python has become a dominant programming language for emerging areas like Machine Learning (ML), Deep Learning (DL), and Data Science (DS). An attractive feature of Python is that it provides easy-to-use programming interface while allowing library developers to enhance performance of their applications by harnessing the computing power offered by High Performance Computing (HPC) platforms. Efficient communication is key to scaling applications on parallel systems, which is typically enabled by the Message Passing Interface (MPI) standard and compliant libraries on HPC hardware. mpi4py is a Python-based communication library that provides an MPI-like interface for Python applications allowing application developers to utilize parallel processing elements including GPUs. However, there is currently no benchmark suite to evaluate communication performance of mpi4py-and Python MPI codes in general-on modern HPC systems. In order to bridge this gap, we propose OMB-Py-Python extensions to the open-source OSU Micro-Benchmark (OMB) suite-aimed to evaluate communication performance of MPI-based parallel applications in Python. To the best of our knowledge, OMBPy is the first communication benchmark suite for parallel Python applications. OMB-Py consists of a variety of point-to-point and collective communication benchmark tests that are implemented for a range of popular Python libraries including NumPy, CuPy, Numba, and PyCUDA. Our evaluation reveals that mpi4py introduces a small overhead when compared to native MPI libraries. We plan to publicly release OMB-Py to benefit the Python HPC community.

关键词： MPI Python mpi4py OMB Benchmarks HPC

来源：评论

学校读者我要写书评

暂无评论

Fast Convergence to Fairness for Reduced Long Flow Tail Latency in Datacenter Networks 36

Fast Convergence to Fairness for Reduced Long Flow Tail Late...

引用

36th IEEE international parallel and distributed processing symposium (IEEE IPDPS)

作者： Snyder, John Lebeck, Alvin R. Duke Univ Dept Comp Sci Durham NC 27706 USA

ISBN: (纸本)9781665481069

Many data-intensive applications, such as distributed deep learning and data analytics, require moving vast amounts of data between compute servers in a distributed system. To meet the demands of these applications, datacenters are adopting Remote Direct Memory Access (RDMA), which has higher bandwidth and lower latency than traditional kernel-based networking. To ensure high performance of RDMA networks, congestion control manages queue depth on switches, and historically focused on moderating queue depth to ensure small flows complete quickly. Unfortunately, one side-effect of many common decisions is that large flows are starved of bandwidth. this negatively impacts the flow completion time (FCT) of large, bandwidth-bound flows, which are integral to the performance of data-intensive applications. the FCT is particularly impacted at the tail, which is increasingly critical for predictable application performance. We identify the root causes of the poor performance for long flows and measure the impact. We then design mechanisms that improve long flow FCT without compromising small flow performance. Our evaluations show that these improvements reduce 99.9% tail FCT of long flows by over 2x.

关键词： distributed processing Protocols Additives distributed databases Tail Bandwidth throughput

来源：评论

学校读者我要写书评

暂无评论

Enabling Multi-threading in Heterogeneous Quantum-Classical Programming Models

Enabling Multi-threading in Heterogeneous Quantum-Classical ...

引用

37th IEEE international parallel and distributed processing symposium (IPDPS)

作者： Hayashi, Akihiro Adams, Austin Young, Jeffrey McCaskey, Alexander Dumitrescu, Eugene Sarkar, Vivek Conte, thomas M. Georgia Inst Technol Atlanta GA 30332 USA NVIDIA Corp Santa Clara CA USA Oak Ridge Natl Lab Oak Ridge TN USA

ISBN: (纸本)9798350311990

While quantum computers enable significant performance improvements for certain classes of applications, building a well-defined programming model has been a pressing issue. In this paper, we address some of the key limitations to realizing a generic heterogeneous parallel programming model for quantumclassical heterogeneous platforms. We discuss our experience in enabling user-level multi-threading in QCOR [1] as well as challenges that need to be addressed for programming future quantum-classical systems. Specifically, we discuss our design and implementation of introducing C++-based parallel constructs to enable 1) parallel execution of a quantum kernel with std::thread and 2) asynchronous execution with std::async. To do so, we provide a detailed overview of the current implementation of the QCOR programming model and runtime, and discuss how we add 1) thread-safety to some of its user-facing API routines, and 2) increase parallelism in QCOR by removing data races that inhibit multi-threading so as to better utilize available computing resources. We also present preliminary performance results with the Quantum++ [2] back end on a single-node Ryzen9 3900X machine that has 12 physical cores (24 hardware threads) with 128GB of RAM. the results show that running two Bell kernels with 12 threads per kernel in parallel outperforms running the kernels one after the other each with 24 threads (1.63x improvement). In addition, we observe the same trend when running two Shor's algorthm kernels in parallel (1.22x faster than executing the kernels one after the other). Furthermore, the parallel version is better in terms of strong scalability. We believe that our design, implementation, and results will open up an opportunity not only for 1) enabling quicker prototyping of parallel-aware quantum-classical algorithms on quantum circuit simulators in the short-term, but also for 2) realizing a generic parallel programming model for quantumclassical heterogeneous platforms in

关键词： Quantum-Classical Programming Models parallel Programming Models QCOR Heterogeneous Computing

来源：评论

学校读者我要写书评

暂无评论

A distributed Stealth Address Generation Protocol for threshold Signatures 22

A Distributed Stealth Address Generation Protocol for Thresh...

引用

22nd IEEE international symposium on parallel and distributed processing with applications, ISPA 2024

作者： Wang, Yujue Zhong, Lin Du, Jun Zou, Yudi He, Kevin Zhang, Andrew Hangzhou Innovation Institute Beihang University Hangzhou China Sinohope Technology Holdings Limited Hong Kong Bitlayer Labs Ltd Vistra Corporate Service Centre British Virgin Islands

ISBN: (纸本)9798331509712

Stealth addresses protect recipient identity privacy in blockchain systems by allowing a sender to derive a stealth address using the recipient's public key, with the receiver deriving a corresponding one-time private key. In threshold signatures, where distributed recipient entities only hold private key shares and the full private key never appears, this becomes challenging. To address this, we propose a distributed stealth address generation (DSAG) protocol for threshold signatures. In our approach, distributed sender entities derive the stealth address, and distributed recipient entities derive corresponding one-time private key shares using Lagrange interpolation. the correctness of these shares is verified by ensuring that they satisfy the necessary Lagrange interpolation and discrete logarithm relationships. Our protocol avoids bilinear mappings and relies solely on elliptic curve exponentiations, leading to higher efficiency. Experimental tests demonstrate that our scheme instance takes just 4 milliseconds confirming its practicality. © 2024 IEEE.

关键词： Authentication Protocol

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共380页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：