检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

5,157 篇 会议
50 篇 期刊文献
19 册 图书

馆藏范围

5,226 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

2,474 篇 工学
- 2,331 篇 计算机科学与技术...
- 1,202 篇 软件工程
- 559 篇 电气工程
- 345 篇 信息与通信工程
- 232 篇 电子科学与技术（可...
- 202 篇 控制科学与工程
- 137 篇 网络空间安全
- 63 篇 动力工程及工程热...
- 43 篇 机械工程
- 40 篇 生物工程
- 29 篇 建筑学
- 29 篇 生物医学工程（可授...
- 28 篇 光学工程
- 28 篇 土木工程
- 27 篇 仪器科学与技术
- 22 篇 环境科学与工程（可...
- 19 篇 材料科学与工程（可...
- 18 篇 安全科学与工程
525 篇 理学
- 373 篇 数学
- 72 篇 物理学
- 65 篇 系统科学
- 48 篇 生物学
- 37 篇 统计学（可授理学、...
443 篇 管理学
- 262 篇 管理科学与工程(可...
- 197 篇 图书情报与档案管...
- 130 篇 工商管理
33 篇 经济学
- 33 篇 应用经济学
28 篇 医学
- 21 篇 临床医学
- 17 篇 基础医学(可授医学...
20 篇 法学
- 15 篇 社会学
13 篇 农学
9 篇 教育学
1 篇 文学

主题

1,759 篇 computer archite...
677 篇 high performance...
615 篇 hardware
463 篇 computational mo...
366 篇 parallel process...
352 篇 concurrent compu...
304 篇 application soft...
252 篇 bandwidth
247 篇 computer science
233 篇 distributed comp...
211 篇 graphics process...
205 篇 kernel
196 篇 costs
195 篇 scalability
195 篇 grid computing
193 篇 throughput
190 篇 cloud computing
184 篇 resource managem...
174 篇 benchmark testin...
172 篇 processor schedu...

机构

32 篇 university of ch...
15 篇 college of compu...
14 篇 ibm thomas j. wa...
14 篇 barcelona superc...
14 篇 mathematics and ...
13 篇 georgia inst tec...
13 篇 school of comput...
12 篇 oak ridge nation...
12 篇 mathematics and ...
12 篇 department of co...
11 篇 intel corporatio...
11 篇 univ fed rio gra...
10 篇 department of co...
10 篇 intel corp santa...
10 篇 oak ridge nation...
9 篇 univ chicago dep...
9 篇 computer science...
9 篇 oak ridge nation...
9 篇 institute of com...
8 篇 university of sc...

作者

16 篇 navaux philippe ...
13 篇 hai jin
11 篇 dhabaleswar k. p...
11 篇 borin edson
11 篇 xiaofei liao
11 篇 prasanna viktor ...
11 篇 wen-mei w. hwu
10 篇 jack dongarra
10 篇 panda dhabaleswa...
10 篇 i. foster
10 篇 d.k. panda
9 篇 dongarra jack
9 篇 renato ferreira
9 篇 vetter jeffrey s...
9 篇 mutlu onur
9 篇 jie zhang
8 篇 wang lei
8 篇 mateo valero
8 篇 hari subramoni
8 篇 guedes dorgival

语言

5,126 篇 英文
94 篇 其他
7 篇 中文
1 篇 葡萄牙文

检索条件"任意字段=2024 International Symposium on Computer Architecture and High Performance Computing Workshops"

共 5226 条记录，以下是451-460 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

PMBS 2024: 15th IEEE international Workshop on performance Modeling, Benchmarking, and Simulation of high performance computer Systems

PMBS 2024: 15th IEEE International Workshop on Performance M...

引用

high performance computing, Networking, Storage and Analysis, SC-W: workshops of the international Conference for

来源：评论

学校读者我要写书评

暂无评论

ETTE: Efficient Tensor-Train-based computing Engine for Deep Neural Networks 23

ETTE: Efficient Tensor-Train-based Computing Engine for Deep...

引用

50th Annual international symposium on computer architecture (ISCA)

作者： Gong, Yu Yin, Miao Huang, Lingyi Xiao, Jinqi Sui, Yang Deng, Chunhua Yuan, Bo Rutgers State Univ New Brunswick NJ 08901 USA ScaleFlux Inc Milpitas CA USA

ISBN: (纸本)9798400700958

Tensor-train (TT) decomposition enables ultra-high compression ratio, making the deep neural network (DNN) accelerators based on this method very attractive. TIE, the state-of-the-art TT based DNN accelerator, achieved high performance by leveraging a compact inference scheme to remove unnecessary computations and memory access. However, TIE increases memory costs for stage-wise intermediate results and additional intra-layer data transfer, leading to limited speedups even the models are highly compressed. To unleash the full potential of TT decomposition, this paper proposes ETTE, an algorithm and hardware co-optimization framework for Efficient Tensor-Train Engine. At the algorithm level, ETTE proposes new tensor core construction and computation ordering mechanism to reduce stage-wise computation and storage cost at the same time. At the hardware level, ETTE proposes a lookahead-style across-stage processing scheme to eliminate the unnecessary stage-wise data movement. By fully leveraging the decoupled input and output dimension factors, ETTE develops an efficient low-cost memory partition-free access scheme to efficiently support the desired matrix transformation. We demonstrate the effectiveness of ETTE via implementing a 16PE hardware prototype with CMOS 28nm technology. Compared with GPU on various workloads, ETTE achieves 6.5x - 253.1x higher throughput and 189.2x - 9750.5x higher energy efficiency. Compared with the state-of-the-art DNN accelerators, ETTE brings 1.1x - 58.3x, 2.6x - 1170.4x and 1.8x - 2098.2x improvement on throughput, energy efficiency and area efficiency, respectively.

关键词： tensor decomposition neural networks low rank accelerator

来源：评论

学校读者我要写书评

暂无评论

Combining Lossy Compression with Multi-level Caching for Data Staging over Network

Combining Lossy Compression with Multi-level Caching for Dat...

引用

1st international Conference on Smart Energy Systems and Artificial Intelligence (SESAI)

作者： Aoyagi, Rei Takahashi, Keichi Shimomura, Yoichi Takizawa, Hiroyuki Tohoku Univ Grad Sch Informat Sci Sendai Miyagi Japan Tohoku Univ Cybersci Ctr Sendai Miyagi Japan

ISBN: (纸本)9798350364613;9798350364606

Researchers conduct post-processing on the simulation results by running an interactive data analysis tool on a high-performance computing (HPC) system installed at an HPC center and retrieving the post-processed results. Certain data analysis scenarios require to transfer the simulation results directly from the center. in such scenarios, a portion of the data would usually be streamed over the network to achieve interactivity. However, there still exist two challenges in maintaining interactivity: (1) limited network bandwidth and (2) long network latency. To tackle these challenges, we propose a system to enable interactive array analysis over the network. We employ error-bounded lossy compression to increase the effective network bandwidth. Furthermore, we employ multi-level caching to hide the network latency and combine prefetching to improve the cache hit ratio. The cache replacement and prefetching policies are designed considering the data access pattern of interactive analysis. We compared our proposed system with TileDB, one of the state-of-the-art array databases, by measuring the average latency for various access patterns. Compared to TileDB, the proposed system reduces the average latency by up to 91.6% by allowing 10% of error because the cache hit ratio was improved by more than 40% due to the proper cache replacement and prefetching policy and network transfer time was reduced more than 75% by using lossy compression.

关键词： HPC Distributed computing Data Staging Lossy Compression Interactive Analysis Bandwidth Software Caching

来源：评论

学校读者我要写书评

暂无评论

Sparse Ternary Matrix Multiplication with Tensor Core for Transformer

Sparse Ternary Matrix Multiplication with Tensor Core for Tr...

引用

international symposium on computing and Networking workshops (CANDARW)

作者： Yushi Ogiwara Hideyuki Kawashima Keio University Fujisawa

ISBN: (数字)9798331505349

ISBN: (纸本)9798331505356

The Transformer architecture, despite its scaling law, faces expensive computational cost challenges as the number of parameters increases. Quantization methods like Ternary-BERT and BitNet address this issue using ternary matrices for weight parameters. While most research focuses on accelerating ternary matrix multiplication (TMM) on specific hardware such as FPGAs, our work aims to accelerate TMM on GPUs by co-designing around the characteristics of sparse ternary matrices and GPU architecture. In this paper, we propose two TMM methods that leverage the performance of CUDA Cores and Tensor Cores, respectively. We demonstrate that the proposed methods outperform dense matrix multiplication at sparsity levels of about 88% and above.

关键词： Tensors Quantization (signal) Graphics processing units computer architecture Transformer cores Transformers Hardware Sparse matrices Field programmable gate arrays Faces

来源：评论

学校读者我要写书评

暂无评论

ADTopk: All-Dimension Top-k Compression for high-performance Data-Parallel DNN Training 24

ADTopk: All-Dimension Top-k Compression for High-Performance...

引用

33rd international symposium on high-performance Parallel and Distributed computing (HPDC)

作者： Ming, Zhangqiang Hu, Yuchong Zhou, Wenxiang Zheng, Xinjue Yao, Chenxuan Feng, Dan Huazhong Univ Sci & Technol Wuhan Hubei Peoples R China Huazhong Univ Sci & Technol Shenzhen Res Inst Shenzhen Guangdong Peoples R China

ISBN: (纸本)9798400704130

Data-parallel deep neural networks (DNN) training systems deployed across nodes have been widely used in various domains, while the system performance is often bottlenecked by the communication overhead among workers for synchronizing gradients. Top-k sparsification compression is the de facto approach to alleviate the communication bottleneck, which truncates the gradient to its largest.. elements before sending it to other nodes. However, we observe that the traditional Top-k still has performance issues: i) the gradient at each layer of a DNN is typically represented as a tensor of multiple dimensions, and the largest.. elements selected by the traditional Top-k are centered in only some of all dimensions and hence the training may miss many dimensions (we call dimension missing), which leads to low convergence performance;ii) the traditional Top-k performs the selection by globally sorting the gradient elements in each layer (we call single global sorting), which leads to a low GPU core parallelism and hence a low training throughput. In this paper, we propose an all-dimension Top-k sparsification scheme, called ADTopk, which selects the largest.. elements from all dimensions of the gradient tensor in each layer, meaning that each dimension must provide some elements, so as to avoid the dimension missing. Further, ADTopk enables each dimension to perform sorting locally within the elements of the dimension, and thus all dimensions can perform multiple local sortings independently and parallelly, instead of a single global sorting for the entire gradient tensor in each layer. On top of ADTopk, we further propose an interleaving compression scheme and an efficient threshold estimation algorithm so as to enhance the performance of ADTopk. We build a sparsification compression data-parallel DNN training framework and implement a compression library containing state-of-the-art sparsification algorithms. Experiments on a local cluster and Alibaba Cloud show that compa

关键词： high performance Data-Parallel DNN Training Gradient Sparsification Compression

来源：评论

学校读者我要写书评

暂无评论

Enabling FPGA and AI Engine Tasks in the HPX Programming Framework for Heterogeneous high-performance computing 20th

Enabling FPGA and AI Engine Tasks in the HPX Programming Fra...

引用

20th international symposium on Applied Reconfigurable computing (ARC)

作者： Kalkhof, Torben Heinz, Carsten Koch, Andreas Tech Univ Darmstadt Embedded Syst & Applicat Grp Darmstadt Germany

ISBN: (纸本)9783031556722;9783031556739

The increasing complexity of modern exascale computers, with a growing number of cores per node, poses a challenge to traditional programming models. To address this challenge, Asynchronous Many-Task (AMT) runtimes such as the C++-based HPX, divide computational problems into smaller tasks that are executed asynchronously by the runtime. By unifying the syntax and semantics of local and remote task execution, the scalability for distributed execution is enhanced. The asynchronous execution model conceals communication latency in distributed systems and eliminates global synchronization barriers, which improves the overall utilization of computation resources. While HPX and other AMT runtimes often support GPUs, there is still a lack of support for other accelerators, such as FPGAs, or more coarse-grained AI processing elements such as AMD's AI Engines (AIE). In this work, we extend the TaPaSCo framework so that TaPaSCo FPGA and AIE tasks can be transparently integrated into HPX applications. We show results for both microbenchmarks as well as the complete LULESH proxy HPC application to demonstrate this concept and evaluate the overheads. Both applications show that the combination of TaPaSCo and HPX can be efficiently used for cooperative computing between CPU software and FPGA/AIE hardware. Compared to CPU-only execution, we achieve a speedup of up to 2.4x in our stencil microbenchmark and a wall-clock speedup of 1.37x for the entire LULESH application, with 2.12x in the accelerated kernels itself. Our TaPaSCo/HPX integration is released as open-source.

关键词： FPGA task-based programming HPC AI engines

来源：评论

学校读者我要写书评

暂无评论

CK-index: A Distribution-Aware Learned Index for Composite Keys 22

CK-index: A Distribution-Aware Learned Index for Composite K...

引用

22nd IEEE international symposium on Parallel and Distributed Processing with Applications, ISPA 2024

作者： Wei, Zhengyang Ye, Baoliu Cai, Miao Hohai University College of Computer and Software China Nanjing University Department of Computer Science and Technology China Nanjing University of Aeronautics and Astronautics College of Computer Science and Technology China

ISBN: (纸本)9798331509712

The learned index is a high-performance index structure that uses machine learning methods to predict key positions in a large key space efficiently. Existing learned indexes suffer from underfitting of key-to-position mapping, leading to poor lookup performance. This paper finds that a data distribution property in the widely-used composite key schema addresses this issue effectively. Specifically, the composite key consists of an agglomerate of attributes. Keys with the same attribute value have a regular data distribution, which leads to a higher fitness of key-to-position mapping. Applying the property, we introduce CK-index, a distribution-aware learned index for composite keys. CK-index divides the key space according to attribute values and trains each learned model separately for an attribute to achieve high fitness of key-to-position mapping. Furthermore, it achieves low data storage consumption via storing composite key's attributes instead of the whole keys. We evaluate the CK-index using real-world datasets. Evaluation results demonstrate that CK-index performs much better in lookup performance, bulk loading time and space consumption compared to B+Tree, RMI, PGM-index and ALEX. © 2024 IEEE.

关键词： Composite key Data distribution Learned index

来源：评论

学校读者我要写书评

暂无评论

Modeling and Analyzing the Shared Receive Queue of RDMA 22

Modeling and Analyzing the Shared Receive Queue of RDMA

引用

22nd IEEE international symposium on Parallel and Distributed Processing with Applications, ISPA 2024

作者： Tian, Zhuang Wang, Kai Jiang, Wanchun Central South University School of Computer Science and Engineering Changsha China

ISBN: (纸本)9798331509712

Nowadays, the RDMA (Remote Direct Memory Access) technology has been broadly employed in data centers. The Shared Receive Queue (SRQ) is an embedded mechanism in RDMA protocol, which reduces the memory cost of queue pairs sharing the same receiver. However, the configurations of SRQ are often heuristic and empirical nowadays. Consequently, the Receiver Not Ready (RNR) signal would be easily triggered, leading to utilization loss in the face of dynamic traffic. In other words, configuring SRQ reasonably is the key to the performance of RDMA and remains a challenge due to the variable traffic and environment. To address this issue, we propose a theoretical model for SRQ to guide its configuration. Simulations demonstrate that the system utilization is significantly improved and the triggering of RNR signals is reduced with the proper SRQ configuration guided by the theoretical model. © 2024 IEEE.

关键词： Receiver Not Ready Remote Direct Memory Access Shared Receive Queue

来源：评论

学校读者我要写书评

暂无评论

An Innovative Optimization Framework for Aerodynamic Shape Optimization Using Deep Neural Network 3

An Innovative Optimization Framework for Aerodynamic Shape O...

引用

3rd international symposium on Aerospace Engineering and Systems, ISAES 2024

作者： Wu, Pin Zhou, Zhu Liu, Zhitao Song, Chao School of Computer Engineering and Science Shanghai University Shanghai200444 China China Aerodynamics Research And Development Center State Key Laboratory of Aerodynamics Sichuan Mianyang621000 China

ISBN: (纸本)9798350350418

It is necessary to optimize the design method of the airfoil aerodynamic shape for better performance while meeting the design requirements. However, current mainstream design methods for aerodynamic shape are based on CFD model simulations and rely heavily on manual experience design, which are very time-consuming. Therefore, we propose an innovative optimization framework based on multi-task learning for Aerodynamic Shape Optimization. Firstly, Bezier-ACGAN is used to intelligently parametrize the airfoil shape. Secondly, the MMOE-MF surrogate model is used to predict different-fidelity of aerodynamic data. Finally, GA optimizer selects the optimal shape that satisfies the optimization objective by adjusting the latent variables of Bezier-ACGAN. The performance of the proposed optimization framework is verified by two optimization design cases and results show that the predicted results of the optimal airfoils agree with the results of high-fidelity simulation well. © 2024 IEEE.

关键词： Deep neural networks

来源：评论

学校读者我要写书评

暂无评论

A Task Dependency-based Deduplicated Task Offloading Mechanism in Vehicular Edge computing 22

A Task Dependency-based Deduplicated Task Offloading Mechani...

引用

22nd IEEE international symposium on Parallel and Distributed Processing with Applications, ISPA 2024

作者： Shao, Zhenyi Liao, Zhuofan Tang, Xiaoyong Changsha University of Science and Technology School of Computer and Communication Engineering Changsha410114 China

ISBN: (纸本)9798331509712

The increasing demand for in-vehicle applications has raised the complexity and computational load, while the in-vehicle tasks exhibit a sensitivity to latency. Previous research has proposed utilizing the idle computational resources of roadside vehicles to alleviate this contradiction. However, the high mobility of vehicles leads to communication interruptions, and the time-varying nature of vehicle density makes resource allocation challenging. In this work, we leverage the dependencies between vehicular computing tasks and design a deduplication offloading mechanism for stable reduction of latency. This mechanism consists of two stages, named the Multi-hop Clustering Deduplication Offloading (MCDO) mechanism. Firstly, a Multi-hop Two Layer Clustering (MTLC) algorithm is designed to divide vehicles based on task dependencies, speed, and position information. Then, a Deduplication Layered Offloading (DLO) algorithm is proposed to identify and remove duplicated tasks within each cluster while maintaining their inter-dependencies. Simulation results demonstrate that MCDO effectively divides vehicle clusters and offloads tasks efficiently under various road conditions. Compared to existing approaches, MCDO significantly enhances system performance, achieving a minimum improvement of 15.1% in terms of latency. © 2024 IEEE.

关键词： Computation offloading

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 42 43 44 45 46 47 48 49 50 51 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：