检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

2,291 篇 会议
29 篇 期刊文献

馆藏范围

2,320 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

1,454 篇 工学
- 1,375 篇 计算机科学与技术...
- 671 篇 软件工程
- 295 篇 电气工程
- 290 篇 信息与通信工程
- 125 篇 电子科学与技术（可...
- 98 篇 网络空间安全
- 73 篇 控制科学与工程
- 57 篇 动力工程及工程热...
- 40 篇 生物工程
- 34 篇 机械工程
- 21 篇 材料科学与工程（可...
- 21 篇 建筑学
- 16 篇 生物医学工程（可授...
- 15 篇 光学工程
- 14 篇 环境科学与工程（可...
- 12 篇 仪器科学与技术
- 11 篇 土木工程
409 篇 理学
- 307 篇 数学
- 49 篇 物理学
- 46 篇 生物学
- 43 篇 系统科学
- 37 篇 统计学（可授理学、...
- 11 篇 地球物理学
- 9 篇 化学
212 篇 管理学
- 155 篇 管理科学与工程(可...
- 111 篇 工商管理
- 65 篇 图书情报与档案管...
30 篇 经济学
- 30 篇 应用经济学
28 篇 法学
- 25 篇 社会学
18 篇 医学
- 14 篇 临床医学
8 篇 农学
7 篇 教育学
3 篇 文学
1 篇 艺术学

主题

530 篇 computer archite...
204 篇 high performance...
201 篇 concurrent compu...
173 篇 hardware
173 篇 distributed comp...
164 篇 application soft...
146 篇 computer science
133 篇 parallel process...
126 篇 computational mo...
125 篇 delay
117 篇 costs
115 篇 computer network...
109 篇 grid computing
96 篇 bandwidth
91 篇 laboratories
77 篇 processor schedu...
67 篇 scalability
66 篇 resource managem...
62 篇 cloud computing
56 篇 distributed comp...

机构

7 篇 univ chicago dep...
7 篇 computer science...
7 篇 carnegie mellon ...
6 篇 univ wisconsin m...
6 篇 mathematics and ...
6 篇 intel corp santa...
6 篇 mathematics and ...
6 篇 changsha univers...
6 篇 institute of com...
5 篇 penn state univ ...
5 篇 univ toronto on
5 篇 school of electr...
5 篇 georgia inst tec...
5 篇 sandia national ...
5 篇 univ illinois ur...
5 篇 computer systems...
5 篇 college of compu...
5 篇 department of co...
4 篇 department of co...
4 篇 school of comput...

作者

9 篇 i. foster
8 篇 mutlu onur
7 篇 chong frederic t...
7 篇 guedes dorgival
7 篇 zhou huiyang
7 篇 magoules frederi...
7 篇 prasanna viktor ...
6 篇 navaux philippe ...
6 篇 patt yale n.
6 篇 torrellas josep
6 篇 kim nam sung
6 篇 d.k. panda
6 篇 wen-mei w. hwu
6 篇 r.k. iyer
5 篇 xie yuan
5 篇 loh gabriel h.
5 篇 schwan karsten
5 篇 li chao
5 篇 ahamed abal-kass...
5 篇 panda dhabaleswa...

语言

2,303 篇 英文
17 篇 其他
1 篇 中文

检索条件"任意字段=Proceedings - 16th Symposium on Computer Architecture and High Performance Computing"

共 2320 条记录，以下是11-20 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

FLIPBIT: Approximate Flash Memory for IoT Devices 30

FLIPBIT: Approximate Flash Memory for IoT Devices

引用

30th IEEE International symposium on high-performance computer architecture (HPCA)

作者： Buck, Alexander Ganesan, Karthik Jerger, Natalie Enright Univ Toronto Toronto ON Canada

ISBN: (纸本)9798350393132;9798350393149

IoT devices commonly use flash memory for both data and code storage. Flash memory consumes a significant portion of the overall energy of such devices. this is problematic because IoT devices are energy constrained due to their reliance on batteries or energy harvesting. To save energy, we leverage a unique property of flash memory;write operations take unequal amounts of energy depending on if we are flipping a 1. 0 versus a 0. 1. We exploit this asymmetry to reduce energy consumption with FLIPBIT, a hardware-software approximation approach that limits costly 0. 1 transitions in flash. Instead of performing an exact write, we write an approximated value that avoids any costly 0. 1 bit flips. Using FLIPBIT, we reduce the mean energy used by flash by 68% on video streaming applications while maintaining 42 dB PSNR. On machine learning models, we reduce energy by an average of 39% and up to 71% with only a 1% accuracy loss. Additionally, by reducing the number of program-erase cycles, we increase the flash lifetime by 68%.

关键词： Approximate computing Internet-of-things Non-volatile memory

来源：评论

学校读者我要写书评

暂无评论

MIMDRAM: An End-to-End Processing-Using-DRAM System for high-throughput, Energy-Efficient and Programmer-Transparent Multiple-Instruction Multiple-Data computing 30

MIMDRAM: An End-to-End Processing-Using-DRAM System for High...

引用

30th IEEE International symposium on high-performance computer architecture (HPCA)

作者： Oliveira, Geraldo F. Olgun, Ataberk Yaglikci, Abdullah Giray Bostanci, F. Nisa Gomez-Luna, Juan Ghose, Saugata Mutlu, Onur Swiss Fed Inst Technol Zurich Switzerland Univ Illinois Champaign IL USA

ISBN: (纸本)9798350393132;9798350393149

Processing-using-DRAM (PUD) is a processing-in-memory (PIM) approach that uses a DRAM array's massive internal parallelism to execute very-wide (e.g., 16,384-262,144-bit-wide) data-parallel operations, in a single-instruction multiple-data (SIMD) fashion. However, DRAM rows' large and rigid granularity limit the effectiveness and applicability of PUD in three ways. First, since applications have varying degrees of SIMD parallelism (which is often smaller than the DRAM row granularity), PUD execution often leads to underutilization, throughput loss, and energy waste. Second, due to the high area cost of implementing interconnects that connect columns in a wide DRAM row, most PUD architectures are limited to the execution of parallel map operations, where a single operation is performed over equally-sized input and output arrays. third, the need to feed the wide DRAM row with tens of thousands of data elements combined with the lack of adequate compiler support for PUD systems create a programmability barrier, since programmers need to manually extract SIMD parallelism from an application and map computation to the PUD hardware. Our goal is to design a flexible PUD system that overcomes the limitations caused by the large and rigid granularity of PUD. To this end, we propose MIMDRAM, a hardware/software co-designed PUD system that introduces new mechanisms to allocate and control only the necessary resources for a given PUD operation. the key idea of MIMDRAM is to leverage finegrained DRAM (i.e., the ability to independently access smaller segments of a large DRAM row) for PUD computation. MIMDRAM exploits this key idea to enable a multiple-instruction multiple-data (MIMD) execution model in each DRAM subarray (and SIMD execution within each DRAM row segment). We evaluate MIMDRAM using twelve real-world applications and 495 multi-programmed application mixes. Our evaluation shows that MIMDRAM provides 34x the performance, 14.3x the energy efficiency, 1.7x the throughp

关键词： DRAM energy-efficiency hardware/software co-design memory-centric computing processing-in-memory

来源：评论

学校读者我要写书评

暂无评论

SpecFaaS: Accelerating Serverless Applications with Speculative Function Execution 29

SpecFaaS: Accelerating Serverless Applications with Speculat...

引用

29th IEEE International symposium on high-performance computer architecture (HPCA)

作者： Stojkovic, Jovan Xu, Tianyin Franke, Hubertus Torrellas, Josep Univ Illinois Urbana IL 61801 USA IBM Res Yorktown Hts NY USA

ISBN: (纸本)9781665476522

Serverless computing has emerged as a popular cloud computing paradigm. Serverless environments are convenient to users and efficient for cloud providers. However, they can induce substantial application execution overheads, especially in applications with many functions. In this paper, we propose to accelerate serverless applications with a novel approach based on software-supported speculative execution of functions. Our proposal is termed Speculative Function-as-a-Service (SpecFaaS). It is inspired by out-of-order execution in modern processors, and is grounded in a characterization analysis of FaaS applications. In SpecFaaS, functions in an application are executed early, speculatively, before their control and data dependences are resolved. Control dependences are predicted like in pipeline branch prediction, and data dependences are speculatively satisfied with memoization. With this support, the execution of downstream functions is overlapped with that of upstream functions, substantially reducing the end-to-end execution time of applications. We prototype SpecFaaS on Apache OpenWhisk, an open-source serverless computing platform. For a set of applications in a warmed-up environment, SpecFaaS attains an average speedup of 4.6x. Further, on average, the application throughput increases by 3.9x and the tail latency decreases by 58.7%.

关键词： Cloud computing Serverless computing Function-as-a-Service

来源：评论

学校读者我要写书评

暂无评论

Design of a Robust IEEE Compliant Floating - Point Divide and Square Root Using Iterative Approximation 16

Design of a Robust IEEE Compliant Floating - Point Divide an...

引用

16th IEEE Latin American symposium on Circuits and Systems, LASCAS 2025

作者： Sager, Carson Stine, James E. Oklahoma State University Vlsi Computer Architecture Research Group Electrical and Computer Engineering Department StillwaterOK74078 United States

ISBN: (纸本)9798331522124

In this paper, we discuss an IEEE 754 compliant normalized floating-point divide and square root unit that utilizes iterative approximation. We provide a robust architecture that allows multiple formats and all IEEE 754 rounding modes while still exhibiting high-performance. Moreover, we also adhere to the IEEE 754 2019 standard and demonstrate methods for rounding results to all five rounding modes using iterative approximation. performance, Power, and Area estimates are determined from physical synthesis using ARM-based standard cells in a TSMC 28nm process. this paper also presents comparisons versus other implementations and demonstrates the efficient of the approach presented here. © 2025 IEEE.

关键词： Integrated circuit design

来源：评论

学校读者我要写书评

暂无评论

LibPreemptible: Enabling Fast, Adaptive, and Hardware-Assisted User-Space Scheduling 30

LibPreemptible: Enabling Fast, Adaptive, and Hardware-Assist...

引用

30th IEEE International symposium on high-performance computer architecture (HPCA)

作者： Li, Yueying Lazarev, Nikita Koufaty, David Yin, Tenny Anderson, Andy Zhang, Zhiru Suh, G. Edward Kaffes, Kostis Delimitrou, Christina Cornell Univ Ithaca NY 14853 USA MIT Cambridge MA USA Intel Labs Hillsboro OR USA Columbia Univ New York NY USA

ISBN: (纸本)9798350393132;9798350393149

Modern cloud applications are prone to high tail latencies since their requests typically follow highly-dispersive distributions. Prior work has proposed both OS- and systemlevel solutions to reduce tail latencies for microsecond-scale workloads through better scheduling. Unfortunately, existing approaches like customized dataplane OSes, require significant OS changes, experience scalability limitations, or do not reach the full performance capabilities hardware offers. We propose LibPreemptible, a preemptive user-level threading library that is flexible, lightweight, and scalable. LibPreemptible is based on three key techniques: 1) a fast and lightweight hardware mechanism for delivery of timed interrupts, 2) a general-purpose user-level scheduling interface, and 3) an API for users to express adaptive scheduling policies tailored to the needs of their applications. Compared to the prior state-of-the-art scheduling system Shinjuku, our system achieves significant tail latency and throughput improvements for various workloads without the need to modify the kernel. We also demonstrate the flexibility of LibPreemptible across scheduling policies for real applications experiencing varying load levels and characteristics.

关键词： cloud computing datacenter performance and quality of service scheduling

来源：评论

学校读者我要写书评

暂无评论

Improved Computation of Database Operators via Vector Processing Near-Data 35

Improved Computation of Database Operators via Vector Proces...

引用

35th IEEE International symposium on computer architecture and high performance computing (SBAC-PAD)

作者： Santos, Sairo Kepe, Tiago R. Alves, Marco A. Z. Fed Rural Univ Semiarid Angicos Brazil Fed Inst Parana Curitiba Parana Brazil

ISBN: (纸本)9798350305487

Data-centric applications are increasingly more common, causing issues brought on by the discrepancy between processor and memory technologies to be increasingly more apparent. Near-Data Processing (NDP) is an approach to mitigate this issue. It proposes moving some of the computation close to the memory, thus allowing for reduced data movement and aiding data-intensive workloads. Analytical database queries are very commonly used in NDP research due to their intrinsics usage of very large volumes of data. In this paper, we investigate the migration of most time-consuming database operators to VIMA, a novel 3D-stacked memory-based NDP architecture. We consider the selection, projection, and bloom join database query operators, commonly used by data analytics applications, comparing Vector-In-Memory architecture (VIMA) to a highperformance x86 baseline. We pitch VIMA against both a single-thread baseline and a modern 16-thread x86 system to evaluate its performance. Against a single-thread baseline, our experiments show that VIMA is able to speed up execution by up to 5x for selection, 2.5x for projection, and 16x for join while consuming up to 99% less energy. When considering a multi-thread baseline, VIMA matches the execution time performance even at the largest dataset sizes considered. In comparison to existing state-of-the-art NDP platforms, we find that our approach achieves superior performance for these operators.

关键词： near-data processing high performance computing database operators

来源：评论

学校读者我要写书评

暂无评论

proceedings - 2024 IEEE International symposium on high-performance computer architecture, HPCA 2024

Proceedings - 2024 IEEE International Symposium on High-Perf...

引用

30th IEEE International symposium on high-performance computer architecture, HPCA 2024

ISBN: (纸本)9798350393132

the proceedings contain 78 papers. the topics discussed include: exploitation of security vulnerability on retirement;GadgetSpinner: a new transient execution primitive using the loop stream detector;uncovering and exploiting AMD speculative memory access predictors for fun and profit;Revet: a language and compiler for dataflow threads;an optimizing framework on MLIR for efficient FPGA-based accelerator generation;Celeritas: out-of-core based unsupervised graph neural network via cross-layer computing 2024;MEGA: a memory-efficient GNN accelerator exploiting degree-aware mixed-precision quantization;Gemini: mapping and architecture co-exploration for large-scale DNN Chiplet accelerators;STELLAR: energy-efficient and low-latency SNN algorithm and hardware co-design with spatiotemporal computation;and MIMDRAM: an end-to-end processing-using-DRAM system for high-throughput, energy-efficient and programmer-transparent multiple-instruction multiple-data computing.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Celeritas: Out-of-Core based Unsupervised Graph Neural Network via Cross-layer computing 2024 30

Celeritas: Out-of-Core based Unsupervised Graph Neural Netwo...

引用

30th IEEE International symposium on high-performance computer architecture (HPCA)

作者： Li, Yi Yang, Tsun-Yu Yang, Ming-Chang Shen, Zhaoyan Li, Bingzhe Univ Texas Dallas Dept Comp Sci Richardson TX 75083 USA Chinese Univ Hong Kong Dept Comp Sci & Engn Hong Kong Peoples R China Shandong Univ Dept Comp Sci Jinan Shandong Peoples R China

ISBN: (纸本)9798350393132;9798350393149

Graph neural networks (GNN) one of the most popular neural network models, are extensively applied in graphrelated fields, including drug discovery, recommendation systems, etc. Unsupervised graph learning as one type of GNN plays a crucial role in various graph-related missions like node classification and edge prediction. However, with the increasing size of real-world graph datasets, processing such massive graphs in host memory becomes impractical, and GNN training demands a substantial storage volume to accommodate the vast amount of graph data. Consequently, GNN training results in significant I/O migration between the host and storage. Although state-ofthe-art frameworks have made strides in mitigating I/O overhead by considering embedding locality, their GNN frameworks still suffer from long training times. In this paper, we propose a fully out-of-core framework, called Celeritas, which speeds up the unsupervised GNN training on a single machine by co-designing the GNN algorithm and storage systems. First, based on the theoretical analysis, we propose a new partial combination operation to enable the embedding updates across GNN layers. this cross-layer computing achieves future computation for the embedding stored in memory to save data migration. Second, due to the dependency between embedding and edges, we consider their data locality together. Based on the cross-layer computing property, we propose a new loading order to fully utilize the data stored in the main memory to save I/O. Finally, a new sampling scheme called two-level sampling is proposed associated with a new partition algorithm to further reduce data migration and computation overhead while maintaining similar training accuracy. the real system experiments indicate that the proposed Celeritas can reduce the total training time of different GNN models from 44.76% to 73.85% compared to state-of-art schemes for different graph datasets.

关键词： Embeddings

来源：评论

学校读者我要写书评

暂无评论

HotTiles: Accelerating SpMM with Heterogeneous Accelerator architectures 30

HotTiles: Accelerating SpMM with Heterogeneous Accelerator A...

引用

30th IEEE International symposium on high-performance computer architecture (HPCA)

作者： Gerogiannis, Gerasimos Aananthakrishnan, Sriram Torrellas, Josep Hur, Ibrahim Intel Corp Santa Clara CA 95051 USA Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9798350393132;9798350393149

Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application across a wide range of domains, including machine learning and linear algebra solvers. In many sparse matrices, the pattern of nonzeros is nonuniform: nonzeros form dense and sparse regions, rather than being uniformly distributed across the whole matrix. We refer to this property as Intra-Matrix Heterogeneity (IMH). Currently, SpMM accelerator designs do not leverage this heterogeneity. they employ the same processing elements (PEs) for all the regions of a sparse matrix, resulting in suboptimal acceleration. To address this limitation, we utilize heterogeneous SpMM accelerator architectures, which include different types of PEs to exploit IMH. We develop an analytical modeling framework to predict the performance of different types of accelerator PEs taking into account IMH. Furthermore, we present a heuristic for partitioning sparse matrices among heterogeneous PEs. We call our matrix modeling and partitioning method HotTiles. To evaluate HotTiles, we simulate three different heterogeneous architectures. Each one consists of two types of workers (i.e., PEs): one suited for compute-bound denser regions (Hot Worker) and one for memory-bound sparser regions (Cold Worker). Our results show that exploiting IMH with HotTiles is very effective. Depending on the architecture, heterogeneous execution with HotTiles outperforms homogeneous execution using only hot or only cold workers by 9.2-16.8x and 1.4-3.7x, respectively. In addition, HotTiles outperforms the best worker type used on a per-matrix basis by 1.3-2.5x. Finally, HotTiles outperforms an IMH-unaware heterogeneous execution strategy by 1.4-2.2x.

关键词： hardware accelerators heterogeneous computing sparse computations SpMM

来源：评论

学校读者我要写书评

暂无评论

ECSSD: Hardware/Data Layout Co-Designed In-Storage-computing architecture for Extreme Classification 23

ECSSD: Hardware/Data Layout Co-Designed In-Storage-Computing...

引用

50th Annual International symposium on computer architecture (ISCA)

作者： Li, Siqi Tu, Fengbin Liu, Liu Lin, Jilan Wang, Zheng Kang, Yangwook Ding, Yufei Xie, Yuan Univ Calif Santa Barbara Santa Barbara CA 93106 USA Hong Kong Univ Sci & Technol Hong Kong Peoples R China Rensselaer Polytech Inst Troy NY USA Samsung Semicond Inc San Jose CA USA Alibaba Grp DAMO Acad Sunnyvale CA USA

ISBN: (纸本)9798400700958

With the rapid growth of classification scale in deep learning systems, the final classification layer becomes extreme classification with a memory footprint exceeding the main memory capacity of the CPU or GPU. the emerging in-storage-computing technique offers an opportunity on account of the fact that SSD has enough storage capacity for the parameters of extreme classification. However, the limited performance of naive in-storage-computing schemes is insufficient to support the heavy workload of extreme classification. To this end, we propose ECSSD, the first hardware/data layout co-designed in-storage-computing architecture for extreme classification, based on the approximate screening algorithm. We propose an alignment-free floating-point MAC circuit technique to improve the computational ability under the limited area budget of in-storage-computing schemes so that the computational ability can match SSD's high internal bandwidth. We present a heterogeneous data layout design for the 4/32-bit weight data in the approximate screening algorithm to avoid data transfer interference and further utilize the internal DRAM bandwidth of SSD. Moreover, we propose a learning-based adaptive interleaving framework to balance the access workload in each flash channel and improve channel-level bandwidth utilization. Putting them together, our ECSSD achieves 3.24-49.87x performance improvements compared with state-of-the-art baselines.

关键词： In-storage-computing architecture Extreme classification

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共232页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：