检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

16,567 篇 会议
127 篇 期刊文献
5 册 图书

馆藏范围

16,699 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

8,337 篇 工学
- 7,321 篇 计算机科学与技术...
- 3,627 篇 软件工程
- 1,914 篇 电气工程
- 1,256 篇 信息与通信工程
- 1,010 篇 电子科学与技术（可...
- 554 篇 控制科学与工程
- 310 篇 动力工程及工程热...
- 222 篇 仪器科学与技术
- 218 篇 机械工程
- 212 篇 生物工程
- 172 篇 网络空间安全
- 167 篇 光学工程
- 153 篇 生物医学工程（可授...
- 124 篇 建筑学
- 113 篇 材料科学与工程（可...
- 112 篇 安全科学与工程
- 111 篇 环境科学与工程（可...
- 98 篇 交通运输工程
1,816 篇 理学
- 1,120 篇 数学
- 354 篇 物理学
- 259 篇 系统科学
- 240 篇 生物学
- 196 篇 统计学（可授理学、...
- 119 篇 化学
1,329 篇 管理学
- 1,046 篇 管理科学与工程(可...
- 497 篇 工商管理
- 382 篇 图书情报与档案管...
148 篇 经济学
- 147 篇 应用经济学
140 篇 医学
- 111 篇 临床医学
107 篇 法学
- 92 篇 社会学
58 篇 农学
29 篇 教育学
22 篇 文学
8 篇 军事学
3 篇 艺术学

主题

5,424 篇 computer archite...
2,014 篇 hardware
1,287 篇 high performance...
1,174 篇 computational mo...
978 篇 application soft...
928 篇 parallel process...
895 篇 concurrent compu...
892 篇 computer science
799 篇 bandwidth
705 篇 throughput
669 篇 delay
613 篇 field programmab...
582 篇 distributed comp...
572 篇 costs
530 篇 computer network...
511 篇 scalability
496 篇 cloud computing
455 篇 runtime
441 篇 kernel
436 篇 grid computing

机构

78 篇 university of ch...
37 篇 school of comput...
36 篇 ibm thomas j. wa...
36 篇 mathematics and ...
32 篇 college of compu...
29 篇 college of compu...
28 篇 georgia inst tec...
28 篇 department of co...
28 篇 state key labora...
28 篇 institute of com...
27 篇 tsinghua univers...
26 篇 school of comput...
26 篇 department of co...
23 篇 school of comput...
21 篇 univ chinese aca...
21 篇 mathematics and ...
20 篇 intel corp santa...
19 篇 georgia institut...
19 篇 oak ridge nation...
19 篇 barcelona superc...

作者

31 篇 dhabaleswar k. p...
17 篇 wayne luk
17 篇 dongarra jack
17 篇 hwu wen-mei w.
17 篇 yan solihin
17 篇 nam sung kim
17 篇 hari subramoni
16 篇 jason cong
16 篇 ninghui sun
16 篇 onur mutlu
16 篇 navaux philippe ...
16 篇 dally william j.
16 篇 chong frederic t...
15 篇 wang lei
15 篇 yu wang
15 篇 zomaya albert y.
15 篇 mateo valero
15 篇 jack dongarra
15 篇 kim nam sung
14 篇 lei wang

语言

16,527 篇 英文
116 篇 其他
54 篇 中文
1 篇 西班牙文
1 篇 葡萄牙文

检索条件"任意字段=IEEE International Symposium on Computer Architecture and High Performance Computing"

共 16699 条记录，以下是261-270 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Memory Traffic and Complete Application Profiling with PAPI Multi-Component Measurements

Memory Traffic and Complete Application Profiling with PAPI ...

引用

37th ieee international Parallel and Distributed Processing symposium (IPDPS)

作者： Barry, Daniel Jagode, Heike Danalis, Anthony Dongarra, Jack Univ Tennessee Innovat Comp Lab Knoxville TN 37996 USA

ISBN: (纸本)9798350311990

Some of the most important categories of performance events count the data traffic between the processing cores and the main memory. However, since these counters are not coreprivate, applications require elevated privileges to access them. PAPI offers a component that can access this information on IBM systems through the performance Co-Pilot (PCP);however, doing so adds an indirection layer that involves querying the PCP daemon. This paper performs a quantitative study of the accuracy of the measurements obtained through this component on the Summit supercomputer. We use two linear algebra kernelsa generalized matrix multiply, and a modified matrix-vector multiply-as benchmarks and a distributed, GPU-accelerated 3D-FFT mini-app (using cuFFT) to compare the measurements obtained through the PAPI PCP component against the expected values across different problem sizes. We also compare our measurements against an in-house machine with a very similar architecture to Summit, where elevated privileges allow PAPI to access the hardware counters directly (without using PCP) to show that measurements taken via PCP are as accurate as the those taken directly. Finally, using both QMCPACK and the 3DFFT, we demonstrate the diverse hardware activities that can be monitored simultaneously via PAPI hardware components.

关键词： GPU power high performance computing memory bandwidth network traffic PAPI performance analysis performance counters

来源：评论

学校读者我要写书评

暂无评论

MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI 31

MLPerf Power: Benchmarking the Energy Efficiency of Machine ...

引用

31st ieee international symposium on high performance computer architecture, HPCA 2025

作者： Tschand, Arya Rajan, Arun Tejusve Raghunath Idgunji, Sachin Ghosh, Anirban Holleman, Jeremy Kiraly, Csaba Ambalkar, Pawan Borkar, Ritika Chukka, Ramesh Cockrell, Trevor Curtis, Oliver Fursin, Grigori Hodak, Miro Kassa, Hiwot Lokhmotov, Anton Miskovic, Dejan Pan, Yuechao Manmathan, Manu Prasad Raymond, Liz John, Tom St. Suresh, Arjun Taubitz, Rowan Zhan, Sean Wasson, Scott Kanter, David Reddi, Vijay Janapa Meta United States Harvard University United States NVIDIA United States UNC Charlotte / Syntiant United States Codex Dell United States Intel United States SMC Japan FlexAI / cTuning AMD United States KRAI Google United States Decompute GATE Overflow India MLCommons

ISBN: (纸本)9798331506476

Rapid adoption of machine learning (ML) technologies has led to a surge in power consumption across diverse systems, from tiny IoT devices to massive datacenter clusters. Benchmarking the energy efficiency of these systems is crucial for optimization, but presents novel challenges due to the variety of hardware platforms, workload characteristics, and system-level interactions. This paper introduces MLPerf® Power, a comprehensive benchmarking methodology with capabilities to evaluate the energy efficiency of ML systems at power levels ranging from microwatts to megawatts. Developed by a consortium of industry professionals from more than 20 organizations, coupled with insights from academia, MLPerf Power establishes rules and best practices to ensure comparability across diverse architectures. We use representative workloads from the MLPerf benchmark suite to collect {1, 8 4 1} reproducible measurements from 60 systems across the entire range of ML deployment scales. Our analysis reveals trade-offs between performance, complexity, and energy efficiency across this wide range of systems, providing actionable insights for designing optimized ML solutions from the smallest edge devices to the largest cloud infrastructures. This work emphasizes the importance of energy efficiency as a key metric in the evaluation and comparison of the ML system, laying the foundation for future research in this critical area. We discuss the implications for developing sustainable AI solutions and standardizing energy efficiency benchmarking for ML systems. © 2025 ieee.

关键词： computer architecture energy efficiency machine learning mlperf sustainable ai

来源：评论

学校读者我要写书评

暂无评论

QULATIS: A Quantum Error Correction Methodology toward Lattice Surgery 28

QULATIS: A Quantum Error Correction Methodology toward Latti...

引用

28th Annual ieee international symposium on high-performance computer architecture (HPCA)

作者： Ueno, Yosuke Kondo, Masaaki Tanaka, Masamitsu Suzuki, Yasunari Tabuchi, Yutaka Univ Tokyo Grad Sch Informat Sci & Technol Tokyo Japan Keio Univ Fac Sci & Technol Tokyo Japan RIKEN Ctr Quantum Comp Tokyo Japan Nagoya Univ Grad Sch Engn Nagoya Aichi Japan NTT Comp & Data Sci Labs Tokyo Japan JST PRESTO Tokyo Japan

ISBN: (纸本)9781665420273

Due to the high error rate of a qubit, detecting and correcting errors on it is essential for fault-tolerant quantum computing (FTQC). Surface code (SC) associated with its decoding algorithm is one of the most promising quantum error correction (QEC) methods because it has high fidelity and requires only nearest neighbor qubits connectivity. To realize FTQC, we need a decoder circuit capable of not only QEC in a 3-D lattice to deal with errors in measurement on ancillary qubits but also quantum operations on logically constructed qubits. Whereas several methods to perform logical operations on SC, such as lattice surgery (LS), are known, no practical decoders supporting them have been proposed yet. One of the most promising QC implementations today is made up of superconducting qubits that are located in a cryogenic environment. To reduce the hardware complexity of QC and latency of QEC, we are supposed to perform QEC in a cryogenic environment. Hence a power-efficient decoder is required due to the limited power budget inside a dilution refrigerator. In this paper, we propose an online-QEC algorithm that supports LS with a practical decoder circuit, as well as a new FTQC architecture. We design a key building block of the proposed architecture with a hybrid of SFQ- and Cryo-CMOS-based digital circuits and evaluate it with a SPICE-level simulation. Each logic element includes about 2400 Josephson junctions, and power consumption is estimated to be 2.07 mu W when operating with a 2 GHz clock frequency. We evaluate the decoder performance by a quantum error simulator for an essential operation of LS with code distances up to 11, and it achieves a 0.6% accuracy threshold. In an LS-based architecture further supporting a magic-state distillation protocol, which is expected to run for near-term universal quantum computing, we evaluate the QEC performance and power consumption of the architecture and show that it is practical to be operated in 4-K temperature region of a

关键词： Quantum computing Quantum Error Correction Single flux quantum (SFQ)

来源：评论

学校读者我要写书评

暂无评论

Characterizing In-Kernel Observability of Latency-Sensitive Request-Level Metrics with eBPF

Characterizing In-Kernel Observability of Latency-Sensitive ...

引用

2024 ieee international symposium on performance Analysis of Systems and Software, ISPASS 2024

作者： Rezvani, Mohammadreza Jahanshahi, Ali Wong, Daniel University of California Riverside Department of Computer Science and Engineering RiversideCA United States University of California Riverside Department of Electrical and Computer Engineering RiversideCA United States

ISBN: (纸本)9798350376388

This paper explores a novel server observability approach using eBPF (extended Berkeley Packet Filter) for detailed request-level performance metrics of data center latency-sensitive applications. Utilizing eBPF system call tracing, we evaluate if syscall activity can reconstruct high-level application behaviors and bypass the need for direct userspace reporting of performance metrics. Through careful selection of eBPF events, we demonstrate that certain syscall statistics can provide robust insight into request-level metrics. In addition, we demonstrate that these metrics can also be robust to networking effects, such as packet loss. By demonstrating the ability for eBPF to provide request-level observability, we can potentially enable many non-intrusive, low-overhead use cases for feedback in system management runtime frameworks, such as resource allocation, scheduling, and power management. © 2024 ieee.

关键词： Observability

来源：评论

学校读者我要写书评

暂无评论

A high-performance Hardware architecture for ECC Point Multiplication over Curve25519 30

A High-Performance Hardware Architecture for ECC Point Multi...

引用

ieee 30th international symposium on Field-Programmable Custom computing Machines (FCCM)

作者： Wu, Guiming He, Qianwen Jiang, Jiali Zhang, Zhenxiang Long, Xin Zhao, Yuan Zou, Yinchao Alibaba Grp Hangzhou Peoples R China Ant Grp Hangzhou Peoples R China

ISBN: (纸本)9781665483322

As one of the most secure ECC curves, Curve25519 is employed by some secure protocols, such as TLS 1.3, IRTF's RFC7748, Diffie-Hellman Private Set Intersection (DH-PSI) protocol, etc. high performance implementation of ECC is required, especially for the DH-PSI protocol. Point multiplication, the chief cryptographic primitive in ECC, is computationally expensive. To improve the performance of DH-PSI protocol, we propose a novel and high-performance hardware architecture for point multiplication over Curve25519. The proposed architecture features a pipelined Finite-field Arithmetic Unit (FAU) and a simple and highly efficient instruction set architecture (ISA). Compared to the best existing work on Xilinx Zynq 7000 series FPGA, our implementation with one Processing Element (PE) can achieve 3.14x speedup on the same device. To the best of our knowledge, our implementation appears to be the fastest among the state-of-the-art works. We also have implemented our proposed architecture consisting of 4 Compute Groups (CGs), each with 16 PEs, on an Intel Agilex AGF027 FPGA. The experimental results show the peak performance of 4.52 Mops/s (million point multiplication operations per seconds) can be achieved. Moreover, the measured performance of 4.48 Mops/s is achieved, with the PE utilization of 99% and at the cost of 86 Watts power, which is the record-setting performance for point multiplication over Curve25519 on FPGAs.

关键词： performance evaluation Protocols Power measurement Costs Instruction sets computer architecture Elliptic curve cryptography

来源：评论

学校读者我要写书评

暂无评论

Economy-based Greedy Bidding for Resources for CAE Workflows in Hybrid Cloud Infrastructure 20

Economy-based Greedy Bidding for Resources for CAE Workflows...

引用

20th ieee international Conference on E-Science (E-Science)

作者： Dasgupta, Srishti Uustalu, Tahvend Gerndt, Michael Gholami, Babak Tech Univ Munich Chair Comp Architecture & Parallel Syst Garching Germany BMW Grp Munich Germany

ISBN: (纸本)9798350365627;9798350365610

The advent of generative design in the automotive sector, characterised by the automatic and iterative exploration of expansive solution spaces to discover optimal design configurations, has significantly increased the demand for computational resources to run intensive computer-aided engineering (CAE) simulations within constrained time frames. The inherent limitations of static high-performance computing (HPC) clusters have necessitated the adoption of cloud resources due to their flexible and elastic nature, thereby enhancing the capacity to accommodate the computational demands of these iterative workflows. These workflows, represented as Directed Acyclic Graphs (DAGs), involve the serial and parallel execution of tasks, which can dynamically share resources with other workflows during idle periods. In this paper, we propose an economy-based approach to exploit the gaps generated by these idle periods through a bidding system, thereby enabling more efficient resource utilisation and reducing the average wait time, makespan, cost and deadline miss by more than 40%, 6%, 13% and 45%respectively against certain infrastructures and baselines. Furthermore, we explore the potential for generating revenue by renting out idle resources in a hybrid cloud setup. This approach not only aims to optimise the use of computational resources but also seeks to provide cost-effective solutions to meet the escalating demands of generative design in the automotive sector.

关键词： Cloud computing high performance computing Hybrid Infrastructures CAE Workflows

来源：评论

学校读者我要写书评

暂无评论

2024 ieee international Parallel and Distributed Processing symposium Workshops, IPDPSW 2024

2024 IEEE International Parallel and Distributed Processing ...

引用

2024 ieee international Parallel and Distributed Processing symposium Workshops, IPDPSW 2024

ISBN: (纸本)9798350364606

The proceedings contain 165 papers. The topics discussed include: understanding multi-dimensional efficiency of fine-tuning large language models using SpeedUp, MemoryUp, and EnergyUp;shared-memory parallel Edmonds blossom algorithm for maximum cardinality matching in general graphs;a reconfigurable architecture of a scalable, ultrafast, ultrasound, delay-and-sum beamformer;scheduling and allocation of disaggregated memory resources in HPC systems;GIM (ghost in the machine): a coarse-grained reconfigurable compute-in-memory platform for exploring machine-learning architectures;further optimizations and analysis of smith-waterman with vector extensions;measurement-based quantum approximate optimization;optimizing forward wavefield storage leveraging high-speed storage media;teaching performance metrics in parallel computing courses;and compiler-driven Swar parallelism for high-performance bitboard algorithms.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Predicting Protein Folding on Intel's Data Center GPU Max Series architecture (PVC)

Predicting Protein Folding on Intel's Data Center GPU Max Se...

引用

2024 Workshops of the international Conference for high performance computing, Networking, Storage and Analysis, SC Workshops 2024

作者： Ruhela, Dhani Prasanna, Madhavan Saxena, Aaditya Westwood High School AustinTX United States Purdue University College of Science West LafayetteIN United States Bob Jones High School MadisonAL United States

ISBN: (纸本)9798350355543

Predicting the structure and interactions of proteins and other life molecules has been a grand challenge for over 60 years. Google's DeepMind AI team leveraged Artificial intelligence (AI) in 2020 to develop AlphaFold and achieved an accuracy above 90 for two-thirds of the proteins in CASP's global distance test (GDT). AlphaFold has been very successful in biology and medicine. However, a lack of training code and expansive computational requirements created an open-source implementation named OpenFold. OpenFold is fast, memory-efficient, and provides an OpenProtein dataset with five million MSAs. MLCommons added OpenFold to their HPC benchmarks suite in 2023 and was evaluated by four institutions on NVIDIA GPU architectures. This work presents our endeavours to port, run and tune OpenFold on Intel's GPU Max Series Ponte Vecchio (PVC). To the best of our knowledge, this is the first large-scale study of the distributed implementation of OpenFold application with Intel PVC GPU, presenting the challenges, opportunities and performance of the application on Intel's Max series architecture. © 2024 ieee.

关键词： HPC Intel OneAPI OpenFold Petascale PyTorch

来源：评论

学校读者我要写书评

暂无评论

Hyperdimensional computing vs. Neural Networks: Comparing architecture and Learning Process 25

Hyperdimensional Computing vs. Neural Networks: Comparing Ar...

引用

25th international symposium on Quality Electronic Design (ISQED)

作者： Ma, Dongning Hao, Cong Jiao, Xun Villanova Univ Villanova PA 19085 USA Georgia Inst Technol Atlanta GA 30332 USA

ISBN: (纸本)9798350309270;9798350309287

Hyperdimensional computing (HDC) has obtained abundant attention as an emerging non von Neumann computing paradigm. Inspired by the way human brain functions, HDC leverages high dimensional patterns to perform learning tasks. Compared to neural networks, HDC has shown advantages such as energy efficiency and smaller model size, but sub-par learning capabilities in sophisticated applications. Recently, researchers have observed when combined with neural network components, HDC can achieve better performance than conventional HDC models. This motivates us to explore the deeper insights behind theoretical foundations of HDC, particularly the connection and differences with neural networks. In this paper, we make a comparative study between HDC and neural network to provide a different angle where HDC can be derived from an extremely compact neural network trained upfront. Experimental results show such neural network-derived HDC model can achieve up to 21% and 5% accuracy increase from conventional and learning-based HDC models respectively. This paper aims to provide more insights and shed lights on future directions for researches on this popular emerging learning scheme.

关键词： Energy efficiency

来源：评论

学校读者我要写书评

暂无评论

A Data-Driven, Congestion-Aware and Open-Source Timing-Driven FPGA Placer Accelerated by GPUs 32

A Data-Driven, Congestion-Aware and Open-Source Timing-Drive...

引用

32nd ieee Annual international symposium on Field-Programmable Custom computing Machines, FCCM 2024

作者： Xiong, Zhili Rajarathnam, Rachel Selina Pan, David Z. The University of Texas at Austin Department of Electrical & Computer Engineering TX United States

ISBN: (纸本)9798350372434

Placement plays a pivotal role in the modern FPGA physical design flow to determine the locations of the design instances among the available FPGA device resources, impacting routability and performance. Due to the lack of open-source accurate timing models for high-performance FPGAs, academic placement research has focused primarily on wirelength opti-mization rather than timing optimizations. This work presents an open-source timing-driven FPGA placer accelerated on GPU that employs a congestion-aware and data-driven timing model with timing optimizations at global placement and legalization. The placement objective incorporates an additional term to optimize the timing arcs in the lagrangian formulation. While packing and legalizing look-up tables (LUTs) and flip-flops (FFs), we emphasize timing-critical nets to remain within the Slice, minimizing overall path delay. On the ISPD'2016 contest benchmarks employing an AMD-Xilinx UltraScale architecture, our placer is 3× faster than the commercial AMD Vivado with similar critical path delay (×1.02) and 40% faster routing runtime. © 2024 ieee.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共500页 << < 23 24 25 26 27 28 29 30 31 32 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：