检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

时间限定

出版年份：

文献类型

图书期刊文献学位论文多媒体

馆藏选择

电子馆藏纸本馆藏

核心期刊

全部期刊 SCI 收录期刊 SSCI 收录期刊 EI 收录期刊 CSCD 收录期刊 CSSCI 收录期刊

语言

中文英文

文献类型

期刊文献图书学位论文标准纸本馆藏

帮助

文字说明：

T=题名（书名、题名），A=作者（责任者），K=主题词，P=出版物名称，PU=出版社名称，O=机构（作者单位、学位授予单位、专利申请人），L=中图分类号，C=学科分类号，U=全部字段，Y=年（出版发行年、学位年度、标准发布年）

检索规则说明：

AND代表“并且”；OR代表“或者”；NOT代表“不包含”；(注意必须大写,运算符两边需空一格)

检索范例：

范例一：(K=图书馆学 OR K=情报学) AND A=范并思 AND Y=1982-2016
范例二：P=计算机应用与软件 AND (U=C++ OR U=Basic) NOT K=Visual AND Y=2011-2016

分类表

所选分类

>> <<

限定检索结果

文献类型

3,814 篇 会议
176 篇 期刊文献
83 册 图书

馆藏范围

4,073 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

2,086 篇 工学
- 1,904 篇 计算机科学与技术...
- 1,023 篇 软件工程
- 367 篇 电气工程
- 150 篇 信息与通信工程
- 137 篇 电子科学与技术（可...
- 75 篇 控制科学与工程
- 30 篇 机械工程
- 30 篇 生物工程
- 24 篇 材料科学与工程（可...
- 24 篇 生物医学工程（可授...
- 22 篇 仪器科学与技术
- 20 篇 光学工程
- 19 篇 建筑学
- 17 篇 测绘科学与技术
- 16 篇 土木工程
- 13 篇 动力工程及工程热...
- 12 篇 农业工程
524 篇 理学
- 417 篇 数学
- 50 篇 物理学
- 39 篇 系统科学
- 33 篇 生物学
- 30 篇 统计学（可授理学、...
- 16 篇 化学
- 16 篇 地球物理学
207 篇 管理学
- 154 篇 管理科学与工程(可...
- 61 篇 工商管理
- 54 篇 图书情报与档案管...
19 篇 农学
- 14 篇 作物学
18 篇 法学
- 18 篇 社会学
15 篇 经济学
- 15 篇 应用经济学
13 篇 医学
3 篇 文学
3 篇 军事学
2 篇 教育学
2 篇 艺术学
1 篇 哲学

主题

647 篇 parallel process...
544 篇 parallel program...
527 篇 computer archite...
462 篇 parallel archite...
448 篇 concurrent compu...
358 篇 parallel algorit...
320 篇 programming
313 篇 hardware
282 篇 computer science
276 篇 algorithm design...
263 篇 computational mo...
214 篇 programming prof...
166 篇 parallel process...
164 篇 dynamic programm...
154 篇 application soft...
139 篇 program processo...
138 篇 costs
136 篇 libraries
136 篇 distributed comp...
133 篇 runtime

机构

9 篇 stanford univ st...
9 篇 intel corporatio...
8 篇 barcelona superc...
8 篇 oak ridge natl l...
8 篇 univ calif berke...
7 篇 school of comput...
7 篇 oak ridge nation...
7 篇 carnegie mellon ...
7 篇 college of compu...
7 篇 oak ridge nation...
7 篇 univ texas austi...
6 篇 school of comput...
6 篇 sandia national ...
6 篇 department of co...
6 篇 department of co...
6 篇 department of co...
5 篇 department of co...
5 篇 nvidia corporati...
5 篇 pacific northwes...
5 篇 georgia institut...

作者

15 篇 jack dongarra
12 篇 dongarra jack
11 篇 hong shen
10 篇 hoefler torsten
9 篇 zhong cheng
9 篇 olukotun kunle
9 篇 gu yan
8 篇 chapman barbara
7 篇 garcia i.
7 篇 forsell martti
7 篇 sun yihan
7 篇 jigang wu
7 篇 nakano koji
7 篇 danelutto marco
6 篇 cheng zhong
6 篇 v.k. prasanna
6 篇 blelloch guy e.
6 篇 h.j. siegel
6 篇 lumsdaine andrew
6 篇 tsigas philippas

语言

4,030 篇 英文
35 篇 其他
13 篇 中文

检索条件"任意字段=International Symposium on Parallel Architectures, Algorithms, and Programming"

共 4073 条记录，以下是161-170 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Exploring the Efficiency of Data-Oblivious Programs

Exploring the Efficiency of Data-Oblivious Programs

引用

IEEE international symposium on Performance Analysis of Systems and Software (ISPASS)

作者： Biernacki, Lauren Tiruye, Biniyam Mengist Demissie, Meron Zerihun Andargie, Fitsum Assamnew Reagen, Brandon Austin, Todd Univ Michigan Ann Arbor MI 48109 USA Addis Ababa Univ Addis Ababa Ethiopia NYU New York NY 10003 USA

ISBN: (纸本)9798350397390

Data-oblivious programs have gained popularity due to their application in security, but are often dismissed because of anticipated performance loss. In order to better understand these performance concerns, this paper details the first performance characterization of data-oblivious programs. We study mechanical data-oblivious transformations applied to twenty workloads from the VIP-Bench benchmark suite and find that, overall, performance overheads vary widely, with a geomean slowdown of 7.4x. This variance can be attributed to whether or not the data-oblivious transformations affect the workload's asymptotic complexity. Performance overheads are much lower for the fourteen workloads whose complexity is unaffected, at 1.9x geomean. Further, by reducing control hazards, we find that data-oblivious transformations often result in improved per-instruction performance (e.g., better branch and memory performance) and increase the number of instructions the processor can execute in parallel (e.g., IPC). Leveraging lessons from analyzing these overheads, we study four notably slow data-oblivious workloads and show how algorithmic changes can significantly improve performance-achieving an average 86.4x speedup over the mechanically produced baseline programs. While data-oblivious program execution often incurs overheads, the contributions of this paper show that these overheads can be overcome by compiler and algorithmic optimizations, bringing us closer to achieving efficient and widely-used data-oblivious programs.

关键词： Benchmark testing Data oblivious programming If conversion

来源：评论

学校读者我要写书评

暂无评论

PyPIM: Integrating Digital Processing-in-Memory from Microarchitectural Design to Python Tensors

PyPIM: Integrating Digital Processing-in-Memory from Microar...

引用

IEEE/ACM international symposium on Microarchitecture (MICRO)

作者： Orian Leitersdorf Ronny Ronen Shahar Kvatinsky Viterbi Faculty of Electrical and Computer Engineering Technion – Israel Institute of Technology Haifa Israel

ISBN: (数字)9798350350579

ISBN: (纸本)9798350350586

Digital processing-in-memory (PIM) architectures mitigate the memory wall problem by facilitating parallel bitwise operations directly within the memory. Recent works have demonstrated their algorithmic potential for accelerating data-intensive applications; however, there remains a significant gap in the programming model and microarchitectural design. This is further exacerbated by aspects unique to memristive PIM such as partitions and operations across both directions of the memory array. To address this gap, this paper provides an end-to-end architectural integration of digital memristive PIM from a high-level Python library for tensor operations (similar to NumPy and PyTorch) to the low-level microarchitectural design. We begin by proposing an efficient microarchitecture and instruction set architecture (ISA) that bridge the gap between the low-level control periphery and an abstraction of PIM parallelism. We subsequently propose a PIM development library that converts high-level Python to ISA instructions and a PIM driver that translates ISA instructions into PIM micro-operations. We evaluate PyPIM via a cycle-accurate simulator on a wide variety of benchmarks that both demonstrate the versatility of the Python library and the performance compared to theoretical PIM bounds. Overall, PyPIM drastically simplifies the development of PIM applications and enables the conversion of existing tensor-oriented Python programs to PIM with ease.

关键词： Microarchitecture Tensors Instruction sets Computational modeling Memory architecture programming parallel processing Libraries Partitioning algorithms Python

来源：评论

学校读者我要写书评

暂无评论

A Dense Tensor Accelerator with Data Exchange Mesh for DNN and Vision Workloads 53

A Dense Tensor Accelerator with Data Exchange Mesh for DNN a...

引用

IEEE international symposium on Circuits and Systems (IEEE ISCAS)

作者： Lin, Yu-Sheng Chen, Wei-Chao Yang, Chia-Lin Chien, Shao-Yi Inventec Corp Taipei Taiwan Natl Taiwan Univ Taipei Taiwan

ISBN: (纸本)9781728192017

We propose a dense tensor accelerator called VectorMesh, a scalable, memory-efficient architecture that can support a wide variety of DNN and computer vision workloads. Its building block is a tile execution unit (TEU), which includes dozens of processing elements (PEs) and SRAM buffers connected through a butterfly network. A mesh of FIFOs between the TEUs facilitates data exchange between tiles and promote local data to global visibility. Our design performs better according to the roofline model for CNN, GEMM, and spatial matching algorithms compared to state-of-the-art architectures. It can reduce global buffer and DRAM fetches by 2-22 times and up to 5 times, respectively.

关键词： Neural network hardware vector processors parallel programming

来源：评论

学校读者我要写书评

暂无评论

parallel Approximations of the Tukey g-and-h Likelihoods and Predictions for Non-Gaussian Geostatistics 36

Parallel Approximations of the Tukey g-and-h Likelihoods and...

引用

36th IEEE international parallel and Distributed Processing symposium (IEEE IPDPS)

作者： Mondal, Sagnik Abdulah, Sameh Ltaief, Hatem Sun, Ying Genton, Marc G. Keyes, David E. King Abdullah Univ Sci & Technol Comp Elect & Math Sci & Engn Div Thuwal 239556900 Saudi Arabia King Abdullah Univ Sci & Technol Stat Program Thuwal Saudi Arabia King Abdullah Univ Sci & Technol Extreme Comp Res Ctr Thuwal Saudi Arabia

ISBN: (纸本)9781665481069

Maximum likelihood estimation is an essential tool in the procedure to impute missing data in climate/weather applications. By defining a particular statistical model, the maximum likelihood estimation can be used to understand the underlying structure of given geospatial data. The Gaussian random field has been widely used to describe geospatial data, as one of the most popular models under the hood of maximum likelihood estimation. Computation of Gaussian log-likelihood demands operations on a dense symmetric positive definite matrix, often parameterized by the Mat ' ern correlation function. This computation of the log-likelihood requires O( n2) storage and O( n3) operations, which can be a huge task considering that the number of geographical locations, n, now commonly reaches into the millions. However, despite its appealing theoretical properties, the assumptions of Gaussianity may be unrealistic since real data often show signs of skewness or have some extreme values. Herein, we consider the Tukey g-and-h (TGH) random field as an example of a non-Gaussian random field that shows more robustness in modeling geospatial data by including two more parameters to incorporate skewness and heavy tail features in the model. This work provides the first HPC implementation of the TGH random field's inference on parallel hardware architectures. Using task-based programming models associated with dynamic runtime systems, our implementation leverages the high concurrency of current parallel systems. This permits to run the exact log-likelihood evaluation of the Tukey g-and-h (TGH) random fields for a decent number of geospatial locations. To tackle large-scale problems, we provide additionally an implementation of the given model using two different lowrank approximations. We compress the aforementioned positivedefinite symmetric matrix for computing the log-likelihood and rely on the Tile Low-Rank (TLR) and the Hierarchical OffDiagonal Low-Rank (HODLR) matrix approximatio

关键词： Maximum likelihood estimation Symmetric matrices Runtime Computational modeling Linear algebra Tail Predictive models

来源：评论

学校读者我要写书评

暂无评论

Quantum Circuit Mapping Using Binary Integer Nonlinear programming

Quantum Circuit Mapping Using Binary Integer Nonlinear Progr...

引用

IEEE international symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： Aaron Orenstein Vipin Chaudhary Dept. of Computer and Data Sciences Case Western Reserve University Cleveland Ohio

ISBN: (数字)9798350364606

ISBN: (纸本)9798350364613

As Quantum Computers continue to increase in size, throughput has not increased proportionally [1]. Errors in qubit measurement and gate operations continue to increase with circuit width and depth. In order to maximize machine throughput, researchers have begun exploring ways of parallelizing circuit execution [2]–[9]. Due to the noisyness of quantum computers, this requires new algorithms for efficient resource allocation, qubit mapping, and scheduling. We improve on existing greedy algorithms by formulating the mapping search as a Binary Integer Non-Linear programming (BINLP) problem. We model practical constraints and propose new heuristics for determining the goodness of a mapping. We evaluate our mappings on several sets of benchmark circuits and investigate the usefulness of parallel execution for circuit cutting, QAOAs, and deep circuits. We observe similar fidelity compared to Qiskit's transpiler for circuit cutting and throughput benchmarks. We observe greater fidelity over Qiskit's transpiler for dense QAOA ansatzes. Finally, we find that parallel circuit cutting provides greater fidelity than full-circuit execution.

关键词： Computers Program processors Qubit programming Logic gates Benchmark testing Throughput

来源：评论

学校读者我要写书评

暂无评论

Meta-programming Design-Flow Patterns for Automating Reusable Optimisations 2022

Meta-Programming Design-Flow Patterns for Automating Reusabl...

引用

12th international symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART)

作者： Vandebon, Jessica Coutinho, Jose G. F. Luk, Wayne Imperial Coll London London England

ISBN: (纸本)9781450396608

Continuing advances in heterogeneous and parallel computing enable massive performance gains in domains such as AI and HPC. Such gains often involve using hardware accelerators, such as FPGAs and GPUs, to speed up specific workloads. However, to make effective use of emerging heterogeneous architectures, optimisation is typically done manually by highly-skilled developers with in-depth understanding of the target hardware. The process is tedious, error-prone, and must be repeated for each new application. This paper introduces Design-Flow Patterns, which capture modular, recurring application-agnostic elements involved in mapping and optimising application descriptions onto efficient CPU and GPU targets. Our approach is the first to codify and programmatically coordinate these elements into fully automated, customisable, and reusable end-to-end design-flows. We implement key design-flow patterns using the meta-programming tool Artisan, and evaluate automated design-flows applied to three sequential C++ applications. Compared to single-threaded implementations, our approach generates multi-threaded OpenMP CPU designs achieving up to 18 times speedup on a CPU platform with 32-threads, as well as HIP GPU designs achieving up to 1184 times speedup on an NVIDIA GeForce RTX 2080 Ti GPU.

关键词： Heterogeneous Computing parallel Computing Meta-programming GPU Multi-Core Patterns

来源：评论

学校读者我要写书评

暂无评论

Teaching Complex Scheduling algorithms

Teaching Complex Scheduling Algorithms

引用

35th IEEE international parallel and Distributed Processing symposium (IPDPS)

作者： Hunold, Sascha Przybylski, Bartlomiej TU Wien Fac Informat Vienna Austria Adam Mickiewicz Univ Fac Math & Comp Sci Poznan Poland

ISBN: (纸本)9781665435772

We introduce *** and show how it can be used for teaching the basics of scheduling theory to Computer Science students. In particular, our course focuses on scheduling algorithms for parallel, identical machines. For these problems, approximation algorithms and approximation schemes exist. However, we believe that students better understand advantages as well as disadvantages of these approximation algorithms when they investigate their implementations and examine how the algorithms work in practice. For that purpose, we have implemented a set of heuristics and approximation algorithms on top of ***. In the present article, we go through some of the implemented algorithms and explain why we believe these algorithms are particularly helpful for students to understand the basic concepts of approximation algorithms. In our experience, students remember algorithmic details much better if we show them examples using ***.

关键词： Scheduling Julia Approximation algorithms Education Gantt Charts PTAS FPTAS Dynamic programming

来源：评论

学校读者我要写书评

暂无评论

Deep Just-In-Time Consistent Comment Update via Source Code Changes

Deep Just-In-Time Consistent Comment Update via Source Code ...

引用

international symposium on parallel architectures, algorithms and programming (PAAP)

作者： Shikai Guo Xihui Xu Hui Li Rong Chen The College of Information Science and Technology Dalian Maritime University Dalian China

ISBN: (纸本)9781665452199

During software development and maintenance, code comments are often missing, inadequate, or they do not match the actual code content. In response to this problem, the research community has proposed a method for updating natural language comments based on code changes. However, there are two major limitations of this method that must be addressed: the long-term and non-temporal dependencies in the source code. To address these limitations, we propose a new model called code semantic learning–comment update (CSL2CU). The code semantic learning component of CSL2CU uses a self-attention mechanism and a positional encoding mechanism. It also uses a relative positional representation to model pairwise relationships between source code tags, thereby improving its ability to capture long-term dependencies and non-temporal dependencies of source code tagging ability. The comment-update component of CSL2CU is used to generate new comments based on old comments and code editing. The experimental results show that the CSL2CU model outperforms the three baselines used in exact match, BLEU, METEOR, and SARI.

关键词： Codes Source coding Semantics Natural languages Tagging programming Maintenance engineering

来源：评论

学校读者我要写书评

暂无评论

UniFaaS: programming across Distributed Cyberinfrastructure with Federated Function Serving

UniFaaS: Programming across Distributed Cyberinfrastructure ...

引用

international symposium on parallel and Distributed Processing (IPDPS)

作者： Yifei Li Ryan Chard Yadu Babuji Kyle Chard Ian Foster Zhuozhao Li Dept. of Computer Science and Engineering Southern University of Science and Technology Shenzhen China Data Science and Learning Division Argonne National Laboratory Lemont IL USA Dept. of Computer Science University of Chicago Chicago IL USA

ISBN: (数字)9798350387117

ISBN: (纸本)9798350387124

Modern scientific applications are increasingly decomposable into individual functions that may be deployed across distributed and diverse cyberinfrastructure such as supercomputers, clouds, and accelerators. Such applications call for new approaches to programming, distributed execution, and function-level management. We present UniFaaS, a parallel programming framework that relies on a federated function-as-a-service (FaaS) model to enable composition of distributed, scalable, and high-performance scientific workflows, and to support fine-grained function-level management. UniFaaS provides a unified programming interface to compose dynamic task graphs with transparent wide-area data management. UniFaaS exploits an observe-predict-decide approach to efficiently map workflow tasks to target heterogeneous and dynamic resources. We propose a dynamic heterogeneity-aware scheduling algorithm that employs a delay mechanism and a re-scheduling mechanism to accommodate dynamic resource capacity. Our experiments show that UniFaaS can efficiently execute workflows across computing resources with minimal scheduling overhead. We show that UniFaaS can improve the performance of a real-world drug screening workflow by as much as 22.99% when employing an additional 19.48% of resources and a montage workflow by 54.41% when employing an additional 47.83% of resources across multiple distributed clusters, in contrast to using a single cluster.

关键词： Scheduling algorithms parallel programming Heuristic algorithms parallel processing Elasticity Dynamic scheduling Supercomputers

来源：评论

学校读者我要写书评

暂无评论

Reuse-Aware Compilation for Zoned Quantum architectures Based on Neutral Atoms

Reuse-Aware Compilation for Zoned Quantum Architectures Base...

引用

IEEE symposium on High-Performance Computer Architecture

作者： Wan-Hsuan Lin Daniel Bochen Tan Jason Cong University of California Los Angeles Department of Physics Harvard University

ISBN: (数字)9798331506476

ISBN: (纸本)9798331506483

Quantum computing architectures based on neutral atoms offer large scales and high-fidelity operations. They can be heterogeneous, with different zones for storage, entangling operations, and readout. Zoned architectures improve computation fidelity by shielding idling qubits in storage from side-effect noise, unlike monolithic architectures where all operations occur in a single zone. However, supporting these flexible architectures with efficient compilation remains challenging. In this paper, we propose ZAC, a scalable compiler for zoned architectures. ZAC minimizes data movement overhead between zones with qubit reuse, i.e., keeping them in the entanglement zone if an immediate entangling operation is pending. Other innovations include novel data placement and instruction scheduling strategies in ZAC, a flexible specification of zoned architectures, and an intermediate representation for zoned architectures, ZAIR. Our evaluation shows that zoned architectures equipped with ZAC achieve a 22x improvement in fidelity compared to monolithic architectures. Moreover, ZAC is shown to have a 10% fidelity gap on average compared to the ideal solution. This significant performance enhancement enables more efficient and reliable quantum circuit execution, enabling advancements in quantum algorithms and applications. ZAC is open source at https://***/UCLAVAST/ZAC

关键词： Technological innovation Quantum algorithm Program processors Scheduling algorithms Qubit Computer architecture Atoms parallel processing Integrated circuit reliability Quantum circuit

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共408页 << < 13 14 15 16 17 18 19 20 21 22 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：