检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

3,806 篇 会议
176 篇 期刊文献
81 册 图书
1 篇 学位论文

馆藏范围

4,064 篇 电子文献
1 种 纸本馆藏

日期分布

学科分类号

2,084 篇 工学
- 1,902 篇 计算机科学与技术...
- 1,022 篇 软件工程
- 367 篇 电气工程
- 150 篇 信息与通信工程
- 137 篇 电子科学与技术（可...
- 75 篇 控制科学与工程
- 30 篇 机械工程
- 30 篇 生物工程
- 24 篇 材料科学与工程（可...
- 24 篇 生物医学工程（可授...
- 22 篇 仪器科学与技术
- 21 篇 光学工程
- 19 篇 建筑学
- 17 篇 测绘科学与技术
- 16 篇 土木工程
- 13 篇 动力工程及工程热...
- 12 篇 农业工程
523 篇 理学
- 415 篇 数学
- 51 篇 物理学
- 39 篇 系统科学
- 33 篇 生物学
- 29 篇 统计学（可授理学、...
- 16 篇 化学
- 16 篇 地球物理学
206 篇 管理学
- 154 篇 管理科学与工程(可...
- 61 篇 工商管理
- 53 篇 图书情报与档案管...
19 篇 农学
- 14 篇 作物学
18 篇 法学
- 18 篇 社会学
15 篇 经济学
- 15 篇 应用经济学
13 篇 医学
3 篇 文学
3 篇 军事学
2 篇 教育学
2 篇 艺术学
1 篇 哲学

主题

644 篇 parallel process...
544 篇 parallel program...
526 篇 computer archite...
461 篇 parallel archite...
448 篇 concurrent compu...
358 篇 parallel algorit...
319 篇 programming
312 篇 hardware
282 篇 computer science
275 篇 algorithm design...
263 篇 computational mo...
214 篇 programming prof...
166 篇 parallel process...
166 篇 dynamic programm...
154 篇 application soft...
138 篇 program processo...
138 篇 costs
138 篇 distributed comp...
134 篇 libraries
133 篇 runtime

机构

9 篇 stanford univ st...
9 篇 intel corporatio...
8 篇 oak ridge natl l...
8 篇 univ calif berke...
7 篇 school of comput...
7 篇 oak ridge nation...
7 篇 carnegie mellon ...
7 篇 college of compu...
7 篇 oak ridge nation...
7 篇 univ texas austi...
6 篇 school of comput...
6 篇 sandia national ...
6 篇 department of co...
6 篇 barcelona superc...
6 篇 department of co...
6 篇 department of co...
5 篇 department of co...
5 篇 nvidia corporati...
5 篇 pacific northwes...
5 篇 georgia institut...

作者

15 篇 jack dongarra
12 篇 dongarra jack
10 篇 hoefler torsten
10 篇 hong shen
9 篇 zhong cheng
9 篇 olukotun kunle
9 篇 gu yan
8 篇 chapman barbara
7 篇 garcia i.
7 篇 forsell martti
7 篇 sun yihan
7 篇 jigang wu
7 篇 nakano koji
7 篇 danelutto marco
6 篇 cheng zhong
6 篇 v.k. prasanna
6 篇 blelloch guy e.
6 篇 h.j. siegel
6 篇 lumsdaine andrew
6 篇 tsigas philippas

语言

4,029 篇 英文
27 篇 其他
9 篇 中文

检索条件"任意字段=International Symposium on Parallel Architectures, Algorithms, and Programming"

共 4064 条记录，以下是4041-4050 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

PSO Optimized Design of Error Balanced Weight Stationary Systolic Array Architecture for CNN

PSO Optimized Design of Error Balanced Weight Stationary Sys...

引用

IEEE international symposium on Quality Electronic Design

作者： Dantu Nandini Devi Gandi Ajay Kumar Bindu G Gowda Madhav Rao IIIT-Bangalore Bangalore India

ISBN: (数字)9798350309270

ISBN: (纸本)9798350309287

The utilization of hardware-designed approximate computing in Convolutional Neural Networks (CNNs) offers notable advantages, including accelerated performance, enhanced power efficiency, and a compact design footprint. Systolic Array (SA) architectures, optimized for matrix multiplication and convolution operations, have been extensively studied in the context of stand-alone image processing applications. However, their potential for CNN workloads has not been thoroughly assessed. SAs consist of an array of Processing Elements (PEs) structured to perform product operations and accumulations. Incorporating inexact computing units in the SA introduces deviations from precise results, posing a challenge for sustaining hardware accelerator designs in CNN workloads. This paper presents a strategy for the optimal placement of both positive and negative error-distributed multipliers as PE elements to create an error-diluted SA structure. The proposed strategy to structure SA is evaluated for prewitt filter and three other filters extracted from first layer of *** paper introduces an optimization framework for selecting the most suitable PEs from a pool of positive and negative error-distributed multipliers, aiming to achieve a balance between hardware efficiency and image quality metrics. Furthermore, the framework and hardware design files are made available for further usage to the designers’ and researchers community.

关键词： Costs Computer architecture parallel processing Filtering algorithms Hardware Systolic arrays Delays

来源：评论

学校读者我要写书评

暂无评论

Linear-Mark: Locality vs. Accuracy in Mark-Sweep Garbage Collection 23

Linear-Mark: Locality vs. Accuracy in Mark-Sweep Garbage Col...

引用

Proceedings of the international symposium on Memory Systems

作者： Chiara Meiohas Stephen M. Blackburn Erez Petrank Technion Israel Google Australia

ISBN: (纸本)9798400716447

Tracing garbage collectors are widely deployed in modern programming languages. But tracing an arbitrary heap shape incurs poor locality and may hinder scalability. In this paper, we explore an avenue for mitigating these inefficiencies at the expense of conservative, less accurate identification of live objects. We do this by proposing and studying an alternative to the Mark-Sweep tracing algorithm, called Linear-Mark. It turns out that although Linear-Mark improves locality and scalability, the accuracy of Mark-Sweep outweighs the achieved enhancements. We present the Linear-Mark garbage-collecting algorithm and provide an evaluation that highlights the trade-offs between the Linear-Mark and the Mark-Sweep approaches. Our hope is that this research will inspire further algorithmic improvements, ultimately leading to better garbage collection algorithms.

关键词： Mark-Sweep garbage collector automatic memory management garbage collection parallel garbage collection

来源：评论

学校读者我要写书评

暂无评论

RAMP: research accelerator for multiple processors - a community vision for a shared experimental parallel HW/SW platform

RAMP: research accelerator for multiple processors - a commu...

引用

IEEE international symposium on Performance Analysis of Systems and Software

作者： D.A. Patterson University of California Berkeley USA

Summary form only given. The vast majority of computer architects believe the future of the microprocessor is hundreds to thousands of processors ("cores") on a chip. Given such widespread agreement, it's surprising how much research remains to be done in algorithms, computer architecture, networks, operating systems, file systems, compilers, programming languages, applications, and so on to realize this vision. Fortunately, Moore's law has not only enabled dense multi-core chips, it has also enabled extremely dense FPGAs. Today, one to two dozen soft cores can be programmed into a single FPGA. With multiple FPGAs on a board and multiple boards in a system, 1000-processor designs can be economically and rapidly explored. To make this happen, however, requires a significant amount of infrastructure in hardware, software, and what we call "gateware", the register-transfer level models that fill the FGPAs. By using the Berkeley Emulation Engine boards that were created for other purposes, the hardware is already done. A group of architects plan to design the gateware, create this infrastructure, and share the results in an open-source fashion so that every institution could have their own. Such a system would not just invigorate multiprocessors research in the architecture community. Since processors cores can run at 100 to 200 MHz, a large scale multiprocessor would be fast enough to run operating systems and large programs at speeds sufficient to support software research. Moreover, there is a new generation of FPGAs every 18 months with capacity for twice as many cores and run them faster, so future multiboard FPGA systems are even more attractive. Hence, we believe such a system would accelerate research across all the fields that touch multiple processors. Thus the acronynm RAMP, for Research Accelerator for Multiple Processors. RAMP has the potential to transform the parallel computing community in computer science from a simulation-driven to a prototype-driven d

关键词：

来源：评论

学校读者我要写书评

暂无评论

Online Live VM Migration algorithms to Minimize Total Migration Time and Downtime

Online Live VM Migration Algorithms to Minimize Total Migrat...

引用

international symposium on parallel and Distributed Processing (IPDPS)

作者： Nikos Tziritas Thanasis Loukopoulos Samee U. Khan Cheng-Zhong Xu Albert Y. Zomaya Cloud Computing Center Shenzhen Institutes of Advanced Technology Shenzhen China Comp. Science and Biomedical Informatics University of Thessaly Lamia Greece Electrical and Computer Engin. North Dakota State University Fargo USA Dept. of Computer and Information Science University of Macau Macau SAR China School of Information Technologies University of Sydney Sydney Australia

Virtual machine (VM) migration is a widely used technique in cloud computing systems to increase reliability. There are also many other reasons that a VM is migrated during its lifetime, such as reducing energy consumption, improving performance, maintenance, etc. During a live VM migration, the underlying VM continues being up until all or part of its data has been transmitted from source to destination. The remaining data are transmitted in an off-line manner by suspending the corresponding VM. The longer the off-line transmission time, the worse the performance of the respective VM. The above is because during the off-line data transmission, the VM service is down. Because a running VM's memory is subject to changes, already transmitted data pages may get dirtied and thus needing re-transmission. The decision of when suspending the VM is not a trivial task at all. The above is justified by the fact that when suspending the VM early we may result in transmitting off-line a significant amount of data degrading thus the VM's performance. On the other hand, a long waiting time to suspend the VM may result in re-transmitting a huge amount of dirty data, leading in that way to waste of resources. In this paper, we tackle the joint problem of minimizing both the total VM migration time (reflecting the resources spent during a migration) and the VM downtime (reflecting the performance degradation). The aforementioned objective functions are weighted according to the needs of the underlying cloud provider/user. To tackle the problem, we propose an online deterministic algorithm resulting in an strong competitive ratio, as well as a randomized online algorithm achieving significantly better results against the deterministic algorithm.

关键词： Cloud computing Linear programming Degradation Bandwidth Reliability Energy consumption Maintenance engineering

来源：评论

学校读者我要写书评

暂无评论

Neuromorphic processing: a new frontier in scaling computer architecture 14

Neuromorphic processing: a new frontier in scaling computer ...

引用

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

作者： Jeff Gehlhaar Qualcomm Technologies Inc. San Diego CA USA

ISBN: (纸本)9781450323055

The desire to build a computer that operates in the same manner as our brains is as old as the computer itself. Although computer engineering has made great strides in hardware performance as a result of Dennard scaling, and even great advances in 'brain like' computation, the field still struggles to move beyond sequential, analytical computing architectures. Neuromorphic systems are being developed to transcend the barriers imposed by silicon power consumption, develop new algorithms that help machines achieve cognitive behaviors, and both exploit and enable further research in neuroscience. In this talk I will discuss a system im-plementing spiking neural networks. These systems hold the promise of an architecture that is event based, broad and shallow, and thus more power efficient than conventional computing solu-tions. This new approach to computation based on modeling the brain and its simple but highly connected units presents a host of new challenges. Hardware faces tradeoffs such as density or lower power at the cost of high interconnection overhead. Consequently, software systems must face choices about new language design. Highly distributed hardware systems require complex place and route algorithms to distribute the execution of the neural network across a large number of highly interconnected processing units. Finally, the overall design, simulation and testing process has to be entirely reimagined. We discuss these issues in the context of the Zeroth processor and how this approach compares to other neuromorphic systems that are becoming available.

关键词： spiking neural network parallel computing low power zeroth neural network neuromorphic

来源：评论

学校读者我要写书评

暂无评论

Deterministic Atomic Buffering

Deterministic Atomic Buffering

引用

IEEE/ACM international symposium on Microarchitecture (MICRO)

作者： Yuan Hsi Chou Christopher Ng Shaylin Cattell Jeremy Intan Matthew D. Sinclair Joseph Devietti Timothy G. Rogers Tor M. Aamodt University of British Columbia University of Wisconsin AMD Research University of Pennsylvania Purdue University

ISBN: (数字)9781728173832

ISBN: (纸本)9781728173849

Deterministic execution for GPUs is a desirable property as it helps with debuggability and reproducibility. It is also important for safety regulations, as safety critical workloads are starting to be deployed onto GPUs. Prior deterministic architectures, such as GPUDet, attempt to provide strong determinism for all types of workloads, incurring significant performance overheads due to the many restrictions that are required to satisfy determinism. We observe that a class of reduction workloads, such as graph applications and neural architecture search for machine learning, do not require such severe restrictions to preserve determinism. This motivates the design of our system, Deterministic Atomic Buffering (DAB), which provides deterministic execution with low area and performance overheads by focusing solely on ordering atomic instructions instead of all memory instructions. By scheduling atomic instructions deterministically with atomic buffering, the results of atomic operations are isolated initially and made visible in the future in a deterministic order. This allows the GPU to execute deterministically in parallel without having to serialize its threads for atomic operations as opposed to GPUDet. Our simulation results show that, for atomic-intensive applications, DAB performs 4× better than GPUDet and incurs only a 23% slowdown on average compared to a non-deterministic GPU architecture. We also characterize the bottlenecks and provide insights for future optimizations.

关键词： Simulation Semantics Graphics processing units Machine learning programming Regulation Safety

来源：评论

学校读者我要写书评

暂无评论

Using Arm Scalable Vector Extension to Optimize OPEN MPI

Using Arm Scalable Vector Extension to Optimize OPEN MPI

引用

IEEE/ACM international symposium on Cluster Computing and the Grid (CCGRID)

作者： Dong Zhong Pavel Shamis Qinglei Cao George Bosilca Shinji Sumimoto Kenichi Miura Jack Dongarra Innovative Computing Laboratory The University of Tennessee US Arm Fujitsu Ltd

ISBN: (数字)9781728160955

ISBN: (纸本)9781728196497

As the scale of high-performance computing (HPC) systems continues to grow, increasing levels of parallelism must be implored to achieve optimal performance. Recently, the processors support wide vector extensions, vectorization becomes much more important to exploit the potential peak performance of target architecture. Novel processor architectures, such as the Armv8-A architecture, introduce Scalable Vector Extension (SVE) - an optional separate architectural extension with a new set of A64 instruction encodings, which enables even greater parallelisms. In this paper, we analyze the usage and performance of the SVE instructions in Arm SVE vector Instruction Set Architecture (ISA); and utilize those instructions to improve the memcpy and various local reduction operations. Furthermore, we propose new strategies to improve the performance of MPI operations including datatype packing/unpacking and MPI reduction. With these optimizations, we not only provide a higher-parallelism for a single node, but also achieve a more efficient communication scheme of message exchanging. The resulting efforts have been implemented in the context of OPEN MPI, providing efficient and scalable capabilities of SVE usage and extending the possible implementations of SVE to a more extensive range of programming and execution paradigms. The evaluation of the resulting software stack under different scenarios with both simulator and Fujitsu's A64FX processor demonstrates that the solution is at the same time generic and efficient.

关键词： Computer architecture Hardware Instruction sets Optimization parallel processing

来源：评论

学校读者我要写书评

暂无评论

Towards Embedded Heterogeneous FPGA-GPU Smart Camera architectures for CNN Inference 2019

Towards Embedded Heterogeneous FPGA-GPU Smart Camera Archite...

引用

Proceedings of the 13th international Conference on Distributed Smart Cameras

作者： Walther Carballo-Hernández François Berry Maxime Pelcat Miguel Arias-Estrada Department of Images Perception Systems and Robotics Institut Pascal Aubière France Department of Images Institut National des Sciences Appliquées (INSA) des Rennes IETR UMR CNRS Rennes France Department of Computer Science Instituto Nacional de Astrofísica (Óptica y Electrónica (INAOE) Puebla Mexico

ISBN: (纸本)9781450371896

The success of Deep Learning (DL) algorithms in computer vision tasks have created an on-going demand of dedicated hardware architectures that could keep up with the their required computation and memory complexities. This task is particularly challenging when embedded smart camera platforms have constrained resources such as power consumption, Processing Element (PE) and communication. This article describes a heterogeneous system embedding an FPGA and a GPU for executing CNN inference for computer vision applications. The built system addresses some challenges of embedded CNN such as task and data partitioning, and workload balancing. The selected heterogeneous platform embeds an Nvidia® Jetson TX2 for the CPU-GPU side and an Intel Altera® Cyclone10GX for the FPGA side interconnected by PCIe Gen2 with a MIPI-CSI camera for prototyping. This test environment will be used as a support for future work on a methodology for optimized model partitioning.

关键词： Internet of Things Field Programmable Gate Array (FPGA) Graphic Processing Unit (GPU) Deep Learning (DL) Edge Computing Processing Elements (PE) Artificial Neural Networks (ANN) Single Instruction Multiple Data Convolutional Neural Networks (CNN) Models of Computation and Architecture Pipelining Heterogeneous Computing parallel programming

来源：评论

学校读者我要写书评

暂无评论

Data management for large-scale scientific computations in high performance distributed systems

Data management for large-scale scientific computations in h...

引用

international symposium on High Performance Distributed Computing

作者： A. Choudhary M. Kandemir H. Nagesh J. No X. Shen V. Taylor S. More R. Thakur Center for Parallel and Distributed Computing Department of Electrical and Computer Engineering Northwestern University Evanston IL USA Mathematics and Computer Science Division Argonne National Laboratory Argonne IL USA

With the increasing number of scientific applications manipulating huge amounts of data, effective data management is an increasingly important problem. Unfortunately, so far the solutions to this data management problem either require deep understanding of specific storage architectures and file layouts (as in high-performance file systems) or produce unsatisfactory I/O performance in exchange for ease-of-use and portability (as in relational DBMSs). In this paper we present a new environment which is built around an active meta-data management system (MDMS). The key components of our three-tiered architecture are user application, the MDMS, and a hierarchical storage system (HSS). Our environment overcomes the performance problems of pure database-oriented solutions, while maintaining their advantages in terms of ease-of-use and portability. The high levels of performance are achieved by the MDMS, with the aid of user-specified directives. Our environment supports a simple, easy-to-use yet powerful user interface, leaving the task of choosing appropriate I/O techniques to the MDMS. We discuss the importance of an active MDMS and show how the three components, namely application, the MDMS, and the HSS, fit together. We also report performance numbers from our initial implementation and illustrate that significant improvements are made possible without undue programming effort.

关键词： Large-scale systems Distributed computing High performance computing Data visualization Engineering management Concurrent computing Relational databases Art Data analysis Image analysis

来源：评论

学校读者我要写书评

暂无评论

Exploiting computing power of Xeon and Intel Xeon Phi for a Molecular Dynamics Application 15

Exploiting computing power of Xeon and Intel Xeon Phi for a ...

引用

Simulation Multiconference

作者： Benny Mathew Nitin Rai Apaar Gupta Amit Harode Tata Consultancy Services

ISBN: (纸本)9781510801011

Molecular Dynamics (MD) is a computational technique with applicability in fields as diverse as material science, biomolecules and chemical physics. Assisted Model Building with Energy Refinement (AMBER) is an MD package and it uses Message Passing Interface (MPI) to scale in multi-core and cluster environments. In our earlier work [1], we modified one of AMBER's algorithms called Generalized Born (GB) algorithm to run optimally on the Xeon Phi co-processor. This improved performance by 277% on the co-processor. The same changes improved performance on the host server by 80%. In this paper, we extend our earlier work and implement a symmetric solution using both the host server and the co-processor. Since the calculations in GB algorithm involve interactions between all possible atom combinations, it has been very difficult to scale GB algorithm in distributed memory. We evaluate various alternate techniques using combination of MPI and Open Multi-Processing (OpenMP) to get a scalable solution that utilizes the computing power of both the host server as well as the co-processor.

关键词： Molecular dynamics AMBER HPC Performance Intel Xeon Phi Nucleosome Generalized Born parallel programming OpenMP MPI message passing amber Coprocessors mannose phosphate isomerase remote procedure calls parallel programming Molecular Dynamics Nucleosomes High Performance Computing computational power

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共407页 << < 398 399 400 401 402 403 404 405 406 407 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：