检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

14 篇 会议

馆藏范围

14 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

12 篇 工学
- 11 篇 计算机科学与技术...
- 4 篇 软件工程
- 3 篇 电气工程
- 1 篇 动力工程及工程热...
- 1 篇 信息与通信工程
1 篇 理学
- 1 篇 数学

主题

2 篇 computer archite...
1 篇 performance mode...
1 篇 distributed syst...
1 篇 parallelization
1 篇 registers
1 篇 dataflow archite...
1 篇 mpi
1 篇 exposed data pat...
1 篇 advanced driver-...
1 篇 process design
1 篇 real-time applic...
1 篇 vector processor...
1 篇 reduced instruct...
1 篇 big. little
1 篇 rram
1 篇 manycore archite...
1 篇 national electri...
1 篇 xeon phi
1 篇 cloud computing ...
1 篇 high performance

机构

1 篇 univ politecn ca...
1 篇 univ fed rio gra...
1 篇 dept. of comp. s...
1 篇 university of ci...
1 篇 department of el...
1 篇 george mason uni...
1 篇 intel corp paral...
1 篇 leibniz univ han...
1 篇 department of el...
1 篇 swiss fed inst t...
1 篇 univ fed campina...
1 篇 univ basel basel
1 篇 departament d'ar...
1 篇 univ fed paraiba...
1 篇 univ kaiserslaut...
1 篇 inria bordeaux s...
1 篇 hp labs. 1501 pa...
1 篇 univ politecn ca...
1 篇 virginia tech de...
1 篇 univ grenoble al...

作者

1 篇 heinrich franz c...
1 篇 misra sanchit
1 篇 schneider klaus
1 篇 del sozzo emanue...
1 篇 bacis marco
1 篇 hoefler torsten
1 篇 legrand arnaud
1 篇 m. valero
1 篇 feng wu-chun
1 篇 moss tobias
1 篇 d. lopez
1 篇 j. llosa
1 篇 dinakarrao sai m...
1 篇 farkas keith i.
1 篇 brito alisson v
1 篇 walk frederik
1 篇 nogueira lima an...
1 篇 tullsen dean m.
1 篇 valero m
1 篇 ayguadé e

语言

14 篇 英文

检索条件"任意字段=The 31st ACM Symposium on Parallelism in Algorithms and Architectures"

共 14 条记录，以下是1-10 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

Annual acm symposium on parallelism in algorithms and architectures

Annual ACM Symposium on Parallelism in Algorithms and Archit...

引用

31st acm symposium on parallelism in algorithms and architectures, SPAA 2019

ISBN: (纸本)9781450361842

The proceedings contain 45 papers. The topics discussed include: the price of clustering in bin-packing with applications to bin-packing with delays;faster matrix multiplication via sparse decomposition;NC algorithms for computing a perfect matching, the number of perfect matchings, and a maximum flow in one-crossing-minor-free graphs;improved MPC algorithms for edit distance and ulam distance;brief announcement: scalable diversity maximization via small-size composable core-sets;brief announcement: eccentricities via parallel set cover;dynamic algorithms for the massively parallel computation model;massively parallel computation via remote memory access;and brief announcement: ultra-fast asynchronous randomized rumor spreading.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Adversarial Attack Mitigation Approaches Using RRAM-Neuromorphic architectures 21

Adversarial Attack Mitigation Approaches Using RRAM-Neuromor...

引用

31st Great Lakes symposium on VLSI, GLSVLSI 2021

作者： Barve, Siddharth Shukla, Sanket Dinakarrao, Sai Manoj Pudukotai Jha, Rashmi University of Cincinnati CincinnatiOH United States George Mason University FairfaxVA United States

ISBN: (纸本)9781450383936

The rising trend and advancements in machine learning has resulted into its numerous applications in the field of computer vision, pattern recognition to providing security to hardware devices. Eventhough the proven achievements showcased by advancement in machine learning, one can exploit the vulnerabilities in those techniques by feeding adversaries. Adversarial samples are generated by well crafting and adding perturbations to the normal input samples. There exists majority of the software based adversarial attacks and defenses. In this paper, we demonstrate the effects of adversarial attacks on a reconfigurable RRAM-neuromorphic architecture with different learning algorithms and device characteristics. We also propose an integrated solution for mitigating the effects of the adversarial attack using the reconfigurable RRAM architecture. © 2021 acm.

关键词： RRAM

来源：评论

学校读者我要写书评

暂无评论

A Distributed Functional Verification Environment for the Design of System-on-Chip in Heterogeneous architectures 31

A Distributed Functional Verification Environment for the De...

引用

31st symposium on Integrated Circuits and Systems Design (SBCCI)

作者： Silva, Thiago W. B. Morais, Daniel C. Andrade, Halamo G. R. Nunes, Felipe C. A. Kurt Melcher, Elmar Uwe Nogueira Lima, Antonio Marcus Brito, Alisson, V Univ Fed Campina Grande Campina Grande Brazil Univ Fed Paraiba UFPB Joao Pessoa Paraiba Brazil

ISBN: (纸本)9781538674314

In complex System-on-a-Chip (SoC) projects, the conclusion of the project depends on the functional verification phase, which takes a long time. Synchronizing distributed and heterogeneous components in a functional verification environment might not be a simple task. This work aims to present a distributed verification environment that allows the integration of heterogeneous components. In this environment, it is possible to perform the functional verification of multiple components in heterogeneous architectures in parallel and distributed fashion. For this, an intercommunication framework already developed by the authors was used, based on the High Level Architecture (IEEE 1516) standard. Thus, this article also demonstrates how the proposed architecture abstracts communication and synchronization details to make the functional verification process in distributed components as straightforward as possible. As a demonstration of the developed solution, an experiment is presented with the functional verification of parallel algorithms in GPU and in FPGA, besides the verification using a CPU.

关键词： Functional Verification Distributed System High-level Architecture

来源：评论

学校读者我要写书评

暂无评论

Eliminating Irregularities of Protein Sequence Search on Multicore architectures 31

Eliminating Irregularities of Protein Sequence Search on Mul...

引用

31st IEEE International Parallel and Distributed Processing symposium (IPDPS)

作者： Zhang, Jing Misra, Sanchit Wang, Hao Feng, Wu-Chun Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA Intel Corp Parallel Comp Lab Santa Clara CA 95051 USA

ISBN: (纸本)9781538639146

Finding regions of local similarity between biological sequences is a fundamental task in computational biology. BLAst is the most widely-used tool for this purpose, but it suffers from irregularities due to its heuristic nature. To achieve fast search, recent approaches construct the index from the database instead of the input query. However, database indexing introduces more challenges in the design of index structure and algorithm, especially for data access through the memory hierarchy on modern multicore processors. In this paper, based on existing heuristic algorithms, we design and develop a database indexed BLAst with the identical sensitivity as query indexed BLAst (i.e., NCBI-BLAst). Then, we identify that existing heuristic algorithms of BLAst can result in serious irregularities in database indexed search. To eliminate irregularities in BLAst algorithm, we propose muBLAstP, that uses multiple optimizations to improve data locality and parallel efficiency for multicore architectures and multi-node systems. Experiments on a single node demonstrate up to a 5.1-fold speedup over the multi-threaded NCBI BLAst. For the inter-node parallelism, we achieve nearly linear scaling on up to 128 nodes and gain up to 8.9-fold speedup over mpiBLAst.

关键词： BLAst Pairwise sequence alignment Database index Multicore MPI

来源：评论

学校读者我要写书评

暂无评论

Out-of-Order Execution of Buffered Function Units in Exposed Data Path architectures 31

Out-of-Order Execution of Buffered Function Units in Exposed...

引用

31st IEEE International Parallel and Distributed Processing symposium Workshops (IPDPS)

作者： Jain, Tripti Schneider, Klaus Walk, Frederik Univ Kaiserslautern Dept Comp Sci Kaiserslautern Germany

ISBN: (纸本)9780769561493

Some of the newer processor architectures are no longer based on registers in order to increase their potential of instruction-level parallelism. Instead, they expose their data paths to the compiler so that the program is able to directly move data values between function units using suitable instructions. Some of these architectures require a synchronous transfer of data values while others use asynchronous transfers by buffering values. In this paper, we discuss the out-of-order execution of function units of exposed data path architectures with asynchronous data transfers. The execution of these function units may locally deviate from the program order which is in analogy to dynamic scheduling used by processors with out-of-order execution. Since our out-of-order execution has only effects inside the function units, it requires no modifications of the compiler or instruction set. We have implemented different variants on FPGAs, and evaluated these for a set of application scenarios showing that the out-of-order extension can considerably increase the performance of these architectures.

关键词： exposed data path architecture out-of-order execution

来源：评论

学校读者我要写书评

暂无评论

static versus Dynamic Task Scheduling of the LU Factorization on ARM *** architectures 31

Static versus Dynamic Task Scheduling of the LU Factorizatio...

引用

31st IEEE International Parallel and Distributed Processing symposium Workshops (IPDPS)

作者： Catalan, Sandra Rodriguez-Sanchez, Rafael Quintana-Orti, Enrique S. Herrero, Jose R. Univ Jaume I Castellon Dept Ingn & Ciencia Comp Castellon de La Plana Spain Univ Politecn Cataluna Dept Arquitectura Comp Barcelona Spain

ISBN: (纸本)9780769561493

We investigate several parallel algorithmic variants of the LU factorization with partial pivoting (LUpp) that trade off the exploitation of increasing levels of task-parallelism in exchange for a more cache-oblivious execution. In particular, our first variant corresponds to the classical implementation of LUpp in the legacy version of LAPACK, which constrains the concurrency exploited to that intrinsic to the basic linear algebra kernels that appear during the factorization, but exerts an strict control of the cache memory and a static mapping of kernels to cores. A second variant relaxes this task-constrained scenario by introducing a look-ahead of depth one to increase task-parallelism, increasing the pressure on the cache system in terms of cache misses. Finally, the third variant orchestrates an execution where the degree of concurrency is only limited by the actual data dependencies in LUpp, potentially yielding to a higher volume of conflicts due to competition for the cache memory resources. The target platform for our implementations and experiments is a specific asymmetric multicore processor (AMP) from ARM, which introduces the additional scheduling complexity of having to deal with two distinct types of cores;and an L2-shared cache per cluster of the AMP, which results in more conflictivity in the access to this key cache level.

关键词： Dense linear algebra LU factorization task parallelism asymmetric multicore processors high performance

来源：评论

学校读者我要写书评

暂无评论

A Pipelined and Scalable Dataflow Implementation of Convolutional Neural Networks on FPGA 31

A Pipelined and Scalable Dataflow Implementation of Convolut...

引用

31st IEEE International Parallel and Distributed Processing symposium Workshops (IPDPS)

作者： Bacis, Marco Natale, Giuseppe Del Sozzo, Emanuele Santambrogio, Marco Domenico Politecn Milan Dipartimento Elettron Informaz & Bioingn Milan Italy

ISBN: (纸本)9780769561493

Convolutional Neural Network (CNN) is a deep learning algorithm extended from Artificial Neural Network (ANN) and widely used for image classification and recognition, thanks to its invariance to distortions. The recent rapid growth of applications based on deep learning algorithms, especially in the context of Big Data analytics, has dramatically improved both industrial and academic research and exploration of optimized implementations of CNNs on accelerators such as GPUs, FPGAs and ASICs, as general purpose processors can hardly meet the ever increasing performance and energy-efficiency requirements. FPGAs in particular are one of the most attractive alternative, as they allow the exploitation of the implicit parallelism of the algorithm and the acceleration of the different layers of a CNN with custom optimizations, while retaining extreme flexibility thanks to their reconfigurability. In this work, we propose a methodology to implement CNNs on FPGAs in a modular, scalable way. This is done by exploiting the dataflow pattern of convolutions, using an approach derived from previous work on the acceleration of Iterative stencil Loops (ISLs), a computational pattern that shares some characteristics with convolutions. Furthermore, this approach allows the implementation of a high-level pipeline between the different network layers, resulting in an increase of the overall performance when the CNN is employed to process batches of multiple images, as it would happen in real-life scenarios.

关键词： Field Programmable Gate Arrays Convolutional Neural Networks Dataflow architectures

来源：评论

学校读者我要写书评

暂无评论

Capability Models for Manycore Memory Systems: A Case-study with Xeon Phi KNL 31

Capability Models for Manycore Memory Systems: A Case-Study ...

引用

31st IEEE International Parallel and Distributed Processing symposium (IPDPS)

作者： Ramos, Sabela Hoefler, Torsten Swiss Fed Inst Technol Dept Comp Sci Scalable Parallel Comp Lab Zurich Switzerland

ISBN: (纸本)9781538639146

Increasingly complex memory systems and onchip interconnects are developed to mitigate the data movement bottlenecks in manycore processors. One example of such a complex system is the Xeon Phi KNL CPU with three different types of memory, fifteen memory configuration options, and a complex on-chip mesh network connecting up to 72 cores. Users require a detailed understanding of the performance characteristics of the different options to utilize the system efficiently. Unfortunately, peak performance is rarely achievable and achievable performance is hardly documented. We address this with capability models of the memory subsystem, derived by systematic measurements, to guide users to navigate the complex optimization space. As a case study, we provide an extensive model of all memory configuration options for Xeon Phi KNL. We demonstrate how our capability model can be used to automatically derive new close-to-optimal algorithms for various communication functions yielding improvements 5x and 24x over Intel's tuned OpenMP and MPI implementations, respectively. Furthermore, we demonstrate how to use the models to assess how efficiently a bitonic sort application utilizes the memory resources. Interestingly, our capability models predict and explain that the high bandwidthMCDRAM does not improve the bitonic sort performance over DRAM.

关键词： Cache coherence memory hierarchy manycore architectures performance modeling

来源：评论

学校读者我要写书评

暂无评论

Characterizing the Performance of Modern architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way 31

Characterizing the Performance of Modern Architectures Throu...

引用

31st IEEE International Parallel and Distributed Processing symposium Workshops (IPDPS)

作者： stanisic, Luka Schnorr, Lucas Mello Degomme, Augustin Heinrich, Franz C. Legrand, Arnaud Videau, Brice Inria Bordeaux Sud Ouest Bordeaux France Univ Fed Rio Grande do Sul Informat Inst Porto Alegre RS Brazil Univ Basel Basel Switzerland Univ Grenoble Alpes CNRS Inria Grenoble France

ISBN: (纸本)9780769561493

Determining key characteristics of High Performance Computing machines that allow users to predict their performance is an old and recurrent dream. This was, for example, the rationale behind the design of the LogP model that later evolved into many variants (LogGP, LogGPS, LoGPS,...) to cope with the evolution and complexity of network technology. Although the network has received a lot of attention, predicting the performance of computation kernels can be very challenging as well. In particular, the tremendous increase of internal parallelism and deep memory hierarchy in modern multi-core architectures often limits applications by the memory access rate. In this context, determining the key characteristics of a machine such as the peak bandwidth of each cache level as well as how an application uses such memory hierarchy can be the key to predict or to extrapolate the performance of applications. Based on such performance models, most high-level simulation-based frameworks separately characterize a machine and an application, later convolving both signatures to predict the overall performance. We evaluate the suitability of such approaches to modern architectures and applications by trying to reproduce the work of others. When trying to build our own framework, we realized that, regardless of the quality of the underlying models or software, most of these frameworks rely on "opaque" benchmarks to characterize the platform. In this article, we report the many pitfalls we encountered when trying to characterize both the network and the memory performance of modern machines. We claim that opaque benchmarks that do not clearly separate experiment design, measurements, and analysis should be avoided as much as possible in a modeling context. Likewise, an a priori identification of experimental factors should be done to make sure the experimental conditions are adequate.

关键词： Forecasting

来源：评论

学校读者我要写书评

暂无评论

Portable Implementation of Advanced Driver-Assistance algorithms on Heterogeneous architectures 31

Portable Implementation of Advanced Driver-Assistance Algori...

引用

31st IEEE International Parallel and Distributed Processing symposium Workshops (IPDPS)

作者： Arndt, Oliver Jakob Traeger, Fabian David Moss, Tobias Blume, Holger Leibniz Univ Hannover Inst Microelect Syst D-30167 Hannover Germany

ISBN: (纸本)9780769561493

The increased use of application-specific computational devices turns even low-power chips into high-performance computers. Not only additional accelerators (e.g., GPU, DSP, or even FPGA), but also heterogeneous CPU clusters form modern computer systems. Programming these chips is however challenging, due to management overhead, data transfer delays, and a missing unification of the programming flow. Moreover, most accelerators require device specific optimizations. Thus, for application developers, fulfilling software's initial intention to serve high portability is one of the most ambitious objectives. In this work, we present a software abstraction layer unifying the programming flow for parallel and heterogeneous platforms. Therefore, we offer a generic C++ API for parallelizing on heterogeneous CPU clusters and offloading to accelerators, specifically addressing applications with strict real-time constraints. At a free configurable choice of parallelization- and offloading-frameworks (e.g., TBB, OpenCL) without affecting the portability, we also include automatic profiling methods. While offering high configurability of the architecture mapping, these methods ease the development of optimum scheduling strategies - e.g., in terms of power, throughput, or latency. To demonstrate the use of the proposed methods, we present heterogeneous implementations of the Semi-Global Matching and Histograms of Oriented Gradients algorithms as exemplary advanced driver-assistance algorithms. We provide an in-depth discussion of scheduling strategies for execution on a Samsung Exynos 5422 MPSoC, an Intel Xeon Phi manycore, and a general-purpose processor equipped with a Nallatech PCIe-385N FPGA accelerator card.

关键词： parallelization MPSoC big. LITTLE Xeon Phi advanced driver-assistance systems real-time applications

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共2页 << < 1 2 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：