检索结果-内蒙古大学图书馆

您好，读者！请登录

内蒙古大学图书馆

首页
概况
党建
资源
服务
科研支持
- 论文收录引用证明
- 科技查新
知识产权
档案馆
帮助

咨询与建议

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

您的常用邮箱：*

您的手机号码：*

问题描述：

当前已输入0个字，您还可以输入200个字

全部搜索
期刊论文
图书
学位论文
标准
纸本馆藏
外文资源发现
数据库导航
超星发现

高级检索

分类表

所选分类

>> <<

限定检索结果

标题

标题
作者
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

作者

作者
标题
主题词
出版物名称
出版社
机构
学科分类号
摘要
ISBN
ISSN
基金资助
索书号

文献类型

160 篇 会议
2 篇 期刊文献

馆藏范围

162 篇 电子文献
0 种 纸本馆藏

日期分布

学科分类号

87 篇 工学
- 66 篇 计算机科学与技术...
- 36 篇 软件工程
- 17 篇 电子科学与技术（可...
- 9 篇 电气工程
- 8 篇 信息与通信工程
- 4 篇 控制科学与工程
- 1 篇 机械工程
- 1 篇 仪器科学与技术
- 1 篇 化学工程与技术
- 1 篇 生物医学工程（可授...
- 1 篇 生物工程
- 1 篇 网络空间安全
21 篇 理学
- 17 篇 数学
- 1 篇 化学
- 1 篇 地球物理学
- 1 篇 生物学
- 1 篇 系统科学
- 1 篇 统计学（可授理学、...
4 篇 管理学
- 2 篇 管理科学与工程(可...
- 2 篇 图书情报与档案管...
- 1 篇 工商管理
1 篇 经济学
- 1 篇 应用经济学
1 篇 教育学
- 1 篇 教育学

主题

40 篇 parallel process...
36 篇 concurrent compu...
31 篇 computer archite...
24 篇 parallel algorit...
21 篇 hardware
17 篇 algorithm design...
15 篇 computer science
15 篇 parallel program...
11 篇 parallel process...
11 篇 delay
10 篇 application soft...
10 篇 signal processin...
10 篇 field programmab...
8 篇 parallel archite...
8 篇 laboratories
8 篇 processor schedu...
7 篇 scalability
7 篇 hypercubes
7 篇 network topology
7 篇 performance anal...

机构

3 篇 univ of aizu fuk...
3 篇 hong kong polyte...
2 篇 university of ai...
2 篇 aizu daigaku aiz...
2 篇 wuhan univ sch c...
2 篇 school of comput...
2 篇 department of el...
2 篇 department of co...
2 篇 university of ai...
2 篇 wuhan univ state...
1 篇 faculty of mathe...
1 篇 department of in...
1 篇 osaka metropolit...
1 篇 school of applie...
1 篇 departmentcomput...
1 篇 the departments ...
1 篇 computer researc...
1 篇 department of co...
1 篇 fakultat fur mat...
1 篇 department of co...

作者

3 篇 t.l. kunii
3 篇 cao jn
3 篇 li xh
3 篇 kunii tosiyasu l...
3 篇 he yx
2 篇 v. varshavsky
2 篇 t. nakamura
2 篇 huang runhe
2 篇 t. ikedo
2 篇 ma jun
2 篇 iwama kazuo
2 篇 s. zimmermann
2 篇 a. marongiu
2 篇 cong jason
2 篇 a.p. vazhenin
2 篇 h. kobayashi
2 篇 k. iwama
2 篇 qian-ping gu
2 篇 p. palazzari
2 篇 y. wong

语言

161 篇 英文
1 篇 中文

检索条件"任意字段=Aizu International Symposium on Parallel Algorithms/Architecture Synthesis"

共 162 条记录，以下是1-10 订阅

全选清除本页清除全部题录导出标记到"检索档案"

详细简洁

排序：

相关度排序

相关度排序
时效性降序
时效性升序

A Novel DA-Based parallel architecture for Inner-Product of Variable Vectors

A Novel DA-Based Parallel Architecture for Inner-Product of ...

引用

IEEE international symposium on Circuits and Systems (ISCAS)

作者： Kali, Anil Sabat, Samrat L. Mehert, Pramod K. Univ Hyderabad CASEST Hyderabad India CV Raman Global Univ Dept Comp Sci & Engn Bhubaneswar India

ISBN: (纸本)9798350330991;9798350331004

Computation of the inner products is frequently used in machine learning (ML) algorithms apart from signal processing and communication applications. Distributed arithmetic (DA) has been frequently employed for area-time efficient inner-product implementations. In conventional DA-based architectures, one of the vectors is constant and known a priori. Hence, the traditional DA architectures are not suitable when both vectors are variable. However, computing the inner product of a pair of variable vectors is frequently used for matrix multiplication of various forms and convolutional neural networks. In this paper, we present a novel DA-based architecture for computing the inner product of variable vectors. To derive the proposed architecture, the inner product of any given length is decomposed into a set of short-length inner products, such that the inner product could be computed by successive accumulation of the results of shortlength inner products. We have designed a DA-based architecture for the computation of the short-length inner-product of variable vectors and used that in successive clock cycles to compute the whole inner-product by successive accumulation. The post-layout synthesis results using Cadence Innovus with a GPDK 90nm technology library show that the proposed DA-based parallel architecture offers significant advantages in area-delay product and energy consumption over the bit-serial DA architecture.

关键词： parallel distributed arithmetic Inner-product Radix-4 modified Booth encoding Adder tree

来源：评论

学校读者我要写书评

暂无评论

High-efficiency Reconfigurable Crypto Accelerator Utilizing Innovative Resource Sharing and parallel Processing 16

High-efficiency Reconfigurable Crypto Accelerator Utilizing ...

引用

16th IEEE international symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)

作者： Le, Vu Trung Duong Pham, Hoai Luan Tran, Thi Hong Duong, Thi Sang Nakashima, Yasuhiko Nara Inst Sci & Technol 8916-5 Takayama Cho Ikoma Nara 6300192 Japan Osaka Metropolitan Univ Osaka 5588585 Japan

ISBN: (纸本)9798350393613

In decentralized IoT ecosystems, four cryptographic algorithms, including SHA256, BLAKE256, BLAKE2s, and Chacha20, are principal to ensure data integrity and confidentiality. However, existing cryptographic hardware is often limited to supporting a single algorithm and suffers from low performance, which falls short of meeting the diverse requirements of these systems. To address these limitations, we introduce a reconfigurable crypto accelerator (RCA) that offers high flexibility, superior performance, and optimal hardware efficiency. Our RCA includes three novel optimizations, specifically, a homogeneous multi-core architecture, a register-adder sharing approach, and a multi-level pipeline scheduler. The RCA was successfully verified and implemented at the system-on-chip level on a ZCU102 FPGA. The real-time performance evaluation of the RCA, during the execution of various cryptographic algorithms, demonstrates an energy efficiency ranging from 94.3-160.4 Mbps/W, which is 3.1-10.5 times higher compared to modern CPUs. Experiments conducted on several FPGAs show that the RCA is higher flexibility while still outperforming previous works by 1.63-31.65 times in throughput and 1.04-2.76 times in area efficiency. Furthermore, in ASIC synthesis, the RCA exhibits exceptional throughput (48.79-92.16 Gbps), area efficiency (66.2-102.31 Gbps/mm2), and energy efficiency (186.22-287.8 Gbps/W), surpassing other related ASICbased works.

关键词： Multi-core Cryptography FPGA ASIC SHA256 Homogeneous

来源：评论

学校读者我要写书评

暂无评论

Proceedings - 2022 IEEE 36th international parallel and Distributed Processing symposium Workshops, IPDPSW 2022

Proceedings - 2022 IEEE 36th International Parallel and Dist...

引用

36th IEEE international parallel and Distributed Processing symposium Workshops, IPDPSW 2022

ISBN: (纸本)9781665497473

The proceedings contain 148 papers. The topics discussed include: heterogeneous architecture for sparse data processing;combined application of approximate computing techniques in DNN hardware accelerators;highly efficient ALLTOALL and ALLTOALLV communication algorithms for GPU systems;implementing spatio-temporal graph convolutional networks on graphcore IPUs;the best of many worlds: scheduling machine learning inference on CPU-GPU integrated architectures;online learning RTL synthesis for automated design space exploration;machine learning aided hardware resource estimation for FPGA DNN implementations;optimal schedules for high-level programming environments on FPGAs with constraint programming;on how to push efficient medical semantic segmentation to the edge: the SENECA approach;and exploiting high-bandwidth memory for FPGA-acceleration of inference on sum-product networks.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Novel DA-Based parallel architecture for Inner-Product of Variable Vectors

A Novel DA-Based Parallel Architecture for Inner-Product of ...

引用

IEEE international symposium on Circuits and Systems (ISCAS)

作者： Anil Kali Samrat L. Sabat Pramod K. Meher CASEST University of Hyderabad Hyderabad India Department of Computer Science & Engineering C. V. Raman Global University Bhubaneswar India

ISBN: (数字)9798350330991

ISBN: (纸本)9798350331004

Computation of the inner products is frequently used in machine learning (ML) algorithms apart from signal processing and communication applications. Distributed arithmetic (DA) has been frequently employed for area-time efficient inner-product implementations. In conventional DA-based architectures, one of the vectors is constant and known a priori. Hence, the traditional DA architectures are not suitable when both vectors are variable. However, computing the inner product of a pair of variable vectors is frequently used for matrix multiplication of various forms and convolutional neural networks. In this paper, we present a novel DA-based architecture for computing the inner product of variable vectors. To derive the proposed architecture, the inner product of any given length is decomposed into a set of short-length inner products, such that the inner product could be computed by successive accumulation of the results of short-length inner products. We have designed a DA-based architecture for the computation of the short-length inner-product of variable vectors and used that in successive clock cycles to compute the whole inner-product by successive accumulation. The post-layout synthesis results using Cadence Innovus with a GPDK 90nm technology library show that the proposed DA-based parallel architecture offers significant advantages in area-delay product and energy consumption over the bit-serial DA architecture.

关键词： Energy consumption Machine learning algorithms architecture Signal processing algorithms Machine learning Signal processing Vectors

来源：评论

学校读者我要写书评

暂无评论

High-efficiency Reconfigurable Crypto Accelerator Utilizing Innovative Resource Sharing and parallel Processing

High-efficiency Reconfigurable Crypto Accelerator Utilizing ...

引用

IEEE international symposium on Embedded Multicore Socs (MCSoC)

作者： Vu Trung Duong Le Hoai Luan Pham Thi Hong Tran Thi Sang Duong Yasuhiko Nakashima Nara Institute of Science and Technology Nara Japan Osaka Metropolitan University Osaka Japan

In decentralized IoT ecosystems, four cryptographic algorithms, including SHA256, BLAKE256, BLAKE2s, and Chacha20, are principal to ensure data integrity and confidentiality. However, existing cryptographic hardware is often limited to supporting a single algorithm and suffers from low performance, which falls short of meeting the diverse requirements of these systems. To address these limitations, we introduce a reconfigurable crypto accelerator (RCA) that offers high flexibility, superior performance, and optimal hardware efficiency. Our RCA includes three novel optimizations, specifically, a homogeneous multi-core architecture, a register-adder sharing approach, and a multilevel pipeline scheduler. The RCA was successfully verified and implemented at the system-on-chip level on a ZCU102 FPGA. The real-time performance evaluation of the RCA, during the execution of various cryptographic algorithms, demonstrates an energy efficiency ranging from 94.3-160.4 Mbps/W, which is 3.1-10.5 times higher compared to modern CPUs. Experiments conducted on several FPGAs show that the RCA is higher flexibility while still outperforming previous works by 1.63-31.65 times in throughput and 1.04-2.76 times in area efficiency. Furthermore, in ASIC synthesis, the RCA exhibits exceptional throughput (48.79-92.16 Gbps), area efficiency (66.2-102.31 Gbps/mm 2 ), and energy efficiency (186.22-287.8 Gbps/W), surpassing other related ASIC-based works.

关键词：

来源：评论

学校读者我要写书评

暂无评论

synthesis of Thinned Planar Arrays with Accurate Mutual Coupling Modeling

Synthesis of Thinned Planar Arrays with Accurate Mutual Coup...

引用

IEEE international symposium on Antennas and Propagation

作者： Lorenzo Poli A-Min Yao Erni Zhu Alessandro Polo Paolo Rocca Andrea Massa CNIT - “University of Trento” Research Unit Trento Italy Shanghai Huawei Technologies Co. Ltd. Shanghai P. R. China ELEDIA Research Center (ELEDIA@XIDIAN - Xidian University) Xi'an China ELEDIA Research Center (ELEDIA@UESTC - UESTC) Chengdu China ELEDIA Research Center (ELEDIA@TSINGHUA - Tsinghua University) Beijing China

ISBN: (纸本)9781728146713

The synthesis of thinned planar arrays of real radiating elements for 5G communications systems is addressed. A nature-inspired optimization strategy based on the Genetic Algorithm (GA) is employed for defining the simplified array architecture in order to reduce the number of transmit/receive modules and radio-frequency (RF) chains with respect to a fully-populated array architecture. For each iteration of the GA-based optimization process, the array pattern is efficiently calculated by considering a limited set of representative embedded element patterns which allow taking into account the mutual coupling phenomena and to obtain an accurate prediction of the real radiation performance. A representative numerical example is reported to validate the proposed approach.

关键词： Radio frequency Phased arrays Mutual coupling Meetings Planar arrays parallel processing Prediction algorithms

来源：评论

学校读者我要写书评

暂无评论

Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable parallel Reduction on GPUs 2019

Automatic Generation of Warp-Level Primitives and Atomic Ins...

引用

17th IEEE/ACM international symposium on Code Generation and Optimization (CGO)

作者： De Gonzalo, Simon Garcia Huang, Sitao Gomez-Luna, Juan Hammond, Simon Mutlu, Onur Hwu, Wen-mei UIUC CS & Coordinated Sci Lab Champaign IL 61820 USA UIUC ECE & Coordinated Sci Lab Champaign IL USA Swiss Fed Inst Technol Comp Sci Zurich Switzerland Sandia Natl Labs Scalable Comp Architecture Livermore CA 94550 USA

ISBN: (纸本)9781728114361

Since the advent of GPU computing, GPU hardware has evolved at a fast pace. Since application performance heavily depends on the latest hardware improvements, performance portability is extremely challenging for GPU application library developers. Portability becomes even more difficult when new low-level instructions are added to the ISA (e.g., warp shuffle instructions) or the microarchitectural support for existing instructions is improved (e.g., atomic instructions). Library developers, besides re-tuning the code for new hardware features, deal with the performance portability issue by hand-writing multiple algorithm versions that leverage different instruction sets and microarchitectures. High-level programming frameworks and Domain Specific Languages (DSLs) do not typically support low-level instructions (e.g., warp shuffle and atomic instructions), so it is painful or even impossible for these programming systems to take advantage of the latest architectural improvements. In this work, we design a new set of high-level APIs and qualifiers, as well as specialized Abstract Syntax Tree (AST) transformations for high-level programming languages and DSLs. Our transformations enable warp shuffle instructions and atomic instructions (on global and shared memories) to be easily generated. We show a practical implementation of these transformations by building on Tangram, a high-level kernel synthesis framework. Using our new language and compiler extensions, we implement parallel reduction, a fundamental building block used in a wide range of algorithms. parallel reduction is representative of the performance portability challenge, as its performance heavily depends on the latest hardware improvements. We compare our synthesized parallel reduction to another high-level programming framework and a hand-written high-performance library across three generations of GPU architectures, and show up to 7.8x speedup (2x on average) over hand-written code.

关键词： Graphics processing units Programming Computer architecture Libraries Hardware DSL Microarchitecture

来源：评论

学校读者我要写书评

暂无评论

15th international symposium on Applied Reconfigurable Computing, ARC 2019

15th International Symposium on Applied Reconfigurable Compu...

引用

15th international symposium on Applied Reconfigurable Computing, ARC 2019

ISBN: (纸本)9783030172268

The proceedings contain 28 papers. The special focus in this conference is on Applied Reconfigurable Computing. The topics include: Proof-Carrying Hardware Versus the Stealthy Malicious LUT Hardware Trojan;secure Local Configuration of Intellectual Property Without a Trusted Third Party;HiFlipVX: An Open Source High-Level synthesis FPGA Library for Image Processing;Real-Time FPGA Implementation of Connected Component Labelling for a 4K Video Stream;A Scalable FPGA-Based architecture for Depth Estimation in SLAM;Evaluating LULESH Kernels on OpenCL FPGA;The TaPaSCo Open-Source Toolflow for the Automated Composition of Task-Based parallel Reconfigurable Computing Systems;Graph-Based Code Restructuring Targeting HLS for FPGAs;UltraSynth: Integration of a CGRA into a Control Engineering Environment;exploiting Reconfigurable Vector Processing for Energy-Efficient Computation in 3D-Stacked Memories;Optimizing CNN-Based Hyperspectral Image Classification on FPGAs;Automatic Toolflow for VCGRA Generation to Enable CGRA Evaluation for Arithmetic algorithms;reM: A Reconfigurable Multipotent Cell for New Distributed Reconfigurable architectures;Update or Invalidate: Influence of Coherence Protocols on Configurable HW Accelerators;hybrid Prototyping for Manycore Design and Validation;Evaluation of FPGA Partitioning Schemes for Time and Space Sharing of Heterogeneous Tasks;Third Party CAD Tools for FPGA Design—A Survey of the Current Landscape;Filter-Wise Pruning Approach to FPGA Implementation of Fully Convolutional Network for Semantic Segmentation;Exploring Data Size to Run Convolutional Neural Networks in Low Density FPGAs;Faster Convolutional Neural Networks in Low Density FPGAs Using Block Pruning;Supporting Columnar In-memory Formats on FPGA: The Hardware Design of Fletcher for Apache Arrow;A Novel Encoder for TDCs.

关键词：

来源：评论

学校读者我要写书评

暂无评论

14th international symposium on Applied Reconfigurable Computing, ARC 2018

14th International Symposium on Applied Reconfigurable Compu...

引用

14th international symposium on Applied Reconfigurable Computing, ARC 2018

ISBN: (纸本)9783319788890

The proceedings contain 59 papers. The special focus in this conference is on Applied Reconfigurable Computing. The topics include: FPGA-based memory efficient shift-and algorithm for regular expression matching;Towards an optimized multi FPGA architecture with STDM network: A preliminary study;An FPGA/HMC-based accelerator for resolution proof checking;An efficient FPGA implementation of the big bang-big crunch optimization algorithm;ReneGENE-GI: Empowering precision genomics with FPGAs on HPCs;FPGA-based parallel pattern matching;embedded vision systems: A review of the literature;a survey of low power design techniques for last level caches;ISA-DTMR: Selective protection in configurable heterogeneous multicores;redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification;Analyzing AXI streaming interface for hardware acceleration in AP-SoC under soft errors;High performance UDP/IP 40Gb ethernet stack for FPGAs;tackling wireless sensor network heterogeneity through novel reconfigurable gateway approach;A low-power FPGA-based architecture for microphone arrays in wireless sensor networks;A hybrid FPGA trojan detection technique based-on combinatorial testing and on-chip sensing;honeyWiN: Novel honeycomb-based wireless NoC architecture in many-core era;Fast partial reconfiguration on SRAM-based FPGAs: A frame-driven routing approach;a dynamic partial reconfigurable overlay framework for python;Runtime adaptive cache for the LEON3 processor;exploiting partial reconfiguration on a dynamic coarse grained reconfigurable architecture;accuracy to throughput trade-offs for reduced precision neural networks on reconfigurable logic;DIM-VEX: Exploiting design time configurability and runtime reconfigurability;The use of HACP+SBT lossless compression in optimizing memory bandwidth requirement for hardware implementation of background modelling algorithms;A reconfigurable PID controller;High-level synthesis of software-defined MPSoCs.

关键词：

来源：评论

学校读者我要写书评

暂无评论

MOCHA: Morphable Locality and Compression Aware architecture for Convolutional Neural Networks

MOCHA: Morphable Locality and Compression Aware Architecture...

引用

international symposium on parallel and Distributed Processing (IPDPS)

作者： Syed Mohammad Asad Hassan Jafri Ahmed Hemani Kolin Paul Naeem Abbas Department of Electronics Royal Institute of Technology Stockholm Sweden Indian Institute of Technology Delhi India National University of Science and Technology Pakistan

Today, machine learning based on neural networks has become mainstream, in many application domains. A small subset of machine learning algorithms, called Convolutional Neural Networks (CNN), are considered as state-ofthe- art for many applications (e.g. video/audio classification). The main challenge in implementing the CNNs, in embedded systems, is their large computation, memory, and bandwidth requirements. To meet these demands, dedicated hardware accelerators have been proposed. Since memory is the major cost in CNNs, recent accelerators focus on reducing the memory accesses. In particular, they exploit data locality using either tiling, layer merging or intra/inter feature map parallelism to reduce the memory footprint. However, they lack the flexibility to interleave or cascade these optimizations. Moreover, most of the existing accelerators do not exploit compression that can simultaneously reduce memory requirements, increase the throughput, and enhance the energy efficiency. To tackle these limitations, we present a flexible accelerator called MOCHA. MOCHA has three features that differentiate it from the state-of-the-art: (i) the ability to compress input/ kernels, (ii) the flexibility to interleave various optimizations, and (iii) intelligence to automatically interleave and cascade the optimizations, depending on the dimension of a specific CNN layer and available resources. Post layout synthesis results reveal that MOCHA provides up to 63% higher energy efficiency, up to 42% higher throughput, and up to 30% less storage, compared to the next best accelerator, at the cost of 26-35% additional area.

关键词： Kernel Convolution Encoding Neural networks parallel processing Throughput Random access memory

来源：评论

学校读者我要写书评

暂无评论

没有更多数据了...

全选清除本页清除全部题录导出标记到“检索档案”

共17页 << < 1 2 3 4 5 6 7 8 9 10 > >>

检索报告对象比较合并检索0

隐藏清空

合并搜索

回到顶部

执行限定条件

内容：

评分：

请选择保存的检索档案：

请选择收藏分类：

订阅名称：

通借通还

温馨提示：

图书名称：

借书校区：

取书校区：

手机号码：

邮箱地址：

一卡通帐号：

电话和邮箱必须正确填写，我们会与您联系确认。

联系人：

所在院系：

联系邮箱：

联系电话：

内蒙古自治区呼和浩特市赛罕区大学西街235号邮编: 010021

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：