检索结果-内蒙古大学图书馆

IEEE conference on high performance Extreme Computing (HPEC)

作者： Arnab A Purkayastha Sai Raghavendran Jhanani Thiagarajan Hamed Tabkhi Department of Electrical and Computer Engineering University of North Carolina Charlotte (UNC Charlotte) USA

OpenCL programming ability combined with OpenCL high-Level Synthesis (OpenCL-HLS) tools have made tremendous improvements in the reconfigurable computing field. FPGAs inherent pipelined parallelism capability provides not only faster execution times but also power-efficient solutions when executing massively parallel applications. A major execution bottleneck affecting FPGA performance is the high number of memory stalls exposed to pipelined data-path that hinders the benefits of data-path *** paper explores the efficiency of “OpenCL Pipe” to hide memory access latency on cloud FPGAs by decoupling memory access from computation. The Pipe semantic is leveraged to split OpenCL kernels into “read”, “compute” and “write back” sub-kernels which work concurrently to overlap the computation of current threads with the memory access of future threads. For evaluation, we use a mix of seven massively parallel high-performance applications from the Rodinia suite vs. 3.1. All our tests are conducted on the Xilinx VU9FP FPGA platform of Amazon cloud-based AWS EC2 F1 instance. On average, we observe 5.2x speedup with a 2.2x increase in memory bandwidth utilization with about 2.5x increase in FPGA resource utilization over the baseline synthesis (Xilinx OpenCL-HLS). 11 This work has been funded and supported by the Xilinx University program (XUP)..

关键词： Field programmable gate arrays Kernel Instruction sets Parallel processing Semantics Bandwidth Runtime

来源：评论

学校读者我要写书评

暂无评论

Design of a Reconfigurable 3D Pixel-Parallel Neuromorphic Architecture for Smart Image Sensor 31

Design of a Reconfigurable 3D Pixel-Parallel Neuromorphic Ar...

引用

IEEE/CVF conference on computer Vision and Pattern Recognition (CVPR)

作者： Bhowmik, Pankaj Pantho, Jubaer Hossain Asadinia, Marjan Bobda, Christophe Univ Arkansas Fayetteville AR 72701 USA

ISBN: (数字)9781538661000

ISBN: (纸本)9781538661000

Power reduction and speed-up of image processing algorithms remain of high interest as image resolutions continue to increase. Neuromorphic-circuits are inspired by the nervous system aiming to reduce power consumption and speed-up. This paper presents a neuromorphic smart image sensor designed by the pixel-parallel 3D hierarchical architecture with an on-chip attention module. The module dynamically detects regions with relevant information and produces a feedback path to sample those regions at high speed. On the other hand, by sampling non-relevant regions with a low-speed, the sensor can reduce redundancy and enable high-performance computing by ensuring low-power operation. The image sensor is comprised of several hierarchical planes and each plane has small and independent reconfigurable computational units (XPU). In each plane, all XPUs operate in parallel with a different operating speed which gives a pixel-parallel architecture. When the raw image passes through the hierarchical planes, necessary image processing algorithms are performed in parallel on different planes at a variable clock rate for saving power and reducing redundancy. The goal of this work is to prototype the focal plane image sensor which emulates the brain features. The results show that the prototype achieves remarkable power saving and speed-up at different stages.

关键词： computer architecture Image sensors Visualization program processors Clocks Three-dimensional displays Image processing

来源：评论

学校读者我要写书评

暂无评论

Grading Prenatal Hydronephrosis from Ultrasound Imaging using Deep Convolutional Neural Networks 15

Grading Prenatal Hydronephrosis from Ultrasound Imaging usin...

引用

15th conference on computer and Robot Vision (CRV)

作者： Dhindsa, Kiret Smail, Lauren C. McGrath, Melissa Braga, Luis H. Becker, Suzanna Sonnadara, Ranil R. McMaster Univ Res & High Performance Comp Hamilton ON Canada Vector Inst Toronto ON Canada McMaster Univ Dept Psychol Neurosci & Behav Hamilton ON Canada McMaster Univ Dept Surg McMaster Pediat Surg Res Collaborat Hamilton ON Canada McMaster Univ Dept Clin Epidemiol & Biostat McMaster Pediat Surg Res Collaborat Dept SurgDiv Urol Hamilton ON Canada McMaster Univ Dept Surg Hamilton ON Canada Univ Toronto Surg Skills Ctr Mt Sinai Hosp Toronto ON Canada

ISBN: (纸本)9781538664810

We evaluate the performance of a Deep Convolutional Neural Network in grading the severity of prenatal hydronephrosis (PHN), one of the most common congenital urological anomalies, from renal ultrasound images. We present results on a variety of classification tasks based on clinically defined grades of severity, including predictions of whether or not an ultrasound image represents a case that is at high risk for further complications requiring surgical intervention with approximately 80% accuracy. The prediction rates obtained by the model are well beyond the rates of agreement among trained clinicians, suggesting that this work can lead to a useful diagnostic aid.

关键词： Deep learning machine learning ultrasound convolutional neural networks medical imaging diagnostic imaging hydronephrosis computer vision

来源：评论

学校读者我要写书评

暂无评论

GPU-based Polynomial Finite Element Matrix Assembly for Simplex Meshes

引用

computer GRAPHICS FORUM 2018年第7期37卷 443-454页

作者： Mueller-Roemer, J. S. Stork, A. TU Darmstadt & Fraunhofer IGD Darmstadt Germany

In this paper, we present a matrix assembly technique for arbitrary polynomial order finite element simulations on simplex meshes for graphics processing units (GPU). Compared to the current state of the art in GPU-based matrix assembly, we avoid the need for an intermediate sparse matrix and perform assembly directly into the final, GPU-optimized data structure. Thereby, we avoid the resulting 180% to 600% memory overhead, depending on polynomial order, and associated allocation time, while simplifying the assembly code and using a more compact mesh representation. We compare our method with existing algorithms and demonstrate significant speedups.

关键词： CCS Concepts •Computing methodologies → Massively parallel and high‐performance simulations Massively parallel algorithms Physical simulation Graphics processors •Mathematics of computing → Combinatoric problems

来源：评论

学校读者我要写书评

暂无评论

Financial Quantitative Big Data Platform Based on high performance Computing

Financial Quantitative Big Data Platform Based on High Perfo...

引用

IEEE International conference on Computational Science and Engineering, CSE

作者： Yongze Sun Zhonghua Lu University of Chinese Academy of Sciences Chinese Academy of Sciences Beijing China Computer Network Information Center Chinese Academy of Sciences

A big data platform to for financial quantitative data is designed and implemented on HPC system. Key technologies including data storage mechanism and distributed computing framework are resolved. Based on the platform, several important feature for financial quantitative strategy research is developed, which are indicator computing, large scale backtest and distributed hyperparameter tuning. Tests shows that the platform can achieve much higher performance than single PC program, and can be used to design strategy base on large scale financial data.

关键词： Sparks Task analysis Big Data Tuning Supercomputers Distributed databases

来源：评论

学校读者我要写书评

暂无评论

Mobiliti: Scalable Transportation Simulation Using high-performance Parallel Computing 21

Mobiliti: Scalable Transportation Simulation Using High-Perf...

引用

21st IEEE International conference on Intelligent Transportation Systems (ITSC)

作者： Chan, Cy Wang, Bin Bachan, John Macfarlane, Jane Lawrence Berkeley Natl Lab Berkeley CA 94720 USA Univ Calif Berkeley Berkeley CA 94720 USA

ISBN: (纸本)9781728103235

Transportation systems are becoming increasingly complex with the evolution of emerging technologies, including deeper connectivity and automation, which will require more advanced control mechanisms for efficient operation (in terms of energy, mobility, and productivity). Stakeholders, including government agencies, industry, and local populations, all have an interest in efficient outcomes, yet there are few tools for developing a holistic understanding of urban dynamics. Simulating large-scale, high-fidelity transportation systems can help, but remains a challenging task, due to the computational demand of processing massive numbers of events and the nonlinear interactions between system components and traveling agents. In this paper, we introduce Mobiliti, a proof-of-concept, scalable transportation system simulator that implements parallel discrete event simulation on high-performance computers. We instantiated millions of nodes, links, and agents to simulate the movement of the population through the San Francisco Bay Area road network and provide estimates of the associated congestion, energy usage, and productivity loss. Our preliminary results show excellent scalability on multiple compute nodes for statically-routed agents, simulating 9.5 million trip legs over a road network with 1.1 million nodes and 2.2 million links, processing 2.4 billion events in less than 30 seconds using 1,024 cores on NERSC's Cori computer.

关键词： large-scale transportation simulation agent-based modeling high-performance computing parallel discrete event simulation

来源：评论

学校读者我要写书评

暂无评论

Visually Analyzing A Billion Tweets: An Application for Collaborative Visual Analytics on Large high-Resolution Display

Visually Analyzing A Billion Tweets: An Application for Coll...

引用

IEEE International conference on Big Data (Big Data)

作者： Su, Simon An, Michael Perry, Vincent Jia, Jianfeng Kim, Taewoo Chen, Te-Yu Li, Chen US Army Res Lab Adelphi MD 20783 USA Parsons Corp Centreville VA 20120 USA Univ Calif Irvine Irvine CA USA

ISBN: (纸本)9781538650356

We present a ParaViewWeb based visual analytics application running on large high-resolution display supporting standard mouse and keyboard interaction. The application relies on SAGE2 for user interaction and multi-display visualization. We also employ a scalable middleware system called "Cloudberry" that allows users to interactively query and analyze large amounts of temporal and spatial data stored on a back end Apache AsterixDB store to enable big data analytics and interactive visualization. Our Visual Analyzing Billion Tweets application shows interactive query and visualization of result from over a billion twitter feeds streamed in real-time to the back end Apache AsterixDB. In our setup, we ran the visual analytics application on a large high-resolution display with a 24-tiled display in a 6 x 4 configuration. We also run a comparative study of the application running on a single 24 inch display and the 24-tiled display with some very interesting findings supporting the benefit of using large high-resolution display for visual analytics.

关键词： H.1.m [Information Systems]: MODELS AND PRINCIPLES-Miscellaneous

来源：评论

学校读者我要写书评

暂无评论

Eca-Router : On Achieving Endpoint Congestion Aware Switch Allocation in the On-Chip Network 36

Eca-Router : On Achieving Endpoint Congestion Aware Switch A...

引用

36th IEEE International conference on computer Design (ICCD)

作者： Li, Cunlu Dong, Dezun Liao, Xiangke Natl Univ Def Technol Natl Lab Parallel & Distributed Proc Collaborat Innovat Ctr High Performance Comp Coll Comp Changsha 410073 Hunan Peoples R China

ISBN: (纸本)9781538684771

As the critical pipeline stage in on-chip routers, switch allocation assigns output ports to input ports and allow flits transiting through the switch without conflicts. Previous works strive to design efficient switch allocaiton strategies by maximizing the matching at each cycle, with the information from the current cycle or multiple cycles in time series. However, those works have not taken endpoint congestion into considerations. Tree-saturation, caused by endpoint congestion, can degrade NoC performance due to the congestion fanning out from the original point to upstream routers. In this paper, a novel router design, Eca-Router, is proposed to relieve the impact of endpoint congestion by switch allocation optimization. Eca-Router detects endpoint congestion by recording the destinations of packets in switch allocation. Endpoint congestion is decided in switch allocation once there are multiple input ports competing for the same output port and the packets in these input ports contain the same destination. During switch allocation, requests that contribute to endpoint congestion will be given lower priority to be allocated, and starvation control is also introduced to ensure allocation fairness. Evaluation results show that Eca-Router is efficient in reducing packet latency.

关键词： Switches Resource management Routing System-on-chip Pipelines Adaptive systems Microarchitecture

来源：评论

学校读者我要写书评

暂无评论

performance Analysis of DC-SSK Scheme and Its Power Allocation in VLC System

Performance Analysis of DC-SSK Scheme and Its Power Allocati...

引用

International conference on Computing, Networking and Communications (ICNC)

作者： Zhang, Qi Bai, Zhiquan Zhang, Na Sun, Shangqian Kwak, Kyung Sup Shandong Univ Sch Informat Sci & Engn Jinan Shandong Peoples R China Shandong Univ Sch Phys Jinan Shandong Peoples R China INHA Univ Grad Sch Informat Technol & Telecommun Incheon South Korea

ISBN: (纸本)9781538636527

In this paper, direct-code generalized space shift keying (DC-SSK) has been studied for indoor visible light communication (VLC) system with the purpose to improve the system transmission rate. Symbol error rate (SER) of the DC-SSK scheme in VLC system has been derived based on the maximal likelihood (ML) detection and low complexity power allocation scheme has been further presented to enhance the system performance over high correlation optical channel. Simulation results and theoretical analysis show the proposed DC-SSK scheme with low complexity power allocation can achieve better spectral efficiency and SER performance of VLC system. Effect of the different semiangles of the LEDs at half-power on the system performance has also been analyzed.

关键词： direct-code generalized space shift keying (DC-SSK) spatial modulation (SM) maximal likelihood (ML) symbol error rate (SER)

来源：评论

学校读者我要写书评

暂无评论

Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-grained Image Recognition

Looking for the Devil in the Details: Learning Trilinear Att...

引用

IEEE/CVF conference on computer Vision and Pattern Recognition

作者： Heliang Zheng Jianlong Fu Zheng-Jun Zha Jiebo Luo University of Science and Technology of China Microsoft Research University of Rochester

ISBN: (纸本)9781728132945

Learning subtle yet discriminative features (e.g., beak and eyes for a bird) plays a significant role in fine-grained image recognition. Existing attention-based approaches localize and amplify significant parts to learn fine-grained details, which often suffer from a limited number of parts and heavy computational cost. In this paper, we propose to learn such fine-grained features from hundreds of part proposals by Trilinear Attention Sampling Network (TASN) in an efficient teacher-student manner. Specifically, TASN consists of 1) a trilinear attention module, which generates attention maps by modeling the inter-channel relationships, 2) an attention-based sampler which highlights attended-parts with high resolution, and 3) a feature distiller, which distills part features into an object-level feature by weight sharing and feature preserving strategies. Extensive experiments verify that TASN yields the best performance under the same settings with the most competitive approaches, in iNaturalist-2017, CUB-Bird, and Stanford-Cars datasets.

关键词： Image recognition sampling network background program Beak optimum performance high definition Learning Avifauna

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：