检索结果-内蒙古大学图书馆

35th IEEE International parallel and distributed processing Symposium (IPDPS)

作者： West, Brendan L. Fessler, Jeffrey A. Wenisch, Thomas F. Univ Michigan Ann Arbor MI 48109 USA

ISBN: (纸本)9781665440660

The Fast Fourier Transform (FFT) is a fundamental algorithm in signal processing;significant efforts have been made to improve its performance using software optimizations and specialized hardware accelerators. Computational imaging modalities, such as MRI, often rely on the Non-uniform Fast Fourier Transform (NuFFT), a variant ()I' the EFT for processing data acquired from non-uniform sampling patterns. The most time-consuming step of the NuFFT algorithm is "gridding," wherein non-uniform samples are interpolated to allow a uniform FFT to be computed over the data. Each non-uniform sample affects a window of non-contiguous memory locations, resulting in poor cache and memory bandwidth utilization. As a result, gridding can account for more than 99.6% of the NuFFT computation time, while the FFT requires less than 0.4%. We present Slice-and-Dice, a novel approach to the NuFFT's gridding step that eliminates the presorting operations required by prior methods and maps more efficiently to hardware. Our GPI! implementation achieves gridding speedups of over 250x and 16x vs prior ***-art CPU and GPU implementations, respectively. We achieve further speedup and energy efficiency gains by implementing Slice-and-Dice in hardware with JIGSAW, a streaming hardware accelerator for non-uniform data gridding. JIGSAW uses stall-free fixed-point pipelines to process M non-uniform samples in approximately M cycles, irrespective of sampling pattern-yielding speedups of over 1500x the CPU baseline and 36x the state-of-the-art GPU implementation, consuming similar to 200 mW power and similar to 12 mm(2) area in 16 nm technology. Slice-and-Dice GPU and JIGSAW ASIC implementations achieve unprecedented end-to-end NuFFT speedups of 8x and 36x compared to the state-of-the-art GPU implementation, respectively.

关键词： Runtime Power demand Fast Fourier transforms Magnetic resonance imaging Software algorithms Graphics processing units Signal processing algorithms

来源：评论

学校读者我要写书评

暂无评论

SDCRKL-GP: Scalable deep convolutional random kernel learning in gaussian process for image recognition

引用

NEUROCOMPUTING 2021年 456卷 288-298页

作者： Wang, Tingting Xu, Lei Li, Junbao Harbin Inst Technol Sch Elect & Informat Engn Harbin 150001 Heilongjiang Peoples R China

Deep convolutional neural networks have shown great potential in image recognition tasks. However, the fact that the mechanism of deep learning is difficult to explain hinders its development. It involves a large amount of parameter learning, which results in high computational complexity. Moreover, deep convolutional neural networks are often limited by overfitting in regimes in which the number of training samples is limited. Conversely, kernel learning methods have a clear mathematical theory, fewer parameters, and can contend with small sample sizes;however, they are not able to handle high-dimensional data, e.g., images. It is important to achieve a performance and complexity trade-off in complicated tasks. In this paper, we propose a novel scalable deep convolutional random kernel learning in Gaussian process architecture called SDCRKL-GP, which is characterized by excellent performance and low complexity. First, we successfully incorporated the deep convolutional architecture into kernel learning by implementing the random Fourier feature transform for Gaussian processes, which can effectively capture hierarchical and local image-level features. This approach enabled the kernel method to effectively handle image processing problems. Second, we optimized the parameters of deep convolutional filters and Gaussian kernels by stochastic variational inference. Then, we derived the lower variational bound of the marginal likelihood. Finally, we explored the model architecture design space selection method to determine the appropriate network architecture for different datasets. The design space consists of the number of layers, the channels per layer, and so on. Different design space selections improved the scalability of the SDCRKL-GP architecture. We evaluated SDCRKL-GP on the MNIST, FMNIST, CIFAR10, and CALTECH4 benchmark datasets. Taking MNIST as an example, the error rate of classification is 0.60%, and the number of parameters, number of computations and memo

关键词： Kernel method Gaussian process Random Fourier feature Convolutions Network design space search

来源：评论

学校读者我要写书评

暂无评论

Initial Results on the Effectiveness of distributed Edge Detection for Large image Data Sets

Initial Results on the Effectiveness of Distributed Edge Det...

引用

IEEE Southeastcon

作者： ivanescu Constantin Renato Mocanu Mihai University of Craiova Craiova Romania

image data is expanding rapidly, along with technology development, so efficient solutions must be considered to achieve high, real-time performance in the case of processing large image datasets. parallel processing is increasingly used as an attractive alternative to improve the performance, when using existing distributed architectures but also for sequential commodity computers. It can provide speedup, efficiency, reliability, incremental growth, and flexibility. We present such an alternative and stress the effectiveness of the methods to accelerate computations on a small cluster of PCs compared to a single CPU. Our paper is focused on applying edge detection on large image data sets, as a fundamental and challenging task in image processing and computer vision. Five different techniques, mainly Sobel, Prewitt, LoG, Canny, and Roberts, are compared in a simple experimental setup that includes the OpenCV library functions for image pixels manipulation. Gaussian blur is used to reduce high-frequency components to manage the noise that edge detection is impacted by. Overall, this work is part of a more extensive investigation of image segmentation methods on large image datasets, but the results presented are relevant and show the effectiveness of our approach.

关键词： Computers image segmentation image edge detection distributed databases parallel processing Real-time systems Libraries

来源：评论

学校读者我要写书评

暂无评论

distributed Deep Learning-based Model for Large image Data Classification 23

Distributed Deep Learning-based Model for Large Image Data C...

引用

Proceedings of the 7th International Conference on Future Networks and distributed Systems

作者： M Faisal Nurnoby Khalid A. Abu Shawarib Ayaz Ul Hassan Khan SDAIA-KFUPM Joint Research Center for Artificial Intelligence King Fahd University of Petroleum and Minerals Saudi Arabia

ISBN: (纸本)9798400709036

Artificial intelligence has shown great potential in a variety of applications, from natural language models to audio visual recognition, classification, and manipulation. AI Researchers have to work with massive amount of collected data for use in machine learning, raising some challenges in effectively managing and utilizing the collected data in the training phase to develop and iterate on more accurate, and more generalized models. In this paper we conducted a review on parallel and distributed machine learning methods and challenges. We also propose a distributed and scalable deep learning model architecture which can span across multiple processing nodes. We tested the model on the MIT Indoor dataset, to evaluate the performance and scalability of the model using multiple hardware nodes, and showed the scaling characteristics of the different model using different model sizes. We find that distributed training is 80% faster using 2 GPUs than 1 GPU. We also find that the model keeps the benefits of distributed training such as speed and accuracy regardless of its size or training batch size.

关键词： Deep Learning

来源：评论

学校读者我要写书评

暂无评论

Generic parallel data structures and algorithms to GPU superpixel image segmentation

引用

DISPLAYS 2022年 74卷

作者： Mansouri, Abdelkhalek Creput, Jean-Charles Qiao, Wen-Bao Univ Bourgogne Franche Comte UTBM CIAD UMR 7533 F-90010 Belfort France Beijing Informat Sci & Technol Univ Comp Sch Beijing 100101 Peoples R China

We propose parallel implementation on GPU (graphics processing unit) system for some generic algorithms applied to superpixel image segmentation problem. The aim is to provide standard algorithms based on generic decentralized data structures that could be easily improved and customized on many optimization problems on parallel platforms. Note that superpixel segmentation are clustering algorithms applied to image processing. Two types of algorithms are presented and implemented on GPU based on common parallel data structures. Firstly, we present a parallel implementation of the well-known k-means algorithm with application to 3D data. It is based on a cellular grid subdivision of space that allows closest point findings in constant optimal time for bounded distributions. Secondly, we present an application of the parallel Boruvka minimum spanning forest algorithm to compute watershed segmentation. Both techniques are fully executed on GPU and share the same data structures that embed disjoint-set-trees and distributed-link-lists. We evaluate our k-means approach with regards to state-of-the-art methods, that are, the well known SLIC algorithm, and the adaptive segmentation approach SPASM. We argue that our implementation has the shortest execution time among the tested methods, with a near real time performance and quasi linear acceleration factor, while it provides more regular shape superpixel segmentation based on hexagonal tessellation. Watershed minimum spanning forest method is presented and evaluated accordingly to the same experimental framework.

关键词： k-means Minimum spanning forest Watershed computation parallel computing GPU Superpixel segmentation

来源：评论

学校读者我要写书评

暂无评论

Single Node Acceleration of Generative Adversarial Networks using HPC for image Analytics 5

Single Node Acceleration of Generative Adversarial Networks ...

引用

5th International Conference on Computational Intelligence and Intelligent Systems, CIIS 2022

作者： Ravikumar, Aswathy Sriraman, Harini School of Computer Science and Engineering Vellore Institute of Technology Chennai600127 India

ISBN: (纸本)9781450397612

Generative Adversarial Networks (GAN) are approaches that are utilized for data augmentation, which facilitates the development of more accurate detection models for unusual or unbalanced datasets. Computer-assisted diagnostic methods may be made more reliable by using synthetic pictures generated by GAN. Generative adversarial networks are challenging to train because too unpredictable training dynamics may occur throughout the learning process, such as model collapse and vanishing gradients. For accurate and faster results the GAN network need to trained in parallel and distributed manner. We enhance the speed and precision of the Deep Convolutional Generative Adversarial Networks (DCGAN) architecture by using its parallelism and executing it on High-Performance Computing platforms. The effective analysis of a DCGAN in Graphic processing Unit and Tensor processing Unit platforms in which each layer execution pattern is analyzed. The bottleneck is identified for the GAN structure for each execution platforms. The Central processing Unit is capable of processing neural network models, but it requires a great deal of time to do it. Graphic processing Unit in contrast, side, are a hundred times quicker than CPUs for Neural Networks, however, they are prohibitively expensive compared to CPUs. Using the systolic array structure, TPU performs well on neural networks with high batch sizes but in GAN the shift between CPU and TPU is huge so it does not perform well. © 2022 ACM.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Patronus: Countering Model Poisoning Attacks in Edge distributed DNN Training

Patronus: Countering Model Poisoning Attacks in Edge Distrib...

引用

IEEE Conference on Wireless Communications and Networking

作者： Zhonghui Wu Changqiao Xu Mu Wang Yunxiao Ma Zhongrui Wu Zhenyu Xiahou Luigi Alfredo Grieco State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China School of Information Technology Zhejiang Financial College Zhejiang China Jiangxi Academy of Cyber Security Jiangxi China Department of Electrical and Information Engineering and CNIT Bari Italy

ISBN: (数字)9798350303582

ISBN: (纸本)9798350303599

As Deep Neural Networks (DNNs) are evolving in complexity to meet the demands of novel applications, a single device becomes insufficient for training, leading to the emergence of distributed DNN training. However, this evolution exposes a gap in research surrounding security vulnerabilities on model poisoning attacks, especially in model parallel setups, an area that has been scarcely studied. To bridge this gap, we introduce Patronus, an approach that counters model poisoning attacks in distributed DNN training, accommodating both data and model parallelism. With the employment of Loss-aware Credit Evaluation, Patronus scores each participating client. Based on the continuously updated credit, malicious clients are isolated and detected after multiple epochs by Shuffling-based Isolation Mechanism. Additionally, the training system is reinforced by Byzantine Fault-tolerant Aggregation to minimize malicious client impacts. Comprehensive experiments confirm Patronus's superior reliable and efficient performance over the existing methods under attack scenarios.

关键词： Training Fault tolerance image edge detection Fault tolerant systems distributed databases Artificial neural networks parallel processing

来源：评论

学校读者我要写书评

暂无评论

Dynamic Checkpointing for Heterogeneous IoT Devices Through Self-Referencing

Dynamic Checkpointing for Heterogeneous IoT Devices Through ...

引用

International Symposium on parallel and distributed processing with Applications, ISPA

作者： Nan Wang Ziyi Wang Lijun Lu Zhiyuan Ma Qun Chao School of Information Science and Engineering East China University of Science and Technology Shanghai China Institute of Machine Intelligence University of Shanghai for Science and Technology Shanghai China School of Mechanical Engineering Shanghai Jiao Tong University Shanghai China

ISBN: (数字)9798331509712

ISBN: (纸本)9798331509729

Failure recovery is one of the most essential problems in Internet of Things (IoT) systems, and the conventional snapshot method is an effective way to solve this problem. However, snapshot methods lack specialized designs for heterogeneous IoT devices, and when implemented in edge devices, serious system interruptions occur and performance is impacted. To address these problems, a dynamic checkpointing strategy is proposed for IoT systems that consist of heterogeneous devices. Firstly, an anomaly detection network for snapshots (i.e., ADSnet) that combines long short-term memory networks with multilayer convolutional networks is used to learn the multidimensional features of system resource usage. Secondly, ADSnet is tuned during deployment to learn the behaviors of target devices, so that ADSnet can report the anomalies of target devices in the near future. Finally, a dynamic checkpointing strategy is proposed to dynamically create snapshots on the basis of the anomaly detection results. The experimental results show that the proposed ADSnet achieves 97.73% accuracy in detecting anomalies in the target device; furthermore, our proposed dynamic checkpointing strategy reduces 25.4% snapshots than that created by the recently proposed ResCheck.

关键词： Checkpointing Performance evaluation Accuracy Design methodology image edge detection Nonhomogeneous media Feature extraction Internet of Things Anomaly detection Long short term memory

来源：评论

学校读者我要写书评

暂无评论

THE ARIA-S1-GUNW: THE ARIA SENTINEL-1 GEOCODED UNWRAPPED PHASE PRODUCT FOR OPEN INSAR SCIENCE AND DISASTER RESPONSE

THE ARIA-S1-GUNW: THE ARIA SENTINEL-1 GEOCODED UNWRAPPED PHA...

引用

IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

作者： Bekaert, David Arena, Nicholas Bato, M. Grace Buzzanga, Brett Govorcin, Marin Havazli, Emre Hogenson, Kirk Hua, Hook Johnston, Andrew Karim, Mohammed Kennedy, Joseph H. Lu, Zhong Marshak, Charlie Z. Meyer, Franz Owen, Susan Sangha, Simran Short, Gregory Wang, Jiahui Zinke, Robert CALTECH Jet Prop Lab Pasadena CA 91109 USA NASA Distributed Active Arch Ctr Alaska Satellite Facil Fairbanks AK 99775 USA Univ Alaska Fairbanks Geophys Inst Fairbanks AK 99775 USA Southern Methodist Univ Dept Earth Sci Dallas TX 75205 USA

ISBN: (纸本)9798350320107

NASA has committed to open-source science that enables Earth observation data transparency, inclusivity, accessibility, and reproducibility - all fundamental to the pace and quality of scientific progress. We have embraced this vision by producing standard InSAR science products that are freely available to the public through NASA Data Active Archive Centers (DAACs) and are generated using state-of-the-art open-source and openly-developed methods. The Advanced Rapid image Analysis (ARIA) project's Sentinel-1 Geocoded Unwrapped Phase product (ARIA-S1-GUNW) is a 90 meter InSAR product that spans major, land-based fault systems, the US Coasts, and active volcanic regions through the complete Sentinel-1 record. The products enable the measurement of centimeter-scale surface displacement with applications across the solid earth, hydrology, and sea-level disciplines. The ARIA-S1-GUNW also enables rapid response mapping of surface motion after earthquakes, landslides, and subsidence. The ARIA-S1-GUNW products are freely available through the Alaska Satellite Facility (ASF) DAAC. In the last year, we have successfully grown the archive to over 1.1 million products, a 6 fold increase, through NASA ACCESS by improving our processing workflow and leveraging HyP3, an AWS-based cloud processing environment. We are continuing to partner with researchers to generate more products over relevant areas of scientific interest. All the processing software and cloud infrastructure are open-source to ensure reproducibility and enable other scientists to modify, improve upon, and scale their own cloud workflows for related InSAR analyses. We have, in parallel, developed and supported open-source, well-documented tools to further streamline time-series analysis from the ARIA-S1-GUNW into deformation analysis workflows.

关键词： InSAR Open Science Surface Deformation Disaster Response

来源：评论

学校读者我要写书评

暂无评论

3D Perception With Slanted Stixels on GPU

引用

IEEE TRANSACTIONS ON parallel AND distributed SYSTEMS 2021年第10期32卷 2434-2447页

作者： Hernandez-Juarez, Daniel Espinosa, Antonio Vazquez, David Lopez, Antonio M. Moure, Juan C. SLAMcore Ltd London SE1 1JA England Univ Autonoma Barcelona Comp Architecture & Operating Syst Dept Bellaterra 08193 Spain Univ Autonoma Barcelona Bellaterra 08193 Spain Element AI Montreal PQ H2S 3G9 Canada Univ Autonoma Barcelona Comp Vis Ctr Bellaterra 08193 Spain

This article presents a GPU-accelerated software design of the recently proposed model of Slanted Stixels, which represents the geometric and semantic information of a scene in a compact and accurate way. We reformulate the measurement depth model to reduce the computational complexity of the algorithm, relying on the confidence of the depth estimation and the identification of invalid values to handle outliers. The proposed massively parallel scheme and data layout for the irregular computation pattern that corresponds to a Dynamic Programming paradigm is described and carefully analyzed in performance terms. Performance is shown to scale gracefully on current generation embedded GPUs. We assess the proposed methods in terms of semantic and geometric accuracy as well as run-time performance on three publicly available benchmark datasets. Our approach achieves real-time performance with high accuracy for 2048 x 1024 image sizes and 4 x 4 Stixel resolution on the low-power embedded GPU of an NVIDIA Tegra Xavier.

关键词： Graphics processing units Semantics Computational modeling Real-time systems Mathematical model image segmentation Indexes Stereo vision stixel world autonomous vehicles scene understanding computer vision embedded systems GPU acceleration

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：