检索结果-内蒙古大学图书馆

SDCRKL-GP: Scalable deep convolutional random kernel learning in gaussian process for image recognition

NEUROCOMPUTING 2021年 456卷 288-298页

作者： Wang, Tingting Xu, Lei Li, Junbao Harbin Inst Technol Sch Elect & Informat Engn Harbin 150001 Heilongjiang Peoples R China

Deep convolutional neural networks have shown great potential in image recognition tasks. However, the fact that the mechanism of deep learning is difficult to explain hinders its development. It involves a large amount of parameter learning, which results in high computational complexity. Moreover, deep convolutional neural networks are often limited by overfitting in regimes in which the number of training samples is limited. Conversely, kernel learning methods have a clear mathematical theory, fewer parameters, and can contend with small sample sizes;however, they are not able to handle high-dimensional data, e.g., images. It is important to achieve a performance and complexity trade-off in complicated tasks. In this paper, we propose a novel scalable deep convolutional random kernel learning in Gaussian process architecture called SDCRKL-GP, which is characterized by excellent performance and low complexity. First, we successfully incorporated the deep convolutional architecture into kernel learning by implementing the random Fourier feature transform for Gaussian processes, which can effectively capture hierarchical and local image-level features. This approach enabled the kernel method to effectively handle image processing problems. Second, we optimized the parameters of deep convolutional filters and Gaussian kernels by stochastic variational inference. Then, we derived the lower variational bound of the marginal likelihood. Finally, we explored the model architecture design space selection method to determine the appropriate network architecture for different datasets. The design space consists of the number of layers, the channels per layer, and so on. Different design space selections improved the scalability of the SDCRKL-GP architecture. We evaluated SDCRKL-GP on the MNIST, FMNIST, CIFAR10, and CALTECH4 benchmark datasets. Taking MNIST as an example, the error rate of classification is 0.60%, and the number of parameters, number of computations and memo

关键词： Kernel method Gaussian process Random Fourier feature Convolutions Network design space search

来源：评论

学校读者我要写书评

暂无评论

Initial Results on the Effectiveness of distributed Edge Detection for Large image Data Sets

Initial Results on the Effectiveness of Distributed Edge Det...

引用

IEEE Southeastcon

作者： Ivanescu Constantin Renato Mocanu Mihai University of Craiova Craiova Romania

image data is expanding rapidly, along with technology development, so efficient solutions must be considered to achieve high, real-time performance in the case of processing large image datasets. parallel processing is increasingly used as an attractive alternative to improve the performance, when using existing distributed architectures but also for sequential commodity computers. It can provide speedup, efficiency, reliability, incremental growth, and flexibility. We present such an alternative and stress the effectiveness of the methods to accelerate computations on a small cluster of PCs compared to a single CPU. Our paper is focused on applying edge detection on large image data sets, as a fundamental and challenging task in image processing and computer vision. Five different techniques, mainly Sobel, Prewitt, LoG, Canny, and Roberts, are compared in a simple experimental setup that includes the OpenCV library functions for image pixels manipulation. Gaussian blur is used to reduce high-frequency components to manage the noise that edge detection is impacted by. Overall, this work is part of a more extensive investigation of image segmentation methods on large image datasets, but the results presented are relevant and show the effectiveness of our approach.

关键词： Computers image segmentation image edge detection distributed databases parallel processing Real-time systems Libraries

来源：评论

学校读者我要写书评

暂无评论

parallelization of the Array Method Using OpenMP 20th

Parallelization of the Array Method Using OpenMP

引用

20th Mexican International Conference on Artificial Intelligence (MICAI)

作者： Velarde Martinez, Apolinar Inst Tecnol El Llano Aguascalientes Carretera Aguascalientes San Luis PotosiKm 18 Aguascalientes Aguascalientes Mexico

ISBN: (纸本)9783030898205;9783030898199

Shared memory programming and distributed memory programming, are the most prominent ways of parallelize applications requiring high processing times and large amounts of storage in High Performance Computing (HPC) systems;parallel applications can be represented by parallel Task Graphs (PTG) using Directed Acyclic Graphs (DAGs). The scheduling of PTGs in HPCS is considered a NP-Complete combinatorial problem that requires large amounts of storage and long processing times. Heuristic methods and sequential programming languages have been proposed to address this problem. In the open access paper: Scheduling in Heterogeneous distributed Computing Systems Based on Internal Structure of parallel Tasks Graphs with Meta-Heuristics, the Array Method is presented, this method optimizes the use of processing Elements (PE) in a HPCS and improves response times in scheduling and mapping resource with the use of the Univariate Marginal Distribution Algorithm (UMDA);Array Method uses the internal characteristics of PTGs to make task scheduling;this method was programmed in the C language in sequential form, analyzed and tested with the use of algorithms for the generation of synthetic workloads and DAGs of real applications. Considering the great benefits of parallel software, this research work presents the Array Method using parallel programming with OpenMP. The results of the experiments show an acceleration in the response times of parallel programming compared to sequential programming when evaluating three metrics: waiting time, makespan and quality of assignments.

关键词： parallel Task Graphs (PTG) High Performance Computing (HPC) Systems Scheduling tasks Univariate Marginal Distribution Algorithm (UMDA) Array method

来源：评论

学校读者我要写书评

暂无评论

SHALLOW MULTI-SCALE NETWORK FOR STYLIZED SUPER-RESOLUTION

SHALLOW MULTI-SCALE NETWORK FOR STYLIZED SUPER-RESOLUTION

引用

IEEE International Conference on image processing (ICIP)

作者： Durand, Thibault Rabin, Julien Tschumperle, David Normandie Univ GREYC CNRS ENSICAENUNICAEN F-14000 Caen France

ISBN: (纸本)9781665441155

image Super Resolution (SR) has come a long way since the early age of image processing. Deep learning methods nowadays give outstanding results, yet very few are actually used in digital illustration and photo retouching software due to large memory storage and GPU computational requirements, but also due to the actual lack of control provided to the user over the final result. This paper introduces a two-step framework for stylized SR using a multi-scale network built with independent parallel branches. The approach aims at: i. designing a shallow network based on image processing techniques making it usable on light hardware architecture (low memory cost, no GPU);ii. providing a versatile, controllable and customizable network to stylize SR results in a plug-and-play manner. We show that the proposed method offers significant advantages over state-of-the-art reference-based approaches regarding these aspects.

关键词： image Super Resolution Style Transfer Shallow Neural Network Texture Synthesis Interactive Computation

来源：评论

学校读者我要写书评

暂无评论

distributed Deep Learning-based Model for Large image Data Classification 23

Distributed Deep Learning-based Model for Large Image Data C...

引用

Proceedings of the 7th International Conference on Future Networks and distributed Systems

作者： M Faisal Nurnoby Khalid A. Abu Shawarib Ayaz Ul Hassan Khan SDAIA-KFUPM Joint Research Center for Artificial Intelligence King Fahd University of Petroleum and Minerals Saudi Arabia

ISBN: (纸本)9798400709036

Artificial intelligence has shown great potential in a variety of applications, from natural language models to audio visual recognition, classification, and manipulation. AI Researchers have to work with massive amount of collected data for use in machine learning, raising some challenges in effectively managing and utilizing the collected data in the training phase to develop and iterate on more accurate, and more generalized models. In this paper we conducted a review on parallel and distributed machine learning methods and challenges. We also propose a distributed and scalable deep learning model architecture which can span across multiple processing nodes. We tested the model on the MIT Indoor dataset, to evaluate the performance and scalability of the model using multiple hardware nodes, and showed the scaling characteristics of the different model using different model sizes. We find that distributed training is 80% faster using 2 GPUs than 1 GPU. We also find that the model keeps the benefits of distributed training such as speed and accuracy regardless of its size or training batch size.

关键词： Deep Learning

来源：评论

学校读者我要写书评

暂无评论

Ghost: A General Framework for High-Performance Online Similarity Queries over distributed Trajectory Streams

引用

Proceedings of the ACM on Management of Data 2023年第2期1卷 1-25页

作者： Ziquan Fang Shenghao Gong Lu Chen Jiachen Xu Yunjun Gao Christian S. Jensen Zhejiang University Hangzhou China Aalborg University Aalborg Denmark

Trajectory similarity queries, including similarity search and similarity join, offer a foundation for many geo-spatial applications. With the rapid increase of streaming trajectory data volumes, e.g., data from mobile phones, vessel monitoring, or traffic systems, many location-based services benefit from online similarity analytics over trajectory data streams, where moving objects continually emit real-time position data. However, most existing studies focus on offline settings, and thus several major challenges remain unanswered in an online setting. To this end, we describe Ghost, a distributed stream processing framework that enables generic, efficient, and scalable online trajectory similarity search and *** propose a novel incremental online similarity computation (IOSC) mechanism to accelerate pair-wise streaming trajectory distance calculation, which supports a broad range of trajectory distance metrics. Compared with previous studies, IOSC reduces the complexity from quadratic to linear in terms of trajectory length. Building on this foundation, we propose histogram-based algorithms that exploit histogram indexes and a series of pruning bounds to enable streaming trajectory similarity search and join. Finally, we extend our methods to the distributed platform Flink for scalability, where a CostPartitioner is developed to ensure parallel processing and workload balancing. An experimental study using two real-life and one synthetic datasets shows that Ghost (i) acquires 6-20× efficiency/throughput gains and one order of magnitude memory overhead savings over state-of-the-art baselines, (ii) achieves 3--8× workload balancing gains on Flink, and (iii) exhibits low parameter sensitivity and high robustness.

关键词： distributed processing flink similarity metrics trajectory streams

来源：评论

学校读者我要写书评

暂无评论

Feluca: A Two-Stage Graph Coloring Algorithm With Color-Centric Paradigm on GPU

引用

IEEE TRANSACTIONS ON parallel AND distributed SYSTEMS 2021年第1期32卷 160-173页

作者： Zheng, Zhigao Shi, Xuanhua He, Ligang Jin, Hai Wei, Shuo Dai, Hulin Peng, Xuan Huazhong Univ Sci & Technol Natl Engn Res Ctr Big Data Technol & Syst Serv Comp Technol & Syst Lab Wuhan 430074 Peoples R China Univ Warwick Dept Comp Sci Coventry CV4 7AL W Midlands England

There are great challenges in performing graph coloring on GPU in general. First, the long-tail problem exists in the recursion algorithm because the conflict (i.e., different threads assign the adjacent nodes to the same color) becomes more likely to occur as the number of iterations increases. Second, it is hard to parallelize the sequential spread algorithm because the color allocation depends on the adjoining iteration. Third, the atomic operation is widely used on GPU to maintain the color list, which can greatly reduce the efficiency of GPU threads. In this article, we propose a two-stage high-performance graph coloring algorithm, called Feluca, aiming to address the above challenges. Feluca combines the recursion-based method with the sequential spread-based method. In the first stage, Feluca uses a recursive routine to color a majority of vertices in the graph. Then, it switches to the sequential spread method to color the remaining vertices in order to avoid the conflicts of the recursive algorithm. Moreover, the following techniques are proposed to further improve the graph coloring performance. i) A new method is proposed to eliminate the cycles in the graph;ii) a top-down scheme is developed to avoid the atomic operation originally required for color selection;and iii) a novel color-centric coloring paradigm is designed to improve the degree of parallelism for the sequential spread part. All these newly developed techniques, together with further GPU-specific optimizations such as coalesced memory access, comprise an efficient parallel graph coloring solution in Feluca. We have conducted extensive experiments on NVIDIA GPU. The results show that Feluca can achieve 1.19 - 8.39x speedup over the state-of-the-art algorithms.

关键词： Color Graphics processing units image color analysis Task analysis parallel processing Computational modeling Synchronization Graph coloring GPGPU parallelism color-centric paradigm pipeline

来源：评论

学校读者我要写书评

暂无评论

Single Node Acceleration of Generative Adversarial Networks using HPC for image Analytics 5

Single Node Acceleration of Generative Adversarial Networks ...

引用

5th International Conference on Computational Intelligence and Intelligent Systems, CiiS 2022

作者： Ravikumar, Aswathy Sriraman, Harini School of Computer Science and Engineering Vellore Institute of Technology Chennai600127 India

ISBN: (纸本)9781450397612

Generative Adversarial Networks (GAN) are approaches that are utilized for data augmentation, which facilitates the development of more accurate detection models for unusual or unbalanced datasets. Computer-assisted diagnostic methods may be made more reliable by using synthetic pictures generated by GAN. Generative adversarial networks are challenging to train because too unpredictable training dynamics may occur throughout the learning process, such as model collapse and vanishing gradients. For accurate and faster results the GAN network need to trained in parallel and distributed manner. We enhance the speed and precision of the Deep Convolutional Generative Adversarial Networks (DCGAN) architecture by using its parallelism and executing it on High-Performance Computing platforms. The effective analysis of a DCGAN in Graphic processing Unit and Tensor processing Unit platforms in which each layer execution pattern is analyzed. The bottleneck is identified for the GAN structure for each execution platforms. The Central processing Unit is capable of processing neural network models, but it requires a great deal of time to do it. Graphic processing Unit in contrast, side, are a hundred times quicker than CPUs for Neural Networks, however, they are prohibitively expensive compared to CPUs. Using the systolic array structure, TPU performs well on neural networks with high batch sizes but in GAN the shift between CPU and TPU is huge so it does not perform well. © 2022 ACM.

关键词： Graphics processing unit

来源：评论

学校读者我要写书评

暂无评论

Patronus: Countering Model Poisoning Attacks in Edge distributed DNN Training

Patronus: Countering Model Poisoning Attacks in Edge Distrib...

引用

IEEE Conference on Wireless Communications and Networking

作者： Zhonghui Wu Changqiao Xu Mu Wang Yunxiao Ma Zhongrui Wu Zhenyu Xiahou Luigi Alfredo Grieco State Key Laboratory of Networking and Switching Technology Beijing University of Posts and Telecommunications Beijing China School of Information Technology Zhejiang Financial College Zhejiang China Jiangxi Academy of Cyber Security Jiangxi China Department of Electrical and Information Engineering and CNIT Bari Italy

ISBN: (数字)9798350303582

ISBN: (纸本)9798350303599

As Deep Neural Networks (DNNs) are evolving in complexity to meet the demands of novel applications, a single device becomes insufficient for training, leading to the emergence of distributed DNN training. However, this evolution exposes a gap in research surrounding security vulnerabilities on model poisoning attacks, especially in model parallel setups, an area that has been scarcely studied. To bridge this gap, we introduce Patronus, an approach that counters model poisoning attacks in distributed DNN training, accommodating both data and model parallelism. With the employment of Loss-aware Credit Evaluation, Patronus scores each participating client. Based on the continuously updated credit, malicious clients are isolated and detected after multiple epochs by Shuffling-based Isolation Mechanism. Additionally, the training system is reinforced by Byzantine Fault-tolerant Aggregation to minimize malicious client impacts. Comprehensive experiments confirm Patronus's superior reliable and efficient performance over the existing methods under attack scenarios.

关键词： Training Fault tolerance image edge detection Fault tolerant systems distributed databases Artificial neural networks parallel processing

来源：评论

学校读者我要写书评

暂无评论

Dynamic Checkpointing for Heterogeneous IoT Devices Through Self-Referencing

Dynamic Checkpointing for Heterogeneous IoT Devices Through ...

引用

International Symposium on parallel and distributed processing with Applications, ISPA

作者： Nan Wang Ziyi Wang Lijun Lu Zhiyuan Ma Qun Chao School of Information Science and Engineering East China University of Science and Technology Shanghai China Institute of Machine Intelligence University of Shanghai for Science and Technology Shanghai China School of Mechanical Engineering Shanghai Jiao Tong University Shanghai China

ISBN: (数字)9798331509712

ISBN: (纸本)9798331509729

Failure recovery is one of the most essential problems in Internet of Things (IoT) systems, and the conventional snapshot method is an effective way to solve this problem. However, snapshot methods lack specialized designs for heterogeneous IoT devices, and when implemented in edge devices, serious system interruptions occur and performance is impacted. To address these problems, a dynamic checkpointing strategy is proposed for IoT systems that consist of heterogeneous devices. Firstly, an anomaly detection network for snapshots (i.e., ADSnet) that combines long short-term memory networks with multilayer convolutional networks is used to learn the multidimensional features of system resource usage. Secondly, ADSnet is tuned during deployment to learn the behaviors of target devices, so that ADSnet can report the anomalies of target devices in the near future. Finally, a dynamic checkpointing strategy is proposed to dynamically create snapshots on the basis of the anomaly detection results. The experimental results show that the proposed ADSnet achieves 97.73% accuracy in detecting anomalies in the target device; furthermore, our proposed dynamic checkpointing strategy reduces 25.4% snapshots than that created by the recently proposed ResCheck.

关键词： Checkpointing Performance evaluation Accuracy Design methodology image edge detection Nonhomogeneous media Feature extraction Internet of Things Anomaly detection Long short term memory

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：