Deep convolutional neural networks have shown great potential in image recognition tasks. However, the fact that the mechanism of deep learning is difficult to explain hinders its development. It involves a large amou...
详细信息
Deep convolutional neural networks have shown great potential in image recognition tasks. However, the fact that the mechanism of deep learning is difficult to explain hinders its development. It involves a large amount of parameter learning, which results in high computational complexity. Moreover, deep convolutional neural networks are often limited by overfitting in regimes in which the number of training samples is limited. Conversely, kernel learning methods have a clear mathematical theory, fewer parameters, and can contend with small sample sizes;however, they are not able to handle high-dimensional data, e.g., images. It is important to achieve a performance and complexity trade-off in complicated tasks. In this paper, we propose a novel scalable deep convolutional random kernel learning in Gaussian process architecture called SDCRKL-GP, which is characterized by excellent performance and low complexity. First, we successfully incorporated the deep convolutional architecture into kernel learning by implementing the random Fourier feature transform for Gaussian processes, which can effectively capture hierarchical and local image-level features. This approach enabled the kernel method to effectively handle imageprocessing problems. Second, we optimized the parameters of deep convolutional filters and Gaussian kernels by stochastic variational inference. Then, we derived the lower variational bound of the marginal likelihood. Finally, we explored the model architecture design space selection method to determine the appropriate network architecture for different datasets. The design space consists of the number of layers, the channels per layer, and so on. Different design space selections improved the scalability of the SDCRKL-GP architecture. We evaluated SDCRKL-GP on the MNIST, FMNIST, CIFAR10, and CALTECH4 benchmark datasets. Taking MNIST as an example, the error rate of classification is 0.60%, and the number of parameters, number of computations and memo
image data is expanding rapidly, along with technology development, so efficient solutions must be considered to achieve high, real-time performance in the case of processing large image datasets. parallelprocessing ...
详细信息
image data is expanding rapidly, along with technology development, so efficient solutions must be considered to achieve high, real-time performance in the case of processing large image datasets. parallelprocessing is increasingly used as an attractive alternative to improve the performance, when using existing distributed architectures but also for sequential commodity computers. It can provide speedup, efficiency, reliability, incremental growth, and flexibility. We present such an alternative and stress the effectiveness of the methods to accelerate computations on a small cluster of PCs compared to a single CPU. Our paper is focused on applying edge detection on large image data sets, as a fundamental and challenging task in imageprocessing and computer vision. Five different techniques, mainly Sobel, Prewitt, LoG, Canny, and Roberts, are compared in a simple experimental setup that includes the OpenCV library functions for image pixels manipulation. Gaussian blur is used to reduce high-frequency components to manage the noise that edge detection is impacted by. Overall, this work is part of a more extensive investigation of image segmentation methods on large image datasets, but the results presented are relevant and show the effectiveness of our approach.
Shared memory programming and distributed memory programming, are the most prominent ways of parallelize applications requiring high processing times and large amounts of storage in High Performance Computing (HPC) sy...
详细信息
ISBN:
(纸本)9783030898205;9783030898199
Shared memory programming and distributed memory programming, are the most prominent ways of parallelize applications requiring high processing times and large amounts of storage in High Performance Computing (HPC) systems;parallel applications can be represented by parallel Task Graphs (PTG) using Directed Acyclic Graphs (DAGs). The scheduling of PTGs in HPCS is considered a NP-Complete combinatorial problem that requires large amounts of storage and long processing times. Heuristic methods and sequential programming languages have been proposed to address this problem. In the open access paper: Scheduling in Heterogeneous distributed Computing Systems Based on Internal Structure of parallel Tasks Graphs with Meta-Heuristics, the Array Method is presented, this method optimizes the use of processing Elements (PE) in a HPCS and improves response times in scheduling and mapping resource with the use of the Univariate Marginal Distribution Algorithm (UMDA);Array Method uses the internal characteristics of PTGs to make task scheduling;this method was programmed in the C language in sequential form, analyzed and tested with the use of algorithms for the generation of synthetic workloads and DAGs of real applications. Considering the great benefits of parallel software, this research work presents the Array Method using parallel programming with OpenMP. The results of the experiments show an acceleration in the response times of parallel programming compared to sequential programming when evaluating three metrics: waiting time, makespan and quality of assignments.
image Super Resolution (SR) has come a long way since the early age of imageprocessing. Deep learning methods nowadays give outstanding results, yet very few are actually used in digital illustration and photo retouc...
详细信息
ISBN:
(纸本)9781665441155
image Super Resolution (SR) has come a long way since the early age of imageprocessing. Deep learning methods nowadays give outstanding results, yet very few are actually used in digital illustration and photo retouching software due to large memory storage and GPU computational requirements, but also due to the actual lack of control provided to the user over the final result. This paper introduces a two-step framework for stylized SR using a multi-scale network built with independent parallel branches. The approach aims at: i. designing a shallow network based on imageprocessing techniques making it usable on light hardware architecture (low memory cost, no GPU);ii. providing a versatile, controllable and customizable network to stylize SR results in a plug-and-play manner. We show that the proposed method offers significant advantages over state-of-the-art reference-based approaches regarding these aspects.
Artificial intelligence has shown great potential in a variety of applications, from natural language models to audio visual recognition, classification, and manipulation. AI Researchers have to work with massive amou...
详细信息
ISBN:
(纸本)9798400709036
Artificial intelligence has shown great potential in a variety of applications, from natural language models to audio visual recognition, classification, and manipulation. AI Researchers have to work with massive amount of collected data for use in machine learning, raising some challenges in effectively managing and utilizing the collected data in the training phase to develop and iterate on more accurate, and more generalized models. In this paper we conducted a review on parallel and distributed machine learning methods and challenges. We also propose a distributed and scalable deep learning model architecture which can span across multiple processing nodes. We tested the model on the MIT Indoor dataset, to evaluate the performance and scalability of the model using multiple hardware nodes, and showed the scaling characteristics of the different model using different model sizes. We find that distributed training is 80% faster using 2 GPUs than 1 GPU. We also find that the model keeps the benefits of distributed training such as speed and accuracy regardless of its size or training batch size.
Trajectory similarity queries, including similarity search and similarity join, offer a foundation for many geo-spatial applications. With the rapid increase of streaming trajectory data volumes, e.g., data from mobil...
详细信息
Trajectory similarity queries, including similarity search and similarity join, offer a foundation for many geo-spatial applications. With the rapid increase of streaming trajectory data volumes, e.g., data from mobile phones, vessel monitoring, or traffic systems, many location-based services benefit from online similarity analytics over trajectory data streams, where moving objects continually emit real-time position data. However, most existing studies focus on offline settings, and thus several major challenges remain unanswered in an online setting. To this end, we describe Ghost, a distributed stream processing framework that enables generic, efficient, and scalable online trajectory similarity search and *** propose a novel incremental online similarity computation (IOSC) mechanism to accelerate pair-wise streaming trajectory distance calculation, which supports a broad range of trajectory distance metrics. Compared with previous studies, IOSC reduces the complexity from quadratic to linear in terms of trajectory length. Building on this foundation, we propose histogram-based algorithms that exploit histogram indexes and a series of pruning bounds to enable streaming trajectory similarity search and join. Finally, we extend our methods to the distributed platform Flink for scalability, where a CostPartitioner is developed to ensure parallelprocessing and workload balancing. An experimental study using two real-life and one synthetic datasets shows that Ghost (i) acquires 6-20× efficiency/throughput gains and one order of magnitude memory overhead savings over state-of-the-art baselines, (ii) achieves 3--8× workload balancing gains on Flink, and (iii) exhibits low parameter sensitivity and high robustness.
There are great challenges in performing graph coloring on GPU in general. First, the long-tail problem exists in the recursion algorithm because the conflict (i.e., different threads assign the adjacent nodes to the ...
详细信息
There are great challenges in performing graph coloring on GPU in general. First, the long-tail problem exists in the recursion algorithm because the conflict (i.e., different threads assign the adjacent nodes to the same color) becomes more likely to occur as the number of iterations increases. Second, it is hard to parallelize the sequential spread algorithm because the color allocation depends on the adjoining iteration. Third, the atomic operation is widely used on GPU to maintain the color list, which can greatly reduce the efficiency of GPU threads. In this article, we propose a two-stage high-performance graph coloring algorithm, called Feluca, aiming to address the above challenges. Feluca combines the recursion-based method with the sequential spread-based method. In the first stage, Feluca uses a recursive routine to color a majority of vertices in the graph. Then, it switches to the sequential spread method to color the remaining vertices in order to avoid the conflicts of the recursive algorithm. Moreover, the following techniques are proposed to further improve the graph coloring performance. i) A new method is proposed to eliminate the cycles in the graph;ii) a top-down scheme is developed to avoid the atomic operation originally required for color selection;and iii) a novel color-centric coloring paradigm is designed to improve the degree of parallelism for the sequential spread part. All these newly developed techniques, together with further GPU-specific optimizations such as coalesced memory access, comprise an efficient parallel graph coloring solution in Feluca. We have conducted extensive experiments on NVIDIA GPU. The results show that Feluca can achieve 1.19 - 8.39x speedup over the state-of-the-art algorithms.
Generative Adversarial Networks (GAN) are approaches that are utilized for data augmentation, which facilitates the development of more accurate detection models for unusual or unbalanced datasets. Computer-assisted d...
详细信息
As Deep Neural Networks (DNNs) are evolving in complexity to meet the demands of novel applications, a single device becomes insufficient for training, leading to the emergence of distributed DNN training. However, th...
详细信息
ISBN:
(数字)9798350303582
ISBN:
(纸本)9798350303599
As Deep Neural Networks (DNNs) are evolving in complexity to meet the demands of novel applications, a single device becomes insufficient for training, leading to the emergence of distributed DNN training. However, this evolution exposes a gap in research surrounding security vulnerabilities on model poisoning attacks, especially in model parallel setups, an area that has been scarcely studied. To bridge this gap, we introduce Patronus, an approach that counters model poisoning attacks in distributed DNN training, accommodating both data and model parallelism. With the employment of Loss-aware Credit Evaluation, Patronus scores each participating client. Based on the continuously updated credit, malicious clients are isolated and detected after multiple epochs by Shuffling-based Isolation Mechanism. Additionally, the training system is reinforced by Byzantine Fault-tolerant Aggregation to minimize malicious client impacts. Comprehensive experiments confirm Patronus's superior reliable and efficient performance over the existing methods under attack scenarios.
Failure recovery is one of the most essential problems in Internet of Things (IoT) systems, and the conventional snapshot method is an effective way to solve this problem. However, snapshot methods lack specialized de...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
Failure recovery is one of the most essential problems in Internet of Things (IoT) systems, and the conventional snapshot method is an effective way to solve this problem. However, snapshot methods lack specialized designs for heterogeneous IoT devices, and when implemented in edge devices, serious system interruptions occur and performance is impacted. To address these problems, a dynamic checkpointing strategy is proposed for IoT systems that consist of heterogeneous devices. Firstly, an anomaly detection network for snapshots (i.e., ADSnet) that combines long short-term memory networks with multilayer convolutional networks is used to learn the multidimensional features of system resource usage. Secondly, ADSnet is tuned during deployment to learn the behaviors of target devices, so that ADSnet can report the anomalies of target devices in the near future. Finally, a dynamic checkpointing strategy is proposed to dynamically create snapshots on the basis of the anomaly detection results. The experimental results show that the proposed ADSnet achieves 97.73% accuracy in detecting anomalies in the target device; furthermore, our proposed dynamic checkpointing strategy reduces 25.4% snapshots than that created by the recently proposed ResCheck.
暂无评论