Artificial intelligence has shown great potential in a variety of applications, from natural language models to audio visual recognition, classification, and manipulation. AI Researchers have to work with massive amou...
详细信息
ISBN:
(纸本)9798400709036
Artificial intelligence has shown great potential in a variety of applications, from natural language models to audio visual recognition, classification, and manipulation. AI Researchers have to work with massive amount of collected data for use in machine learning, raising some challenges in effectively managing and utilizing the collected data in the training phase to develop and iterate on more accurate, and more generalized models. In this paper we conducted a review on parallel and distributed machine learning methods and challenges. We also propose a distributed and scalable deep learning model architecture which can span across multiple processing nodes. We tested the model on the MIT Indoor dataset, to evaluate the performance and scalability of the model using multiple hardware nodes, and showed the scaling characteristics of the different model using different model sizes. We find that distributed training is 80% faster using 2 GPUs than 1 GPU. We also find that the model keeps the benefits of distributed training such as speed and accuracy regardless of its size or training batch size.
A significant part in computational fluid dynamics (CFD) simulations is the solving of large sparse systems of linear equations resulting from implicit time integration of the Reynolds-averaged Navier-Stokes (RANS) eq...
详细信息
ISBN:
(纸本)9783031396977;9783031396984
A significant part in computational fluid dynamics (CFD) simulations is the solving of large sparse systems of linear equations resulting from implicit time integration of the Reynolds-averaged Navier-Stokes (RANS) equations. The sparse linear system solver Spliss aims to provide a linear solver library that, on the one hand, is tailored to these requirements of CFD applications but, on the other hand, independent of the particular CFD solver. Spliss allows leveraging a range of available HPC technologies such as hybrid CPU parallelization and the possibility to offload the computationally intensive linear solver to GPU accelerators, while at the same time hiding this complexity from the CFD solver. This work highlights the steps taken to establish multi-GPU capabilities for the Spliss solver allowing for efficient and scalable usage of large GPU systems. In addition, this work evaluates performance and scalability on CPU and GPU systems using a representative CODA test case as an example. CODA is the CFD software being developed as part of a collaboration between the French Aerospace Lab ONERA, the German Aerospace Center (DLR), Airbus, and their European research partners. CODA is jointly owned by ONERA, DLR and Airbus. The evaluation examines and compares performance and scalability in a strong scaling approach on Nvidia A100 GPUs and the AMD Rome architecture.
Scene text images have different shapes and are subjected to various distortions, e.g. perspective distortions. To handle these challenges, the state-of-the-art methods rely on a rectification network, which is connec...
详细信息
ISBN:
(数字)9781665490627
ISBN:
(纸本)9781665490627
Scene text images have different shapes and are subjected to various distortions, e.g. perspective distortions. To handle these challenges, the state-of-the-art methods rely on a rectification network, which is connected to the text recognition network. They form a linear pipeline which uses text rectification on all input images, even for images that can be recognized without it. Undoubtedly, the rectification network improves the overall text recognition performance. However, in some cases, the rectification network generates unnecessary distortions on images, resulting in incorrect predictions in images that would have otherwise been correct without it. In order to alleviate the unnecessary distortions, the portmanteauing of features is proposed. The portmanteau feature, inspired by the portmanteau word, is a feature containing information from both the original text image and the rectified image. To generate the portmanteau feature, a non-linear input pipeline with a block matrix initialization is presented. In this work, the transformer is chosen as the recognition network due to its utilization of attention and inherent parallelism, which can effectively handle the portmanteau feature. The proposed method is examined on 6 benchmarks and compared with 13 state-of-the-art methods. The experimental results show that the proposed method outperforms the state-of-the-art methods on various of the benchmarks.
Time series forecasting is one of the most common and important problems in time series related tasks. With the rise of deep learning models, time series forecasting develops rapidly. Recurrent neural networks (RNNs) ...
详细信息
ISBN:
(数字)9798350352719
ISBN:
(纸本)9798350352726
Time series forecasting is one of the most common and important problems in time series related tasks. With the rise of deep learning models, time series forecasting develops rapidly. Recurrent neural networks (RNNs) were first proposed in 1990 to solve the problem of timing series forecasting with deep learning. After that, LSTM, GRU and other models were developed. In recent years, along with the emergence of models based on the Attention mechanism, a series of methods for timing series forecasting have been proposed based on Transformer. Complete and accurate data sampling of time series signals is the most ideal laboratory condition. But the sampling in real signal processing tasks is mostly incomplete. The existing methods are not satisfied for the forecasting results under incomplete sampling conditions. This paper aims to propose a modified RNN method to improve the forecasting effect of time series under incomplete sampling conditions.
Data parallelprocessing is a key concept to increase the scalability and elasticity in event streaming systems. Often data parallelism is accomplished in a splitter-merger architecture where the splitter divides inco...
详细信息
Accurate and rapid classification of large-scale lychee images is crucial for collecting germplasm resources and studying the characteristics of different lychee varieties, and it requires the construction of accurate...
详细信息
Accurate and rapid classification of large-scale lychee images is crucial for collecting germplasm resources and studying the characteristics of different lychee varieties, and it requires the construction of accurate classification models and the design of rapid classification algorithms. However, the current deep learning-based classification methods for lychee images are unable to simultaneously meet the processing requirements of accuracy and timeliness in large-scale lychee image classification. To address the problem above, this paper proposes a largescale parallel classification algorithm for lychee images based on Spark and deep learning. Specifically, first, the T_ECBAM_ResNetS-34 model architecture was designed and trained using a self-built dataset covering ten types of lychee images and the PyTorch deep learning framework, which improved the accuracy of model classification;Second, the model inference algorithm trained by PyTorch was restructured, utilizing Apache Spark RDD and broadcast variables and data structures to implement data partitioning and model parallel computation across nodes. The experimental results show that the method proposed in this paper surpasses existing technologies in both classification accuracy and the speed of large-scale lychee image classification.
Future deep HI surveys will be essential for understanding the nature of galaxies and the content of the Universe. However, the large volume of these data will require distributed and automated processing techniques. ...
详细信息
Future deep HI surveys will be essential for understanding the nature of galaxies and the content of the Universe. However, the large volume of these data will require distributed and automated processing techniques. We introduce LiSA, a set of python modules for the denoising, detection and characterization of HI sources in 3D spectral data. LiSA was developed and tested on the Square Kilometer Array Science Data Challenge 2 dataset, and contains modules and pipelines for easy domain decomposition and parallel execution. LiSA contains algorithms for 2D-1D wavelet denoising using the starlet transform and flexible source finding using null-hypothesis testing. These algorithms are lightweight and portable, needing only a few user-defined parameters reflecting the resolution of the data. LiSA also includes two convolutional neural networks developed to analyze data cubes which separate HI sources from artifacts and predict the HI source properties. All of these components are designed to be as modular as possible, allowing users to mix and match different components to create their ideal pipeline. We demonstrate the performance of the different components of LiSA on the SDC2 dataset, which is able to find 95% of HI sources with SNR > 3 and accurately predict their properties. (C) 2022 The Author(s). Published by Elsevier B.V.
There is an increase in medicine data quantity and image resolution requirements due to the modern medicine development, which leads to the necessity of strong computing resources and huge computer memory amount durin...
详细信息
With the rapid development of information technology, big data has become an important strategic resource for enterprises. However, with the increase in data volume, data quality issues are becoming increasingly promi...
详细信息
ISBN:
(数字)9798350360240
ISBN:
(纸本)9798350384161
With the rapid development of information technology, big data has become an important strategic resource for enterprises. However, with the increase in data volume, data quality issues are becoming increasingly prominent. Low data quality may lead to enterprises being unable to make accurate decisions, and even bring huge economic losses. This article first discusses the definition of data quality and the dimensions of data quality measurement, and then focuses on the measurement methods and architecture of data quality. It mainly summarizes the data quality comparison mode, technical implementation deployment, and proposes a distributed data quality audit architecture. Finally, it provides the steps and common software implementations for data quality management within the enterprise. In summary, this article aims to explore the significance, solutions, and implementation methods of data quality management, providing an effective theoretical basis for improving the data quality for enterprises.
In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Vario...
详细信息
ISBN:
(纸本)9783031189067;9783031189074
In this paper, we present OpenMedIA, an open-source toolbox library containing a rich set of deep learning methods for medical image analysis under heterogeneous Artificial Intelligence (AI) computing platforms. Various medical image analysis methods, including 2D/3D medical image classification, segmentation, localisation, and detection, have been included in the toolbox with PyTorch and/or MindSpore implementations under heterogeneous NVIDIA and Huawei Ascend computing systems. To our best knowledge, OpenMedIA is the first opensource algorithm library providing compared PyTorch and MindSpore implementations and results on several benchmark datasets. The source codes and models are available at https://***/OpenMedIA.
暂无评论