the article explores character recognition using convolutional neural networks (CNNs) optimized withthe CUDA platform to enhance computational efficiency. It outlines the CNN architecture, methods for leveraging GPU-...
详细信息
ISBN:
(数字)9798331531836
ISBN:
(纸本)9798331531843
the article explores character recognition using convolutional neural networks (CNNs) optimized withthe CUDA platform to enhance computational efficiency. It outlines the CNN architecture, methods for leveraging GPU-based parallel data processing, and presents experimental results derived from the MNIST dataset. the study highlights that implementing CUDA drastically reduces processing time while maintaining a high level of predictive accuracy. the findings emphasize the potential of GPU acceleration in handling intensive computational tasks, making it a promising approach for real-time applications in image recognition and machine learning.
the graph coloring problem, a fundamental NP-hard challenge, has numerous applications in scheduling, register allocation, and network optimization. Traditional sequential algorithms for graph coloring are computation...
详细信息
ISBN:
(数字)9798350365887
ISBN:
(纸本)9798350365894
the graph coloring problem, a fundamental NP-hard challenge, has numerous applications in scheduling, register allocation, and network optimization. Traditional sequential algorithms for graph coloring are computationally expensive, particularly for large-scale graphs. In this paper, we propose the parallel BitColor Algorithm (PBitCo), an extension of the BitColor framework, designed to exploit the parallelprocessing capabilities of modern CPU and GPU architectures. the PBitCo algorithm utilizes bitwise operations to reduce computation time and employs parallel execution on widely accessible platforms to further enhance performance. We implemented and tested the algorithm on various graph instances, comparing its performance against conventional graph coloring methods. Our results demonstrate that PBitCo achieves significant speedups, withthe GPU implementation delivering up to a 10x improvement over baseline methods.
Histogram equalization is a method of contrast adjustment in image processing using the image’s histogram. However, as modern imaging systems become more complex, these traditional algorithms for histogram equalizati...
详细信息
ISBN:
(数字)9798350370249
ISBN:
(纸本)9798350370270
Histogram equalization is a method of contrast adjustment in image processing using the image’s histogram. However, as modern imaging systems become more complex, these traditional algorithms for histogram equalization are no longer efficient. In response to this problem, researchers have studied several strategies for improving the performance of histogram equalization in digital images. An option is to use parallelprocessing and multi-threading approaches to distribute the computational burden, thereby speeding up the execution of histogram equalization. Another methodology includes using machine learning algorithms to adapt histogram equalization parameters according to the input image. Furthermore, using advanced hardware architectures like Field Programmable Gate Arrays (FPGA), Graphic processing Units (GPU), or Application Specific Integrated Circuits can significantly enhance the speed and efficiency of a Histogram Equalization. the performance optimization techniques have provided encouraging results, which significantly refine image processing time and visual perception. Modern imaging systems may benefit tremendously from their use in the new age.
As a research tool autonomous underwater vehicles (AUVs) play an important role in various fields. However, due to the limitation of the number of field tests, poor information is the basic characteristic and backgrou...
详细信息
ISBN:
(数字)9781665483063
ISBN:
(纸本)9781665483063
As a research tool autonomous underwater vehicles (AUVs) play an important role in various fields. However, due to the limitation of the number of field tests, poor information is the basic characteristic and background of most AUV test data processing. the application of simulation model and simulation data is an important way in the existing poor information test. How to use the field test data to identify and correct the parameters of the simulation model is an important point for the simulation test data to be integrated into the field data. For the characterization of AUV motion model and nonlinear polynomial autoregressive models with exogenous inputs (NARX) model, this paper proposes the use of steady-state response method to realize parameter identification based on field data. then, the prior information is extracted by combining the existing data. On the basis of a priori constraints, three methods are used for grey box identification, the recursive least squares method with forgetting factor, the parallel recursive prediction error algorithm, recursive prediction error algorithm (RPEA), the steady-state response based on the NARX model. the accumulated error after identification proves that the steady-state response method based on the NARX model has better fitting accuracy.
A widely used computationally intensive scientific kernel, the matrix multiplication algorithm is at the heart of many scientific routines. Resurging fields, such as artificial intelligence (AI), strongly benefit from...
详细信息
Machine Learning (ML) rises as a highly useful tool to analyze the vast amount of data generated in every field of science nowadays. Simultaneously, data movement inside computer systems gains more focus due to its hi...
详细信息
ISBN:
(纸本)9781665414555
Machine Learning (ML) rises as a highly useful tool to analyze the vast amount of data generated in every field of science nowadays. Simultaneously, data movement inside computer systems gains more focus due to its high impact on time and energy consumption. In this contest. the Near-Data processing (NDP) architectures emerged as a prominent solution to increasing data by drastically reducing the required amount of data movement. For NDP, we see three main approaches, Application-Specific Integrated Circuits (ASICs), full Central processing units (CPUs) and Graphics processing Units (GPUs), or vector units integration. However, previous work considered only ASICs, CPUs and GPUs when executing ML algorithms inside the memory. In this paper, we present an approach to execute ML algorithms near-data, using a general-purpose vector architecture and applying near-data parallelism to kernels from KNN, MLP, and CNN algorithms. To facilitate this process, we also present an NDP intrinsics library to ease the evaluation and debugging tasks. Our results show speedups up to 10x for KNN, 11x for MLP, and 3x for convolution when processing near-data compared to a high-performance x86 baseline.
this paper introduces a new laser processing method for triangular mesh surfaces. In order to solve the efficiency problems of the hierarchical bounding box method in curved surface layering intersection, this method ...
this paper introduces a new laser processing method for triangular mesh surfaces. In order to solve the efficiency problems of the hierarchical bounding box method in curved surface layering intersection, this method proposes to use the direction of intersecting lines and the adjacency relation of surfaces to achieve faster layered detection of triangle-to-triangle intersection. In order to solve the deformation problem of parameter mapping in tool path generation, this method proposes a circular unfolding method to get a plane, and the contour parallel tool path planning method is used to fill the unfolded plane. At last, the curved surface processing path is generated by inversely mapping. the effectiveness of laser processing method is verified by a faster intersection comparison experiment and two triangular mesh surface cases studies.
In the era of Artificial Intelligence and specifically in the sub-category of Generative Artificial Intelligence (GenIA), the trend in the context of Data Centers is to be able to integrate traditional network archite...
详细信息
ISBN:
(数字)9798331510886
ISBN:
(纸本)9798331510893
In the era of Artificial Intelligence and specifically in the sub-category of Generative Artificial Intelligence (GenIA), the trend in the context of Data Centers is to be able to integrate traditional network architecturesthat are still managed as Ethernet networks with new ones. GenIA-based network structures that use low latency protocols such as InfiniBand and that allow parallelprocessing. In this work, a novel network architecture is presented that integrates the relevant aspects of Ethernet networks and the new GenIA Back End networks. We also deal with aspects related to protocols, bandwidth, design of a GenIA network architecture, components of GenIA networks and cooling systems. the main objective of this article is to analyze, through the Radial Basic Function (RBF) and Multilayer Perceptron (MLP) algorithms, which are part of the scope of Neural Networks (NN), how the dependent variable or metric GPU Performance behaves, since it is one of the critical aspects in the middle of modern data center architectures. To achieve this, a series of independent variables are defined such as GPU Memory Access, Memory Bandwidth, throughput, Power Consumption, Temperature, and Clock Speed. Withthe RBF algorithm, these sub-metrics are analyzed taking into consideration the number of units that may be in each of the hidden layers and the level of importance of the variable, while, withthe MLP, the number of units established both in hidden layer 1 and 2, respectively, and the importance level and normalized importance to analyze the behavior of these sub-metrics. the latter is presented through case studies and helps to identify which metrics are more relevant than others in terms of operation, performance, and efficiency of GPU performance metrics.
this study aims to achieve efficient parallelprocessing of WELL19937(1024,32) on Field-Programmable Gate Array (FPGA) platforms by applying a hybrid parallelization strategy. the goal is to enhance system clock frequ...
详细信息
ISBN:
(数字)9798350384437
ISBN:
(纸本)9798350384444
this study aims to achieve efficient parallelprocessing of WELL19937(1024,32) on Field-Programmable Gate Array (FPGA) platforms by applying a hybrid parallelization strategy. the goal is to enhance system clock frequency and data processingthroughput while optimizing hardware resource utilization and reducing circuit area requirements. the improved Pseudo-Random Number Generator (PRNG) can provide security assurance for dynamic encryption scenarios. Furthermore, this paper rigorously evaluates the statistical performance of both serial and parallel random number sequences generated by the PRNG, verifying the randomness of the system. Additionally, the parallel system outperforms related works on other platforms in terms of data output and operating frequency.
In recent years, BP neural network has become a powerful tool for time series forecasting due to its excellent nonlinear fitting ability and wide application. In this paper, we propose a parallelprocessing scheme for...
详细信息
ISBN:
(数字)9798350354973
ISBN:
(纸本)9798350354980
In recent years, BP neural network has become a powerful tool for time series forecasting due to its excellent nonlinear fitting ability and wide application. In this paper, we propose a parallelprocessing scheme for BP neural network optimized by combining MapReduce framework and Genetic Algorithm (GA), which aims to solve the problem of fast prediction of large-scale datasets. A dataset parallelization method of BP neural network optimized by GA based on MapReduce, which mainly has two stages: parallelization of GA and parallelization of BP network. GA parallelization adopts a multi-population method for parallelization, and by assigning different initial populations at different nodes, the diversity of populations and individuals can be increased, and the probability of obtaining optimal individuals can be improved. the parallelization map stage of the BP network calculates the change of the weight of each connection in the neural network, which is temporarily stored locally, and finally the Reduce side calculates the average value of the change of ownership, and uses this evaluation change to update the weight, which can speed up the convergence time of the BP network. the results show that in the cluster environment, the convergence speed is significantly accelerated, which has obvious speed advantages compared with traditional single-node training.
暂无评论