Machine Learning (ML) rises as a highly useful tool to analyze the vast amount of data generated in every field of science nowadays. Simultaneously, data movement inside computer systems gains more focus due to its hi...
详细信息
Machine Learning (ML) rises as a highly useful tool to analyze the vast amount of data generated in every field of science nowadays. Simultaneously, data movement inside computer systems gains more focus due to its high impact an time and energy consumption. In this context, the Near-Data processing (NDP) architectures emerged as a prominent solution to increasing data by drastically reducing the required amount of data movement For NDP, we see three main approaches, Application-Specific Integrated Circuits (ASICs), full Central processing Units (CPUs) and Graphics processing Units (GPIIs), or vector units integration. However, previous work considered only ASICs, CPUs and GPUs when executing ML algorithms inside the memory. In this paper, we present an approach to execute ML algorithms near-data, using a general-purpose vector architecture and applying near-data parallelism to kernels from KNN, MEP, and CNN algorithms. To facilitate this process, we also present an NDP intrinsics library to ease the evaluation and debugging tasks. Our results show speedups up to fox for KNN, 11× for MLP, and 3× for convolution when processing near-data compared to a high-performance ×86 baseline.
Particle filter is a serial Monte-Carlo estimation method. It is suitable for the applications whose system or measurement model is highly non-linear and uncertainties are large. the standard particle filter encounter...
详细信息
ISBN:
(纸本)9781728175652
Particle filter is a serial Monte-Carlo estimation method. It is suitable for the applications whose system or measurement model is highly non-linear and uncertainties are large. the standard particle filter encounters degeneracy problem. One of the important solution is resampling. the most common resampling method is Systematic resampling. It requires collective operations over the weights. this causes numerical instability problem for Systematic resampling. Furthermore, interaction between all weights causes less-readily parallelization of Systematic resampling on graphics processing unit (GPU). To overcome these problems, Metropolis resampling is proposed. Since it only uses ratio of two weights rather than collective operations, it does not suffer from numerical instability and it is more suitable for the parallel implementation of it on GPU. Although it is fast in theory, it suffers from non-coalesced global memory access patterns when implemented on CPU. Metropolis-C1 and Metropolis-C2 are proposed to overcome this problem. In this study, we investigate the factors on the execution times of Metropolis, Metropolis-C1 and Metropolis-C2. there are mainly three factors. these are physical resources and limitations of the GPU, non-coalesced global memory access patterns and the number of particles. We discuss how they affect the execution times of these resampling algorithms in detail.
In recent years, in addition to the growth in the number of investors in the stock market, there has been a growing interest in predicting stock prices. Accurate stock prices can effectively improve investment returns...
详细信息
ISBN:
(纸本)9781450399067
In recent years, in addition to the growth in the number of investors in the stock market, there has been a growing interest in predicting stock prices. Accurate stock prices can effectively improve investment returns on the premise of reducing investment risks for stock investors. therefore, this study presents a hybrid Bo-LSTM-SVR model to predict the next day's stock closing price. Firstly, the hyper-parameters of LSTM and SVR as well as the length of their respective sliding windows are optimized by the Bayesian optimization method, so as to obtain more accurate predicted values of the single models. the genetic algorithm is then adopted every day to decide the weight of the two single models, and finally, the combined predicted values are obtained. In order to ensure that the prediction of the proposed model is more accurate, this model and the other six models are applied to predict the closing prices of the Shanghai Composite Index on the next trading day. the results reveal that the model proposed in this study is the most accurate, withthe smallest MAE and RMSE as well as the largest R^2. Compared with other models, the proposed model is more suitable for stock price prediction, which provides a dependable tool for investors to make stock investment decisions.
the proceedings contain 40 papers. the topics discussed include: a scalable framework for solving fractional diffusion equations;fast distributed bandits for online recommendation systems;wavefront parallelization of ...
ISBN:
(纸本)9781450379830
the proceedings contain 40 papers. the topics discussed include: a scalable framework for solving fractional diffusion equations;fast distributed bandits for online recommendation systems;wavefront parallelization of recurrent neural networks on multi-core architectures;NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems;a coordinate-oblivious index for high-dimensional distance similarity searches on the GPU;V-combiner: speeding-up iterative graph processing on a shared-memory platform with vertex merging;efficient parallelalgorithms for betweenness- and closeness-centrality in dynamic graphs;parallelizing pruned landmark labeling: dealing with dependencies in graph algorithms;Graptor: efficient pull and push style vectorized graph processing;and SB-fetch: synchronization aware hardware prefetching for chip multiprocessors.
From the last couple of decades, image denoisification is one of the challenging areas in the image processing and computer vision domains that adds clarity to images by removing noise and makes them suitable for furt...
详细信息
In recent years, there has been an increased interest in denoising techniques that are applicable in various medical imaging fields. the extraordinary development of the denoising area is no doubt due to the ever expa...
详细信息
ISBN:
(纸本)9783030602451;9783030602444
In recent years, there has been an increased interest in denoising techniques that are applicable in various medical imaging fields. the extraordinary development of the denoising area is no doubt due to the ever expanding and successful computing technology, but also to the emergence of the multi-resolution analysis (MRA) on both mathematical and algorithmic levels. However, many denoising techniques still remain ineffective in dealing with certain types of noise. Other methods can be too expensive, given their nested and complicated structure. therefore, in this paper, A new multi-scale parallel denoising paradigm is defined and tested. A comparative study is conducted between the two best-known MRA-based decomposition techniques: the Empirical Mode Decomposition (EMD) and the Discrete Wavelet Transform (DWT). the comparison is carried out in this framework of multi-scaled parallel denoising, where a Non-Local Means (NLM) filter is implemented and adjusted scale-by-scale to a sample of X-ray benchmark images. Some state-of-the-art denoising methods were also used in the evaluation. the numerical results proved the effectiveness of the multi-scaled parallel denoising in terms of accuracy and speed of convergence, especially when the NLM filtering is coupled withthe EMD. this shows a bright future for their medical use in the next few years.
Stochastic Gradient Descent (SGD) is widely used to train a machine learning model over large datasets, yet its slow convergence rate can be a bottleneck. As a remarkable family of variance reduction techniques, memor...
详细信息
ISBN:
(纸本)9783030602390;9783030602383
Stochastic Gradient Descent (SGD) is widely used to train a machine learning model over large datasets, yet its slow convergence rate can be a bottleneck. As a remarkable family of variance reduction techniques, memory algorithms such as SAG and SAGA have been proposed to accelerate the convergence rate of SGD. However, these algorithms need to store per training data point corrections in memory. the unlimited space usage feature is impractical for modern large-scale applications, especially over data points that arrive over time (referred to as streaming data in this paper). To overcome this weakness, this paper investigates the methods that bound the space usage in the state-of-the-art family of variance-reduced stochastic gradient descent over streaming data, and presents CHEAPS2AGA. At each step of updating the model, the key idea of CHEAPS2AGA is always reserving N random data points as samples, while re-using information about past stochastic gradients across all the observed data points with limited space usage. In addition, training an accurate model over streaming data requires the algorithm to be time-efficient. To accelerate the model training phase, CHEAPS2AGA embraces a lock-free data structure to insert new data points and remove unused data points in parallel, and updates the model parameters without using any locking. We conduct comprehensive experiments to compare CHEAPS2AGA to prior related algorithms suited for streaming data. the experimental results demonstrate the practical competitiveness of CHEAPS2AGA in terms of scalability and accuracy.
In the course of patrol and exploration on extraterrestrial celestial bodies, it is hard to monitor the state of the planetary rover in real time due to the large communication delay and tight link between the target ...
详细信息
Direction of arrival (DOA) is widely used in communication, biomedicine, and other fields. Stochastic maximum likelihood (SML) algorithm is an excellent direction of arrival (DOA) estimation algorithm. However, the ex...
详细信息
the proceedings contain 60 papers. the topics discussed include: an application recommendation method based on IUF;design of AES/EBU audio transceiver system based on FPGA;research on test platform of train control ve...
ISBN:
(纸本)9781728185897
the proceedings contain 60 papers. the topics discussed include: an application recommendation method based on IUF;design of AES/EBU audio transceiver system based on FPGA;research on test platform of train control vehicle-mounted subsystem for high-speed railway;classification of the mask Augsburg speech corpus (MASC) using the consistency learning method;research on energy-saving collaborative optimization method for multiple trains considering renewable energy utilization;design of online open channel flow monitoring system based on LSPIV;a refined prior-box generator for anchor-based object detector;research on parallel system for motion states monitoring of the planetary rover;and surface damage detection method for blade of wind turbine based on image segmentation.
暂无评论