We investigated the performance impact of IEEE-754 double-precision floating-point subnormal numbers, focusing on vector arithmetic and transcendental functions across Intel, AMD, and HiSilicon CPUs. We developed a be...
详细信息
This paper explores whether reinforcement learning is capable of enhancing metaheuristics for the quadratic unconstrained binary optimization (QUBO), which have recently attracted attention as a solver for a wide rang...
详细信息
Financial time series is one of the most important data in the field of economics and finance, and it is important to forecast and simulate such data effectively based on historical patterns and trends. Existing forec...
详细信息
ISBN:
(数字)9789819708598
ISBN:
(纸本)9789819708581;9789819708598
Financial time series is one of the most important data in the field of economics and finance, and it is important to forecast and simulate such data effectively based on historical patterns and trends. Existing forecasting models mainly forecasting one-step ahead, and cannot retain the complex characteristics of financial time series data such as serial correlation and the long-term time-dependent relationship. On the other hand, the large-scale data makes the training of the deep learning models a time-consuming process. Therefore, how to forecast financial time series multi-step ahead efficiently has become a key point to improve the asset management capability. At the same time, constructing a fuzzy portfolio optimization for different distributions is also an important direction to improve the robustness of a portfolio model. This paper proposes a distributed financial time series simulating model AssetGANs that simulating multi-step ahead based on GANs, and apply GANs as a parameter simulation method to fuzzy portfolio optimization to provide users with better strategy choices. The paper carries on numerical experiments on real market stock data, compares the results with LSTM and achieves a training speedup of over 573 with 8 GPUs compared to the CPU version.
There is a growing interest in training deep neural networks (DNNs) in a GPU cloud environment. This is typically achieved by running parallel training workers on multiple GPUs across computing nodes. Under such a set...
详细信息
ISBN:
(数字)9781665471770
ISBN:
(纸本)9781665471770
There is a growing interest in training deep neural networks (DNNs) in a GPU cloud environment. This is typically achieved by running parallel training workers on multiple GPUs across computing nodes. Under such a setup, the communication overhead is often responsible for long training time and poor scalability. This paper presents AIACC-Training, a unified communication framework designed for the distributed training of DNNs in a GPU cloud environment. AIACC-Training permits a training worker to participate in multiple gradient communication operations simultaneously to improve network bandwidth utilization and reduce communication latency. It employs auto-tuning techniques to dynamically determine the right communication parameters based on the input DNN workloads and the underlying network infrastructure. AIACC-Training has been deployed to production at Alibaba GPU Cloud with 3000+ GPUs executing AIACC-Training optimized code at any time. Experiments performed on representative DNN workloads show that AIACC-Training outperforms existing solutions, improving the training throughput and scalability by a large margin.
Center-based clustering is a pivotal primitive for unsupervised learning and data analysis. A popular variant is the k-means problem, which, given a set P of points from a metric space and a parameter k < |P|, requ...
详细信息
ISBN:
(纸本)9783031396977;9783031396984
Center-based clustering is a pivotal primitive for unsupervised learning and data analysis. A popular variant is the k-means problem, which, given a set P of points from a metric space and a parameter k < |P|, requires finding a subset S subset of P of k points, dubbed centers, which minimizes the sum of all squared distances of points in P from their closest center. A more general formulation, introduced to deal with noisy datasets, features a further parameter z and allows up to z points of P (outliers) to be disregarded when computing the aforementioned sum. We present a distributed coreset-based 3-round approximation algorithm for k-means with z outliers for general metric spaces, using MapReduce as a computational model. Our distributed algorithm requires sublinear local memory per reducer, and yields a solution whose approximation ratio is an additive term O(gamma) away from the one achievable by the best known polynomial-time sequential (possibly bicriteria) approximation algorithm, where gamma can be made arbitrarily small. An important feature of our algorithm is that it obliviously adapts to the intrinsic complexity of the dataset, captured by its doubling dimension D. To the best of our knowledge, no previous distributed approaches were able to attain similar quality-performance tradeoffs for general metrics.
Despite the widespread adoption of energy-efficient microcontroller units (MCUs) in the Tiny Machine Learning (TinyML) domain, they face significant limitations in terms of performance and memory (RAM, Flash), especia...
详细信息
ISBN:
(纸本)9798350364613;9798350364606
Despite the widespread adoption of energy-efficient microcontroller units (MCUs) in the Tiny Machine Learning (TinyML) domain, they face significant limitations in terms of performance and memory (RAM, Flash), especially when considering deep networks for complex classification tasks. In this work, we combine significance-aware computation skipping and software kernel design to accelerate the inference of approximate CNN models on MCUs. Our evaluation on an STM32-Nucleo board and 2 popular CNNs trained on the CIFAR-10 dataset shows that, compared to state-of-the-art exact inference, our Pareto optimal solutions can feature on average 21% latency reduction with no degradation in Top-1 classification accuracy, while for lower accuracy requirements, the corresponding reduction becomes even more pronounced.
In order to improve the problem of long-term landslide displacement prediction accuracy, this paper applies the Frequency enhanced decomposed transformer (FEDformer) model to landslide displacement prediction, and pro...
详细信息
This paper implements Timing-Go, a distributed timed task scheduling system based on a loosely coupled architecture, which solves the problems of poor manageability, poor parallel processing capability, poor task sche...
详细信息
Over the last few years, the number of IoT devices in daily use has increased, as they come in many sizes and of different types. In addition to this, these devices have become cheaper, which has led to many more peop...
详细信息
ISBN:
(纸本)9781665469586
Over the last few years, the number of IoT devices in daily use has increased, as they come in many sizes and of different types. In addition to this, these devices have become cheaper, which has led to many more people being able to use them. These devices are capable of both creating and processing information, thus reducing network overload. However, in Cloud or Edge computing environments, it is useful to know where these devices are located, in order to better distribute the information among the servers and further reduce the network load, allowing users to get the data faster. Therefore, there are simulators capable of analyzing Cloud infrastructures, but most of them fail to offer the possibility of including mobility in the sensors. For these reasons, in this paper we detail an API extension developed on the SimGrid toolkit to add mobility to IoT sensors and, in addition, it integrates with an API called Folium for the visualization of the mobility of these elements.
Evolutionary-based algorithms emerged due to their flexibility and effectiveness in solving different varieties of problems. Optimization-based techniques are used in finding solutions that involve multiple conflictin...
详细信息
暂无评论