This research delves into the utilization of distributed machine learning methods for revenue forecasting in the realm of online advertising systems. As the domain of digital advertising continues to expand rapidly, t...
详细信息
In the past, data processing was completed on a computer or a server. This is because the resources available at that time were limited. However, with the progress of technology, it is possible to use multiple compute...
详细信息
Large language models have become an important research direction in the field of deep learning, and have received extensive attention from academia and industry. These models excel in natural language processing task...
详细信息
ISBN:
(纸本)9798400707254
Large language models have become an important research direction in the field of deep learning, and have received extensive attention from academia and industry. These models excel in natural language processing tasks, significantly improving the performance of downstream tasks. However, due to the large scale of the model, high-performance computing resources are required for deployment, and the latency problem in the inference process also limits its practical use in some industrial applications. Therefore, how to optimize these models to improve their practical application effect is still an urgent problem to be solved. In this work, by optimising the model architecture, introducing sparsity techniques, using quantisation methods and adopting distributed training strategies, we have achieved a substantial reduction in the computational overhead and memory requirements of large-scale language models, while simultaneously improving the inference speed and training efficiency. Firstly, the model architecture was optimised in order to reduce redundant calculations and enhance the parameter efficiency of the model, particularly in the design of the self-attention mechanism and the feedforward network layer. Secondly, the incorporation of sparsity technology has the potential to reduce the number of parameters and the amount of computation without a significant impact on the model’s performance. This is achieved through the utilisation of sparse matrix multiplication and pruning techniques, which serve to minimise unnecessary computation. Furthermore, the quantization method markedly diminishes the memory footprint and bandwidth requirements by transforming the model weights and activation functions from high-precision floating-point numbers to low-precision representations, thereby enhancing the computational efficiency of the model. In the training process, a distributed training strategy was employed, utilising a combination of data parallelism and model parallelis
Over the past two decades, deep learning techniques have emerged as an immensely powerful technology, with break-through in computer vision, speech to text technologies, natural language processing and many more such ...
详细信息
ISBN:
(纸本)9781450397964
Over the past two decades, deep learning techniques have emerged as an immensely powerful technology, with break-through in computer vision, speech to text technologies, natural language processing and many more such fields. As neural networks grow in size and capacity, they require higher compute power and larger datasets to converge to a model with higher accuracy. In this poster, we present a data-centric approach to studying the system level requirements (GPU utilization, CPU utilization, I/O) of deep learning training workloads, and uncover a few insights that help us understand the nature of deep learning training workloads. We analyse three datasets found in industry, academia and national laboratories to understand the requirements and properties of deep learning training workloads.
The swift progress of the Internet of Vehicles (IoV) and autonomous driving technology has facilitated the emergence of the Internet of Autonomous Vehicles (IoAV). If delay-sensitive vehicle tasks are not completed on...
详细信息
ISBN:
(数字)9789819708116
ISBN:
(纸本)9789819708109;9789819708116
The swift progress of the Internet of Vehicles (IoV) and autonomous driving technology has facilitated the emergence of the Internet of Autonomous Vehicles (IoAV). If delay-sensitive vehicle tasks are not completed on time, it will lead to bad consequences for IoAV. Task offloading technology can solve the problem that the vehicle cannot meet the task requirements. However, highly dynamic vehicle networks and diverse vehicle applications require more intelligent task offloading strategies. Therefore, this paper addresses the distributed task offloading problem in the IoAV to meet diverse vehicle task demands. First, we model the vehicle task offloading problem as a decision problem, and a deep reinforcement learning (DRL) algorithm named DDP-DQN (double-dueling-prioritize-DQN) is applied to complete vehicle tasks more efficiently. Then, we design a reward function to complete the task within the acceptable maximum delay of the task while reducing the consumption of resources. Simulations demonstrate the outperforming of the DDP-DQN compared with other three reinforcement learning algorithms.
Over the past two years, the COVID-19 pandemic has been one of the most frequently and hotly debated social topics. Lockdowns and restrictions radically change the way of working and socializing due to social distanci...
详细信息
Today, the amount of data generated each year is growing exponentially, directly affecting the time required for its analysis. This problem worsens with high-dimensional datasets, such as those used in electroencephal...
详细信息
ISBN:
(纸本)9783031646287;9783031646294
Today, the amount of data generated each year is growing exponentially, directly affecting the time required for its analysis. This problem worsens with high-dimensional datasets, such as those used in electroencephalography, so a good feature selection method and techniques that improve algorithms' efficiency are increasingly relevant. Consequently, computing time and energy consumption are reduced, which could be used to explore more solutions to the problem. However, it is also necessary to adapt the applications to take advantage of the hardware offered by high-performance computing systems. Therefore, in this work, a parallel and distributed binary particle swarm optimization algorithm has been implemented, used as a feature selection method, and applied to two real electroencephalography datasets: the University of Essex dataset and the well-known BCI Competition IV 2a dataset. The proposed method has been analyzed in a multi-node computing cluster, not only in terms of classification accuracy, but also from the energy-time point of view to study its impact depending on different experimental conditions and datasets used.
Android malware detection has become research hotspot in mobile security. When security service providers obtain feature information from target samples, they may involve user privacy information such as identity and ...
详细信息
The popularity of multicore processors and the rise of High Performance Computing as a Service (HPCaaS) have made parallel programming essential to fully utilize the performance of multicore systems. OpenMP, a widely ...
详细信息
ISBN:
(纸本)9798350386783;9798350386776
The popularity of multicore processors and the rise of High Performance Computing as a Service (HPCaaS) have made parallel programming essential to fully utilize the performance of multicore systems. OpenMP, a widely adopted shared-memory parallel programming model, is favored for its ease of use. However, it is still challenging to assist and accelerate automation of its parallelization. Although existing automation tools such as Cetus and DiscoPoP to simplify the parallelization, there are still limitations when dealing with complex data dependencies and control flows. Inspired by the success of deep learning in the field of Natural Language processing (NLP), this study adopts a Transformer-based model to tackle the problems of automatic parallelization of OpenMP instructions. We propose a novel Transformer-based multimodal model, ParaMP, to improve the accuracy of OpenMP instruction classification. The ParaMP model not only takes into account the sequential features of the code text, but also incorporates the code structural features and enriches the input features of the model by representing the Abstract Syntax Trees (ASTs) corresponding to the codes in the form of binary trees. In addition, we built a BTCode dataset, which contains a large number of C/C++ code snippets and their corresponding simplified AST representations, to provide a basis for model training. Experimental evaluation shows that our model outperforms other existing automated tools and models in key performance metrics such as F1 score and recall. This study shows a significant improvement on the accuracy of OpenMP instruction classification by combining sequential and structural features of code text, which will provide a valuable insight into deep learning techniques to programming tasks.
An apodized fiber Bragg grating (FBG) is designed for quasi-distributed sensing of temperature and strain due its various advantages particularly in hazardous environment. The main purpose of apodized FBG is to attain...
详细信息
ISBN:
(数字)9781665482509
ISBN:
(纸本)9781665482509
An apodized fiber Bragg grating (FBG) is designed for quasi-distributed sensing of temperature and strain due its various advantages particularly in hazardous environment. The main purpose of apodized FBG is to attain maximum reflectivity, narrow bandwidth and low level of side lobes, which are crucial for quasi-distributed sensing applications. Relationship between FBG properties and grating length have been explored to enhance and optimize the FBG. K Nearest Neighbors (KNN) algorithm is introduced for predictive analysis of FBG properties with different K values for the reliability of apodized FBG particularly for sensing applications. The optimal value of K has been identified for KNN by using various statistical techniques such as Mean Squared Error and Mean Absolute Error. Strong linearity has been obtained for both temperature and strain sensitivity of the designed apodized FBG. The optimized apodized FBG is utilized on wavelength division multiplexing (WDM) based quasi-distributed sensing system of four FBG signifying high reliability. High temperature and strain sensitivity ranges have been achieved in quasi-distributed sensing. The obtained ranges can be imposed in FBG-based sensing applications for monitoring of civil structure in hazardous environment.
暂无评论