From the very beginning of the CUDA technology, it was essential to apply state-of-the-art optimization techniques. Only then was it possible to fully utilize the enormous computational power of graphic processing uni...
详细信息
ISBN:
(纸本)9781665435741
From the very beginning of the CUDA technology, it was essential to apply state-of-the-art optimization techniques. Only then was it possible to fully utilize the enormous computational power of graphic processing units. However, withthe development of the CUDA architecture, the impact of typical optimization techniques on software performance has changed significantly. this article shows how the impact of several optimization techniques on the performance of the image filtering algorithm has changed for the subsequent generations of CUDA architecture. then, based on the results obtained, it attempts to answer whether tedious and time-consuming optimization of the CUDA software is still necessary.
In many applications involving composite Web services, one or more component services may become unavailable, or no longer satisfy the quality requirements. this leads to the problem of identifying other components th...
详细信息
ISBN:
(纸本)9781665464970
In many applications involving composite Web services, one or more component services may become unavailable, or no longer satisfy the quality requirements. this leads to the problem of identifying other components that can substitute the faulty component, while maintaining the overall functionality of the composite service. Given a candidate service that offers the desired functionality, it is often necessary to select the most preferred substitution based on the correctness of the global behavior of the composite service. Such a candidate is selected from a repository of public representations of possible components where internal and private behavior are hidden. In this paper, we present and implement an approach to determine whether a possible candidate can substitute a part of a composite service while preserving the deadlock freeness property. At the same time, we propose a behavioral abstraction of web services, namely the symbolic observation graph, that preserves their privacy while allowing the modular verification of their correction. the approach is illustrated on a realistic use case involving three services (a provider, a distributor and a client).
Withthe rapid development of edge computing technology, the application of edge computing in smart grids has become more and more extensive. But edge computing has not yet been applied to the operation control of dis...
详细信息
ISBN:
(纸本)9781665435741
Withthe rapid development of edge computing technology, the application of edge computing in smart grids has become more and more extensive. But edge computing has not yet been applied to the operation control of distributed power generation microgrid systems. this article proposes a microgrid-oriented edge computing architecture. First, we introduce the main functions of edge-cloud collaboration. then we explain the construction plan of the architecture, including the realization of data processing, network communication and security mechanisms. Finally, we introduce the architecture application practice in a rural community in Central China.
Real-time ship detection from remote sensing imagery it is a great challenge due to the complex scene, the changeable characteristics of ship target, and the uncontrollable interference factors. In this letter, an inf...
详细信息
ISBN:
(纸本)9781665435741
Real-time ship detection from remote sensing imagery it is a great challenge due to the complex scene, the changeable characteristics of ship target, and the uncontrollable interference factors. In this letter, an infrared ship detection algorithm based on multi-feature fusion is proposed. Based on the fully analysis of the target features, the proposed algorithm combines the cascade rejection mechanism with multi-features through a cascade linear classifier from simple to complex, and uses fine features to accurately distinguish the target from the complex background. Large number of experimental results show that the proposed method can achieve better results in detection performance and real-time processing.
High-level programming models can help application developers to access and use resources without the need to manage low-level architectural entities, as a parallel programming model defines a set of programming abstr...
详细信息
ISBN:
(数字)9781665451550
ISBN:
(纸本)9781665451550
High-level programming models can help application developers to access and use resources without the need to manage low-level architectural entities, as a parallel programming model defines a set of programming abstractions that simplify the way by which a programmer structures and expresses her/his algorithm. Early proposals of Exascale programming tools are based on the adaptation of traditional parallel programming languages and hybrid solutions. this incremental approach is too conservative, often resulting in very complex code. this paper describes the design features, the programming constructs, and the runtime mechanisms of the Data Centric programming model for Exascale systems (DCEx). DCEx is based on structuring applications into data-parallel blocks. Blocks are units of shared-and distributed-memory parallel computation, communication, and migration in the memory/storage hierarchy. Blocks and their message queues are mapped onto processes and placed in memory/storage by the DCEx runtime. those data-parallel blocks are orchestrated by using distributedparallel patterns that simplify the development cost. DCEx aims to reach the convergence of traditional HPC programming models, mainly based on MPI, withthe emerging technologies based on the data intensive paradigms. To demonstrate the potential of DCEx, we carried out an experimental evaluation developing a real-world diffusion-weighted magnetic resonance imaging data processing application in a neuroimaging research context.
iWAPT (international Workshop on Automatic Performance Tuning) is a series of workshops that focus on research and techniques related to performance sustainability issues. the series provides an opportunity for resear...
ISBN:
(数字)9798350364606
ISBN:
(纸本)9798350364613
iWAPT (international Workshop on Automatic Performance Tuning) is a series of workshops that focus on research and techniques related to performance sustainability issues. the series provides an opportunity for researchers and users of automatic performance tuning (AT) technologies to exchange ideas and experiences acquired when applying such technologies to improve the performance of algorithms, libraries, and applications; in particular, on cutting edge computing platforms. Topics of interest include performance modeling; adaptive algorithms; autotuned numerical algorithms; libraries and scientific applications; empirical compilation; automated code generation; frameworks and theories of AT and software optimization; autonomic computing; and context-aware computing.
the profound impact of recent developments in artificial intelligence is unquestionable. the applications of deep learning models are everywhere, from advanced natural language processing to highly accurate prediction...
详细信息
ISBN:
(数字)9781665451550
ISBN:
(纸本)9781665451550
the profound impact of recent developments in artificial intelligence is unquestionable. the applications of deep learning models are everywhere, from advanced natural language processing to highly accurate prediction of extreme weather. those models have been continuously increasing in complexity, becoming much more powerful than their original versions. In addition, data to train the models is becoming more available as technological infrastructures sense and collect more readings. Consequently, distributed deep learning training is often times necessary to handle intricate models and massive datasets. Running a distributed training strategy on a supercomputer exposes the models to all the considerations of a large-scale machine;reliability is one of them. As supercomputers integrate a colossal number of components, each fabricated on an ever decreasing feature-size, faults are common during execution of programs. A particular type of fault, silent data corruption, is troublesome because the system does not crash and does not immediately give an evident sign of an error. We set out to explore the effects of that type of faults by inspecting how distributed deep learning training strategies cope with bit-flips that affect their internal data structures. We used checkpoint alteration, a technique that permits the study of this phenomenon on different distributed training platforms and with different deep learning frameworks. We evaluated two distributed learning libraries (distributed Data parallel and Horovod) and found out Horovod is slightly more resilient to SDCs. However, fault propagation is similar in both cases, and the model is more sensitive to SDCs than the optimizer.
We investigate cryptanalytic applications comprised of many independent tasks that exhibit a stochastic runtime distribution. We compare four algorithms for executing such applications on GPUs. We demonstrate that for...
详细信息
ISBN:
(纸本)9781665435772
We investigate cryptanalytic applications comprised of many independent tasks that exhibit a stochastic runtime distribution. We compare four algorithms for executing such applications on GPUs. We demonstrate that for different distributions, problem sizes, and platforms the best strategy varies. We support our analytic results by extensive experiments on two different GPUs, from different sides of the performance spectrum: A high performance GPU (Nvidia Volta) and an energy saving system on chip (Jetson Nano).
Approximate nearest neighbor search (ANNS) is the most basic and important algorithm in Database, Machine Learning and other applications. Withthe expansion of cloud computing, the academia focuses on the study of ho...
详细信息
ISBN:
(纸本)9781665435741
Approximate nearest neighbor search (ANNS) is the most basic and important algorithm in Database, Machine Learning and other applications. Withthe expansion of cloud computing, the academia focuses on the study of how to optimize distributed frameworks based on approximate nearest neighbor search such as MapReduce, and Memcached. We implement a new distributed ANNS search framework (NetANNS). the main contributions of NetANNS are to accelerate the data preprocessing with programmable switch, and integrate a variety of efficient ANNS algorithms so that it can choose the most suitable algorithm for each datasets. the experiments show that the search efficiency of NetANNS is about 2x than the common distributed ANNS frameworks which are implemented based on the framework of MapReduce.
the correctness and robustness of the neural network model are usually proportional to its depth and width. Currently, the neural network models become deeper and wider to cope with complex applications, which leads t...
详细信息
ISBN:
(纸本)9781665435741
the correctness and robustness of the neural network model are usually proportional to its depth and width. Currently, the neural network models become deeper and wider to cope with complex applications, which leads to high memory capacity requirement and computer capacity requirements of the training process. the multi-accelerator parallelism is a promising choice for the two challenges, which deploys multiple accelerators in parallel for training neural networks. Among them, the pipeline parallel scheme has a great advantage in training speed, but its memory capacity requirements are relatively higher than other parallel schemes. Aiming at solving this challenge of pipeline parallel scheme, we propose a data transfer mechanism, which effectively reduces the peak memory usage of the training process by real-time data transferring. In the experiment, we implement our design and apply it to Pipedream, a mature pipeline parallel scheme. the memory requirement of training process is reduced by up to 48.5%, and the speed loss is kept within a reasonable range.
暂无评论