This paper aims at high and portable performance for tensor computations across spatial (e.g., FPGAs) and vector architectures (e.g., GPUs). The state-of-the-art usually address performance portability across vector a...
详细信息
It is common practice to use large computational resources to train neural networks, known from many examples, such as reinforcement learning applications. However, while massively parallelcomputing is often used for...
详细信息
Federated learning (FL), as a safe distributed training mode, provides strong support for the edge intelligence of the Internet of Vehicles (IoV) to realize efficient collaborative control and safe data sharing. Howev...
详细信息
As a new stage in the development of the cloud computing paradigm, serverless computing has the high-level abstraction characteristic of shielding underlying details. This makes it extremely challenging for users to c...
As a new stage in the development of the cloud computing paradigm, serverless computing has the high-level abstraction characteristic of shielding underlying details. This makes it extremely challenging for users to choose a suitable serverless platform. To address this, targeting the jointcloud computing scenario of heterogeneous serverless platforms across multiple clouds, this paper presents a jointcloud collaborative mechanism called FCloudless with cross-cloud detection of the full lifecycle performance of serverless platforms. Based on the benchmark metrics set that probe performance critical stages of the full lifecycle, this paper proposes a performance optimization algorithm based on detected performance data that takes into account all key stages that affect the performance during the lifecycle of a function and predicts the overall performance by combining the scores of local stages and dynamic weights. We evaluate FCloudless on AWS, AliYun, and Azure. The experimental results show that FCloudless can detect the underlying performance of serverless platforms hidden in the black box and its optimization algorithm can select the optimal scheduling strategy for various applications in a jointcloud environment. FCloudless reduces the runtime by 23.3% and 24.7% for cold and warm invocations respectively under cost constraints.
It is common practice to use large computational resources to train neural networks, known from many examples, such as reinforcement learning applications. However, while massively parallelcomputing is often used for...
详细信息
Data flow analysis (e.g., dynamic taint analysis) has proven to be useful for guiding fuzzers to explore hard-to-reach code and find vulnerabilities. However, traditional taint analysis is labor-intensive, inaccurate ...
ISBN:
(纸本)9781939133175
Data flow analysis (e.g., dynamic taint analysis) has proven to be useful for guiding fuzzers to explore hard-to-reach code and find vulnerabilities. However, traditional taint analysis is labor-intensive, inaccurate and slow, affecting the fuzzing efficiency. Apart from taint, few data flow features are *** this paper, we proposed a data flow sensitive fuzzing solution GREYONE. We first utilize the classic feature taint to guide fuzzing. A lightweight and sound fuzzing-driven taint inference (FTI) is adopted to infer taint of variables, by monitoring their value changes while mutating input bytes during fuzzing. With the taint, we propose a novel input prioritization model to determine which branch to explore, which bytes to mutate and how to mutate. Further, we use another data flow feature constraint conformance, i.e., distance of tainted variables to values expected in untouched branches, to tune the evolution direction of *** implemented a prototype of GREYONE and evaluated it on the LAVA data set and 19 real world programs. The results showed that it outperforms various state-of-the-art fuzzers in terms of both code coverage and vulnerability discovery. In the LAVA data set, GREYONE found all listed bugs and 336 more unlisted. In real world programs, GREYONE on average found 2.12X unique program paths and 3.09X unique bugs than state-of-the-art evolutionary fuzzers, including AFL, VUzzer, CollAFL, Angora and Honggfuzz, Moreover, GREYONE on average found 1.2X unique program paths and 1.52X unique bugs than a state-of-the-art symbolic exeuction assisted fuzzer QSYM. In total, it found 105 new security bugs, of which 41 are confirmed by CVE.
In general, one of the complexities of large simulations is related to the usage of the heterogeneous computational resources that are needed to execute them. The definition of workflows, usually linked to concrete or...
详细信息
ISBN:
(纸本)9781538678800
In general, one of the complexities of large simulations is related to the usage of the heterogeneous computational resources that are needed to execute them. The definition of workflows, usually linked to concrete orchestrations solutions, has reduced most of that complexity. These solutions are oriented to High Performance computing (HPC) or just deals with services managed remotely. This paper presents a novel solution we propose for running simulations in a hybrid HPC and Cloud infrastructure, exploiting the performance and power of HPC systems and benefiting from the fast and flexible provision of Cloud resources. We provide our vision about the typical simulation workflows and the kind of computational resources that fits best in each phase. In line with such vision, we describe the research done in order to enable the definition of the workflows extending the TOSCA standard (originally focused on Cloud solutions), used by our orchestrator and other solutions. We propose several extensions (types, relationships, compute properties and job properties), compatible with the standard definition, so Cloud and HPC tasks can be processed as expected. The paper also shows a use case implemented with the proposed approach, highlighting some of the benefits found so far.
This article's main contributions are twofold: 1) to demonstrate how to apply the general European Union's High-Level Expert Group's (EU HLEG) guidelines for trustworthy AI in practice for the domain of he...
详细信息
Even the most sophisticated artificial neural networks are built by aggregating substantially identical units called neurons. A neuron receives multiple signals, internally combines them, and applies a non-linear func...
详细信息
A one-sided programming model separates communication from synchronization, and is the driving principle behind partitioned global address space (PGAS) libraries such as Global Arrays (GA) and SHMEM. PGAS models expos...
详细信息
A one-sided programming model separates communication from synchronization, and is the driving principle behind partitioned global address space (PGAS) libraries such as Global Arrays (GA) and SHMEM. PGAS models expose a rich set of functionality that a developer needs in order to implement mathematical algorithms that require frequent multidimensional array accesses. However, use of existing PGAS libraries in application codes often requires significant development effort in order to fully exploit these programming models. On the other hand, a vast majority of scientific codes use MPI either directly or indirectly via third-party scientific computation libraries, and need features to support application-specific communication requirements (e.g., asynchronous update of distributed sparse matrices, commonly arising in machine learning workloads). For such codes it is often impractical to completely shift programming models in favor of special one-sided communication middleware. Instead, an elegant and productive solution is to exploit the one-sided functionality already offered by MPI-3 RMA (Remote Memory Access). We designed a general one-sided interface using the MPI-3 passive RMA model for remote matrix operations in the linear algebra library Elemental, we call the interface we designed RMAInterface. Elemental is an open source library for distributed-memory dense and sparse linear algebra and optimization. We employ RMAInterface to construct a Global Arrays-like API and demonstrate its performance scalability and competitivity with that of the existing GA (with ARMCI-MPI) for a quantum chemistry application.
暂无评论