We propose a novel and adaptive feature space distillation method (AFSD) to reduce the communication overhead among distributed computers. The proposed method improves the Codistillation process by supporting longer u...
详细信息
We propose a novel and adaptive feature space distillation method (AFSD) to reduce the communication overhead among distributed computers. The proposed method improves the Codistillation process by supporting longer update interval rates. AFSD performs knowledge distillates across the models infrequently and provides flexibility to the models in terms of exploring diverse variations in the training process. We perform knowledge distillation in terms of sharing the feature space instead of output only. Therefore, we also propose a new loss function for the Codistillation technique in AFSD. Using the feature space leads to more efficient knowledge transfer between models with a longer update interval rates. In our method, the models can achieve the same accuracy as Allreduce and Codistillation with fewer epochs.
The advancement of power technology and the improvement of people's living standards promote the expansion of the power grid scale and the sharp rise in electricity consumption. In the power system, due to the use...
详细信息
The advancement of power technology and the improvement of people's living standards promote the expansion of the power grid scale and the sharp rise in electricity consumption. In the power system, due to the use of various sensors, we can collect a large number of power data (eg. the spatial-temporal information of electric vehicle charging). Usually, such spatial-temporal data is generated in the form of a data stream. The analysis and mining of such data can be widely applied in power equipment condition monitoring and maintenance, user equipment anomaly warning, urban power grid analysis and other scenarios. Among them, the pattern detection of power data plays a key role in power data analysis. Since the power data such as the spatial-temporal information of electric vehicle charging is time-sensitive, it is crucial to perform real-time pattern mining in real-time monitoring systems. However, state-of-the-art pattern detection methods are built on batch mode. Extending such works directly to an online environment tends to result in (1) expensive network cost, (2) high processing latency, and (3) low accuracy results. In this paper, we propose a framework for frequent motion pattern detection of power data in the real-time distributed environment. Through the softmax differentiation function, the power data is filtered to reduce the workload and improve the performance of the framework. At the same time, we propose the concept of historical state matrix to solve the problem that the nodes of each physical partition in a distributed environment can not perceive each other. Extensive experiments are conducted on real dataset and the experimental results show that our pattern detection is about 70% faster than baseline methods, which proves the huge advantage of our approach over available solutions in the literature.
We introduce logical synchrony, a framework that allows distributed computing to be coordinated as tightly as in synchronous systems without the distribution of a global clock or any reference to universal time. We de...
详细信息
We introduce logical synchrony, a framework that allows distributed computing to be coordinated as tightly as in synchronous systems without the distribution of a global clock or any reference to universal time. We develop a model of events called a logical synchrony network, in which nodes correspond to processors and every node has an associated local clock which generates the events. We construct a measure of logical latency and develop its properties. A further model, called a multiclock network, is then analyzed and shown to be a refinement of the logical synchrony network. We present the bittide mechanism as an instantiation of multiclock networks, and discuss the clock control mechanism that ensures that buffers do not overflow or underflow. Finally we give conditions under which a logical synchrony network has an equivalent synchronous realization.
Task graphs are a popular method for defining complex scientific simulations and experiments that run on distributed and HPC (High-performance computing) clusters, because they allow their authors to focus on the prob...
详细信息
Task graphs are a popular method for defining complex scientific simulations and experiments that run on distributed and HPC (High-performance computing) clusters, because they allow their authors to focus on the problem domain, instead of low-level communication between nodes, and also enable quick prototyping. However, executing task graphs on HPC clusters can be problematic in the presence of allocation managers like PBS or Slurm, which are not designed for executing a large number of potentially short-lived tasks with dependencies. To make task graph execution on HPC clusters more efficient and ergonomic, we have created HYPERQUEUE, an open-source task graph execution runtime tailored for HPC use-cases. It enables the execution of large task graphs on top of an allocation manager by aggregating tasks into a smaller amount of PBS/Slurm allocations and dynamically load balances tasks amongst all available nodes. It can also automatically submit allocations on behalf of the user, it supports arbitrary task resource requirements and heterogeneous HPC clusters, it is trivial to deploy and does not require elevated privileges.
The Internet of Things promotes a view of large-scale deployments of devices able to compute, communicate, and interact with their surrounding environment. In this context, one significant challenge revolves around de...
详细信息
The Internet of Things promotes a view of large-scale deployments of devices able to compute, communicate, and interact with their surrounding environment. In this context, one significant challenge revolves around designing and programming collective processes, i.e., durable activities involving the collaboration of large groups of devices. Examples of collective processes include distributed sensing, collective decision-making, collective movement/transport, and adaptive maintenance of system-level structures. To address the issues involved in developing such kinds of system-wide behaviours, research has proposed multiple approaches, abstractions, and algorithmic solutions. In particular, the approach of aggregate processes has emerged as a promising formal technique for programming collective processes by a macro-level perspective while supporting decentralisation, abstraction, and resilience. In order to characterise (i) previous work on aggregate processes, (ii) the usages and applications that this technique may foster, and (iii) draw general design insights in the realm of collective computing, this article provides a characterisation of common problems and solutions based on aggregate processes. What results is a catalogue of design patterns for decentralised collective processes. Specifically, we provide a taxonomy of patterns, describe each pattern in a schematic form, and discuss the implications for the design of collective processes for the Internet of Things and related scenarios.
Cyberphysical systems have disseminated devices that can be untrustworthy or compromised. Nevertheless, the privacy and integrity of computation and data can be guaranteed through cryptographic protocols. We address t...
详细信息
Cyberphysical systems have disseminated devices that can be untrustworthy or compromised. Nevertheless, the privacy and integrity of computation and data can be guaranteed through cryptographic protocols. We address the computational burden posed by cryptography, and argue for a synergistic approach of designing programmable hardware accelerators for cryptography, followed by tailoring cryptographic protocols to this hardware.
In this paper we report the implementation and testing of algorithmic changes that have been implemented in MGAC, a crystal structure prediction system, to make it scalable and amenable to take advantage of such signi...
详细信息
In this paper we report the implementation and testing of algorithmic changes that have been implemented in MGAC, a crystal structure prediction system, to make it scalable and amenable to take advantage of such significant distributed resources as the Open Science Grid (OSG). The changes include the adoption of a steady state Genetic Algorithm (GA) and the adoption of a more general definition of the GA genome that eliminates the need of searching individually for each of the 230 possible space groups and the use of the Density Functional Theory with dispersion correction (DFT-D) as implemented in Quantum Espresso (QE) to calculate crystal energies. The performance of this implementation of MGAC, which in the following we label as MGAC-QE-OSG, is demonstrated for two test cases methanol and ethanol. In both cases the MGAC-QE-OSG can find the experimental structures of these compounds.
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used ...
详细信息
Task graphs provide a simple way to describe scientific workflows (sets of tasks with dependencies) that can be executed on both HPC clusters and in the cloud. An important aspect of executing such graphs is the used scheduling algorithm. Many scheduling heuristics have been proposed in existing works;nevertheless, they are often tested in oversimplified environments. We provide an extensible simulation environment designed for prototyping and benchmarking task schedulers, which contains implementations of various scheduling algorithms and is open-sourced, in order to be fully reproducible. We use this environment to perform a comprehensive analysis of workflow scheduling algorithms with a focus on quantifying the effect of scheduling challenges that have so far been mostly neglected, such as delays between scheduler invocations or partially unknown task durations. Our results indicate that network models used by many previous works might produce results that are off by an order of magnitude in comparison to a more realistic model. Additionally, we show that certain implementation details of scheduling algorithms which are often neglected can have a large effect on the scheduler's performance, and they should thus be described in great detail to enable proper evaluation.
This work proposes a sub-optimal method based on a two-layer structured meta-deep reinforcement learning (MDRL) approach to address the hardware impairment (HWI) optimization issue in large intelligent surface (LIS) s...
详细信息
This work proposes a sub-optimal method based on a two-layer structured meta-deep reinforcement learning (MDRL) approach to address the hardware impairment (HWI) optimization issue in large intelligent surface (LIS) systems. This method, designed for distributed LIS systems with reflection matrices, effectively enhances the system capacity and performance despite HWIs. Building upon existing techniques of dividing large-area LIS systems into multiple small-area subsystems, the simulated results demonstrate that sub-optimal LIS performance can be achieved with fewer samples in diverse dynamic wireless environments. This innovative approach enhances the adaptability of distributed LIS systems and offers an effective HWI management strategy, paving the way for future LIS system optimization.
The distributed matrix multiplication problem with an unknown number of stragglers is considered, where the goal is to efficiently and flexibly obtain the product of two massive matrices by distributing the computatio...
详细信息
The distributed matrix multiplication problem with an unknown number of stragglers is considered, where the goal is to efficiently and flexibly obtain the product of two massive matrices by distributing the computation across N servers. There are up to N - R stragglers but the exact number is not known a priori. Motivated by reducing the computation load of each server, a flexible solution is proposed to fully utilize the computation capability of available servers. The computing task for each server is separated into several subtasks, constructed based on Entangled Polynomial codes by Yu et al. The final results can be obtained from either a larger number of servers with a smaller amount of computation completed per server or a smaller number of servers with a larger amount of computation completed per server. The required finite field size of the proposed solution is less than 2N. Moreover, the optimal design parameters such as the partitioning of the input matrices are discussed. Our constructions can also be generalized to other settings such as batch distributed matrix multiplication and secure distributed matrix multiplication.
暂无评论