A large family of paradigmatic models arising in the area of image/signal processing, machine learning and statistics regression can be boiled down to consensus optimization. This paper is devoted to a class of consen...
详细信息
A large family of paradigmatic models arising in the area of image/signal processing, machine learning and statistics regression can be boiled down to consensus optimization. This paper is devoted to a class of consensus optimization by reformulating it as monotone plus skew-symmetric inclusion. We propose a distributed optimization method by deploying the algorithmic framework of generalized alternating direction implicit method. Under some mild conditions, the proposed method converges globally. Furthermore, the preconditioner is exploited to expedite the efficiency of the proposed method. Numerical simulations on sparse logistic regression are implemented by variant distributed fashions. Compared to some state-of-the-art methods, the proposed method exhibits appealing numerical performances, especially when the relaxation factor approaches to zero.
Radiomics is a technology that extracts a large number of quantitative features from high -throughput medical images and has become a focus of research. It can help in disease diagnosis, therapy planning, and prognosi...
详细信息
Radiomics is a technology that extracts a large number of quantitative features from high -throughput medical images and has become a focus of research. It can help in disease diagnosis, therapy planning, and prognosis evaluation through Big Data analysis algorithms. Radiomics technology can extract hundreds or even tens of thousands of quantifiable data features from medical images, which can no longer fit into the memory of one machine. Therefore, we propose a distributed correlation analysis algorithm (DFCA) based on a MapReduce distributed computing framework for breast ultrasound radiomics feature datasets. Each compute node will produce massive intermediate data while the DFCA calculates the Pearson correlation coefficient of radiomics features. With the increase of feature data and dimensions, the data transmission cost will be in a square growth. To reduce the cost, we propose a distributed correlation estimation algorithm (DFCEA) for radiomics features based on DFCA. The DFCEA algorithm estimates the Pearson correlation coefficient using an iterative method, which can further reduce the I/O cost. The experiment proved that our algorithms are more effective compared to the algorithms in the literature.
As distributed continuum systems (DCSs) are envisioned, they will have a massive impact on our future society. Hence, it is of utmost importance to ensure that their impact is socially responsible. Equipping these sys...
详细信息
As distributed continuum systems (DCSs) are envisioned, they will have a massive impact on our future society. Hence, it is of utmost importance to ensure that their impact is socially responsible. Equipping these systems with causal models brings features such as explainability, accountability, and auditability, which are needed to provide the right level of trust. Furthermore, by combining causality with graph-based service-level objectives, we can cope with dynamic and complex system requirements while achieving sustainable development of DCSs' capacities and applications.
Matrix computations are a fundamental building block of the edge computing systems, with a major recent uptick in demand due to their use in AI/ML training and inference procedures. Existing approaches for distributin...
详细信息
Matrix computations are a fundamental building block of the edge computing systems, with a major recent uptick in demand due to their use in AI/ML training and inference procedures. Existing approaches for distributing the matrix computations involve allocating coded combinations of submatrices to worker nodes, to build resilience to slower nodes, called stragglers. In the edge learning context, however, these approaches will compromise sparsity properties that are often present in the original matrices found at the edge server. In this study, we consider the challenge of augmenting, such approaches to preserve input sparsity when distributing the task across the edge devices, thereby retaining the associated computational efficiency enhancements. First, we find a lower bound on the weight of coding, i.e., the number of submatrices to be combined to obtain coded submatrices to provide the resilience to the maximum possible number of straggler devices (for given number of devices and their storage constraints). Next, we propose distributed matrix computation schemes which meet the exact lower bound on the weight of the coding. Numerical experiments conducted in amazon Web services (AWSs) validate our assertions regarding straggler mitigation and computation speed for the sparse matrices.
Over the years, Field Programmable Gate Arrays (FPGA) have been gaining popularity in the High Performance computing (HPC) field, because their reconfigurability enables very fine-grained optimizations with low energy...
详细信息
Over the years, Field Programmable Gate Arrays (FPGA) have been gaining popularity in the High Performance computing (HPC) field, because their reconfigurability enables very fine-grained optimizations with low energy cost. However, the different characteristics, architectures, and network topologies of the clusters have hindered the use of FPGAs at a large scale. In this work, we present an evolution of OmpSs@FPGA, a high-level taskbased programming model and extension to OmpSs-2, that aims at unifying all FPGA clusters by using a message-passing interface that is compatible with FPGA accelerators. These accelerators are programmed with C/C++ pragmas, and synthesized with High-Level Synthesis tools. The new framework includes a custom protocol to exchange messages between FPGAs, agnostic of the architecture and network type. On top of that, we present a new communication paradigm called Implicit Message Passing (IMP), where the user does not need to call any message-passing API. Instead, the runtime automatically infers data movement between nodes. We test classic message passing and IMP with three benchmarks on two different FPGA clusters. One is cloudFPGA, a disaggregated platform with AMD FPGAs that are only connected to the network through UDP/TCP/IP. The other is ESSPER, composed of CPU-attached Intel FPGAs that have a private network at the ethernet level. In both cases, we demonstrate that IMP with OmpSs@FPGA can increase the productivity of FPGA programmers at a large scale thanks to simplifying communication between nodes, without limiting the scalability of applications. We implement the N-body, Heat simulation and Cholesky decomposition benchmarks, and show that FPGA clusters get 2.6x and 2.4x better performance per watt than a CPU-only supercomputer for N-body and Heat.
The large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with massive data efficiently. This is the case of Association Rules, an unsupervised data mining too...
详细信息
The large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform which has been demonstrated to outperform existing distributive algorithmic implementations.
This study focuses on optimizing resource monitoring and management (RMM) of artificial intelligence (AI) systems. The study begins by discussing the fundamental concepts of Digital Twins and cloud collaboration platf...
详细信息
This study focuses on optimizing resource monitoring and management (RMM) of artificial intelligence (AI) systems. The study begins by discussing the fundamental concepts of Digital Twins and cloud collaboration platforms. Subsequently, a collaborative platform called the distributed Digital Twins cloud is designed, incorporating blockchain technology (BCT) to enhance its security. The utilization of cryptographic algorithms in the BCT ensures data security. The study then presents a comprehensive evaluation of the designed BCT-based distributed Digital Twins cloud. The evaluation results demonstrate that the proposed distributed Digital Twins cloud achieves job scheduling times ranging from approximately 35ms to 40ms during training, with the shortest and longest durations observed. The computed data consistency by the proposed platform ranges from 90% to 98%. In the test set, the proposed model achieves job allocation times ranging from approximately 30ms to 35ms, with the shortest and longest durations observed. The data consistency computed by the platform ranges from approximately 90% to 98%, with the lowest and highest values. The network resource management technology model developed in this study exhibits higher efficiency and intelligence, and its performance is comprehensively evaluated. The research findings indicate that the designed model enhances the data processing capability of the cloud platform and effectively improves the security of platform data management, providing substantial support for the advancement of Digital Twins platforms. Moreover, this study offers technical support for the application, design, and optimization of AI technology in network RMM, providing insights into the efficient application and innovation of AI and network technology for future societal development. It contributes to the profound development of AI technology.
Offloading techniques are considered one of the key enablers of deep neural network (DNN)-based artificial intelligence (AI) services on end devices with limited computing resources. However, offloading DNN layers inv...
详细信息
Offloading techniques are considered one of the key enablers of deep neural network (DNN)-based artificial intelligence (AI) services on end devices with limited computing resources. However, offloading DNN layers involves hard combinatorial problems. To this end, we develop a deep reinforcement learning (DRL)-based offloading algorithm for computing DNN layers with minimum end-to-end inference latency. We combine long short-term memory (LSTM) and graph neural network for state embedding that can exploit spatial correlation over the network to accelerate training, and temporal correlation over time to reduce the overhead of state monitoring. With this embedding, our DRL algorithm can draw multiple actions from a single state observation and adapt, without retraining, to new environments unseen in the training phase. We show through extensive simulations that our algorithm outperforms the existing ones in terms of both latency and robustness to feedback delay which is inevitable in practice, in particular, achieving a performance enhancement of up to 29.6% in some scenarios.
Developing large-scale distributed methods that are robust to the presence of adversarial or corrupted workers is an important part of making such methods practical for real-world problems. In this paper, we propose a...
详细信息
Developing large-scale distributed methods that are robust to the presence of adversarial or corrupted workers is an important part of making such methods practical for real-world problems. In this paper, we propose an iterative approach that is adversary-tolerant for convex optimization problems. By leveraging simple statistics, our method ensures convergence and is capable of adapting to adversarial distributions. Additionally, the efficiency of the proposed methods for solving convex problems is shown in simulations with the presence of adversaries. Through simulations, we demonstrate the efficiency of our approach in the presence of adversaries and its ability to identify adversarial workers with high accuracy and tolerate varying levels of adversary rates.
In this article, we explore two types of distributed quantum machine learning (DQML) methodologies: quantum federated learning and quantum model-parallel learning. We discuss the challenges encountered in DQML, propos...
详细信息
In this article, we explore two types of distributed quantum machine learning (DQML) methodologies: quantum federated learning and quantum model-parallel learning. We discuss the challenges encountered in DQML, propose potential solutions, and highlight future research directions in this rapidly evolving field. Additionally, we implement two solutions tailored to the two types of DQML, aiming to enhance the reliability of the computing process. Our results show the potential of DQML in the current Noisy Intermediate-Scale Quantum era.
暂无评论