Modem data-driven applications arising in such domains as smart manufacturing, healthcare, and the Internet of Things, pose new challenges to data processing systems. Traditional stream processing systems, such as Fli...
详细信息
ISBN:
(纸本)9798400704437
Modem data-driven applications arising in such domains as smart manufacturing, healthcare, and the Internet of Things, pose new challenges to data processing systems. Traditional stream processing systems, such as Flink, Spark, and Kafka Streams are ill-suited to cope with the massive scale of distribution, the heterogeneous computing landscape, and requirements, such as timely processing and actuation. Classical approaches like managed runtimes, interpretation-based query processing, and the optimization of single queries that neglect interactions, greatly limit throughput, latency, energy-efficiency, and the general usability of these systems for emerging applications involving distributed data processing at scale in a sensor-edge-cloud-environment. To overcome these limitations, we are researching and building NebulaStream, a novel data stream processing system for massively distributed, heterogeneous environments. NebulaStream supports (potentially resource-constrained) heterogeneous devices, a hierarchical topology (with the distribution of computation and data flow in a cloud-edge-continuum), and the sharing of computations and data across multiple concurrent queries.
This paper summarizes state-of-the-art results on data series processing with the emphasis on parallel and distributed data series indexes that exploit the computational power of modern computing platforms. The paper ...
详细信息
ISBN:
(纸本)9783031516429;9783031516436
This paper summarizes state-of-the-art results on data series processing with the emphasis on parallel and distributed data series indexes that exploit the computational power of modern computing platforms. The paper comprises a summary of the tutorial the author delivered at the 15th International Conference on Management of Digital EcoSystems (MEDES'23).
The Large Intelligent Surface (LIS) is a promising technology in the areas of wireless communication, remote sensing and positioning. It consists of a continuous radiating surface located in the proximity of the users...
详细信息
The Large Intelligent Surface (LIS) is a promising technology in the areas of wireless communication, remote sensing and positioning. It consists of a continuous radiating surface located in the proximity of the users, with the capability to communicate by transmission and reception (replacing base stations). Despite its potential, there are numerous challenges from an implementation point of view, with the interconnection data-rate, computational complexity, and storage the most relevant ones. In order to address these challenges, hierarchical architectures with distributed processing techniques are envisioned to be relevant for this task, while ensuring scalability. In this work we perform algorithm-architecture codesign to propose two distributed interference cancellation algorithms, and a tree-based interconnection topology for uplink processing. We also analyze the performance, hardware requirements, and architecture trade-offs for a discrete LIS, in order to provide concrete case studies and guidelines for efficient implementation of LIS systems.
NASA Technical Reports Server (Ntrs) 20010068374: Performance Monitoring of distributed Data processing Systems by NASA Technical Reports Server (Ntrs); NASA Technical Reports Server (Ntrs); published by
NASA Technical Reports Server (Ntrs) 20010068374: Performance Monitoring of distributed Data processing Systems by NASA Technical Reports Server (Ntrs); NASA Technical Reports Server (Ntrs); published by
Batch and stream processing are separately and efficiently applied in many applications. However, some newer data-driven applications such as the Internet of Things and cloud computing call for hybrid processing appro...
详细信息
Batch and stream processing are separately and efficiently applied in many applications. However, some newer data-driven applications such as the Internet of Things and cloud computing call for hybrid processing approaches in order to handle the speed and accuracy required for processing such complex data. In this paper, we propose a Hybrid distributed Batch-Stream (HDBS) architecture for anomaly detection in real-time data. The hybrid architecture, while benefiting from the accuracy provided by batch processing, also enjoys the speed and real-time features of stream processing. In the proposed architecture, our focus is on the algorithmic aspects of hybrid processing including the interaction models between batch and stream processing units, the characteristics of batch and stream machine learning algorithms and the principles of merging the results of different processing units. The driving idea of such combination is that the results of batch and stream processing units are complementary with each other, as one of them constructs accurate models based on previous data, and the other one is capable of processing new stream data in real-time. Furthermore, we propose a generalized version of the HDBS with respect to its algorithms and communication policy levels. In the generalized HDBS architecture, we address the various aspects of the interaction between the batch and stream processing units, and the merging operations to produce the final results. the evaluations of the proposed architecture using various criteria (accuracy, space complexity, and time complexity) demonstrate that the accuracy of the proposed method is higher than the accuracy of the batch processing methods, its time complexity is also similar to one of the stream processing methods and much less than the batch processing methods, which makes our proposed architecture an efficient and practical solution for real-time anomaly detection (C) 2020 Elsevier Inc. All rights reserved.
A laser transmission system for free space optical communication is described in this paper. The distributed processing techniques are investigated to apply to all-optical transmission systems. A pair of transmission ...
详细信息
ISBN:
(纸本)9784907764456
A laser transmission system for free space optical communication is described in this paper. The distributed processing techniques are investigated to apply to all-optical transmission systems. A pair of transmission nodes is equipped with an E/O and an O/E converter and each node is controlled independently. The distributed processing system transmits the positioning data of laser beam and the control commands using TCP/IP between two PCs. A prototype of active free space optics systems is constructed and employed for experiments of tracking mobile terminals. Results confirm that the proposed techniques enable the free space optics system to pursue the unstable transmission nodes and maintain broadband communication in high quality.
distributed processing of large-scale graph data has many practical applications and has been widely studied. In recent years, a lot of distributed graph processing frameworks and algorithms have been proposed. While ...
详细信息
distributed processing of large-scale graph data has many practical applications and has been widely studied. In recent years, a lot of distributed graph processing frameworks and algorithms have been proposed. While many efforts have been devoted to analyzing these, with most analyzing them based on programming models, less research focuses on understanding their challenges in distributed environments. Applying graph tasks to distributed environments is not easy, often facing numerous challenges through our analysis, including parallelism, load balancing, communication overhead, and bandwidth. In this article, we provide an extensive overview of the current state-of-the-art in this field by outlining the challenges and solutions of distributed graph algorithms. We first conduct a systematic analysis of the inherent challenges in distributed graph processing, followed by presenting an overview of existing general solutions. Subsequently, we survey the challenges highlighted in recent distributed graph processing papers and the strategies adopted to address them. Finally, we discuss the current research trends and identify potential future opportunities.
In this article, we deal with a network of agents that want to cooperatively minimize the sum of local cost functions depending on a common decision variable. We consider the challenging scenario in which objective fu...
详细信息
In this article, we deal with a network of agents that want to cooperatively minimize the sum of local cost functions depending on a common decision variable. We consider the challenging scenario in which objective functions are unknown and agents have only access to local measurements of their local functions. We propose a novel distributed algorithm that combines a recent gradient tracking policy with an extremum seeking technique to estimate the global descent direction. The joint use of these two techniques results in a distributed optimization scheme that provides arbitrarily accurate solution estimates through the combination of Lyapunov and averaging analysis approaches with consensus theory. We perform numerical simulations in a personalized optimization framework to corroborate the theoretical results.
This article tackles spectrum estimation of a linear time-invariant system by a multiagent network using data. We consider a group of agents that communicate over a strongly connected, aperiodic graph and do not have ...
详细信息
This article tackles spectrum estimation of a linear time-invariant system by a multiagent network using data. We consider a group of agents that communicate over a strongly connected, aperiodic graph and do not have any knowledge of the system dynamics. Each agent only measures some signals that are linear functions of the system states or inputs, and does not know the functional form of this dependence. The proposed distributed algorithm consists of two steps that rely on the collected data: first, the identification of an unforced trajectory of the system, and second, the estimation of the coefficients of the characteristic polynomial of the system matrix using this unforced trajectory. We show that each step can be formulated as a problem of finding a common solution to a set of linear algebraic equations, which are amenable to distributed algorithmic solutions. We prove that under mild assumptions on the collected data, when the initial condition of the system is random, the proposed distributed algorithm accurately estimates the spectrum with probability 1.
作者:
Xu, ZhenChen, ZushouWenzhou Univ
Coll Comp Sci & Artificial Intelligence Wenzhou 325006 Peoples R China Wenzhou Univ
Metaverse & Artificial Intelligence Inst Wenzhou 325006 Peoples R China
distributed learning (DL), in which multiple nodes in an inner-connected network collaboratively induce a predictive model using their local data and some information communicated across neighboring nodes, has receive...
详细信息
distributed learning (DL), in which multiple nodes in an inner-connected network collaboratively induce a predictive model using their local data and some information communicated across neighboring nodes, has received significant research interest in recent years. Yet, it is challenging to achieve excellent performance in scenarios when training data instances have incomplete features and ambiguous labels. In such cases, it is essential to develop an efficient method to jointly perform the tasks of missing feature imputation and credible label recovery. Considering this, in this article, a distributed partial label missing data classification (dPMDC) algorithm is proposed. In the proposed algorithm, an integrated framework is formulated, which takes the ideas of both generative and discriminative learning into account. Firstly, by exploiting the weakly supervised information of ambiguous labels, a distributed probabilistic information-theoretic imputation method is designed to distributively fill in the missing features. Secondly, based on the imputed feature vectors, the classifier modeled by the random feature map of the chi(2) kernel function can be learned. Two iterative steps constitute the dPMDC algorithm, which can be used to handle dispersed, distributed data with partially missing features and ambiguous labels. Experiments on several datasets show the superiority of the suggested algorithm from many viewpoints.
暂无评论