Due to rising cyber threats, IoT devices' security vulnerabilities are expanding. However, these devices cannot run complicated security algorithms locally due to hardware restrictions. Data must be transferred to...
详细信息
Due to rising cyber threats, IoT devices' security vulnerabilities are expanding. However, these devices cannot run complicated security algorithms locally due to hardware restrictions. Data must be transferred to cloud nodes for processing, giving attackers an entry point. This research investigates distributed computing on the edge, using AI-enabled IoT devices and container orchestration tools to process data in real time at the network edge. The purpose is to identify and mitigate DDoS assaults while minimizing CPU usage to improve security. It compares typical IoT devices with and without AI-enabled chips, container orchestration, and assesses their performance in running machine learning models with different cluster settings. The proposed architecture aims to empower IoT devices to process data locally, minimizing the reliance on cloud transmission and bolstering security in IoT environments. The results correlate with the update in the architecture. With the addition of AI-enabled IoT device and container orchestration, there is a difference of 60% between the new architecture and traditional architecture where only Raspberry Pi were being used.
Modern day proteomics generates ever more complex data, causing the requirements on the storage and processing of such data to outgrow the capacity of most desktop computers. To cope with the increased computational d...
详细信息
Modern day proteomics generates ever more complex data, causing the requirements on the storage and processing of such data to outgrow the capacity of most desktop computers. To cope with the increased computational demands, distributed architectures have gained substantial popularity in the recent years. In this review, we provide an overview of the current techniques for distributed computing, along with examples of how the techniques are currently being employed in the field of proteomics. We thus underline the benefits of distributed computing in proteomics, while also pointing out the potential issues and pitfalls involved.
We provide a comprehensive characterisation of the theoretical properties of the divide-and-conquer sequential Monte Carlo (DaC-SMC) algorithm. We firmly establish it as a well-founded method by showing that it posses...
详细信息
We provide a comprehensive characterisation of the theoretical properties of the divide-and-conquer sequential Monte Carlo (DaC-SMC) algorithm. We firmly establish it as a well-founded method by showing that it possesses the same basic properties as conventional sequential Monte Carlo (SMC) algorithms do. In particular, we derive pertinent laws of large numbers, Lp inequalities, and central limit theorems;and we characterize the bias in the normalized estimates produced by the algorithm and argue the absence thereof in the unnormalized ones. We further consider its practical implementation and several interesting variants;obtain expressions for its globally and locally optimal intermediate targets, auxiliary measures, and proposal kernels;and show that, in comparable conditions, DaC-SMC proves more statistically efficient than its direct SMC analogue. We close the paper with a discussion of our results, open questions, and future research directions.
This article introduces a refined algorithm designed for distributed nonparametric quantile regression in a reproducing kernel Hilbert space (RKHS). Unlike existing nonparametric approaches that primarily address homo...
详细信息
This article introduces a refined algorithm designed for distributed nonparametric quantile regression in a reproducing kernel Hilbert space (RKHS). Unlike existing nonparametric approaches that primarily address homogeneous data, our approach uses kernel-based quantile regression to effectively model heterogeneous data. Moreover, we integrate the concepts of random features (RF) and communication-efficient surrogate likelihood (CSL) to ensure accurate estimation and enhance computational efficiency in distributed settings. Specifically, we employ an embedding technique to map the original data into RF spaces, enabling us to construct an extended surrogate loss function. This function can be locally optimized using an iterative alternating direction method of multipliers (ADMM) algorithm, minimizing the need for extensive computation and communication within the distributed system. The article thoroughly investigates the asymptotic properties of the distributed estimator and provides convergence rates of the excess risk. These properties are established under mild technical conditions and are comparable to state-of-the-art results in the literature. Additionally, we validate the effectiveness of the proposed algorithm through a comprehensive set of synthetic examples and a real data study, effectively highlighting its advantages and practical utility. Supplementary materials for this article are available online.
Echosounders are high-frequency sonar systems used to sense fish and zooplankton underwater. Their deployment on a variety of ocean observing platforms is generating vast amounts of data at an unprecedented speed from...
详细信息
Echosounders are high-frequency sonar systems used to sense fish and zooplankton underwater. Their deployment on a variety of ocean observing platforms is generating vast amounts of data at an unprecedented speed from the oceans. Efficient and integrative analysis of these data, whether across different echosounder instruments or in combination with other oceanographic datasets, is crucial for understanding marine ecosystem response to the rapidly changing climate. Here we present Echopype, an open-source Python software library designed to address this need. By standardizing data as labeled, multi-dimensional arrays encoded in the widely embraced netCDF data model following a community convention, Echopype enhances the interoperability of echosounder data, making it easier to explore and use. By leveraging scientific Python libraries optimized for distributed computing, Echopype achieves computational scalability, enabling efficient processing in both local and cloud computing environments. Echopype's modularized package structure further provides a unified framework for expanding support for additional instrument raw data formats and incorporating new analysis functionalities. We plan to continue developing Echopype by supporting and collaborating with the echosounder user community, and envision that the growth of this package will catalyze the integration of echosounder data into broader regional and global ocean observation strategies.
Analysis of Big data to gain better insights has been the focus of researchers in the recent past. Traditional desktop computers or database management systems may not be suitable for efficient and timely analysis, du...
详细信息
Analysis of Big data to gain better insights has been the focus of researchers in the recent past. Traditional desktop computers or database management systems may not be suitable for efficient and timely analysis, due to the requirement of massive parallel processing. distributed computing frameworks are being explored as a viable solution. For example, Google proposed MapReduce, which is becoming a de facto computing architecture for Big data solutions. However, scheduling in MapReduce is coarse grained and remains as a challenge for improvement. Related with MapReduce scheduler when configured over distributed clusters, we identify two issues: data locality disruption and random assignment of non-local map tasks. We propose a network aware scheduler to extend the existing rack awareness. The tasks are scheduled in the order of node, rack and any other rack within the same cluster to achieve cluster level data locality. The issue of random assignment non-local map tasks is handled by enhancing the scheduler to consider the network parameters, such as delay, bandwidth and packet loss between remote clusters. As part of Big data analysis at computational biology, we consider two major data intensive applications: indexing genome sequences and de Novo assembly. Both of these applications deal with the massive amount data generated from DNA sequencers. We developed a scalable algorithm to construct sub-trees of a suffix tree in parallel to address huge memory requirements needed for indexing the human genome. For the de Novo assembly, we propose Parallel Giraph based Assembler (PGA) to address the challenges associated with the assembly of large genomes over commodity hardware. PGA uses the de Bruijn graph to represent the data generated from sequencers. Huge memory demands and performance expectations are addressed by developing parallel algorithms based on the distributed graph-processing framework, Apache Giraph.
In many areas including precise medical treatments and financial investments, analysis of heterogeneous treatment effects has become important. In this paper, we focus on identifying subgroups by combining data in a d...
详细信息
In many areas including precise medical treatments and financial investments, analysis of heterogeneous treatment effects has become important. In this paper, we focus on identifying subgroups by combining data in a distributed storage system. We propose a distributed algorithm based on the alternating direction method of multipliers, which can well preserve privacy of subjects. This method can deal with large-scale data, and perform well in identifying subgroups if there exist sufficient samples in a whole distributed storage system but not necessarily in every computing node. Our numerical study indicates that the proposed method is promising in many interesting cases.
In the era of artificial intelligence and big data, the demand for data processing has surged, leading to larger datasets and computation capability. distributed machine learning (DML) has been introduced to address t...
详细信息
In the era of artificial intelligence and big data, the demand for data processing has surged, leading to larger datasets and computation capability. distributed machine learning (DML) has been introduced to address this challenge by distributing tasks among multiple workers, reducing the resources required for each worker. However, in distributed systems, the presence of slow machines, commonly known as stragglers, or failed links can lead to prolonged runtimes and diminished performance. This survey explores the application of coding techniques in DML and coded edge computing in the distributed system to enhance system speed, robustness, privacy, and more. Notably, the study delves into coding in Federated Learning (FL), a specialized distributed learning system. Coding involves introducing redundancy into the system and identifying multicast opportunities. There exists a tradeoff between computation and communication costs. The survey establishes that coding is a promising approach for building robust and secure distributed systems with low latency.
Remote sensing data, whose dimensions increase exponentially and turn into big data with the new technologies, cause significant difficulties in transferring, storing, and processing because of consisting of gigantic ...
详细信息
Remote sensing data, whose dimensions increase exponentially and turn into big data with the new technologies, cause significant difficulties in transferring, storing, and processing because of consisting of gigantic coarse-grained files. This article proposes a novel two-phase big data management system on the geo-distributed private cloud that takes advantage of network topology and resource utilization in a distributed manner. The system optimizes resource allocation to facilitate efficient and extensive data analysis for remote sensing applications by minimizing file fragmentation, resulting in faster analysis. In order to simulate the proposed system, different network topologies are created using virtual machines. Moreover, the proposed method named performance-aware assignment is compared with well-known methods such as random assignment, Hungarian algorithm, and Hadoop distributed File System, also famous in the big data era. The experimental results indicate that performance-aware assignment outperforms random assignment, Hungarian algorithm, and Hadoop distributed File System, achieving 36%, 26%, and 71% more stored data, respectively, within the same time while also exhibiting lower IOPS values. In addition, it optimizes resource usage in data centers, which is particularly important for preventing resource exhaustion.
The concurrence of state-of-the-art Industrial 5G, Cyber-Physical Systems, Smart-Systems, Industrial Internet of Things, and Additive Manufacturing paves the next-level digital remodeling. However, the transfiguration...
详细信息
The concurrence of state-of-the-art Industrial 5G, Cyber-Physical Systems, Smart-Systems, Industrial Internet of Things, and Additive Manufacturing paves the next-level digital remodeling. However, the transfiguration unwittingly tailpiece an operational onus on the smart-environment operators. The multiplicity and classes of IoT devices operating in the intelligent environment are myriad. The characterization of ingress network traffic and the accurate classification of devices is necessary to efficiently manage the devices and offer cutting-edge security solutions and quality of Service (QoS). The paper addresses these challenges by offering a novel intelligent framework for traffic classification leveraging behavioral attributes of IoT traffic. The paper's contributions to the research community are fourfold. Firstly, the paper proposes a novel IoT classification framework based on Stack-Ensemble for real-time high-volume IoT traffic. The experimental results indicate that the proposed novel Stack Ensemble model can extract the best out of base models and demonstrate an accuracy of 99.94%. The intelligent models are evaluated over multiple dimensions to project the isometric view of the model performance and the experimental results. To achieve that goal, all the performance metrics that most researchers most often miss have been elucidated. Secondly, the paper comprehends the flow-level statistical characteristics of IoT devices. Third, the paper offers the distributed, scalable, and portable framework architecture. The architecture is horizontally scalable, distributing the computational load. The framework offers an end-to-end industry-grade machine-learning pipeline and triumphs the vulnerabilities of existing solutions. Finally, the paper discusses the statistical insights into the intelligent model and the results of the experimentation study. The proposed work paves the opportunities for researchers, smart-environment operators, and developers to unfold th
暂无评论