检索结果-内蒙古大学图书馆

Cyber Security on the Edge: Efficient Enabling of Machine Learning on IoT Devices

INFORMATION 2024年第3期15卷 126页

作者： Kumari, Swati Tulshyan, Vatsal Tewari, Hitesh Trinity Coll Dublin Sch Comp Sci & Stat Dublin D02 PN40 Ireland Thapar Inst Engn & Technol Patiala 147004 Punjab India

Due to rising cyber threats, IoT devices' security vulnerabilities are expanding. However, these devices cannot run complicated security algorithms locally due to hardware restrictions. Data must be transferred to cloud nodes for processing, giving attackers an entry point. This research investigates distributed computing on the edge, using AI-enabled IoT devices and container orchestration tools to process data in real time at the network edge. The purpose is to identify and mitigate DDoS assaults while minimizing CPU usage to improve security. It compares typical IoT devices with and without AI-enabled chips, container orchestration, and assesses their performance in running machine learning models with different cluster settings. The proposed architecture aims to empower IoT devices to process data locally, minimizing the reliance on cloud transmission and bolstering security in IoT environments. The results correlate with the update in the architecture. With the addition of AI-enabled IoT device and container orchestration, there is a difference of 60% between the new architecture and traditional architecture where only Raspberry Pi were being used.

关键词： IoT cyber threats distributed computing AI-enabled chips container orchestration DDoS attacks

来源：评论

学校读者我要写书评

暂无评论

distributed computing and data storage in proteomics: Many hands make light work, and a stronger memory

引用

PROTEOMICS 2014年第4-5期14卷 367-377页

作者： Verheggen, Kenneth Barsnes, Harald Martens, Lennart VIB Dept Med Prot Res Ghent Belgium Univ Ghent Dept Biochem Fac Med & Hlth Sci B-9000 Ghent Belgium Univ Bergen Dept Biomed Prote Unit N-5020 Bergen Norway

Modern day proteomics generates ever more complex data, causing the requirements on the storage and processing of such data to outgrow the capacity of most desktop computers. To cope with the increased computational demands, distributed architectures have gained substantial popularity in the recent years. In this review, we provide an overview of the current techniques for distributed computing, along with examples of how the techniques are currently being employed in the field of proteomics. We thus underline the benefits of distributed computing in proteomics, while also pointing out the potential issues and pitfalls involved.

关键词： Bioinformatics Cloud computing Crowdsourcing distributed computing Parallelized computing

来源：评论

学校读者我要写书评

暂无评论

THE DIVIDE-AND-CONQUER SEQUENTIAL MONTE CARLO ALGORITHM: THEORETICAL PROPERTIES AND LIMIT THEOREMS

引用

ANNALS OF APPLIED PROBABILITY 2024年第1B期34卷 1469-1523页

作者： Kuntz, Juan Crucinio, Francesca R. Johansen, Adam M. Univ Warwick Dept Stat Coventry W Midlands England Inst Polytech Paris ENSAE CREST Paris France

We provide a comprehensive characterisation of the theoretical properties of the divide-and-conquer sequential Monte Carlo (DaC-SMC) algorithm. We firmly establish it as a well-founded method by showing that it possesses the same basic properties as conventional sequential Monte Carlo (SMC) algorithms do. In particular, we derive pertinent laws of large numbers, Lp inequalities, and central limit theorems;and we characterize the bias in the normalized estimates produced by the algorithm and argue the absence thereof in the unnormalized ones. We further consider its practical implementation and several interesting variants;obtain expressions for its globally and locally optimal intermediate targets, auxiliary measures, and proposal kernels;and show that, in comparable conditions, DaC-SMC proves more statistically efficient than its direct SMC analogue. We close the paper with a discussion of our results, open questions, and future research directions.

关键词： Strong law of large numbers central limit theorem interacting particle systems product-form estimators distributed computing Bayesian inference

来源：评论

学校读者我要写书评

暂无评论

Communication-Efficient Nonparametric Quantile Regression via Random Features

引用

JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS 2024年第4期33卷 1175-1184页

作者： Wang, Caixing Li, Tao Zhang, Xinyi Feng, Xingdong He, Xin Shanghai Univ Finance & Econ Sch Stat & Management Shanghai Peoples R China

This article introduces a refined algorithm designed for distributed nonparametric quantile regression in a reproducing kernel Hilbert space (RKHS). Unlike existing nonparametric approaches that primarily address homogeneous data, our approach uses kernel-based quantile regression to effectively model heterogeneous data. Moreover, we integrate the concepts of random features (RF) and communication-efficient surrogate likelihood (CSL) to ensure accurate estimation and enhance computational efficiency in distributed settings. Specifically, we employ an embedding technique to map the original data into RF spaces, enabling us to construct an extended surrogate loss function. This function can be locally optimized using an iterative alternating direction method of multipliers (ADMM) algorithm, minimizing the need for extensive computation and communication within the distributed system. The article thoroughly investigates the asymptotic properties of the distributed estimator and provides convergence rates of the excess risk. These properties are established under mild technical conditions and are comparable to state-of-the-art results in the literature. Additionally, we validate the effectiveness of the proposed algorithm through a comprehensive set of synthetic examples and a real data study, effectively highlighting its advantages and practical utility. Supplementary materials for this article are available online.

关键词： ADMM distributed computing distributed estimation Reproducing kernel Hilbert space

来源：评论

学校读者我要写书评

暂无评论

Interoperable and scalable echosounder data processing with Echopype

引用

ICES JOURNAL OF MARINE SCIENCE 2024年第10期81卷 1941-1951页

作者： Lee, Wu-Jung Setiawan, Landung Tuguinay, Caesar Mayorga, Emilio Staneva, Valentina Univ Washington Appl Phys Lab Seattle WA 98105 USA Univ Washington eSci Inst Seattle WA 98105 USA

Echosounders are high-frequency sonar systems used to sense fish and zooplankton underwater. Their deployment on a variety of ocean observing platforms is generating vast amounts of data at an unprecedented speed from the oceans. Efficient and integrative analysis of these data, whether across different echosounder instruments or in combination with other oceanographic datasets, is crucial for understanding marine ecosystem response to the rapidly changing climate. Here we present Echopype, an open-source Python software library designed to address this need. By standardizing data as labeled, multi-dimensional arrays encoded in the widely embraced netCDF data model following a community convention, Echopype enhances the interoperability of echosounder data, making it easier to explore and use. By leveraging scientific Python libraries optimized for distributed computing, Echopype achieves computational scalability, enabling efficient processing in both local and cloud computing environments. Echopype's modularized package structure further provides a unified framework for expanding support for additional instrument raw data formats and incorporating new analysis functionalities. We plan to continue developing Echopype by supporting and collaborating with the echosounder user community, and envision that the growth of this package will catalyze the integration of echosounder data into broader regional and global ocean observation strategies.

关键词： echosounder fisheries acoustics water column sonar data data standardization distributed computing cloud computing open-source software

来源：评论

学校读者我要写书评

暂无评论

Performance Improvement of distributed computing Framework and Scientific Big Data Analysis

Performance Improvement of Distributed Computing Framework a...

引用

作者： Kondikoppa, Praveenkumar Louisiana State University

学位级别：博士

Analysis of Big data to gain better insights has been the focus of researchers in the recent past. Traditional desktop computers or database management systems may not be suitable for efficient and timely analysis, due to the requirement of massive parallel processing. distributed computing frameworks are being explored as a viable solution. For example, Google proposed MapReduce, which is becoming a de facto computing architecture for Big data solutions. However, scheduling in MapReduce is coarse grained and remains as a challenge for improvement. Related with MapReduce scheduler when configured over distributed clusters, we identify two issues: data locality disruption and random assignment of non-local map tasks. We propose a network aware scheduler to extend the existing rack awareness. The tasks are scheduled in the order of node, rack and any other rack within the same cluster to achieve cluster level data locality. The issue of random assignment non-local map tasks is handled by enhancing the scheduler to consider the network parameters, such as delay, bandwidth and packet loss between remote clusters. As part of Big data analysis at computational biology, we consider two major data intensive applications: indexing genome sequences and de Novo assembly. Both of these applications deal with the massive amount data generated from DNA sequencers. We developed a scalable algorithm to construct sub-trees of a suffix tree in parallel to address huge memory requirements needed for indexing the human genome. For the de Novo assembly, we propose Parallel Giraph based Assembler (PGA) to address the challenges associated with the assembly of large genomes over commodity hardware. PGA uses the de Bruijn graph to represent the data generated from sequencers. Huge memory demands and performance expectations are addressed by developing parallel algorithms based on the distributed graph-processing framework, Apache Giraph.

关键词： MapReduce Hadoop Scheduler Suffix Tree distributed computing Indexing Genome Parallel Assembler Apache Giraph Pregel PGA

来源：评论

学校读者我要写书评

暂无评论

distributed identification of heterogeneous treatment effects

引用

COMPUTATIONAL STATISTICS 2022年第1期37卷 57-89页

作者： Zhang, Shuang Feng, Xingdong Shanghai Univ Finance & Econ Sch Stat & Management 777 Guoding Rd Shanghai 200433 Peoples R China Shanghai Univ Finance & Econ Inst Data Sci & Stat 777 Guoding Rd Shanghai 200433 Peoples R China

In many areas including precise medical treatments and financial investments, analysis of heterogeneous treatment effects has become important. In this paper, we focus on identifying subgroups by combining data in a distributed storage system. We propose a distributed algorithm based on the alternating direction method of multipliers, which can well preserve privacy of subjects. This method can deal with large-scale data, and perform well in identifying subgroups if there exist sufficient samples in a whole distributed storage system but not necessarily in every computing node. Our numerical study indicates that the proposed method is promising in many interesting cases.

关键词： Alternating direction method of multipliers distributed computing Privacy preservation Subgroup analysis

来源：评论

学校读者我要写书评

暂无评论

Coded Federated Learning for Communication-Efficient Edge computing: A Survey

IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY

引用

IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY 2024年 5卷 4098-4124页

作者： Zhang, Yiqian Gao, Tianli Li, Congduan Tan, Chee Wei Sun Yat Sen Univ Sch Elect & Commun Engn Shenzhen 518107 Peoples R China Guizhou Univ State Key Lab Publ Big Data Guiyang 550025 Peoples R China Sun Yat Sen Univ Shenzhen Key Lab Nav & Commun Integrat Shenzhen 518107 Peoples R China Nanyang Technol Univ Sch Comp Sci & Engn Singapore 639815 Singapore

In the era of artificial intelligence and big data, the demand for data processing has surged, leading to larger datasets and computation capability. distributed machine learning (DML) has been introduced to address this challenge by distributing tasks among multiple workers, reducing the resources required for each worker. However, in distributed systems, the presence of slow machines, commonly known as stragglers, or failed links can lead to prolonged runtimes and diminished performance. This survey explores the application of coding techniques in DML and coded edge computing in the distributed system to enhance system speed, robustness, privacy, and more. Notably, the study delves into coding in Federated Learning (FL), a specialized distributed learning system. Coding involves introducing redundancy into the system and identifying multicast opportunities. There exists a tradeoff between computation and communication costs. The survey establishes that coding is a promising approach for building robust and secure distributed systems with low latency.

关键词： Encoding Servers Computational modeling Training Federated learning Data models Surveys Coding distributed machine learning federated learning distributed computing edge computing

来源：评论

学校读者我要写书评

暂无评论

Performance-Aware Big Data Management for Remote Sensing Systems

引用

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2024年第3期49卷 3845-3865页

作者： Pekturk, Mustafa Kemal Unal, Muhammet Gokcen, Hadi ESEN Syst & Integrat Remote Sensing & Image Proc Dept Ankara Turkiye Gazi Univ Dept Comp Engn Ankara Turkiye Gazi Univ Dept Ind Engn Ankara Turkiye

Remote sensing data, whose dimensions increase exponentially and turn into big data with the new technologies, cause significant difficulties in transferring, storing, and processing because of consisting of gigantic coarse-grained files. This article proposes a novel two-phase big data management system on the geo-distributed private cloud that takes advantage of network topology and resource utilization in a distributed manner. The system optimizes resource allocation to facilitate efficient and extensive data analysis for remote sensing applications by minimizing file fragmentation, resulting in faster analysis. In order to simulate the proposed system, different network topologies are created using virtual machines. Moreover, the proposed method named performance-aware assignment is compared with well-known methods such as random assignment, Hungarian algorithm, and Hadoop distributed File System, also famous in the big data era. The experimental results indicate that performance-aware assignment outperforms random assignment, Hungarian algorithm, and Hadoop distributed File System, achieving 36%, 26%, and 71% more stored data, respectively, within the same time while also exhibiting lower IOPS values. In addition, it optimizes resource usage in data centers, which is particularly important for preventing resource exhaustion.

关键词： Big data analytics Remote sensing distributed computing Geo-distributed cloud Optimization model Lagrange relaxation

来源：评论

学校读者我要写书评

暂无评论

A Novel distributed Stack Ensembled Meta-Learning-Based Optimized Classification Framework for Real-time Prolific IoT Traffic Streams

引用

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING 2022年第8期47卷 9907-9930页

作者： Snehi, Manish Bhandari, Abhinav Punjabi Univ Comp Sci & Engn Patiala Punjab India

The concurrence of state-of-the-art Industrial 5G, Cyber-Physical Systems, Smart-Systems, Industrial Internet of Things, and Additive Manufacturing paves the next-level digital remodeling. However, the transfiguration unwittingly tailpiece an operational onus on the smart-environment operators. The multiplicity and classes of IoT devices operating in the intelligent environment are myriad. The characterization of ingress network traffic and the accurate classification of devices is necessary to efficiently manage the devices and offer cutting-edge security solutions and quality of Service (QoS). The paper addresses these challenges by offering a novel intelligent framework for traffic classification leveraging behavioral attributes of IoT traffic. The paper's contributions to the research community are fourfold. Firstly, the paper proposes a novel IoT classification framework based on Stack-Ensemble for real-time high-volume IoT traffic. The experimental results indicate that the proposed novel Stack Ensemble model can extract the best out of base models and demonstrate an accuracy of 99.94%. The intelligent models are evaluated over multiple dimensions to project the isometric view of the model performance and the experimental results. To achieve that goal, all the performance metrics that most researchers most often miss have been elucidated. Secondly, the paper comprehends the flow-level statistical characteristics of IoT devices. Third, the paper offers the distributed, scalable, and portable framework architecture. The architecture is horizontally scalable, distributing the computational load. The framework offers an end-to-end industry-grade machine-learning pipeline and triumphs the vulnerabilities of existing solutions. Finally, the paper discusses the statistical insights into the intelligent model and the results of the experimentation study. The proposed work paves the opportunities for researchers, smart-environment operators, and developers to unfold th

关键词： Classification Deep learning Docker distributed computing H2O Internet of Things IoT Machine learning Stack ensemble

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：