检索结果-内蒙古大学图书馆

A web-based sensor network system with distributed data processing approach via web application

COMPUTER STANDARDS & INTERFACES 2011年第6期33卷 565-573页

作者： Fukatsu, Tokihiro Kiura, Takuji Hirafuji, Masayuki Natl Agr & Food Res Org Natl Agr Res Ctr Tsukuba Ibaraki 3058666 Japan

We have proposed a Web-based sensor network constructed of Web-based sensor nodes and a remote management system. The Web-based sensor nodes consist of communication units and measurement devices with Web servers. The management system has intelligent processing and rule-based function to manage them flexibly via the Internet and performs various image analyses easily with Web application services. By distributing the image analyses to Web application services, our proposed system provides versatile and scalable data processing. We demonstrated that it can realize the desired image analyses effectively and perform complicated management by changing its operations depending on the results of analysis. (C) 2011 Elsevier B.V. All rights reserved.

关键词： Sensor network Image monitoring Agriculture Web application distributed data processing

来源：评论

学校读者我要写书评

暂无评论

Dynamic Collaboration of Centralized & Edge processing for Coordinated data Management in an IoT Paradigm 32

Dynamic Collaboration of Centralized & Edge Processing for C...

引用

32nd IEEE International Conference on Advanced Information Networking and Applications (AINA)

作者： Young, Roger Fallon, Sheila Jacob, Paul Athlone Inst Technol Software Res Inst Athlone Westmeath Ireland

ISBN: (纸本)9781538621950

Over the past decade, much focus in the area of Technology has deviated towards two relatively new areas;"The Internet of Things" and "Machine Learning". Although completely separate technologies, they have one major factor in common, data. The IoT paradigm relies on sensor devices to ingest data and gain valuable insight on their surrounding environment. data is often considered the newest natural resource. Analysing data instantaneously can give companies a leading edge in their market. Machine learning algorithms are helping companies achieve this feat in the most efficient way possible. In this paper, we propose a governance architecture for dynamic distributed data mining, utilizing a flow based programming inspired model. We illustrate a collaborative protocol between edge devices and central controllers where computation and distribution may be driven by factors including hardware limitations, latency, or energy consumption. Our proposed architecture is evaluated in a connected vehicle use case. To demonstrate the feasibility of our work, we present two scenarios;local real-time prediction of driver alertness, and task/computation offloading based on CPU usage of the edge device.

关键词： distributed data processing Edge Computing Apache Nifi Apache Minifi Internet of Things

来源：评论

学校读者我要写书评

暂无评论

Spark-parSketch: A Massively distributed Indexing of Time Series datasets 18

Spark-parSketch: A Massively Distributed Indexing of Time Se...

引用

27th ACM International Conference on Information and Knowledge Management (CIKM)

作者： Levchenko, Oleksandra Yagoubi, Djamel-Edine Akbarinia, Reza Masseglia, Florent Kolev, Boyan Shasha, Dennis INRIA Montpellier France LIRMM Montpellier France NYU Dept Comp Sc New York NY 10003 USA

ISBN: (纸本)9781450360142

A growing number of domains (finance, seismology, internet-of-things, etc.) collect massive time series. When the number of series grow to the hundreds of millions or even billions, similarity queries become intractable on a single machine. Further, naive (quadratic) parallelization won't work well. So, we need both efficient indexing and parallelization. We propose a demonstration of Spark-parSketch, a complete solution based on sketches /random projections to efficiently perform both the parallel indexing of large sets of time series and a similarity search on them. Because our method is approximate, we explore the tradeoff between time and precision. A video showing the dynamics of the demonstration can be found by the link http://***/video/parSketchdemo_***.

关键词： time series indexing similarity search distributed data processing Spark

来源：评论

学校读者我要写书评

暂无评论

Meta-dataflows: Efficient Exploratory dataflow Jobs 18

Meta-Dataflows: Efficient Exploratory Dataflow Jobs

引用

44th ACM SIGMOD International Conference on Management of data

作者： Fernandez, Raul Castro Culhane, William Watcharapichat, Pijika Weidlich, Matthias Morales, Victoria Lopez Pietzuch, Peter MIT Cambridge MA 02139 USA Imperial Coll London London England Humboldt Univ Berlin Germany

ISBN: (纸本)9781450317436

distributed dataflow systems such as Apache Spark and Apache Flink are used to derive new insights from large datasets. While they efficiently execute concrete data processing workflows, expressed as dataflow graphs, they lack generic support for exploratory workflows: if a user is uncertain about the correct processing pipeline, e.g. in terms of data cleaning strategy or choice of model parameters, they must repeatedly submit modified jobs to the system. This, however, misses out on optimisation opportunities for exploratory workflows, both in terms of scheduling and memory allocation. We describe meta-dataflows (MDFs), a new model to effectively express exploratory workflows and efficiently execute them on compute clusters. With MDFs, users specify a family of dataflows using two primitives: (a) an explore operator automatically considers choices in a dataflow;and (b) a choose operator assesses the result quality of explored dataflow branches and selects a subset of the results. We propose optimisations to execute MDFs: a system can (i) avoid redundant computation when exploring branches by reusing intermediate results and discarding results from underperforming branches;and (ii) consider future data access patterns in the MDF when allocating cluster memory. Our evaluation shows that MDFs improve the runtime of exploratory workflows by up to 90% compared to sequential execution.

关键词： distributed data processing parameter space exploration exploratory workflows parallel data processing dataflow

来源：评论

学校读者我要写书评

暂无评论

Scalable Construction of Text Indexes with Thrill

Scalable Construction of Text Indexes with Thrill

引用

IEEE International Conference on Big data (Big data)

作者： Bingmann, Timo Gog, Simon Kurpicz, Florian Karlsruhe Inst Technol Inst Theoret Informat Karlsruhe Germany Tech Univ Dortmund Dept Comp Sci Dortmund Germany

ISBN: (纸本)9781538650356

The suffix array is the key to efficient solutions for myriads of string processing problems in different application domains, like data compression, data mining, or bioinformatics. With the rapid growth of available data, suffix array construction algorithms have to be adapted to advanced computational models such as external memory and distributed computing. In this article, we present five suffix array construction algorithms utilizing the new algorithmic big data batch processing framework Thrill, which allows scalable processing of input sizes on distributed systems in orders of magnitude that have not been considered before.

关键词： suffix array C plus big data tool distributed data processing

来源：评论

学校读者我要写书评

暂无评论

Technology for Designing Tools for the Process and Analysis of data from Very Large Scale distributed Satellite Archives

引用

ATMOSPHERIC AND OCEANIC OPTICS 2017年第1期30卷 84-88页

作者： Kashnitskii, A. V. Lupyan, E. A. Balashov, I. V. Konstantinova, A. M. Russian Acad Sci Space Res Inst Moscow 117997 Russia

The rapid increase in the volume of Earth satellite observation data over recent years makes more necessary the problem of developing new technologies for effective data search, selection, and processing within very large constantly updated distributed archives. The paper describes the features of such technologies developed at the Space Research Institute, Russian Academy of Sciences (IKI RAS). These techniques provide the design of various data processing tools for satellite data analysis with the use of distributed computing resources of remote sensing data processing and archiving centers. Advantages and capabilities of the approaches suggested are described, as well as examples of implemented tools for distributed processing of data from various satellite remote sensing systems. The examples given show the capabilities of using the tools for the analysis of various atmospheric and ocean surface phenomena.

关键词： remote sensing information systems distributed data processing satellite data processing very large data archives data management technologies ocean and atmosphere remote observation approaches

来源：评论

学校读者我要写书评

暂无评论

On Fault Tolerance for distributed Iterative dataflow processing

引用

IEEE TRANSACTIONS ON KNOWLEDGE AND data ENGINEERING 2017年第8期29卷 1709-1722页

作者： Xu, Chen Holzemer, Markus Kaul, Manohar Soto, Juan Markl, Volker TU Berlin Database Syst & Informat Management DIMA Grp D-10623 Berlin Germany TU Berlin D-10623 Berlin Germany TU Berlin DIMA Grp D-10623 Berlin Germany Acellere D-60325 Frankfurt Hessen Germany IIT Hyderabad Comp Sci & Engn Dept Telangana 502285 India

Large-scale graph and machine learning analytics widely employ distributed iterative processing. Typically, these analytics are a part of a comprehensive workflow, which includes data preparation, model building, and model evaluation. General-purpose distributed dataflow frameworks execute all steps of such workflows holistically. This holistic view enables these systems to reason about and automatically optimize the entire pipeline. Here, graph and machine learning analytics are known to incur a long runtime since they require multiple passes over the data until convergence is reached. Thus, fault tolerance and a fast-recovery from any intermittent failure is critical for efficient analysis. In this paper, we propose novel fault-tolerant mechanisms for graph and machine learning analytics that run on distributed dataflow systems. We seek to reduce checkpointing costs and shorten failure recovery times. For graph processing, rather than writing checkpoints that block downstream operators, our mechanism writes checkpoints in an unblocking manner that does not break pipelined tasks. In contrast to the conventional approach for unblocking checkpointing (e.g., that manage checkpoints independently for immutable datasets), we inject the checkpoints of mutable datasets into the iterative dataflow itself. Hence, our mechanism is iteration-aware by design. This simplifies the system architecture and facilitates coordinating checkpoint creation during iterative graph processing. Moreover, we are able to rapidly rebound, via confined recovery, by exploiting the fact that log files exist locally on healthy nodes and managing to avoid a complete recomputation from scratch. In addition, we propose replica recovery for machine learning algorithms, whereby we employ a broadcast variable that enables us to quickly recover without having to introduce any checkpoints. In order to evaluate our fault tolerance strategies, we conduct both a theoretical study and experimental analyses us

关键词： Fault tolerance distributed data processing iterative computation graph processing machine learning analytics

来源：评论

学校读者我要写书评

暂无评论

Enabling Semantics within Industry 4.0 8th

Enabling Semantics within Industry 4.0

引用

8th International Conference on Industrial Applications of Holonic and Multi-Agent Systems (HoloMAS)

作者： Jirkovsky, Vaclav Obitko, Marek Czech Tech Univ Czech Inst Robot Informat & Cybernet Zikova 4 Prague 16636 Czech Republic Rockwell Automat R&D Ctr Argentinska 1610-4 Prague 17000 Czech Republic

ISBN: (纸本)9783319646350;9783319646343

Manufacturing faces increasing requirements from customers which causes the need of exploiting emerging technologies and trends for preserving competitive advantages. The apriori announced fourth industrial revolution (also known as Industry 4.0) is represented mainly by an employment of Internet technologies into industry. The essential requirement is the proper understanding of given CPS (one of the key component of Industry 4.0) data models together with a utilization of knowledge coming from various systems across a factory as well as an external data sources. The suitable solution for data integration problem is an employment of Semantic Web Technologies and the model description in ontologies. However, one of the obstacles to the wider use of the Semantic Web technologies including the use in the industrial automation domain is mainly insufficient performance of available triplestores. Thus, on so called Semantic Big data Historian use case we are proposing the usage of state of the art distributed data storage. We discuss the approach to data storing and describe our proposed hybrid data model which is suitable for representing time series (sensor measurements) with added semantics. Our results demonstrate a possible way to allow higher performance distributed analysis of data from industrial domain.

关键词： Industry 4.0 Ontology Triplestore Big data distributed data processing Historian

来源：评论

学校读者我要写书评

暂无评论

Generic and Concurrent Computation of Belief Combination Rules 1

引用

5th International Conference on data Management Technologies and Applications (data)

作者： Dambreville, Frederic Ensta Bretagne CNRS UMR 6285 DGA MILab STICC 2 Rue Francois Verny Brest France

ISBN: (数字)9783319629117

ISBN: (纸本)9783319629117;9783319629100

As a form of random set, belief functions come with specific semantic and combination rule able to perform the representation and the fusion of uncertain and imprecise informations. The development of new combination rules able to manage conflict between data now offers a variety of tools for robust combination of piece of data from a database. The computation of multiple combinations from many querying cases in a database make necessary the development of efficient approach for concurrent belief computation. The approach should be generic in order to handle a variety of fusion rules. We present a generic implementation based on a map-reduce paradigm. An enhancement of this implementation is then proposed by means of a Markovian decomposition of the rule definition. At last, comparative results are presented for these implementations within the frameworks Apache Spark and Apache Flink.

关键词： Map-reduce distributed data processing Belief functions Combination rules Statistics

来源：评论

学校读者我要写书评

暂无评论

When to Use a distributed dataflow Engine: Evaluating the Performance of Apache Flink 13

When to Use a Distributed Dataflow Engine: Evaluating the Pe...

引用

Conference on UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld

作者： Verbitskiy, Ilya Thamsen, Lauritz Kao, Odej Tech Univ Berlin Berlin Germany

ISBN: (纸本)9781509027712

With the increasing amount of available data, distributed data processing systems like Apache Flink and Apache Spark have emerged that allow to analyze large-scale datasets. However, such engines introduce significant computational overhead compared to non-distributed implementations. Therefore, the question arises when using a distributed processing approach is actually beneficial. This paper helps to answer this question with an evaluation of the performance of the distributed data processing framework Apache Flink. In particular, we compare Apache Flink executed on up to 50 cluster nodes to single-threaded implementations executed on a typical laptop for three different benchmarks: TPC-H Query 10, Connected Components, and Gradient Descent. The evaluation shows that the performance of Apache Flink is highly problem dependent and varies from early outperformance in case of TPC-H Query 10 to slower runtimes in case of Connected Components. The reported results give hints for which problems, input sizes, and cluster resources using a distributed data processing system like Apache Flink or Apache Spark is sensible.

关键词： Apache Flink COST distributed data processing Parallel dataflows Performance evaluation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：