This paper considers the subspace clustering problem in a decentralized setting. The core algorithm finds directions of novelty in the span of the data to identify the membership of a collection of distributed data po...
详细信息
ISBN:
(纸本)9781509045518
This paper considers the subspace clustering problem in a decentralized setting. The core algorithm finds directions of novelty in the span of the data to identify the membership of a collection of distributed data points. The low rank structure of the full M 1 × M 2 data matrix D is exploited to substantially reduce the processing and communication overhead. Two decentralized designs are presented. In the first design, each agent/sensor sends a compressed version of its data vector to a central processing unit, which applies the clustering algorithm to the compressed data vectors. In the second design, only a small random subset of the agents send their compressed data vectors to the central unit and the clustering algorithm is applied to the sampled compressed data vectors. It is shown that the gain in communication overhead of the first and second decentralized designs relative to the centralized solution is of order O(M 1 /r) and O(M 1 M 2 /r 2 +M 2 ), respectively, where r is the rank of D.
the recent development of Information and Communications Technology (ICT) has contributed to an explosive growth of high-volume, high-velocity and high-variety information assets. Consequently, the concept of Big Data...
详细信息
ISBN:
(纸本)9781467399562
the recent development of Information and Communications Technology (ICT) has contributed to an explosive growth of high-volume, high-velocity and high-variety information assets. Consequently, the concept of Big Data has emerged as a widely recognized trend. In order to accommodate these data-intensive services, a sustainable and resilient network infrastructure is essentially needed, as well as some mission-critical transactions are to be assured in the big data environments. However, there is an inherent contradiction involved in simultaneously accommodating both service resilience by over-provision of network resources as well as sustainable transmission by switching off the unnecessary network elements to minimize the overall energy consumption. In this paper, we propose a sustainable transmission network design (SustainMe) approach to enable the routing algorithms seeking the trade-off solutions between service resiliency and energy efficiency. The simulation results have confirmed that SustainMe approach is feasible and its derived routing algorithm for shared backup protection, i.e., SustainMe_SBP is a promising mechanism to resolve above trade-off problem. It consumes much less capacity by only sacrificing a small increase on energy expenditure comparing with other approaches.
The kNN join problem, denoted by R × KNN S, is to find the k nearest neighbors from a given dataset S for each point in the query set R. It is an operation required by many big data applications. As large volume...
详细信息
ISBN:
(纸本)9781509053827
The kNN join problem, denoted by R × KNN S, is to find the k nearest neighbors from a given dataset S for each point in the query set R. It is an operation required by many big data applications. As large volume of data are continuously generated in more and more real-life cases, we address the problem of monitoring kNN join results on data streams. Specifically, we are concerned with answering kNN join periodically at each snapshot which is called snapshot kNN join. Existing kNN join solutions mainly solve the problem on static datasets, or on a single centralized machine, which are difficult to scale to large data on data streams. In this paper, we propose to incrementally calculate the kNN join results of time t i from the results of snapshot t i-1 . Typically, for the data continuously generated on the data stream, we can get S i = S i-1 + ΔS i for the valid datasets of adjacent snapshots, where ΔS i denotes the updated points between time t i-1 and t i . Our basic idea is to first find the queries in R whose kNN results can be affected by the updated points in ΔS i , and then update the kNN results of these small part of queries respectively. In this way, we can avoid calculating the kNN join results on the whole dataset S i in time t i . We propose an implementation of searching for affected query points in MapReduce to scale to large volume of data. In brief, the mappers partition the datasets into groups, and the reducers search for affected queries separately on each group of points. Furthermore, we present the enhanced strategies of data partitioning and grouping to reduce the shuffling cost and computational cost. Extensive experiments on real-world datasets demonstrate that proposed methods are efficient, robust, and scalable.
This paper aims at realization of the Automated Teller Machine network all around the globe using IPv6, thereby reducing the complexity and total number of transactions involved in the entire process of cash withdrawa...
详细信息
ISBN:
(纸本)9781467366816
This paper aims at realization of the Automated Teller Machine network all around the globe using IPv6, thereby reducing the complexity and total number of transactions involved in the entire process of cash withdrawal. But the major challenge involved in connecting ATM network to public domain is the security. A Near-Field Communication (NFC) is proposed to be used where in the user, after inserting ATM Card, would communicate via only their NFC enabled mobile phones. Reserving NFC spectrum band to the government is proposed to be made mandatory for ensuring no eavesdropping. The proposed technique would help in achieving a unified interface for ATM terminals among all the financial institutions. A prior establishment of session is made between a trusted third party application and the ATM terminals, wherein the ATM card is checked for its validity and after this validation, session is established between the ATM terminal and the financial institution's server. Then server sends registered mobile number of the swiped ATM card to the terminal, wherein the terminal invokes NFC with that mobile number. Thereafter all the communication is between mobile and the server via the ATM terminals ensuring authenticity and confidentiality. Using this technique, an ATM terminal can connect to any bank server in the world, directly, reducing huge network traffic and providing scope to include other non-financial services as offerings.
Microgrids are desired in remote areas, such as islands and under developed countries. However, given the limited capacities of local energy generation and storage in such a community, it is extremely challenging for ...
详细信息
Microgrids are desired in remote areas, such as islands and under developed countries. However, given the limited capacities of local energy generation and storage in such a community, it is extremely challenging for an isolated microgrid to balance the power demand and generation in real-time with dynamically changing energy demand. Meanwhile, more and more sensing devices (such as smart meters) are deployed in individual homes to monitor real-time energy data, which can be helpful for homes and microgrid to better schedule the workload and generation. However, it is still difficult to conduct real-time distributed control due to the unreliable sensing devices and communications between sensing devices and controllers. To address these issues in microgrids, we designed a novel approach for the system to i) process the collected sensing data, ii) reconstruct the missing data caused by sensing error or unreliable communication, and iii) predict the future demand for real-time distributed control with missing data in extreme situations. The control center then decides the operations of the local generator and each home decides the scheduling of the flexible workload of appliances based on the collected and predicted data. We conducted extensive experiments and simulations with real world energy consumption data from 100 homes for one year. The evaluation results show that our design can recover the missing data with more than 99% accuracy and our distributed control can balance power demand and generation in real-time and reduce the operational cost by 23%.
The data explosion in the emerging big data era imposes a big burden on the network infrastructure. This vision has urged the evolution of computer networks. By softwarizing traditional dedicated hardware based functi...
详细信息
ISBN:
(纸本)9781467399562
The data explosion in the emerging big data era imposes a big burden on the network infrastructure. This vision has urged the evolution of computer networks. By softwarizing traditional dedicated hardware based functions to virtualized network function (VNF) that can run on standard commodity servers, network function virtualization (NFV) technology promises increased networking efficiency, flexibility and scalability. From the perspective of network service providers, with the consideration of big data traffic volume, one primary concern is on the communication cost, which is highly influenced by the VNF placement strategy. In this paper, we are motivated to investigate the issue on communication cost efficient VNF placement problem for big data processing, with joint consideration of network flow balancing and the predetermined network service semantics. We formulate this problem into a mixed-integer linear programming (MILP) form and then propose a low-complexity relaxation-based heuristic algorithm accordingly. The high efficiency of our proposal is validated by extensive simulation studies.
Project Matsu is a collaboration between the Open Commons Consortium and NASA focused on developing open source technology for the cloud-based processing of Earth satellite imagery. A particular focus is the developme...
详细信息
Project Matsu is a collaboration between the Open Commons Consortium and NASA focused on developing open source technology for the cloud-based processing of Earth satellite imagery. A particular focus is the development of applications for detecting fires and floods to help support natural disaster detection and relief. Project Matsu has developed an open source cloud-based infrastructure to process, analyze, and reanalyze large collections of hyperspectral satellite image data using OpenStack, Hadoop, MapReduce, Storm and related technologies. We describe a framework for efficient analysis of large amounts of data called the Matsu "Wheel." The Matsu Wheel is currently used to process incoming hyperspectral satellite data produced daily by NASA's Earth Observing-1 (EO-1) satellite. The framework is designed to be able to support scanning queries using cloud computing applications, such as Hadoop and Accumulo. A scanning query processes all, or most of the data, in a database or data repository. We also describe our preliminary Wheel analytics, including an anomaly detector for rare spectral signatures or thermal anomalies in hyperspectral data and a land cover classifier that can be used for water and flood detection. Each of these analytics can generate visual reports accessible via the web for the public and interested decision makers. The resultant products of the analytics are also made accessible through an Open Geospatial Compliant (OGC)-compliant Web Map Service (WMS) for further distribution. The Matsu Wheel allows many shared data services to be performed together to efficiently use resources for processing hyperspectral satellite image data and other, e.g., large environmental datasets that may be analyzed for many purposes.
This paper introduces GlobalFS, a POSIX-compliant geographically distributed file system. GlobalFS builds on two fundamental building blocks, an atomic multicast group communication abstraction and multiple instances ...
详细信息
ISBN:
(纸本)9781509035144
This paper introduces GlobalFS, a POSIX-compliant geographically distributed file system. GlobalFS builds on two fundamental building blocks, an atomic multicast group communication abstraction and multiple instances of a single-site data store. We define four execution modes and show how all file system operations can be implemented with these modes while ensuring strong consistency and tolerating failures. We describe the GlobalFS prototype in detail and report on an extensive performance assessment. We have deployed GlobalFS across all EC2 regions and show that the system scales geographically, providing performance comparable to other state-of-the-art distributed file systems for local commands and allowing for strongly consistent operations over the whole system. The code of GlobalFS is available as open source.
In the Computer System we have basically three types of Resources; they are Software, Hardware and Data. Data is the most important resource of computer system, because whatever computing we are doing is just because ...
详细信息
ISBN:
(纸本)9781509006700
In the Computer System we have basically three types of Resources; they are Software, Hardware and Data. Data is the most important resource of computer system, because whatever computing we are doing is just because of data. Data Science deals with large amount of data to infer knowledge from the data sets to rationalize the information to achieve business values. Traditionally information was in structured form, means that was generally generated by business transactions. This information was easily process with our traditional data management system. In past few years the amount of digital data generated was grown exponentially. This large amount of data was unstructured, which cannot be processed and extracted efficiently from our traditional system. It includes Text files, sensor data, log data, web data, social networking data etc. The major reason behind the generation of this unstructured data is various applications used via internet e.g. Smart devices, web, mobile, social media and sensor Devices. For achieving the business goals this data is essential to mine. This large amount of unstructured data is called BIG DATA [1]. It's in large volume, its varying. There are various tools are available for processing of this large amount of data. Hadoop is one of the popular and efficient tool for the processing of Big Data. Hadoop provide a framework that allows us distributed computing and run tasks in parallel such that such type of complex data can be processed efficiently with respect to time, performance and resources. This paper covers the major resources used by Hadoop cluster.
Smart grid solutions are increasingly making use of wireless communications as a core component for moving information to make more intelligent decisions. This paper considers a synchrophasor system implemented on a d...
详细信息
ISBN:
(纸本)9781467376181
Smart grid solutions are increasingly making use of wireless communications as a core component for moving information to make more intelligent decisions. This paper considers a synchrophasor system implemented on a distribution feeder that uses two different wireless communications technologies to transmit synchrophasor measurement data simultaneously. This system was implemented to demonstrate advanced distributed generation control and its impacts in terms of grid support on the broader electrical feeder. A wireless serial communications solution and a 3G cellular solution were implemented, allowing a comparison of the performance of these different technologies. Because phasor measurement unit data are continuously streamed at up to 60 messages per second, these data provide a means to continuously evaluate the communications systems. Performance data were measured and archived over a one-week period, providing detailed information to compare the wireless communications technologies implemented. Using this information, recommendations are made in this paper about which wireless technology may be better suited for a variety of utility applications.
暂无评论