Recommender systems provide an important tool for users to find interested items from the massive amount of user-generated contents. As user interests often change over time and contents become available in a streamin...
详细信息
Recommender systems provide an important tool for users to find interested items from the massive amount of user-generated contents. As user interests often change over time and contents become available in a streaming fashion, it is highly desirable to support real-time recommendation that can adapt to changes in user interests and contents. If we represent both user interests and items by high-dimensional points in the same vector space, we can recommend to the user the k items that are the nearest neighbors (kNN) of the user. The problem of real-time recommendation, thus, translates to computing the kNNs based on the most recent items when the user interests change. As such, the main issue we tackle in this paper is to efficiently process high-dimensional kNN queries over a sliding window on data streams. In particular, we are interested in developing a scalable distributed solution to be able to handle the ever-increasing number of users and volume of data. We propose a new index structure called the dynamic bounded rings index (DBRI) to index the data points in data streams. The basic idea is to first find a set of pivots and assign all points to their nearest pivot to form subsets and then partition each subset into finer-grained bounded rings that can be dynamically adjusted as points change. The design of DBRI lends itself to easy adoption in a distributed setting. We further present the distributed high-dimensional kNN query algorithm (DHDKNN) based on DBRI, aiming at reducing both the communication and the computational cost of query processing. The experiments demonstrate that our algorithm scales well and significantly outperforms the existing methods.
The explosion of data-centric and data dependent applications requires new storage devices, interfaces, and software stacks. Big data analytics solutions such as Hadoop, MapReduce and Spark have addressed the performa...
详细信息
ISBN:
(纸本)9781538655559
The explosion of data-centric and data dependent applications requires new storage devices, interfaces, and software stacks. Big data analytics solutions such as Hadoop, MapReduce and Spark have addressed the performance challenge by using a distributed architecture based on a new paradigm that relies on moving computation closer to data. In this paper, we describe a novel approach aimed at pushing the "move computation to data" paradigm to its ultimate limit by enabling highly efficient and flexible in-storage processing capability in solid state drives (SSDs). We have designed CompStor, an FPGA-based SSD that implement computational storage through a software stack (devices, protocol, interface, software, and systems) and a dedicated hardware for in-storage processing including a quadcore ARM processor subsystem. The dedicated hardware resources provide in-storage data analytics capability without degrading the performance of common storage device functions such as read, write and trim. Experimental results show up to 3X energy saving for some applications in comparison to the host CPU. To the best of our knowledge, the 24TB CompStor SSD is the first one capable of supporting in-storage computation running an operating system, enabling all types of applications and Linux shell commands to be executed in-place with no modification.
Rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. Howe...
详细信息
ISBN:
(纸本)9781728118680
Rapidly growing volume of electrophysiological signals has been generated for clinical research in neurological disorders. European Data Format (EDF) is a standard format for storing electrophysiological signals. However, the bottleneck of existing signal analysis tools for handling large-scale datasets is the sequential way of loading large EDF files before performing signal analyses. To overcome this, we develop Hadoop-EDF, a distributed signal processing tool to load EDF data in a parallel manner using Hadoop MapReduce. Hadoop-EDF uses a robust data partition algorithm making EDF data parallelly processable. We evaluate Hadoop-EDF's scalability and performance by leveraging two datasets from the National Sleep Research Resource and running experiments on Amazon Web Service clusters. The performance of Hadoop-EDF on a 20-node cluster achieved about 26 times and 47 times faster than the sequential processing of 200 small-size files and 200 large-size files, respectively. The results demonstrate that Hadoop-EDF is more suitable and effective in processing large EDF files.
distributed processing and control are critical to supports distributed intelligence and autonomy of multi-degree-of-freedom motion systems. Measurements and fusion of spatiotemporal physical quantities imply data-int...
详细信息
ISBN:
(纸本)9781538663844
distributed processing and control are critical to supports distributed intelligence and autonomy of multi-degree-of-freedom motion systems. Measurements and fusion of spatiotemporal physical quantities imply data-intensive computing and spatial distribution of computing resources to enable control and processing of large data sets from image and inertial sensors. We examine distributed and asynchronous processing nodes which process information independently deriving partial solutions. There are multiple sensing-and-processing nodes in each individual agent. Each node comprises solid-state or MEMS multi-degree-of-freedom sensors with ASICs which process and fuse data. On-node computing supports distributed processing. Adaptive bottom-up organization ensures data aggregation and data management with operation on sub-samples or hashed sets of large source datasets. Cooperative distributed processing is essential in centralized, decentralized and behavioral coordination. In the centralized organization, a central processor may not ensure adequacy. A network of semi-autonomous on-device processing sensors may interact to solve specific tasks and validate solutions. Problem allocation, partitioning, coordination and other tasks are implemented using software-and hardware-supported algorithms and protocols. This paper contributes to design of next generation of systems with distributed multi-node processing capabilities.
With the widespread usage of wireless network and mobile devices, the scale of spatial-temporal data is dramatically increasing and a good deal of real world applications can be formulated as processing continuous que...
详细信息
ISBN:
(数字)9783319633121
ISBN:
(纸本)9783319633121;9783319633114
With the widespread usage of wireless network and mobile devices, the scale of spatial-temporal data is dramatically increasing and a good deal of real world applications can be formulated as processing continuous queries over moving objects. Most existing works investigating this problem mainly concern about the centralized search algorithm for dealing with range queries over a limited volume of objects, but these approaches hardly can scale well in a cluster of servers. Additionally, the existing approaches seldom process the situation that the locations of objects and queries are simultaneously changing. To address this challenge, we propose a distributed grid index and a distributed incremental search approach to handle concurrent continuous range queries over an ocean of moving objects. As to the distributed grid index, it can be deployed on a distributed computing framework to well support the real-time maintenance of moving objects. Further, we take fully into account the condition that locations of objects and queries are both changing at the same time, and put forward a parallel search approach based on the publish/subscribe mechanism to achieve incrementally searching results of each continuous range queries with a cluster of servers. Finally, we conduct extensive experiments to sufficiently evaluate the performance of our proposal.
Studies of cosmic sources - pulsars require solving complex problems from detection of pulsed radio emission to determination these pulses main parameters. The authors solve these problems by use of hybrid processor w...
详细信息
ISBN:
(纸本)9781538618103
Studies of cosmic sources - pulsars require solving complex problems from detection of pulsed radio emission to determination these pulses main parameters. The authors solve these problems by use of hybrid processor with distributed processing: broadband spectral analysis of radiation is performed in analog form in a coherent acousto-optic Fourier processor (conveyor-sliding integral transform), the output of which is additionally compressed in the multi-element photodetector operating in time delay and integration mode (a photodetector is a discrete-analog processor here), a further increase the signal/noise ratio is produced by accumulation of photodetector output data in a digital microprocessor, which also executes control of hybrid processor as a whole. The report shows the results of numerical simulation of the arrangement in MATLAB. Discussed are also the possibilities of using of two-channel acousto-optical interferometric processor to obtain estimates of the parameters of polarized pulsars radio emission.
Vehicular Clouds processing is a new field of research that aims to exploit the vehicles' onboard computational resources as a part of a cooperative distributed cloud computing environment. In this paper, we propo...
详细信息
ISBN:
(纸本)9781538605547
Vehicular Clouds processing is a new field of research that aims to exploit the vehicles' onboard computational resources as a part of a cooperative distributed cloud computing environment. In this paper, we propose a vehicular cloud network architecture where a group of vehicles near a traffic light cluster and form a temporal vehicular cloud by aggregating their computational resources in that cluster. The goal of the proposed architecture is to minimize the processing and network power consumed in the data center of a cloud operator. To this end, arriving processing tasks are optimally assigned to the centralized cloud and/or the formed vehicular clouds to reduce the total power consumption of the centralized cloud by reducing its average processing workload and network traffic. Furthermore, task assignment among vehicular clouds is constrained by tasks completion time. Our proposed system is analyzed using a mixed integer linear programming (MILP) model where two task assignment approaches were considered: single task assignment and distributed task assignment. In the first approach, each task is not split among multiple clouds, while splitting is allowed in the second approach. It was found that the power consumption of the centralized cloud is reduced by 45% (in the first approach) and 60% (in the second approach) compared to the case where all tasks are assigned to the centralized cloud only. The higher power saving of the centralized cloud in the second approach comes from the ability of vehicular clouds to host more processing workload, an average of 37% more workload, compared to the single task assignment approach.
Datacubes provide a suitable paradigm for storing, accessing and processing large-scale, multi-dimensional spatio-temporal raster data. As hardware and distributed infrastructure, such as cloud and federations, become...
详细信息
ISBN:
(纸本)9781538691540
Datacubes provide a suitable paradigm for storing, accessing and processing large-scale, multi-dimensional spatio-temporal raster data. As hardware and distributed infrastructure, such as cloud and federations, become common, enabling datacubes to fully exploit the available capabilities is a crucial step in building scalable systems for complex query answering in real time. In this contribution, we describe an approach to distributed datacube processing that enables datacube engines - in our case rasdaman - to analyze, for every incoming query, a large number of equivalent distributed execution variants, and to pick an efficient one.
A method to determine the orientation and speed of a gravitational wave is described based on a coordinate system that is oriented along the inter-detector unit vectors across a network of detectors that successively ...
详细信息
ISBN:
(纸本)9781509063673
A method to determine the orientation and speed of a gravitational wave is described based on a coordinate system that is oriented along the inter-detector unit vectors across a network of detectors that successively detected the wave in time. A set of normalized wave propagation constraints are determined for the successive wave traversal events through the detectors based on the time of arrival at the detectors, to estimate the orientation and speed of the wave. distributed processing across detectors based on cross-correlation of detector streams is suggested to determine the difference in arrival times across detectors, and to determine the sequence of wave arrivals across a network of detectors. From the inter-arrival time information across detectors, the wave orientation and speed can be determined.
We consider the problem of distributed representation of signals in sensor networks, where sensors exchange quantized information with their neighbors. The signals of interest are assumed to have a sparse representati...
详细信息
We consider the problem of distributed representation of signals in sensor networks, where sensors exchange quantized information with their neighbors. The signals of interest are assumed to have a sparse representation with spectral graph dictionaries. We further model the spectral dictionaries as polynomials of the graph Laplacian operator. We first study the impact of the quantization noise in the distributed computation of matrix-vector multiplications, such as the forward and the adjoint operator, which are used in many classical signal processing tasks. It occurs that the performance is clearly penalized by the quantization noise, whose impact directly depends on the structure of the spectral graph dictionary. Next, we focus on the problem of sparse signal representation and propose an algorithm to learn polynomial graph dictionaries that are both adapted to the graph signals of interest and robust to quantization noise. Simulation results show that the learned dictionaries are efficient in processing graph signals in sensor networks where bandwidth constraints impose quantization of the messages exchanged in the network.
暂无评论