Preferential attachment drives the evolution of many complex networks. Its analytical studies mostly consider the simplest case of a network that grows uniformly in time despite the accelerating growth of many real ne...
详细信息
Preferential attachment drives the evolution of many complex networks. Its analytical studies mostly consider the simplest case of a network that grows uniformly in time despite the accelerating growth of many real networks. Motivated by the observation that the average degree growth of nodes is time invariant in empirical network data, we study the degree dynamics in the relevant class of network models where preferential attachment is combined with heterogeneous node fitness and aging. We propose an analytical framework based on the time invariance of the studied systems and show that it is self-consistent only for two special network growth forms: the uniform and the exponential network growth. Conversely, the breaking of such time invariance explains the winner-takes-all effect in some model settings, revealing the connection between the Bose-Einstein condensation in the Bianconi-Barabási model and similar gelation in superlinear preferential attachment. Aging is necessary to reproduce realistic node degree growth curves and can prevent the winner-takes-all effect under weak conditions. Our results are verified by extensive numerical simulations.
In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For ...
详细信息
In optimization or machine learning problems we are given a set of items, usually points in some metric space, and the goal is to minimize or maximize an objective function over some space of candidate solutions. For example, in clustering problems, the input is a set of points in some metric space, and a common goal is to compute a set of centers in some other space (points, lines) that will minimize the sum of distances to these points. In database queries, we may need to compute such a some for a specific query set of k centers. However, traditional algorithms cannot handle modern systems that require parallelreal-time computations of infinite distributed streams from sensors such as GPS, audio or video that arrive to a cloud, or networks of weaker devices such as smartphones or robots. Core-set is a "small data" summarization of the input "big data," where every possible query has approximately the same answer on both data sets. Generic techniques enable efficient coreset maintenance of streaming, distributed and dynamic data. Traditional algorithms can then be applied on these coresets to maintain the approximated optimal solutions. The challenge is to design coresets with provable tradeoff between their size and approximation error. This survey summarizes such constructions in a retrospective way, that aims to unified and simplify the state-of-the-art. This article is categorized under Algorithmic Development > Structure Discovery Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Scalable Statistical Methods
SSDs are getting widely used in data centers. It is a critical issue to design efficient software for exploiting the benefits of fast SSD hardware. In this paper, we propose OCStore, an object store based on open-chan...
详细信息
ISBN:
(纸本)9781728125190
SSDs are getting widely used in data centers. It is a critical issue to design efficient software for exploiting the benefits of fast SSD hardware. In this paper, we propose OCStore, an object store based on open-channel SSDs for distributed object storage system. OCStore manages the objects directly on raw flash memory, mitigating redundant functions across the object store, the file system, and the FTL layers. It provides streamed transactional update, which not only ensures the multi-page atomicity leveraging the non-overwrite flash write features, but also provides isolation for independent I/O streams while enabling parallel accesses to different channels. OCStore also coordinates different channels to enable transaction-aware scheduling, so as to reduce transaction-level latency and provide low response time to distributed storage. We implement OCStore in Linux kernel on the real open-channel SSDs, and evaluate them as OSDs in Ceph. Evaluations show that OCStore outperforms state-of-the-art object stores by 1.5 x to 3.0 x, while providing much lower and stable latencies, and decreases up to 70% write traffic under heavy workloads.
Solving ordinary differential equations (ODE) on heterogenous or multi-core/parallel embedded systems does significantly increase the operational capacity of many sensing systems in view of processing tasks such as se...
详细信息
Solving ordinary differential equations (ODE) on heterogenous or multi-core/parallel embedded systems does significantly increase the operational capacity of many sensing systems in view of processing tasks such as self-calibration, model-based measurement and self-diagnostics. The main challenge is usually related to the complexity of the processing task at hand which costs/requires too much processing power, which may not be available, to ensure a real-time processing. Therefore, a distributed solving involving multiple cores or nodes is a good/precious option. Also, speeding-up the processing does also result in significant energy consumption or sensor nodes involved. There exist several methods for solving differential equations on single processors. But most of them are not suitable for an implementation on parallel (i.e., multi-core) systems due to the increasing communication related network delays between computing nodes, which become a main and serious bottleneck to solve such problems in a parallel computing context. Most of the problems faced relate to the very nature of differential equations. Normally, one should first complete calculations of a previous step in order to use it in the next/following step. Hereby, it appears also that increasing performance (e.g., through increasing step sizes) may possibly result in decreasing the accuracy of calculations on parallel/multi-core systems like GPUs. In this paper, we do create a new adaptive algorithm based on the Adams-Moulton and Parareal method (we call it PAMCL) and we do compare this novel method with other most relevant implementations/schemes such as the so-called DOPRI5, PAM, etc. Our algorithm (PAMCL) is showing very good performance (i.e., speed-up) while compared to related competing algorithms, while thereby ensuring a reasonable accuracy. For a better usage of computing units/resources, the OpenCL platform is selected and ODE solver algorithms are optimized to work on both GPUs and CPUs. This
Stable local image feature detection is a fundamental problem in computer vision and is critical for obtaining the corresponding interest points among images. As a popular and robust feature extraction algorithm, the ...
详细信息
Stable local image feature detection is a fundamental problem in computer vision and is critical for obtaining the corresponding interest points among images. As a popular and robust feature extraction algorithm, the scale invariant feature transform (SIFT) is widely used in various domains, such as image stitching and remote sensing image registration. However, the computational complexity of SIFT is extremely high, which limits its application in real-timesystems and large-scale data processing tasks. Thus, we propose several efficient optimizations to realize a high-performance SIFT (HartSift) by exploiting the computing resources of CPUs and GPUs in a heterogeneous machine. Our experimental results show that HartSift processes an image within 3.07 similar to 7.71 ms, which is 55.88 similar to 121.99 times, 5.17 similar to 6.88 times, and 1.25 similar to 1.79 times faster than OpenCV SIFT, SiftGPU, and CudaSift, respectively. (C) 2018 Elsevier Inc. All rights reserved.
Existing GPU-based approaches cannot yet meet the performance requirement for training very large convolutional neural networks (CNNs), where convolutional layers (Conv-layers) dominate the training time. In this pape...
详细信息
Existing GPU-based approaches cannot yet meet the performance requirement for training very large convolutional neural networks (CNNs), where convolutional layers (Conv-layers) dominate the training time. In this paper, we find that no single convolution policy can always perform the fastest across all the computing phases. Then, we propose an approach called HyConv to accelerating multi-phase CNN computation by fine-grained policy selection. HyConv encapsulates existing convolution policies into a set of modules, and selects the fastest policy (a.k.a., winner policy) via one-round runtime measurement for computing each phase. Furthermore, HyConv uses a winner database to record the current winner policies, avoiding duplicate measurement later for the same parameter configuration. Our experimental results indicate that over all the used real-world CNN networks, HyConv consistently outperforms existing approaches on either a single GPU or four GPUs, with speedups of up to 3.3x and up to 1.6x over cuDNN-MM respectively. Such improvement can be explained by our result that HyConv delivers obviously better performance for most of single Conv-layers. Furthermore, HyConv has the ability to work with any parameter configuration and thus keeps better usability.
The analysis of systems that can be modeled as networks of interacting entities is becoming often more important in different research fields. The application of machine learning algorithms, like prediction tasks over...
详细信息
Text stream classification is an important problem that is difficult to solve at scale. Batch processing systems, widely adopted for text classification tasks, cannot provide for low latency. distributed stream proces...
详细信息
This paper proposes priority- and weight-based steal strategies for an idle worker (thief) to select a victim worker in work-stealing frameworks. Typical work-stealing frameworks employ uniformly random victim selecti...
详细信息
ISBN:
(纸本)9781728159874
This paper proposes priority- and weight-based steal strategies for an idle worker (thief) to select a victim worker in work-stealing frameworks. Typical work-stealing frameworks employ uniformly random victim selection. We implemented the proposed strategies on a work-stealing framework called Tascell;Tascell programmers can let each worker estimate and declare, as a real number, the amount of remaining work required to complete its current task so that declared values are used as priorities or weights in the enhanced Tascell framework. To reduce the total task-division cost, the proposed strategies avoid stealing small tasks. With a priority-based strategy, a thief selects the victim that has the highest known priority at that point in time. With a weight-based non-uniformly random strategy, a thief uses the relative weights of victim candidates as their selection probabilities. The proposed selection strategies outperformed uniformly random victim selection. Our evaluation uses a parallel implementation of the "highly serial" version of the Barnes-Hut force-calculation algorithm in a shared memory environment and five benchmark programs in a distributed memory environment.
Distinct sampling is fundamental for computing statistics (e.g., the age and gender distribution of distinct users accessing a particular website) depending on the set of distinct keys (e.g., user IDs) in a large and ...
详细信息
Distinct sampling is fundamental for computing statistics (e.g., the age and gender distribution of distinct users accessing a particular website) depending on the set of distinct keys (e.g., user IDs) in a large and high speed data stream such as a sequence of key-update pairs. However, the major shortcoming of existing methods is their high computational cost incurred by determining whether each incoming key in the data stream is currently in the set of sampled keys and keeping track of sampled keys' update aggregations. To solve this challenge, we develop a new method random projection and eviction (RPE) that uses a list of buckets to continuously sample distinct keys and their update aggregations. RPE processes each key-update pair with small and nearly constant time complexity O(1). Besides centralized data streams, we also develop a novel method DRPE to deal with distributed data streams consisting of key-update pairs observed at multiple distributed sites. We conduct extensive experiments on real-world datasets, and the results demonstrate that RPE and DRPE reduce the memory, computational, and message costs of state-of-the-art methods by several times.
暂无评论