Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this...
详细信息
ISBN:
(纸本)9781467365994
Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this paper, we propose to leverage approximation techniques to data clustering to obtain the trade-off between clustering efficiency and result quality, along with online accuracy estimation. The proposed method is based on the bootstrap trials. We implemented this method as an Intelligent Bootstrap Library (IBL) on Spark to support efficient data clustering. Intensive evaluations show that IBL can provide a 2x speed-up over the state of art solution with the same error bound.
The computation core of many big data applications can be expressed as general matrix computations, including linear algebra operations and irregular matrix operations. However, existing parallel programming systems s...
详细信息
The computation core of many big data applications can be expressed as general matrix computations, including linear algebra operations and irregular matrix operations. However, existing parallel programming systems such as Spark do not have programming abstraction and efficient implementation for general matrix computations. In this paper, we present MatrixMap, a unified and efficient data-parallel system for general matrix computations. MatrixMap provides powerful yet simple abstraction, consisting of a distributed data structure called bulk key matrix and a computation interface defined by matrix patterns. Users can easily load data into bulk key matrices and program algorithms into parallel matrix patterns. MatrixMap outperforms current state-of-the-art systems by employing three key techniques: matrix patterns with lambda functions for irregular and linear algebra matrix operations, asynchronous computation pipeline with optimized data shuffling strategies for specific matrix patterns and in-memory data structure reusing data in iterations. Moreover, it can automatically handle the parallelization and distribute execution of programs on a large cluster. The experiment results show that MatrixMap is 12 times faster than Spark.
The publish/subscribe(pub/sub)paradigm is a popular communication model for data dissemination in large-scale distributed ***,scalability comes with a contradiction between the delivery latency and the memory *** one ...
详细信息
The publish/subscribe(pub/sub)paradigm is a popular communication model for data dissemination in large-scale distributed ***,scalability comes with a contradiction between the delivery latency and the memory *** one hand,constructing a separate overly per topic guarantees real-time dissemination,while the number of node degrees rapidly increases with the number of *** the other hand,maintaining a bounded number of connections per node guarantees small memory cost,while each message has to traverse a large number of uninterested nodes before reaching the *** this paper,we propose Feverfew,a coverage-based hybrid overlay that disseminates messages to all subscribers without uninterested nodes involved in,and increases the average number of node connections slowly with an increase in the number of subscribers and *** major novelty of Feverfew lies in its heuristic coverage mechanism implemented by combining a gossip-based sampling protocol with a probabilistic searching *** on the practical workload,our experimental results show that Feverfew significantly outperforms existing coverage-based overlay and DHT-based overlay in various dynamic network environments.
Sparse coding has shown its great potential in learning image feature representation. Recent developed methods such as group sparse coding prefer discovering the group relationships among examples and have achieved th...
详细信息
Sparse coding has shown its great potential in learning image feature representation. Recent developed methods such as group sparse coding prefer discovering the group relationships among examples and have achieved the state-of-the-art results in image classification. However, they suffer from poor robustness shortcomings in practice. This paper proposes a robust weighted supervised sparse coding method(RWSSC) to address this ***, RWSSC distinguishes different classes' contributions to the sparse coding by a novel weighting strategy meanwhile removes the out liers by imposing1 l-regularization over the noisy entries. Benefitting from these strategies, RWSSC can effectively boost performance of sparse coding in image ***, we developed the block coordinate descent algorithm to optimize it, and proved its *** results of image classification on two popular datasets show that RWSSC outperforms the representative sparse coding methods in quantities.
Coordination among users is inevitable in wireless communication for efficient medium access. Even though the data rate of individual user increases significantly, the performance of wireless network does not grow up ...
详细信息
ISBN:
(纸本)9781467364300
Coordination among users is inevitable in wireless communication for efficient medium access. Even though the data rate of individual user increases significantly, the performance of wireless network does not grow up accordingly due to the high MAC coordination overhead. In this paper, we present VFA, namely virtual frame aggregation, to achieve high coordination efficiency by amortizing the overhead over multiple transmissions. VFA provides a novel way to construct a winner cluster and allow the winners to transmit without interruption. Specifically, in a multicarrier network, every contending node chooses a subcarrier and the nodes are ordered by the index of the chosen subcarrier. When there are some subcarriers chosen by two or more nodes, an additional slot is exploited to reorder the collided nodes. Finally, all ordered nodes form a cluster and the transmissions are issued sequentially and uninterruptedly. Simulation results show that usually two slots are enough to construct a sufficiently large winner cluster. Moreover, VFA achieves a notable throughput gain over IEEE 802.11 as high as 120% with better fairness under various scenarios.
The volume of malwares is growing at an exponential speed nowadays. This huge growth makes it extremely hard to analyse malware manually. Most existing signatures extracting methods are based on string signatures, and...
详细信息
The performance of virtualized networks is critical to cloud applications. The "distributed line graphs" (DLG) are a universal technique for designing network topologies based on arbitrary regular graphs. In...
详细信息
The performance of virtualized networks is critical to cloud applications. The "distributed line graphs" (DLG) are a universal technique for designing network topologies based on arbitrary regular graphs. In this paper we implement a prototype (C library) for a DLG-enabled network (called DLG-Kautz), as an application-layer virtualized network service. The effectiveness of our design and implementation is demonstrated through prototype evaluations.
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event cha...
详细信息
The contribution of parasitic bipolar amplification to SETs is experimentally verified using two P-hit target chains in the normal layout and in the special layout. For PMOSs in the normal layout, the single-event charge collection is composed of diffusion, drift, and the parasitic bipolar effect, while for PMOSs in the special layout, the parasitic bipolar junction transistor cannot turn on. Heavy ion experimental results show that PMOSs without parasitic bipolar amplification have a 21.4% decrease in the average SET pulse width and roughly a 40.2% reduction in the SET cross-section.
Sparse coding has shown its great potential in learning image feature representation. Recent developed methods such as group sparse coding prefer discovering the group relationships among examples and have achieved th...
详细信息
ISBN:
(纸本)9781467370066
Sparse coding has shown its great potential in learning image feature representation. Recent developed methods such as group sparse coding prefer discovering the group relationships among examples and have achieved the state-of-the-art results in image classification. However, they suffer from poor robustness shortcomings in practice. This paper proposes a robust weighted supervised sparse coding method (RWSSC) to address this deficiency. Particularly, RWSSC distinguishes different classes' contributions to the sparse coding by a novel weighting strategy meanwhile removes the out liers by imposing l1-regularization over the noisy entries. Benefitting from these strategies, RWSSC can effectively boost performance of sparse coding in image classification. Besides, we developed the block coordinate descent algorithm to optimize it, and proved its convergence. Experimental results of image classification on two popular datasets show that RWSSC outperforms the representative sparse coding methods in quantities.
In order to utilize the shared last-level cache (LLC) in chip multi-processors (CMP) more efficiently, the partitioning of LLC resources among all cores should have the characteristics of low-latency for access, fine ...
详细信息
In order to utilize the shared last-level cache (LLC) in chip multi-processors (CMP) more efficiently, the partitioning of LLC resources among all cores should have the characteristics of low-latency for access, fine granularity for migration and simple hardware complexity for implementation. This paper proposes a dynamic LLC management scheme to achieve these goals. The proposed scheme migrates cache resources among different cores at the granularity of cache blocks, instead of ways. The quantity of victim cache blocks that each victim core can migrate to other target cores are related to an eviction probability, which are calculated according to the performance goal. Then the victim cache blocks for a target core is chosen from the nearest victim core who has non-zero eviction probability by introducing innovate E-Table structure in CMP. The eviction probabilities are updated periodically. With the help of E-Tables, the proposal achieves low-latency accesses by always keeping the required cache blocks near to the target cores. And fine granularity is guaranteed by maintaining an eviction probability for each core. In addition, only little additional hardware changes to traditional cache structure is required. Simulation results suggest significant performance improvements from 6.8% to 22.7% over related works.
暂无评论