SKVM is a high performance in-memory Key-Value (KV) store for multicore, which is designed for high concurrent data access. There are some problems of existing systems dealing with high concurrent data processing on m...
详细信息
ISBN:
(纸本)9781467371957
SKVM is a high performance in-memory Key-Value (KV) store for multicore, which is designed for high concurrent data access. There are some problems of existing systems dealing with high concurrent data processing on multicore: lock competition, cache coherency overhead, and large numbers of concurrent network connections. To solve the problems and make the in-memory KV store scale well on multicore, high concurrent data access is divided into two steps: high concurrent connection processing and high concurrent data processing. Half sync/half async model (HSHA) is adopted to eliminate network bottle-neck, which can support high concurrent network connection. Through data partition, lock competition is eliminated and cache movement is reduced. Furthermore, consistent hash is adopted as data distribution strategy which can improve the scalability of system on multicore. Though some of these ideas appear elsewhere, SKVM is the first to combine them together. The experimental results show that SKVM can reach at most 2.4× higher throughput than Memcached, and scales near linearly with the number of cores under any workload.
Cloud is an emerging computing *** has drawn extensive attention from both academia and *** its security issues have been considered as a critical obstacle in its rapid *** data owners store their data as plaintext in...
详细信息
Cloud is an emerging computing *** has drawn extensive attention from both academia and *** its security issues have been considered as a critical obstacle in its rapid *** data owners store their data as plaintext in cloud,they lose the security of their cloud data due to the arbitrary accessibility,specially accessed by the un-trusted *** order to protect the confidentiality of data owners’cloud data,a promising idea is to encrypt data by data owners before storing them in ***,the straightforward employment of the traditional encryption algorithms can not solve the problem well,since it is hard for data owners to manage their private keys,if they want to securely share their cloud data with others in a fine-grained *** this paper,we propose a fine-grained and heterogeneous proxy re-encryption(FHPRE)system to protect the confidentiality of data owners’cloud *** applying the FH-PRE system in cloud,data owners’cloud data can be securely stored in cloud and shared in a fine-grained ***,the heterogeneity support makes our FH-PRE system more efficient than the previous ***,it provides the secure data sharing between two heterogeneous cloud systems,which are equipped with different cryptographic primitives.
Code offloading has been proposed to improve the performance and energy-efficiency of mobile devices by sending heavy computation tasks to resourceful cloud, instead of executing all tasks on local mobile devices. Unf...
详细信息
ISBN:
(纸本)9781467371957
Code offloading has been proposed to improve the performance and energy-efficiency of mobile devices by sending heavy computation tasks to resourceful cloud, instead of executing all tasks on local mobile devices. Unfortunately, current code offloading techniques are not efficient enough because of high communication cost. In this paper, we propose a novel code offloading strategy with cellular traffic aggregation that aggregates offloaded codes to several mobile devices before sending them to cloud, which can significantly reduce tail energy effect. With the objective of minimizing the total cost of computation and communication, we propose an optimization framework by jointly considering code partition and traffic aggregation. Due to the hardness of this problem, we design an efficient heuristic algorithm and evaluate its performance via extensive simulations. Simulation results demonstrate that the proposed algorithm significantly outperforms existing schemes.
Recently, convolutional networks have achieved great successes in the field of computer vision. In order to improve the efficiency of convolutional networks, large amount of solutions focusing on training algorithms a...
详细信息
ISBN:
(纸本)9781479989386
Recently, convolutional networks have achieved great successes in the field of computer vision. In order to improve the efficiency of convolutional networks, large amount of solutions focusing on training algorithms and parallelism strategies have been proposed. In this paper, a novel algorithm based on look-up table is proposed to speed up convolutional networks with small filters by applying GPU. By transforming multiplication operations in the convolution computation to some table-based summation operations, the overhead of convolution computation can be reduced largely. The process of creating table and looking up table is very appropriate for parallelization on GPU. Experiment results show that the proposed approaches can improve the speed of convolution computation by 20%-30%, compared with state-of-the-art existing works.
The increasing popularity of photo sharing in social networking service (SNS) complicates the challenge of storing and transmitting large photo data for SNS providers. Distributed web caches are generally used by SNS ...
详细信息
ISBN:
(纸本)9781509001552
The increasing popularity of photo sharing in social networking service (SNS) complicates the challenge of storing and transmitting large photo data for SNS providers. Distributed web caches are generally used by SNS providers to improve data transmission performance effectively. Two critical factors affect the efficiency of web caches: storage media and replacement algorithms. Solid-state drive (SSD) is often used in web caches due to its good performance. Random write performance and SSD endurance are directly affected by the characteristic of write amplification;this characteristic is also related to replacement algorithms. The least recently used (LRU) algorithm effectively computes cache hit ratio but induces severe write amplification. First in first out (FIFO) method minimizes write amplification, whereas a weak hit ratio increases bandwidth consumption. In this paper, we present a hybrid replacement strategy, namely, F-SkLRU. This strategy facilitates a good trade-off between performance and the cost of web cache. F-SkLRU takes advantage of sequential writing to reduce SSD wear. This strategy also consolidates multiple factors to improve hit ratio. Simulation results show that F-SkLRU can reduce bandwidth consumption by 14.11% to 23.4% in comparison with that of FIFO and LRU. The proposed strategy can also reduce the cost of SSD by up to 300% more than LRU can in extreme situations.
Graph model has been widely applied in docu- ment summarization by using sentence as the graph node, and the similarity between sentences as the edge. In this paper, a novel graph model for document summarization is p...
详细信息
Graph model has been widely applied in docu- ment summarization by using sentence as the graph node, and the similarity between sentences as the edge. In this paper, a novel graph model for document summarization is presented, that not only sentences relevance but also phrases relevance information included in sentences are utilized. In a word, we construct a phrase-sentence two-layer graph structure model (PSG) to summarize document(s) . We use this model for generic document summarization and query-focused sum- marization. The experimental results show that our model greatly outperforms existing work.
The stacked autoencoder is a deep learning model that consists of multiple autoencoders. This model has been widely applied in numerous machine learning applications. A significant amount of effort has been made to in...
详细信息
ISBN:
(纸本)9781509001552
The stacked autoencoder is a deep learning model that consists of multiple autoencoders. This model has been widely applied in numerous machine learning applications. A significant amount of effort has been made to increase the size of the deep learning model with respect to the size of the training dataset and the parameter of the model to improve performance. However, training a large deep learning model is highly time consuming. Recent studies have applied the CPU cluster with thousands of machines as well as the single GPU or the GPU cluster, to train large scale deep learning models. As a high-performance coprocessor like GPU, the Xeon Phi can be an alternative tool for training large scale deep learning models on a single machine. The Xeon Phi can be recognized as a small cluster which features about 60 cores, and each core supports four hardware threads. Massive parallelism offsets the low computing capacity of every core, but challenges an efficient parallel autoencoders design. In this paper, we analyze the training algorithm of autoencoders based on the matrix operation and point out the thread oversubscription problem, which results in performance degradation. Based on the observation, we propose our map-reduce implementation of autoencoders on the Xeon Phi coprocessor. Our basic idea is to parallelize multiple autoencoder model replicas with bulk synchronous parallel (BSP) communication model where the parameters are updated after the computations of all replicas are completed. Each thread is responsible for one model replica, and all replicas work together on the same mini-batch. This data parallelism method is suitable for training autoencoders on the Xeon Phi, and can extend to asynchronous parallel training method without thread oversubscription. In our experiment the speedup is four times higher than that of sequential implementation. Enlarging the size of the autoencoder model, our method still gets stable speedup.
To satisfy the rapid growth of cloud technologies, a large number of web applications have been developed and deployed, and these applications are being run in clouds. Due to the scalability provided by clouds, a sing...
详细信息
To satisfy the rapid growth of cloud technologies, a large number of web applications have been developed and deployed, and these applications are being run in clouds. Due to the scalability provided by clouds, a single web application may be concurrently visited by several millions or billions of users. Thus, the testing and performance evaluations of these applications are increasingly important. User model based evaluations can significantly reduce the manual work required, and can enable us to determine the performance of applications under real runtime environments. Hence, it has become one of the most popular evaluation methods in both industry and academia. Significant efforts have focused on building different kinds of models using mining web access logs, such as Markov models and Customer Behavior Model Graph (CBMG). This paper proposes a new kind of model, named the User Representation Model Graph (URMG), which is built based on CBMG. It uses an algorithm to refine CBMG and optimizes the evaluations execution process. Based on this model, an automatic testing and evaluation system for web applications is designed, implemented, and deployed in our test cloud, which is able to execute all of the analysis and testing operations using only web access logs. In our system, the error rate caused by random access to applications in the execution phase is also reduced, and the results show that the error rate of the evaluation that depends on URMG is 50% less than that which depends on CBMG.
Delta-based accumulative iterative computation (DAIC) model is currently proposed to support iterative algorithms in a synchronous or an asynchronous way. However, both the synchronous DAIC model and the asynchronou...
详细信息
Delta-based accumulative iterative computation (DAIC) model is currently proposed to support iterative algorithms in a synchronous or an asynchronous way. However, both the synchronous DAIC model and the asynchronous DAIC model only satisfy some given conditions, respectively, and perform poorly under other conditions either for high synchronization cost or for many redundant activations. As a result, the whole performance of both DAIC models suffers from the serious network jitter and load jitter caused by multi- tenancy in the cloud. In this paper, we develop a system, namely Hyblter, to guarantee the performance of iterative algorithms under different conditions. Through an adaptive execution model selection scheme, it can efficiently switch between synchronous and asynchronous DAIC model in order to be adapted to different conditions, always getting the best performance in the cloud. Experimental results show that our approach can improve the performance of current solutions up to 39.0%.
暂无评论