P2P-VoD systems have gained tremendous popularity in recent years. While existing research is mostly based on theoretical or conventional assumptions, it is particularly valuable to understand and examine how these as...
详细信息
P2P-VoD systems have gained tremendous popularity in recent years. While existing research is mostly based on theoretical or conventional assumptions, it is particularly valuable to understand and examine how these assumptions work in realistic environments, so as to set up a solid foundation for mechanism design and optimization possibilities. In this paper, we present a comprehensive measurement study of CoolFish, a real-world P2P-VoD system. Our measurement provides several new findings which are different from the traditional assumptions or observations: the access pattern does not match Poisson distribution; session time does not have positive correlation with movie popularity; jump frequency does not have a negative correlation with movie popularity as assumed in previous studies. We analyze the reasons for these results and provide suggestions for the further study of P2P-VoD services.
Retrieval from Hindi document image collections is a challenging task. This is partly due to the complexity of the script, which has more than 800 unique ligatures. In addition, segmentation and recognition of individ...
详细信息
Retrieval from Hindi document image collections is a challenging task. This is partly due to the complexity of the script, which has more than 800 unique ligatures. In addition, segmentation and recognition of individual characters often becomes difficult due to the writing style as well as degradations in the print. For these reasons, robust OCRs are non existent for Hindi. Therefore, Hindi document repositories are not amenable to indexing and retrieval. In this paper, we propose a scheme for retrieving relevant Hindi documents in response to a query word. This approach uses BLSTM neural networks. Designed to take contextual information into account, these networks can handle word images that can not be robustly segmented into individual characters. By zoning the Hindi words, we simplify the problem and obtain high retrieval rates. Our simplification suits the retrieval problem, while it does not apply to recognition. Our scalable retrieval scheme avoids explicit recognition of characters. An experimental evaluation on a dataset of word images gathered from two complete books demonstrates good accuracy even in the presence of printing variations and degradations. The performance is compared with baseline methods.
It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communicatio...
详细信息
It is an established fact that the network topology can have an impact on the performance of scientific parallel applications. However, little work has been done to design an easy to use solution inside a communication library supporting a parallel programming model where the complexities of making the application performance network topology agnostic is hidden from the end user. Similarly, the rapid improvements in networking technology and speed are resulting in many commodity clusters becoming heterogeneous, with respect to networking speed. For example, switches and adapters belonging to different generations (SDR - 8 Gbps, DDR - 16 Gbps and QDR - 36 Gbps speeds in InfiniBand) are integrated into a single system. This leads to an additional challenge to make the communication library aware of the performance implications of heterogeneous link speeds. Accordingly, the communication library can perform optimizations taking link speed into account. In this paper, we propose a framework to automatically detect the topology and speed of an InfiniBand network and make it available to users through an easy to use interface. We also make design changes inside the MPI library to dynamically query this topology detection service and to form a topology model of the underlying network. We have redesigned the broadcast algorithm to take into account this network topology information and dynamically adapt the communication pattern to best fit the characteristics of the underlying network. To the best of our knowledge, this is the first such work for InfiniBand clusters. Our experimental results show that, for large homogeneous systems and large message sizes, we get up to 14% improvement in the latency of the broadcast operation using our proposed network topology-aware scheme over the default scheme at the micro-benchmark level. At the application level, the proposed framework delivers up to 8% improvement in total application run-time especially as job size scales up. The p
Noise is an omnipresent phenomenon. It obscures the real behavior of dynamical system. Lots of methods are proposed to remove the noise contaminating time series. However, almost all the methods consider the noise red...
详细信息
Noise is an omnipresent phenomenon. It obscures the real behavior of dynamical system. Lots of methods are proposed to remove the noise contaminating time series. However, almost all the methods consider the noise reduction in the phase space and often sharp points are kept. Different with these methods, this paper proposes a method directly on the time series itself, considering the gauss noise feature and the smoothness of the real data, uses curve-fitting way to eliminate the sharp points. The numeral results verify the effectiveness of our method.
The era of petascale computing brought machines with hundreds of thousands of processors. The next generation of exascale supercomputers will make available clusters with millions of processors. In those machines, mea...
详细信息
The era of petascale computing brought machines with hundreds of thousands of processors. The next generation of exascale supercomputers will make available clusters with millions of processors. In those machines, mean time between failures will range from a few minutes to few tens of minutes, making the crash of a processor the common case, instead of a rarity. Parallel applications running on those large machines will need to simultaneously survive crashes and maintain high productivity. To achieve that, fault tolerance techniques will have to go beyond checkpoint/restart, which requires all processors to roll back in case of a failure. Incorporating some form of message logging will provide a framework where only a subset of processors are rolled back after a crash. In this paper, we discuss why a simple causal message logging protocol seems a promising alternative to provide fault tolerance in large supercomputers. As opposed to pessimistic message logging, it has low latency overhead, especially in collective communication operations. Besides, it saves messages when more than one thread is running per processor. Finally, we demonstrate that a simple causal message logging protocol has a faster recovery and a low performance penalty when compared to checkpoint/restart. Running NAS Parallel Benchmarks (CG, MG, BT and DT) on 1024 processors, simple causal message logging has a latency overhead below 5%.
Energy efficient systems are highly demanded as the power consumption in HPC region increase. The use of GPUs has attracted attention as a possible solution to these problems because of their parallel performance and ...
详细信息
Alignment of RNA structures is very important in biological *** to pair-wise sequence alignment,there is often disagreement about how to weight matches,mismatches,indels and gaps when comparing two RNA ***,we develop ...
详细信息
Alignment of RNA structures is very important in biological *** to pair-wise sequence alignment,there is often disagreement about how to weight matches,mismatches,indels and gaps when comparing two RNA ***,we develop a visual tool for computing parametric alignment of two RNA *** this tool,users can see explicitly and completely the effect of parameter choices on the optimal alignments of RNA *** software is available for academic use( http://***/wang/software/Para RNA/).
In this paper we propose an energy aware dynamic consolidation algorithm for virtualized service centers based on reinforcement learning. The energy awareness is enacted by using the Energy Aware Context Model (EACM) ...
详细信息
In this paper we propose an energy aware dynamic consolidation algorithm for virtualized service centers based on reinforcement learning. The energy awareness is enacted by using the Energy Aware Context Model (EACM) to programmatically represent the current service center context situation by means of ontologies. We have defined the EACM model entropy metric for evaluating the service center greenness level. If the entropy value is above a predefined threshold, the service center is not in a green state. As a consequence, consolidation or dynamic power management actions are selected by means of reinforcement learning and executed to bring back the service center in an energy efficient state. The results are promising showing that the proposed energy aware consolidation algorithm decreases the energy consumption with about 26% from the total energy consumption of a service center.
When determining the class of the unknown example by using naïve Bayesian classifier, we need to estimate the class conditional probabilities for the continuous attributes. In flexible Bayesian classifier, the Ga...
详细信息
When determining the class of the unknown example by using naïve Bayesian classifier, we need to estimate the class conditional probabilities for the continuous attributes. In flexible Bayesian classifier, the Gaussian kernel function is frequently used for classification task under the framework of Parzen window method. In this paper, the other six kernel functions (uniform, triangular, epanechnikov, biweight, triweight and cosine) are introduced in the flexible naïve Bayesian. The performances of these seven kernels are compared in 30 UCI datasets. The experimental comparisons are carried out according to the following three aspects: the classification accuracy, ranking performance and the class probability estimation. The latter two are measured by the area under the ROC curve (AUC) and the conditional log likelihood (CLL). The related kernels are compared via two-tailed t-test with a 95 percent confidence level and the Friedman's test using the 0.05 critical level. The experimental results show that the most commonly used Gaussian kernel can not achieve the best classification accuracy and AUC. However, on the CLL, the Gaussian kernel is statistically significantly better than the other six kernels. Finally, the corresponding analyses are given based on the experimental results.
Running MapReduce programs in the public cloud introduces the important problem: how to optimize resource provisioning to minimize the financial charge for a specific job? In this paper, we study the whole process of ...
详细信息
Running MapReduce programs in the public cloud introduces the important problem: how to optimize resource provisioning to minimize the financial charge for a specific job? In this paper, we study the whole process of MapReduce processing and build up a cost function that explicitly models the relationship between the amount of input data, the available system resources (Map and Reduce slots), and the complexity of the Reduce function for the target MapReduce job. The model parameters can be learned from test runs with a small number of nodes. Based on this cost model, we can solve a number of decision problems, such as the optimal amount of resources that can minimize the financial cost with a time deadline or minimize the time under certain financial budget. Experimental results show that this cost model performs well on tested MapReduce programs.
暂无评论