The large amounts of software repositories over the Internet are fundamentally changing the traditional paradigms of software maintenance. Efficient categorization of the massive projects for retrieving the relevant s...
详细信息
ISBN:
(纸本)9781467352185
The large amounts of software repositories over the Internet are fundamentally changing the traditional paradigms of software maintenance. Efficient categorization of the massive projects for retrieving the relevant software in these repositories is of vital importance for Internet-based maintenance tasks such as solution searching, best practices learning and so on. Many previous works have been conducted on software categorization by mining source code or byte code, which are only verified on relatively small collections of projects with coarse-grained categories or clusters. However, Internet-based software maintenance requires finer-grained, more scalable and language-independent categorization approaches. In this paper, we propose a novel approach to hierarchically categorize software projects based on their online profiles across multiple repositories. We design a SVM-based categorization framework to classify the massive number of software hierarchically. To improve the categorization performance, we aggregate different types of profile attributes from multiple repositories and design a weighted combination strategy which assigns greater weights to more important attributes. Extensive experiments are carried out on more than 18,000 projects across three repositories. The results show that our approach achieves significant improvements by using weighted combination, and the overall precision, recall and F-Measure can reach 71.41%, 65.60% and 68.38% in appropriate settings. Compared to the previous work, our approach presents competitive results with 123 finer-grained and multi-layered categories. In contrast to those using source code or byte code, our approach is more effective for large-scale and language-independent software categorization.
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world's fastest supercomputer...
详细信息
GPGPUs are increasingly being used to as performance accelerators for HPC (High Performance Computing) applications in CPU/GPU heterogeneous computing systems, including TianHe-1A, the world's fastest supercomputer in the TOP500 list, built at NUDT (National University of Defense technology) last year. However, despite their performance advantages, GPGPUs do not provide built-in fault-tolerant mechanisms to offer reliability guarantees required by many HPC applications. By analyzing the SIMT (single-instruction, multiple-thread) characteristics of programs running on GPGPUs, we have developed PartialRC, a new checkpoint-based compiler-directed partial recomputing method, for achieving efficient fault recovery by leveraging the phenomenal computing power of GPGPUs. In this paper, we introduce our PartialRC method that recovers from errors detected in a code region by partially re-computing the region, describe a checkpoint-based faulttolerance framework developed on PartialRC, and discuss an implementation on the CUDA platform. Validation using a range of representative CUDA programs on NVIDIA GPGPUs against FullRC (a traditional full-recomputing Checkpoint-Rollback-Restart fault recovery method for CPUs) shows that PartialRC reduces significantly the fault recovery overheads incurred by FullRC, by 73.5% when errors occur earlier during execution and 74.6% when errors occur later on average. In addition, PartialRC also reduces error detection overheads incurred by FullRC during fault recovery while incurring negligible performance overheads when no fault happens.
Symbolic execution is an effective path oriented and constraint based program analysis technique. Recently, there is a significant development in the research and application of symbolic execution. However, symbolic e...
详细信息
Wireless sensor networks (WSN) is a critical technology for information gathering covering many areas, including health-care, transportation, air traffic control and environment monitoring. Despite wide use, the fast ...
详细信息
Context situation, which means a snapshot of the status of the real world, is formed by integrating a large amount of contexts collected from various resources. How to get the context situation and use the situation t...
详细信息
distributed social networks have emerged recently. Nevertheless, recommending friends in the distributed social networks has not been exploited fully. We propose FDist, a distributed common-friend estimation scheme th...
详细信息
distributed online social networks (DOSN) have emerged recently. Nevertheless, recommending friends in the distributed social networks has not been exploited fully. We propose BCE (Bloom Filter based Common-Friend Est...
详细信息
By amassing 'wisdom of the crowd', social tagging systems draw more and more academic attention in interpreting Internet folk knowledge. In order to uncover their hidden semantics, several researches have atte...
详细信息
As the energy consumption of embedded multiprocessor systems becomes increasingly prominent, it becomes an urgent problem of real-time energy-efficient scheduling in multiprocessor systems to reduce system energy cons...
详细信息
Nowadays, more and more scientific applications are moving to cloud computing. The optimal deployment of scientific applications is critical for providing good services to users. Scientific applications are usually to...
详细信息
暂无评论