When encountering unfamiliar programming tasks (e.g., connecting to a database), there is a need to seek potential working code examples. Instead of using code search engines, software developers usually post related ...
详细信息
When encountering unfamiliar programming tasks (e.g., connecting to a database), there is a need to seek potential working code examples. Instead of using code search engines, software developers usually post related programming questions on online Q&A forums (e.g., Stack Overflow). One possible reason is that existing code search engines would return effective code examples only if a query contains identifiers (e.g., class or method names). In other words, existing code search engines do not handle natural-language queries well (e.g., a description of a programming task). However, developers may not know the appropriate identifiers at the time of the search. As the demand of searching code examples is increasing, it is of significant interest to enhance code search engines. We conjecture that expanding natural-language queries with their semantically related identifiers has a great potential to enhance code search engines. In this paper, we propose an automated approach to find identifiers (in particular api class-names) that are semantically related to a given natural-language query. We evaluate the effectiveness of our approach using 74 queries on a corpus of 23,677,216 code snippets that are extracted from 24,666 open source Java projects. The results show that our approach can effectively recommend semantically related api class-names to expand the original natural-language queries. For instance, our approach successfully retrieves relevant code examples in the top 10 retrieved results for 76 percent of 74 queries, while it is 36 percent when using the original natural-language query;and the median rank of the first relevant code example is increased from 22 to 7.
暂无评论