A package's source code repository records the package's development history, which is critical for the use and risk monitoring of the package. However, a package release often misses its sourcecode repositor...
详细信息
A package's source code repository records the package's development history, which is critical for the use and risk monitoring of the package. However, a package release often misses its source code repository due to the separation of the package's development platform from its distribution platform. To establish the link, existing tools retrieve the release's repository information from its metadata, which suffers from two limitations: the metadata may not contain or contain wrong information. Our analysis shows that existing tools can only retrieve repository information for up to 70.5% of PyPI releases. To address the limitations, this paper proposes PyRadar, a novel framework that utilizes the metadata and source distribution to retrieve and validate the repository information for PyPI releases. We start with an empirical study to compare four existing tools on 4,227,425 PyPI releases and analyze phantom files (files appearing in the release's distribution but not in the release's repository) in 14,375 correct and 2,064 incorrect package-repository links. Based on the findings, we design PyRadar with three components, i.e., Metadata-based Retriever, source code repository Validator, and sourcecode-based Retriever, that progressively retrieves correct source code repository information for PyPI releases. In particular, the Metadata-based Retriever combines best practices of existing tools and successfully retrieves repository information from the metadata for 72.1% of PyPI releases. The source code repository Validator applies common machine learning algorithms on six crafted features and achieves an AUC of up to 0.995. The sourcecode-based Retriever queries World of code with the SHA-1 hashes of all Python files in the release's source distribution and retrieves repository information for 90.2% of packages in our dataset with an accuracy of 0.970. Both practitioners and researchers can employ the PyRadar to better use PyPI packages.
Software reuse is a solution to reduce the costs of estimation, design, development, and testing. Reuse of sourcecodes can release the coders from the time-killing botheration of doing the same coding again and again...
详细信息
ISBN:
(纸本)9781479903979
Software reuse is a solution to reduce the costs of estimation, design, development, and testing. Reuse of sourcecodes can release the coders from the time-killing botheration of doing the same coding again and again for same kind of software projects. If the proposed distribution format (Application→Module→Class→Function→Sub-function) is followed while storing sourcecodes, then it is easier to fulfill the requests for any level of sourcecodes. In this paper, a source code repository system is proposed that dynamically distributes an application to its smallest level components and stores their information. Besides, it provides an interface to browse and retrieve the desired sourcecodes. During dynamic retrieval, the desired sourcecodes are merged together by a bottom-up integration method. Implementation shows that the proposed repository outperforms the other related research works with providing more features in simpler way.
Personality traits influence most, if not all, of the human activities, from those as natural as the way people walk, talk, dress and write to those most complex as the way they interact with others. Most importantly,...
详细信息
ISBN:
(纸本)9781450341554
Personality traits influence most, if not all, of the human activities, from those as natural as the way people walk, talk, dress and write to those most complex as the way they interact with others. Most importantly, personality influence the way people make decisions including, in the case of developers, the criteria they consider when selecting a software project they want to participate. Most of the works that study the influence of social, technical and human factors in software development projects have been focused on the impact of communications in software quality. For instance, on identifying predictors to detect files that may contain bugs before releasing an enhanced version of a software product. Only a few of these works focus on the analysis of personality traits of developers with commit permissions (committers) in Free/Libre and Open-source Software projects and their relationship with the software artifacts they interact with. This paper presents an approach, based on the automatic recognition of personality traits from e-mails sent by committers in FLOSS projects, to uncover relationships between the social and technical aspects that occur during the software development process. Our experimental results suggest the existence of some relationships among personality traits projected by the committers through their e-mails and the social (communication) and technical activities they undertake. This work is a preliminary study aimed at supporting the setting up of efficient work teams in software development projects based on an appropriate mix of stakeholders taking into account their personality traits.
The products of Open source Software (OSS) projects are widely used even in commercial mission-critical and high-availability systems. This is because both the quality of these software products is high enough for the...
详细信息
The products of Open source Software (OSS) projects are widely used even in commercial mission-critical and high-availability systems. This is because both the quality of these software products is high enough for these applications and the support of software could fulfill the requirement. In general, when one wants to adopt OSS as a part of computer systems, it is required to examine the functional requirement (FR) for the OSS as well as nonfunctional requirement (NFR). In the previous paper, we focused on NFR of OSS and proposed an evaluation method based on the maturity model of OSS community. Based on the model, we tried to evaluate four major OSS communities. For the evaluation, we used human knowledge of targeted OSS community. However it was not clear how to evaluate individual OSS project in OSS community. In this paper, we focused on continuity of OSS project, as it is one of the most important factors for users to make a decision. In order to evaluate continuity, we proposed a growth model of OSS project, which is based on the size and activity of OSS Project. We evaluated the growth model using information retrieved from OSS communities from both OSS community sites and sourcecode repositories. (C) 2015 The Authors. Published by Elsevier B.V.
The products of Open source Software (OSS) projects are widely used even in commercial mission-critical and high-availability systems. This is because both the quality of these software products is high enough for the...
详细信息
The products of Open source Software (OSS) projects are widely used even in commercial mission-critical and high-availability systems. This is because both the quality of these software products is high enough for these applications and the support of software could fulfill the requirement. In general, when one wants to adopt OSS as a part of computer systems, it is required to examine the functional requirement (FR) for the OSS as well as nonfunctional requirement (NFR). In the previous paper, we focused on NFR of OSS and proposed an evaluation method based on the maturity model of OSS community. Based on the model, we tried to evaluate four major OSS communities. For the evaluation, we used human knowledge of targeted OSS community. However it was not clear how to evaluate individual OSS project in OSS community. In this paper, we focused on continuity of OSS project, as it is one of the most important factors for users to make a decision. In order to evaluate continuity, we proposed a growth model of OSS project, which is based on the size and activity of OSS Project. We evaluated the growth model using information retrieved from OSS communities from both OSS community sites and sourcecode repositories.
Online Question and Answer websites for developers have emerged as the main forums for interaction during the software development process. The veracity of an answer in such websites is typically verified by the numbe...
详细信息
ISBN:
(纸本)9781450320382
Online Question and Answer websites for developers have emerged as the main forums for interaction during the software development process. The veracity of an answer in such websites is typically verified by the number of 'upvotes' that the answer garners from peer programmers using the same forum. Although this mechanism has proved to be extremely successful in rating the usefulness of the answers, it does not lend itself very elegantly to model the expertise of a user in a particular domain. In this paper, we propose a model to rank the expertise of the developers in a target domain by mining their activity in different opensource projects. To demonstrate the validity of the model, we built a recommendation system for StackOverflow which uses the data mined from GitHub.
Software is more than cards, tapes and disks; it is the meaning of the bits that they contain. To properly preserve the history of computing, we must preserve the software code, especially the sourcecode, for analysi...
详细信息
Software is more than cards, tapes and disks; it is the meaning of the bits that they contain. To properly preserve the history of computing, we must preserve the software code, especially the sourcecode, for analysis by future historians.
暂无评论