Using cloud-based computer vision services is gaining traction, where developers access AI-powered components through familiar RESTful APIs, not needing to orchestrate large training and inference infrastructures or c...
详细信息
Using cloud-based computer vision services is gaining traction, where developers access AI-powered components through familiar RESTful APIs, not needing to orchestrate large training and inference infrastructures or curate/label training datasets. However, while these APIs seem familiar to use, their non-deterministic run-time behaviour and evolution is not adequately communicated to developers. Therefore, improving these services' API documentation is paramount-more extensive documentation facilitates the development process of intelligent software. In a prior study, we extracted 34 API documentation artefacts from 21 seminal works, devising a taxonomy of five key requirements to produce quality API documentation. We extend this study in two ways. First, by surveying 104 developers of varying experience to understand what API documentation artefacts are of most value to practitioners. Second, identifying which of these highly-valued artefacts are or are not well-documented through a case study in the emerging computer vision service domain. We identify: (i) several gaps in the software engineering literature, where aspects of API documentation understanding is/is not extensively investigated;and (ii) where industry vendors (in contrast) document artefacts to better serve their end-developers. We provide a set of recommendations to enhance intelligent software documentation for both vendors and the wider research community.
Background: To determine the comprehensiveness of neonatal resuscitation documentation and to determine the association of various patient, provider and institutional factors with completeness of neonatal documentatio...
详细信息
Background: To determine the comprehensiveness of neonatal resuscitation documentation and to determine the association of various patient, provider and institutional factors with completeness of neonatal documentation. Methods: Multi-center retrospective chart review of a sequential sample of very low birth weight infants born in 2013. The description of resuscitation in each infant's record was evaluated for the presence of 29 Resuscitation Data Items and assigned a Number of items documented per record. Covariates associated with this Assessment were identified. Results: Charts of 263 infants were reviewed. The mean gestational age was 28.4 weeks, and the mean birth weight 1050 g. Of the infants, 69 % were singletons, and 74 % were delivered by Cesarean section. A mean of 13.2 (SD 3.5) of the 29 Resuscitation Data Items were registered for each birth. Items most frequently present were;review of obstetric history (98 %), Apgar scores (96 %), oxygen use (77 %), suctioning (71 %), and stimulation (62 %). In our model adjusted for measured covariates, the institution was significantly associated with documentation. Conclusions: Neonatal resuscitation documentation is not standardized and has significant variation. Variation in documentation was mostly dependent on institutional factors, not infant or provider characteristics. Understanding this variation may lead to efforts to standardize documentation of neonatal resuscitation.
Many data scientists use computational notebooks to test and present their work, as a notebook can weave code and documentation together (computational narrative), and support rapid iteration on code experiments. Howe...
详细信息
ISBN:
(纸本)9781450380959
Many data scientists use computational notebooks to test and present their work, as a notebook can weave code and documentation together (computational narrative), and support rapid iteration on code experiments. However, it is not easy to write good documentation in a data science notebook, partially because there is a lack of a corpus of well-documented notebooks as exemplars for data scientists to follow. To cope with this challenge, thiswork looks at Kaggle - a large online community for data scientists to host and participate in machine learning competitions - and considers highly-voted Kaggle notebooks as a proxy for well-documented notebooks. Through a qualitative analysis at both the notebook level and the markdown-cell level, we fnd these notebooks are indeed well documented in reference to previous literature. Our analysis also reveals nine categories of content that data scientists write in their documentation cells, and these documentation cells often interplay with diferent stages of the data science lifecycle. We conclude the paper with design implications and future research directions.
Computational notebooks have become the go-to way for solving data-science problems. While they are designed to combine code and documentation, prior work shows that documentation is largely ignored by the developers ...
详细信息
ISBN:
(纸本)9798350329964
Computational notebooks have become the go-to way for solving data-science problems. While they are designed to combine code and documentation, prior work shows that documentation is largely ignored by the developers because of the manual effort. Automated documentation generation can help, but existing techniques fail to capture algorithmic details and developers often end up editing the generated text to provide more explanation and sub-steps. This paper proposes a novel machine-learning pipeline, Cell2Doc, for code cell documentation in Python data science notebooks. Our approach works by identifying different logical contexts within a code cell, generating documentation for them separately, and finally combining them to arrive at the documentation for the entire code cell. Cell2Doc takes advantage of the capabilities of existing pre-trained language models and improves their efficiency for code cell documentation. We also provide a new benchmark dataset for this task, along with a data-preprocessing pipeline that can be used to create new datasets. We also investigate an appropriate input representation for this task. Our automated evaluation suggests that our best input representation improves the pre-trained model's performance by 2.5x on average. Further, Cell2Doc achieves 1.33x improvement during human evaluation in terms of correctness, informativeness, and readability against the corresponding standalone pre-trained model.
暂无评论