The proceedings contain 63 papers. The topics discussed include: deep graph convolutional encoders for structured data to text generation;syntactic manipulation for generating more diverse and interesting texts;automa...
ISBN:
(纸本)9781948087865
The proceedings contain 63 papers. The topics discussed include: deep graph convolutional encoders for structured data to text generation;syntactic manipulation for generating more diverse and interesting texts;automated learning of templates for data-to-text generation: comparing rule-based, statistical and neural methods;end-to-end content and plan selection for data-to-text generation;explainable autonomy: a study of explanation styles for building clear mental models;template-based multilingual football reports generation using wikidata as a knowledge base;sequence-to-sequence models for data-to-text naturallanguage generation: word- vs. character-based processing and output diversity;and generation of company descriptions using concept-to-text and text-to-text deep models: dataset collection and systems evaluation.
In this paper we introduce an unsupervised learning approach for WordNet construction. The whole construction method is an Expectation Maximization (EM) approach which uses Princeton WordNet 3.0 (PWN) and a corpus as ...
详细信息
In this paper we introduce an unsupervised learning approach for WordNet construction. The whole construction method is an Expectation Maximization (EM) approach which uses Princeton WordNet 3.0 (PWN) and a corpus as the data source for unsupervised learning. The proposed method can be used to construct WordNet in any language. Links between PWN synsets and target language words are extracted using a bilingual dictionary. For each of these links a parameter is defined that shows probability of selecting PWN synset for target language word in corpus. Model parameters are adjusted in an iterative fashion. In our experiments on Persian language, by selecting 10% of highly probable links trained by the EM method, a Persian WordNet was obtained that covered 7,109 out of 11,076 distinct words and 9,427 distinct PWN synsets with a precision of more than 86%.
Acquiring knowledge beyond the usual expertise is a critical challenge when implementing semantic information solutions for querying a knowledge base. To address this difficulty, one proposed solution was to use knowl...
详细信息
In modern code reviews, many artifacts play roles in knowledge-sharing and documentation: summaries, test plans, and comments, etc. Improving developer tools and facilitating better code reviews require an understandi...
详细信息
ISBN:
(纸本)9781450394130
In modern code reviews, many artifacts play roles in knowledge-sharing and documentation: summaries, test plans, and comments, etc. Improving developer tools and facilitating better code reviews require an understanding of the quality of pull requests and their artifacts. This is difficult to measure, however, because they are often free-form naturallanguage and unstructured text data. In this paper, we focus on measuring the quality of test plans at Meta. Test plans are used as a communication mechanism between the author of a pull request and its reviewers, serving as walkthroughs to help confirm that the changed code is behaving as expected. We collected developer opinions on over 650 test plans from more than 500 Meta developers, then introduced a transformer-based model to leverage the success of naturallanguageprocessing (NLP) techniques in the code review domain. In our study, we show that the learned model is able to capture the sentiment of developers and reflect a correlation of test plan quality with review engagement and reversions: compared to a decision tree model, our proposed transformer-based model achieves a 7% higher F1-score. Finally, we present a case study of how such a metric may be useful in experiments to inform improvements in developer tools and experiences.
Rigorous and interactive class discussions that support students to engage in high-level thinking and reasoning are essential to learning and are a central component of most teaching interventions. However, formally a...
详细信息
ISBN:
(纸本)9783031363351;9783031363368
Rigorous and interactive class discussions that support students to engage in high-level thinking and reasoning are essential to learning and are a central component of most teaching interventions. However, formally assessing discussion quality 'at scale' is expensive and infeasible for most researchers. In this work, we experimented with various modern naturallanguageprocessing (NLP) techniques to automatically generate rubric scores for individual dimensions of classroom text discussion quality. Specifically, we worked on a dataset of 90 classroom discussion transcripts consisting of over 18000 turns annotated with fine-grained Analyzing Teaching Moves (ATM) codes and focused on four Instructional Quality Assessment (IQA) rubrics. Despite the limited amount of data, our work shows encouraging results in some of the rubrics while suggesting that there is room for improvement in the others. We also found that certain NLP approaches work better for certain rubrics.
作者:
Sonntag, Daniel
Stuhlsatzenhausweg 3 66123 Saarbruecken Germany
Dialogue-based Question Answering (QA) in the context of information seeking applications is a highly complex user interaction task. QA systems normally include various naturallanguageprocessing components (i.e., co...
详细信息
The proceedings contain 136 papers. The topics discussed include: financial engineering and computer science: application of deep learning in the field of big data finance: taking stock forecasting analysis for exampl...
ISBN:
(纸本)9798400718144
The proceedings contain 136 papers. The topics discussed include: financial engineering and computer science: application of deep learning in the field of big data finance: taking stock forecasting analysis for example;exploration of distributed power transaction mechanism based on blockchain;cohesive group queries of collective spatial keywords based on road network;power grid data model based on IFC and its application in the whole process management;research on Russian cultural transliteration algorithm based on Hidden Markov model;research on computer naturallanguageprocessing intelligent question answering system based on knowledge graph;three-dimensional reconstruction of tomato fruit based on multi-view images;and research on real-time processing and stream analysis of unstructured data based on big data platforms.
language modeling is a fundamental task for building any naturallanguageprocessing application or language understandable intelligent system. language modeling becomes difficult with the increase of input sequence l...
详细信息
At the moment ontology-based applications do not provide a solution to handle vague information. Recently, some tentatives have been made to integrate fuzzy set theory in ontology domain. This paper presents an approa...
详细信息
ISBN:
(纸本)3540362975
At the moment ontology-based applications do not provide a solution to handle vague information. Recently, some tentatives have been made to integrate fuzzy set theory in ontology domain. This paper presents an approach to handle the nuances of naturallanguages (i.e. adjectives, adverbs) in the fuzzy ontologies context. On the one hand, we handle query-processing to evaluate vague information. On the other hand, we manage the knowledge domain extending ontology properties with quality concepts.
This paper describes Pique, a web-based recommendation system that applies word embedding and a sequence generator to present students with a sequence of scientific paper recommendations personalized to their backgrou...
详细信息
ISBN:
(纸本)9783030232078;9783030232061
This paper describes Pique, a web-based recommendation system that applies word embedding and a sequence generator to present students with a sequence of scientific paper recommendations personalized to their background and interest. The use of naturallanguageprocessing (NLP) on learning materials enables educational environments to present students with papers with content that is responsive to their knowledge history and interests. Instructors tend to focus on presentation of learning materials based on overall learning goals in a course rather than personalizing the presentation for each student. The ultimate goal of Pique is to provide learners with content that will encourage their curiosity to learn more by presenting sequences of papers with increasingly more novel content. We piloted Pique with students in a course and report on their responses to the recommended sequences. The next steps are to improve the identification of relevant keywords to represent content and the algorithm for the sequence generator.
暂无评论