We present a series of empirical studies aimed at illuminating more precisely the likely contribution of semantic roles in improving statistical machine translation accuracy. The experiments reported study several asp...
详细信息
In this paper, we deal with the problem of a large number of unaligned words in automatically learned word alignments for machine translation (MT). These unaligned words are the reason for ambiguous phrase pairs extra...
详细信息
The number of research publications in various disciplines is growing exponentially. Researchers and scientists are increasingly finding themselves in the position of having to quickly understand large amounts of tech...
详细信息
We present a characterization of a useful class of skills based on a graphical representation of an agent's interaction with its environment. Our characterization uses betweenness, a measure of centrality on graph...
详细信息
ISBN:
(纸本)9781605609492
We present a characterization of a useful class of skills based on a graphical representation of an agent's interaction with its environment. Our characterization uses betweenness, a measure of centrality on graphs. It captures and generalizes (at least intuitively) the bottleneck concept, which has inspired many of the existing skill-discovery algorithms. Our characterization may be used directly to form a set of skills suitable for a given task. More importantly, it serves as a useful guide for developing incremental skill-discovery algorithms that do not rely on knowing or representing the interaction graph in its entirety.
The recently introduced online confidence-weighted (CW) learning algorithm for binary classification performs well on many binary NLP tasks. However, for multi-class problems CW learning updates and inference cannot b...
详细信息
Sentiment analysis often relies on a semantic orientation lexicon of positive and negative words. A number of approaches have been proposed for creating such lexicons, but they tend to be computationally expensive, an...
详细信息
Argumentation is a process that occurs often in ill-defined domains and that helps deal with the illdefinedness. Typically a notion of "correctness" for an argument in an ill-defined domain is impossible to ...
详细信息
Argumentation is a process that occurs often in ill-defined domains and that helps deal with the illdefinedness. Typically a notion of "correctness" for an argument in an ill-defined domain is impossible to define or verify formally because the underlying concepts are open-textured and the quality of the argument may be subject to discussion or even expert disagreement. Previous research has highlighted the advantages of graphical representations for learning argumentation skills. A number of intelligent tutoring systems have been built that support students in rendering arguments graphically, as they learn argumentation skills. The relative instructional benefits of graphical argument representations have not been reliably shown, however. In this paper we present a formative evaluation of LARGO (Legal ARgument Graph Observer), a system that enables law students graphically to represent examples of legal interpretation with hypotheticals they observe while reading texts of U.S. Supreme Court oral arguments. We hypothesized that, compared to a text-based alternative, LARGO's diagramming language geared toward depicting hypothetical reasoning processes, coupled with non-directive feedback, helps students better extract the important information from argument transcripts and better learn argumentation skills. A first pilot study, conducted with volunteer first-semester law students, provided support for the hypothesis. The system especially helped lower-aptitude students learn argumentation skills, and LARGO improved the reading skills of students as they studied expert arguments. A second study with LARGO was conducted as a mandatory part of a first-semester University law course. Although there were no differences in the learning outcomes of the two conditions, the second study showed some evidence that those students who engaged more with the argument diagrams through the advice did better than the text condition. One lesson learned from these two studies is that gr
We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularizati...
详细信息
ISBN:
(纸本)9781615679119
We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data.
Audio segmentation is an essential preprocessing step in several audio processing applications with a significant impact e.g. on speech recognition performance. We introduce a novel framework which combines the advant...
详细信息
Audio segmentation is an essential preprocessing step in several audio processing applications with a significant impact e.g. on speech recognition performance. We introduce a novel framework which combines the advantages of different well known segmentation methods. An automatically estimated log-linear segment model is used to determine the segmentation of an audio stream in a holistic way by a maximum a posteriori decoding strategy, instead of classifying change points locally. A comparison to other segmentation techniques in terms of speech recognition performance is presented, showing a promising segmentation quality of our approach.
In this paper, we survey some central issues in the historical, current, and future landscape of statistical machine translation (SMT) research, taking as a starting point an extended three-dimensional MT model space....
详细信息
ISBN:
(纸本)9781424454785
In this paper, we survey some central issues in the historical, current, and future landscape of statistical machine translation (SMT) research, taking as a starting point an extended three-dimensional MT model space. We posit a socio-geographical conceptual disparity hypothesis, that aims to explain why language pairs like Chinese-English have presented MT with so much more difficulty than others. The evolution from simple token-based to segment-based to tree-based syntactic SMT is sketched. For tree-based SMT, we consider language bias rationales for selecting the degree of compositional power within the hierarchy of expressiveness for transduction grammars (or synchronous grammars). This leads us to inversion transductions and the ITG model prevalent in current state-of-the-art SMT, along with the underlying ITG hypothesis, which posits a language universal. Against this backdrop, we enumerate a set of key open questions for syntactic SMT. We then consider the more recent area of semantic SMT. We list principles for successful application of sense disambiguation models to semantic SMT, and describe early directions in the use of semantic role labeling for semantic SMT.
暂无评论