The integration of language models for neural machine translation has been extensively studied in the past. It has been shown that an external language model, trained on additional target-side monolingual data, can he...
详细信息
Document-level context for neural machine translation (NMT) is crucial to improve the translation consistency and cohesion, the translation of ambiguous inputs, as well as several other linguistic phenomena. Many work...
详细信息
Micropaleontology is a critical tool for determining the ages of geologic records, reconstructing ancient environments, and monitoring modern ecosystem health. However, most students are not exposed to micropaleontolo...
详细信息
This paper presents a multi-objective version of the Cat Swarm Optimization Algorithm called the Grid-based Multiobjective Cat Swarm Optimization Algorithm (GMOCSO). Convergence and diversity preservation are the two ...
详细信息
Protecting privacy in contemporary NLP models is gaining in importance. So does the need to mitigate social biases of such models. But can we have both at the same time? Existing research suggests that privacy preserv...
详细信息
This article proposes a novel ontology design for intelligent controlling of traffic signals, considering the investigated factors, crowded factors, road factors, visibility conditions, and emergency situations. Essen...
详细信息
ABSTRACTABSTRACTAggression during adolescence can lead to unhealthy outcomes. Prior research suggests that youth with disruptive behaviors filter information in a distorted manner and struggle with social information ...
详细信息
ABSTRACTABSTRACTAggression during adolescence can lead to unhealthy outcomes. Prior research suggests that youth with disruptive behaviors filter information in a distorted manner and struggle with social information processing skills. Teaching effective social processing skills can help reduce aggressive behaviors. The current study aimed to investigate the effects of customized aggression interventions on adolescent males. We used a theory-informed framework to guide the development of the interventions using anN= 1/ABA single-case research design with four male adolescents aged 13–14 who volunteered to participate in our study. A female licensed clinical mental health counselor designed and delivered the interventions and collected the outcome data. Participants completed a series of temporal assessments examining proactive, reactive, and total aggression. We hypothesized that customized interventions would be an effective means to address and reduce problematic aggressive behaviors. The data produced small to large effect sizes for three of the four participants, and statistically significant differences were observed between phases. The results have implications for the contributions of utilizing social information processing theory-informed customized aggression interventions with adolescents using single-case research design methodology.
This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech ...
详细信息
This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models. In hard distillation, a teacher model transcribes large amounts of unlabelled speech to train a student model. Soft distillation is another popular KD method that distills the output logits of the teacher model. Due to the nature of RNN-T alignments, applying soft distillation between RNNT architectures having different posterior distributions is challenging. In addition, bad teachers having high word-error-rate (WER) reduce the efficacy of KD. We investigate how to effectively distill knowledge from variable quality ASR teachers, which has not been studied before to the best of our knowledge. We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models, especially for bad teachers. We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER. We conduct experiments on public datasets namely SpeechStew and LibriSpeech, and on in-house production data.
We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) where the transition probabilities between hidden states are modeled and learned explicitly. Most contem...
We investigate a novel modeling approach for end-to-end neural network training using hidden Markov models (HMM) where the transition probabilities between hidden states are modeled and learned explicitly. Most contemporary sequence-to-sequence models allow for from-scratch training by summing over all possible label segmentations in a given topology. In our approach there are explicit, learnable probabilities for transitions between segments as opposed to a blank label that implicitly encodes duration *** implement a GPU-based forward-backward algorithm that enables the simultaneous training of label and transition *** investigate recognition results and additionally Viterbi alignments of our models. We find that while the transition model training does not improve recognition performance, it has a positive impact on the alignment quality. The generated alignments are shown to be viable targets in state-of-the-art Viterbi trainings.
Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid mod...
详细信息
Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.
暂无评论