检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Kouzelis, Theodoros Plitsis, Manos Nicolaou, Mihalis A. Panagakis, Yannis National Technical University of Athens Athens Greece Archimedes AI Athena RC Athens Greece Institute for Language and Speech Athena RC Processing Athens Greece Department of Informatics and Telecommunications National and Kapodistrian University of Athens Athens Greece Computation-based Science and Technology Research Center The Cyprus Institute Nicosia Cyprus

Recent advances in Diffusion Models (DMs) have led to significant progress in visual synthesis and editing tasks, establishing them as a strong competitor to Generative Adversarial Networks (GANs). However, the latent space of DMs is not as well understood as that of GANs. Recent research has focused on unsupervised semantic discovery in the latent space of DMs by leveraging the bottleneck layer of the denoising network, which has been shown to exhibit properties of a semantic latent space. However, these approaches are limited to discovering global attributes. In this paper we address, the challenge of local image manipulation in DMs and introduce an unsupervised method to factorize the latent semantics learned by the denoising network of pre-trained DMs. Given an arbitrary image and defined regions of interest, we utilize the Jacobian of the denoising network to establish a relation between the regions of interest and their corresponding subspaces in the latent space. Furthermore, we disentangle the joint and individual components of these subspaces to identify latent directions that enable local image manipulation. Once discovered, these directions can be applied to different images to produce semantically consistent edits, making our method suitable for practical applications. Experimental results on various datasets demonstrate that our method can produce semantic edits that are more localized and have better fidelity compared to the state-of-the-art. https://***/localdiff/. © 2024, CC BY.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

An Asynchronous WFST-Based Decoder for Automatic speech Recognition

An Asynchronous WFST-Based Decoder for Automatic Speech Reco...

引用

IEEE International Conference on Acoustics, speech and Signal processing

作者： Hang Lv Zhehuai Chen Hainan Xu Daniel Povey Lei Xie Sanjeev Khudanpur Audio Speech and Language Processing Lab (ASLP@NPU) School of Computer Science Northwestern Polytechnical University Xi’an China Center of Language and Speech Processing Johns Hopkins University Baltimore MD USA Shanghai Jiao Tong University Xiaomi Corporation Beijing China Human Language Technology Center of Excellence Johns Hopkins University Baltimore MD USA

We introduce asynchronous dynamic decoder, which adopts an efficient A~* algorithm to incorporate big language models in the one-pass decoding for large vocabulary continuous speech recognition. Unlike standard one-pass decoding with on-the-fly composition decoder which might induce a significant computation overhead, the asynchronous dynamic decoder has a novel design where it has two fronts, with one performing "exploration" and the other "backfill". The computation of the two fronts alternates in the decoding process, resulting in more effective pruning than the standard one-pass decoding with an on-the-fly composition decoder. Experiments show that the proposed decoder works notably faster than the standard one-pass decoding with on-the-fly composition decoder, while the acceleration will be more obvious with the increment of data complexity.

关键词： Vocabulary Heuristic algorithms Conferences Computational modeling Signal processing algorithms Signal processing Decoding

来源：评论

学校读者我要写书评

暂无评论

Unsupervised acoustic unit discovery by leveraging a language-independent subword discriminative feature representation

arXiv

引用

arXiv 2021年

作者： Feng, Siyuan Zelasko, Piotr Moro-Velázquez, Laureano Scharenborg, Odette Multimedia Computing Group Delft University of Technology Delft Netherlands Center for Language and Speech Processing Johns Hopkins University BaltimoreMD United States Human Language Technology Center of Excellence Johns Hopkins University BaltimoreMD United States

This paper tackles automatically discovering phone-like acoustic units (AUD) from unlabeled speech data. Past studies usually proposed single-step approaches. We propose a two-stage approach: the first stage learns a subword-discriminative feature representation, and the second stage applies clustering to the learned representation and obtains phone-like clusters as the discovered acoustic units. In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual out-of-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent. In the second stage, segment-level k-means is adopted, and two methods to represent the variable-length speech segments as fixed-dimension feature vectors are compared. Experiments on a very low-resource Mboshi language corpus show that our approach outperforms state-of-the-art AUD in both normalized mutual information (NMI) and F-score. The multilingual ASR improved upon the monolingual ASR in providing OOD phone labels and in estimating the phone boundaries. A comparison of our systems with and without knowing the ground-truth phone boundaries showed a 16% NMI performance gap, suggesting that the current approach can significantly benefit from improved phone boundary estimation. Copyright © 2021, The Authors. All rights reserved.

关键词： Telephone sets

来源：评论

学校读者我要写书评

暂无评论

Unsupervised Domain-Adaptive Semantic Segmentation for Surgical Instruments Leveraging Dropout-Enhanced Dual Heads and Coarse-Grained Classification Branch

IEEE Transactions on Medical Robotics and Bionics

引用

IEEE Transactions on Medical Robotics and Bionics 2025年

作者： Li, Ziqian Wang, Zhengyu Xu, Xinzhou Chen, Yongfa Schuller, Bjorn W. Hefei University of Technology School of Mechanical Engineering Hefei China Nanjing University of Posts and Telecommunications School of Internet of Things Nanjing China Graz University of Technology Signal Processing and Speech Communication Laboratory Graz Austria Chair of Health Informatics Munich Germany Munich Data Science Institute Munich Germany Munich Center for Machine Learning Munich Germany Imperial College London GLAM – the Group on Language Audio and Music London United Kingdom

Accurate semantic segmentation for surgical instruments is crucial in robot-assisted minimally invasive surgery, mainly regarded as a core module in surgical-instrument tracking and operation guidance. Nevertheless, it is usually difficult for existing semantic surgical-instrument segmentation approaches to adapt to unknown surgical scenes, particularly due to their insufficient consideration for reducing the domain gaps across different scenes. To address this issue, we propose an unsupervised domain-adaptive semantic segmentation approach for surgical instruments, leveraging Dropout-enhanced Dual Heads and Coarse-Grained classification branch (D2HCG). The proposed approach comprises dropout-enhanced dual heads for diverse feature representation, and a coarse-grained classification branch for capturing complexities across varying granularities. This incorporates consistency loss functions targeting fine-grained features and coarse-grained granularities, aiming to reduce crossscene domain gaps. Afterwards, we perform experiments in crossscene surgical-instrument semantic segmentation cases, with the experimental results reporting the effectiveness for the proposed approach, compared with state-of-the-art semantic segmentation ones. © 2018 IEEE.

关键词： coarse-grained classification dropout enhancement Surgical instruments unsupervised domain-adaptive semantic segmentation

来源：评论

学校读者我要写书评

暂无评论

A parallelizable lattice rescoring strategy with neural language models

arXiv

引用

arXiv 2021年

作者： Li, Ke Povey, Daniel Khudanpur, Sanjeev Center for Language and Speech Processing The Johns Hopkins University BaltimoreMD21218 United States Human Language Technology Center of Excellence The Johns Hopkins University BaltimoreMD21218 United States Xiaomi Corp. Beijing China

This paper proposes a parallel computation strategy and a posterior-based lattice expansion algorithm for efficient lattice rescoring with neural language models (LMs) for automatic speech recognition. First, lattices from first-pass decoding are expanded by the proposed posterior-based lattice expansion algorithm. Second, each expanded lattice is converted into a minimal list of hypotheses that covers every arc. Each hypothesis is constrained to be the best path for at least one arc it includes. For each lattice, the neural LM scores of the minimal list are computed in parallel and are then integrated back to the lattice in the rescoring stage. Experiments on the Switchboard dataset show that the proposed rescoring strategy obtains comparable recognition performance and generates more compact lattices than a competitive baseline method. Furthermore, the parallel rescoring method offers more flexibility by simplifying the integration of PyTorch-trained neural LMs for lattice rescoring with Kaldi. Copyright © 2021, The Authors. All rights reserved.

关键词： speech recognition

来源：评论

学校读者我要写书评

暂无评论

Sources of Transfer in Multilingual Named Entity Recognition

arXiv

引用

arXiv 2020年

作者： Mueller, David Andrews, Nicholas Dredze, Mark Center for Language and Speech Processing Johns Hopkins University Human Language Technology Center of Excellence Johns Hopkins University

Named-entities are inherently multilingual, and annotations in any given language may be limited. This motivates us to consider polyglot named-entity recognition (NER), where one model is trained using annotated data drawn from more than one language. However, a straightforward implementation of this simple idea does not always work in practice: naive training of NER models using annotated data drawn from multiple languages consistently underperforms models trained on monolingual data alone, despite having access to more training data. The starting point of this paper is a simple solution to this problem, in which polyglot models are fine-tuned on monolingual data to consistently and significantly outperform their monolingual counterparts. To explain this phenomena, we explore the sources of multilingual transfer in polyglot NER models and examine the weight structure of polyglot models compared to their monolingual counterparts. We find that polyglot models efficiently share many parameters across languages and that fine-tuning may utilize a large number of those parameters. Copyright © 2020, The Authors. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Modelling Collocations in OntoLex-FrAC

Modelling Collocations in OntoLex-FrAC

引用

2022 Globalex Workshop on Linked Lexicography, GWLL 2022

作者： Chiarcos, Christian Gkirtzou, Katerina Ionov, Maxim Kabashi, Besim Khan, Anas Fahad Truică, Ciprian-Octavian Applied Computational Linguistics Goethe University Frankfurt Frankfurt am Main Germany Institute for Digital Humanities University of Cologne Germany Institute of Language and Speech Processing Athena Research Center Athens Greece Computational and Corpus Linguistics Friedrich-Alexander University of Erlangen-Nuremberg Germany Istituto di Linguistica Computazionale A. Zampolli Consiglio Nazionale delle Ricerche Italy Department of Information Technology Uppsala University Sweden

ISBN: (纸本)9791095546924

Following presentations of frequency and attestations, and embeddings and distributional similarity, this paper introduces the third cornerstone of the emerging OntoLex module for Frequency, Attestation and Corpus-based Information, OntoLex-FrAC. We provide an RDF vocabulary for collocations, established as a consensus over contributions from five different institutions and numerous data sets, with the goal of eliciting feedback from reviewers, workshop audience and the scientific community in preparation of the final consolidation of the OntoLex-FrAC module, whose publication as a W3C community report is foreseen for the end of this year. The novel collocation component of OntoLex-FrAC is described in application to a lexicographic resource and corpus-based collocation scores available from the web, and finally, we demonstrate the capability and genericity of the model by showing how to retrieve and aggregate collocation information by means of SPARQL, and its export to a tabular format, so that it can be easily processed in downstream applications. © European language Resources Association (ELRA)

关键词： collocation analysis lexical resources OntoLex standards

来源：评论

学校读者我要写书评

暂无评论

DRAWING ORDER RECOVERY FOR HANDWRITING CHINESE CHARACTERS 44

DRAWING ORDER RECOVERY FOR HANDWRITING CHINESE CHARACTERS

引用

44th IEEE International Conference on Acoustics, speech and Signal processing (ICASSP)

作者： Zhao, Bocheng Yang, Minghao Tao, Jianhua Center for Language and Speech Processing The Johns Hopkins University Baltimore USA Human Language Technology Center of Excellence The Johns Hopkins University Baltimore USA

ISBN: (纸本)9781479981311

Recover drawing orders from a Chinese handwriting image is a challenge issue. Most of English drawing order recovery( DOR) methods perform unsatisfactorily in Chinese. This paper proposes a novel image-to-sequence algorithm to deal with Chinese DOR problem. The proposed method utilizes two regression convolution neural network(CNN) models to generate two corresponding pen-tip movement heat-maps. To estimate pen-tip movement for most of the normal states in writing process, the algorithm analyzes the above two heat-maps with a specifically designed framework. Then the drawing order is restored through a simple iteration process based on the proposed framework. Experiments on public online handwriting database show that our method have got a remarkable result for Chinese DOR tasks. In addition, for English tasks, our method performs superiorly among state-of-the-art methods.

关键词： Drawing order recovery Chinese handwriting Convolution neural network image-to-sequence model

来源：评论

学校读者我要写书评

暂无评论

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

arXiv

引用

arXiv 2020年

作者： Bhati, Saurabhchand Villalba, Jesús Żelasko, Piotr Dehak, Najim Center for Language and Speech Processing Human Language Technology Center of Excellence Johns Hopkins University BaltimoreMD United States

Unsupervised spoken term discovery consists of two tasks: finding the acoustic segment boundaries and labeling acoustically similar segments with the same labels. We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments. Therefore, for strong segmentation performance, it is crucial that the features represent the phonetic properties of a frame more than other factors of variability. We achieve this via a self-expressing autoencoder framework. It consists of a single encoder and two decoders with shared weights. The encoder projects the input features into a latent representation. One of the decoders tries to reconstruct the input from these latent representations and the other from the self-expressed version of them. We use the obtained features to segment and cluster the speech data. We evaluate the performance of the proposed method in the Zero Resource 2020 challenge unit discovery task. The proposed system consistently outperforms the baseline, demonstrating the usefulness of the method in learning representations. Copyright © 2020, The Authors. All rights reserved.

关键词： Signal encoding

来源：评论

学校读者我要写书评

暂无评论

Single channel far field feature enhancement for speaker verification in the wild

arXiv

引用

arXiv 2020年

作者： Nidadavolu, Phani Sankar Kataria, Saurabh Perera, Paola Garcia Villalba, Jesus Dehak, Najim Center for Language and Speech Processing Human Language Technology Center of Excellence Johns Hopkins University BaltimoreMD United States

We investigated an enhancement and a domain adaptation approach to make speaker verification systems robust to perturbations of far-field speech. In the enhancement approach, using paired (parallel) reverberant-clean speech, we trained a supervised Generative Adversarial Network (GAN) along with a feature mapping loss. For the domain adaptation approach, we trained a Cycle Consistent Generative Adversarial Network (CycleGAN), which maps features from far-field domain to the speaker embedding training domain. This was trained on un-paired data in an unsupervised manner. Both networks, termed Supervised Enhancement Network (SEN) and Domain Adaptation Network (DAN) respectively, were trained with multi-Task objectives in (filter-bank) feature domain. On a simulated test setup, we first note the benefit of using feature mapping (FM) loss along with adversarial loss in SEN. Then, we tested both supervised and unsupervised approaches on several real noisy datasets. We observed relative improvements ranging from 2% to 31% in terms of DCF. Using three training schemes, we also establish the effectiveness of the novel DAN approach. Copyright © 2020, The Authors. All rights reserved.

关键词： speech enhancement

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：