检索结果-内蒙古大学图书馆

Linear Discriminant Analysis Applied to the Detection of Allergic Rhinitis in Patients

学校读者我要写书评

暂无评论

Linear Discriminant Analysis Applied to the Detection of All...

2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020

作者： Stainhaouer, Gregory Bakamidis, Stylianos Dologlou, Ioannis Institute for Language and Speech Processing ILSP/ATHENA R.C Athens Greece

ISBN: (纸本)9781728176246

This paper presents a system to detect symptoms of allergic rhinitis remotely by using uttered speech and by exploiting its specific spectral characteristics. Based on the principles of adaptive modeling and fundamental frequency variations (jitter) as well as speech analysis by means of acoustic models, the proposed technique achieves an efficient classification of patients from uttered speech using a Linear Discriminant Analysis (LDA) algorithm. A Singular Value Decomposition (SVD) based iterative approach is used for the accurate estimation of the jitter and Hidden Markov Models (HMM) are implemented to model the 32 phonemes. The final decision is derived by optimally combining the individual estimates to form vectors that are processed by an LDA based iterative algorithm providing better clustering of healthy and allergic subjects. © 2020 IEEE.

关键词： Singular value decomposition

A Multi-Task BERT Model for Schema-Guided Dialogue State Tracking

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Kapelonis, Eleftherios Georgiou, Efthymios Potamianos, Alexandros School of Ece National Technical University of Athens Athens Greece Institute for Language and Speech Processing Athena Research Center Athens Greece

Task-oriented dialogue systems often employ a Dialogue State Tracker (DST) to successfully complete conversations. Recent state-of-the-art DST implementations rely on schemata of diverse services to improve model robustness and handle zeroshot generalization to new domains [1], however such methods [2, 3] typically require multiple large scale transformer models and long input sequences to perform well. We propose a single multi-task BERT-based model that jointly solves the three DST tasks of intent prediction, requested slot prediction and slot filling. Moreover, we propose an efficient and parsimonious encoding of the dialogue history and service schemata that is shown to further improve performance. Evaluation on the SGD dataset shows that our approach outperforms the baseline SGP-DST by a large margin and performs well compared to the state-of-the-art, while being significantly more computationally efficient. Extensive ablation studies are performed to examine the contributing factors to the success of our model. © 2022, CC BY.

关键词： Zero-shot learning

Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Georgiou, Efthymios Kritsis, Kosmas Paraskevopoulos, Georgios Katsamanis, Athanasios Katsouros, Vassilis Potamianos, Alexandros Institute for Language and Speech Processing Athena Research Center Athens Greece School of Ece National Technical University of Athens Athens Greece

Recent deep learning Text-to-speech (TTS) systems have achieved impressive performance by generating speech close to human parity. However, they suffer from training stability issues as well as incorrect alignment of the intermediate acoustic representation with the input text sequence. In this work, we introduce Regotron, a regularized version of Tacotron2 which aims to alleviate the training issues and at the same time produce monotonic alignments. Our method augments the vanilla Tacotron2 objective function with an additional term, which penalizes non-monotonic alignments in the location-sensitive attention mechanism. By properly adjusting this regularization term we show that the loss curves become smoother, and at the same time Regotron consistently produces monotonic alignments in unseen examples even at an early stage (13% of the total number of epochs) of its training process, whereas the fully converged Tacotron2 fails to do so. Moreover, our proposed regularization method has no additional computational overhead, while reducing common TTS mistakes and achieving slighlty improved speech naturalness according to subjective mean opinion scores (MOS) collected from 50 evaluators. © 2022, CC BY.

关键词： Alignment

SL-REDU GSL: A Large Greek Sign language Recognition Corpus

学校读者我要写书评

暂无评论

SL-REDU GSL: A Large Greek Sign Language Recognition Corpus

Acoustics, speech, and Signal processing Workshops (ICASSPW), IEEE International Conference on

作者： Katerina Papadimitriou Galini Sapountzaki Kyriaki Vasilaki Eleni Efthimiou Stavroula-Evita Fotinea Gerasimos Potamianos Department of Electrical & Computer Engineering University of Thessaly Volos Greece Department of Special Education University of Thessaly Volos Greece Institute for Language & Speech Processing Athena Research & Innovation Center Athens Greece

We present a large multi-signer video corpus for the Greek Sign language (GSL), suitable for the development and evaluation of GSL recognition algorithms. The database has been collected as part of the “SL-ReDu” project that focuses on the education use-case of systematic teaching of GSL as a second language (L2). The project aims to assist this process by allowing self-monitoring and objective assessment of GSL learners’ productions through the use of recognition technology, thus requiring suitable data resources relevant to the aforementioned use-case. To this end, we present the SL-ReDu GSL corpus, an extensive RGB+D video collection of 21 informants with a duration of 36 hours, recorded under studio conditions, consisting of: (i) isolated signs; (ii) continuous signing (annotated at the sentence level); and (iii) fingerspelling of words. We provide a detailed description of the design and acquisition methods used to develop it, along with corpus statistics and a comparison to existing sign language datasets. The SL-ReDu GSL corpus, as well as proposed frameworks for recognition experiments on it, are publicly available at https://***/corpus.

关键词：

HFabD+M: A Web-based Platform for Automated Hyperledger Fabric Deployment and Management

学校读者我要写书评

暂无评论

HFabD+M: A Web-based Platform for Automated Hyperledger Fabr...

Global Emerging Technology Blockchain Forum: Blockchain & Beyond (iGETblockchain), IEEE

作者： Ioannis Zikos Andreas Sendros George Drosatos Pavlos S. Efraimidis Department of Electrical and Computer Engineering Democritus University of Thrace Xanthi Greece Institute for Language and Speech Processing Athena Research Center Xanthi Greece

Hyperledger Fabric is an open-source private permissioned blockchain that supports the use of smart contracts (chaincode). It is aimed mainly at private networks of companies. To serve the different needs of each company and to be flexible in customer requirements, it consists of various adaptive components. Although this structure efficiently addresses a wide range of needs, deploying such a network for research purposes or rapid development is complex. In this paper, we present a web-based system architecture for the automated deployment of a Hyperledger Fabric network, and in addition, we describe the tools needed to manage and update such a network. Finally, as a proof-of-concept, we implement the proposed architecture to demonstrate the feasibility of our approach.

关键词： Distributed ledger Smart contracts Prototypes Systems architecture Companies Fabrics Blockchains

DESIGNING AND EVALUATING speech EMOTION RECOGNITION SYSTEMS: A REALITY CHECK CASE STUDY WITH IEMOCAP

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Antoniou, Nikolaos Katsamanis, Athanasios Giannakopoulos, Theodoros Narayanan, Shrikanth Behavioral Signal Technologies Los AngelesCA United States Institute for Language and Speech Processing Athena Research Center Athens Greece SAIL-University of Southern California Los AngelesCA United States

关键词： speech recognition

Alternating Objectives Generates Stronger PGD-Based Adversarial Attacks

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Nikolaos, Antoniou Georgiou, Efthymios Potamianos, Alexandros School of Electrical and Computer Engineering National Technical University of Athens Athens Greece Institute for Language and Speech Processing Athena Research Center Athens Greece

Designing powerful adversarial attacks is of paramount importance for the evaluation of p-bounded adversarial defenses. Projected Gradient Descent (PGD) is one of the most effective and conceptually simple algorithms to generate such adversaries. The search space of PGD is dictated by the steepest ascent directions of an objective. Despite the plethora of objective function choices, there is no universally superior option and robustness overestimation may arise from ill-suited objective selection. Driven by this observation, we postulate that the combination of different objectives through a simple loss alternating scheme renders PGD more robust towards design choices. We experimentally verify this assertion on a synthetic-data example and by evaluating our proposed method across 25 different ∞-robust models and 3 datasets. The performance improvement is consistent, when compared to the single loss counterparts. In the CIFAR-10 dataset, our strongest adversarial attack outperforms all of the white-box components of AutoAttack (AA) ensemble [1], as well as the most powerful attacks existing on the literature, achieving state-of-the-art results in the computational budget of our study (T = 100, no restarts). Copyright © 2022, The Authors. All rights reserved.

关键词： Budget control

Conditional Drums Generation using Compound Word Representations

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Makris, Dimos Zixun, Guo Kaliakatsos-Papakostas, Maximos Herremans, Dorien Information Systems Technology and Design Singapore University of Technology and Design Singapore Institute for Language and Speech Processing R.C. Athena Athens Greece

The field of automatic music composition has seen great progress in recent years, specifically with the invention of transformer-based architectures. When using any deep learning model which considers music as a sequence of events with multiple complex dependencies, the selection of a proper data representation is crucial. In this paper, we tackle the task of conditional drums generation using a novel data encoding scheme inspired by the Compound Word representation, a tokenization process of sequential data. Therefore, we present a sequence-to-sequence architecture where a Bidirectional Long short-term memory (BiLSTM) Encoder receives information about the conditioning parameters (i.e., accompanying tracks and musical attributes), while a Transformer-based Decoder with relative global attention produces the generated drum sequences. We conducted experiments to thoroughly compare the effectiveness of our method to several baselines. Quantitative evaluation shows that our model is able to generate drums sequences that have similar statistical distributions and characteristics to the training corpus. These features include syncopation, compression ratio, and symmetry among others. We also verified, through a listening test, that generated drum sequences sound pleasant, natural and coherent while they "groove" with the given accompaniment. © 2022, CC BY.

关键词： Music

Overview of MWE history, challenges, and horizons: standing at the 20th anniversary of the MWE workshop series via MWE-UD2024

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Han, Lifeng Evang, Kilian Bhatia, Archna Bouma, Gosse Doğruöz, A. Seza Garcia, Marcos Giouli, Voula Nivre, Joakim Rademacher, Alexandre LIACS LUMC Leiden University Netherlands University of Manchester United Kingdom Heinrich Heine University Düsseldorf Germany Institute for Human and Machine Cognition United States Groningen University Netherlands Ghent University Belgium University of Santiago de Compostela Spain Institute for Language & Speech Processing ATHENA RC Greece Uppsala University Research Institutes of Sweden Sweden IBM Research Brazil

Starting in 2003 when the first MWE workshop was held with ACL in Sapporo, Japan, this year, the joint workshop of MWE-UD co-located with the LREC-COLING 2024 conference marked the 20th anniversary of MWE workshop events over the past nearly two decades. Standing at this milestone, we look back to this workshop series and summarise the research topics and methodologies researchers have carried out over the years. We also discuss the current challenges that we are facing and the broader impacts/synergies of MWE research within the CL and NLP fields. Finally, we give future research perspectives. We hope this position paper can help researchers, students, and industrial practitioners interested in MWE get a brief but easy understanding of its history, current, and possible future. © 2024, CC BY.

关键词：