检索结果-内蒙古大学图书馆

Automated Arabic Text Classification Using Hyperparameter Tuned Hybrid Deep Learning Model

学校读者我要写书评

暂无评论

computers, Materials & Continua 2023年第3期74卷 5447-5465页

作者： Badriyya B.Al-onazi Saud S.Alotaib Saeed Masoud Alshahrani Najm Alotaibi Mrim M.Alnfiai Ahmed S.Salama Manar Ahmed Hamza Department of Language Preparation Arabic Language Teaching InstitutePrincess Nourah bint Abdulrahman UniversityP.O.Box 84428Riyadh11671Saudi Arabia Department of Information Systems College of Computing and Information SystemUmm Al-Qura UniversitySaudi Arabia Department of Computer Science College of Computing and Information TechnologyShaqra UniversityShaqraSaudi Arabia Prince Saud AlFaisal Institute for Diplomatic Studies Saudi Arabia Department of Information Technology College of Computers and Information TechnologyTaif UniversityTaif P.O.Box 11099Taif21944Saudi Arabia Department of Electrical Engineering Faculty of Engineering&TechnologyFuture University in EgyptNew Cairo11845Egypt Department of Computer and Self Development Preparatory Year DeanshipPrince Sattam bin Abdulaziz UniversityAlKharjSaudi Arabia

The text classification process has been extensively investigated in various languages,especially *** classification models are vital in several Natural language Processing(NLP)*** Arabic language has a lot of *** instance,it is the fourth mostly-used language on the internet and the sixth official language of ***,there are few studies on the text classification process in Arabic.A few text classification studies have been published earlier in the Arabic *** general,researchers face two challenges in the Arabic text classification process:low accuracy and high dimensionality of the *** this study,an Automated Arabic Text Classification using Hyperparameter Tuned Hybrid Deep Learning(AATC-HTHDL)model is *** major goal of the proposed AATC-HTHDL method is to identify different class labels for the Arabic *** first step in the proposed model is to pre-process the input data to transform it into a useful *** Term Frequency-Inverse Document Frequency(TF-IDF)model is applied to extract the feature ***,the Convolutional Neural Network with Recurrent Neural Network(CRNN)model is utilized to classify the Arabic *** the final stage,the Crow Search Algorithm(CSA)is applied to fine-tune the CRNN model’s hyperparameters,showing the work’s *** proposed AATCHTHDL model was experimentally validated under different parameters and the outcomes established the supremacy of the proposed AATC-HTHDL model over other approaches.

关键词： Hybrid deep learning natural language processing arabic language text classification parameter tuning

One size does not fit all: Investigating strategies for differentially-private learning across NLP tasks

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Senge, Manuel Igamberdiev, Timour Habernal, Ivan Trustworthy Human Language Technologies Department of Computer Science Technical University of Darmstadt Germany

Preserving privacy in contemporary NLP models allows us to work with sensitive data, but unfortunately comes at a price. We know that stricter privacy guarantees in differentially-private stochastic gradient descent (DP-SGD) generally degrade model performance. However, previous research on the efficiency of DP-SGD in NLP is inconclusive or even counter-intuitive. In this short paper, we provide an extensive analysis of different privacy preserving strategies on seven downstream datasets in five different 'typical' NLP tasks with varying complexity using modern neural models based on BERT and XtremeDistil architectures. We show that unlike standard non-private approaches to solving NLP tasks, where bigger is usually better, privacy-preserving strategies do not exhibit a winning pattern, and each task and privacy regime requires a special treatment to achieve adequate performance. © 2021, CC BY-SA.

关键词： Natural language processing systems

Using ASR methods for OCR 15

学校读者我要写书评

暂无评论

Using ASR methods for OCR

15th IAPR International Conference on Document Analysis and Recognition, ICDAR 2019

作者： Arora, Ashish Garcia, Paola Watanabe, Shinji Manohar, Vimal Shao, Yiwen Khudanpur, Sanjeev Chang, Chun Chieh Rekabdar, Babak Babaali, Bagher Povey, Daniel Etter, David Raj, Desh Hadian, Hossein Trmal, Jan Center for Language and Speech Processing Johns Hopkins University Baltimore United States Human Language Technology Center of Excellence Johns Hopkins University Baltimore United States Department of Computer Engineering Sharif University of Technology Iran School of Mathematics Statistics and Computer Sciences College of Science University of Tehran Iran

ISBN: (纸本)9781728128610

Hybrid deep neural network hidden Markov models (DNN-HMM) have achieved impressive results on large vocabulary continuous speech recognition (LVCSR) tasks. However, the recent approaches using DNN-HMM models are not explored much for text recognition. Inspired by the current work in automatic speech recognition (ASR) and machine translation, we present an open vocabulary sub-word text recognition system. The sub-word lexicon and sub-word language model (LM) helps in overcoming the challenge of recognizing out of vocabulary (OOV) words, and a time delay neural network (TDNN) and convolution neural network (CNN) based DNN-HMM optical model (OM) efficiently models the sequence dependency in the line image. We present results on 12 datasets with training data varying from 6k lines to 600k lines. The system is built for 8 languages, i.e., English, French, Arabic, Chinese, Farsi, Tamil, Russian, and Korean. We report competitive results on several commonly used handwritten and printed text datasets. © 2019 IEEE.

关键词： Hidden Markov models

Pairwise document similarity in large collections with MapReduce 46

学校读者我要写书评

暂无评论

Pairwise document similarity in large collections with MapRe...

46th Annual Meeting of the Association for Computational Linguistics: human language Technologies, ACL 2008

作者： Elsayed, Tamer Lin, Jimmy Oard, Douglas W. Human Language Technology Center of Excellence UMIACS Laboratory for Computational Linguistics and Information Processing University of Maryland College ParkMD20742 United States Department of Computer Science The iSchool College of Information Studies

This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to decompose the inner products involved in computing document similarity into separate multiplication and summation stages in a way that is well matched to efficient disk access patterns across several machines. On a collection consisting of approximately 900,000 newswire articles, our algorithm exhibits linear growth in running time and space in terms of the number of documents. © 2008 Association for Computational Linguistics.

关键词： MapReduce

Privacy-Preserving Models for Legal Natural language Processing

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Yin, Ying Habernal, Ivan Trustworthy Human Language Technologies Department of Computer Science Technical University of Darmstadt Germany

Pre-training large transformer models with in-domain data improves domain adaptation and helps gain performance on the domain-specific downstream tasks. However, sharing models pre-trained on potentially sensitive data is prone to adversarial privacy attacks. In this paper, we asked to which extent we can guarantee privacy of pre-training data and, at the same time, achieve better downstream performance on legal tasks without the need of additional labeled data. We extensively experiment with scalable self-supervised learning of transformer models under the formal paradigm of differential privacy and show that under specific training configurations we can improve downstream performance without sacrifying privacy protection for the in-domain data. Our main contribution is utilizing differential privacy for large-scale pre-training of transformer language models in the legal NLP domain, which, to the best of our knowledge, has not been addressed before.1 © 2022, CC BY-SA.

关键词： Sensitive data

Privacy-Preserving Graph Convolutional Networks for Text Classification

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Igamberdiev, Timour Habernal, Ivan Trustworthy Human Language Technologies Department of Computer Science Technical University of Darmstadt Germany

Graph convolutional networks (GCNs) are a powerful architecture for representation learning on documents that naturally occur as graphs, e.g., citation or social networks. However, sensitive personal information, such as documents with people's profiles or relationships as edges, are prone to privacy leaks, as the trained model might reveal the original input. Although differential privacy (DP) offers a well-founded privacy-preserving framework, GCNs pose theoretical and practical challenges due to their training specifics. We address these challenges by adapting differentially-private gradient-based training to GCNs and conduct experiments using two optimizers on five NLP datasets in two languages. We propose a simple yet efficient method based on random graph splits that not only improves the baseline privacy bounds by a factor of 2.7 while retaining competitive F1scores, but also provides strong privacy guarantees of ϵ = 1:0. We show that, under certain modeling choices, privacy-preserving GCNs perform up to 90% of their non-private variants, while formally guaranteeing strong privacy measures. © 2021, CC BY-SA.

关键词： Classification (of information)

FEATURE COMBINATION AND STACKING OF RECURRENT AND NON-RECURRENT NEURAL NETWORKS FOR LVCSR

学校读者我要写书评

暂无评论

FEATURE COMBINATION AND STACKING OF RECURRENT AND NON-RECURR...

IEEE International Conference on Acoustics, Speech, and Signal Processing

作者： Christian Plahl Michael Kozielski Ralf Schluter Hermann Ney Human Language Technology and Pattern Recognition Computer Science Department RWTH Aachen University

ISBN: (纸本)9781479903573

This paper investigates the combination of different short-term features and the combination of recurrent and non-recurrent neural networks (NNs) on a Spanish speech recognition task. Several methods exist to combine different feature sets such as concatenation or linear discriminant analysis (LDA). Even though all these techniques achieve reasonable improvements, feature combination by multi-layer perceptrons (MLPs) outperforms all known approaches. We develop the concept of MLP based feature combination further using recurrent neural networks (RNNs). The phoneme posterior estimates derived from an RNN lead to a significant improvement over the result of the MLPs and achieve a 5% relative better word error rate (WER) with much less parameters. Moreover, we improve the system performance further by combining an MLP and an RNN in a hierarchical framework. The MLP benefits from the preprocessing of the RNN. All NNs are trained on phonemes. Nevertheless, the same concepts could be applied using context-dependent states. In addition to the improvements in recognition performance w.r.t. WER, NN based feature combination methods reduce both, the training and the testing complexity. Overall, the systems are based on a single set of acoustic models, together with the training of different NNs.

关键词： Feature combination Multi-layer perceptron Recurrent neural networks Long-short-term-memory Speech recognition recurrent neural nets Speech recognition CSRP3 gene Stacking Neural network System performance Training

The Legal Argument Reasoning Task in Civil Procedure

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Bongard, Leonard Held, Lena Habernal, Ivan Trustworthy Human Language Technologies Department of Computer Science Technical University of Darmstadt Germany

We present a new NLP task and dataset from the domain of the U.S. civil procedure. Each instance of the dataset consists of a general introduction to the case, a particular question, and a possible solution argument, accompanied by a detailed analysis of why the argument applies in that case. Since the dataset is based on a book aimed at law students, we believe that it represents a truly complex task for benchmarking modern legal language models. Our baseline evaluation shows that fine-tuning a legal transformer provides some advantage over random baseline models, but our analysis reveals that the actual ability to infer legal arguments remains a challenging open research question. © 2022, CC BY-SA.

关键词：

Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus

学校读者我要写书评

暂无评论

Generating high-coverage semantic orientation lexicons from ...

2009 Conference on Empirical Methods in Natural language Processing, EMNLP 2009, Held in Conjunction with ACL-IJCNLP 2009

作者： Mohammad, Saif Dunne, Cody Dorr, Bonnie Laboratory for Computational Linguistics and Information Processing University of Maryland United States Human-Computer Interaction Lab. University of Maryland United States Institute for Advanced Computer Studies University of Maryland United States Department of Computer Science University of Maryland United States Human Language Technology Center of Excellence United States

Sentiment analysis often relies on a semantic orientation lexicon of positive and negative words. A number of approaches have been proposed for creating such lexicons, but they tend to be computationally expensive, and usually rely on significant manual annotation and large corpora. Most of these methods use WordNet. In contrast, we propose a simple approach to generate a high-coverage semantic orientation lexicon, which includes both individual words and multi-word expressions, using only a Roget-like thesaurus and a handful of affixes. Further, the lexicon has properties that support the Polyanna Hypothesis. Using the General Inquirer as gold standard, we show that our lexicon has 14 percentage points more correct entries than the leading WordNet-based high-coverage lexicon (SentiWordNet). In an extrinsic evaluation, we obtain significantly higher performance in determining phrase polarity using our thesaurus-based lexicon than with any other. Additionally, we explore the use of visualization techniques to gain insight into the our algorithm beyond the evaluations mentioned above. © 2009 ACL and AFNLP.

关键词： Thesauri