检索结果-内蒙古大学图书馆

Artificial Intelligence and Digital Technology (ICAIDT), International conference on

作者： Xiangdong Liang Fangning Zhu Yuqing Du PhD student Dept of Graduate School Daejin University Pachon Gyeonggi Province South Korea

ISBN: (数字)9798350386905

ISBN: (纸本)9798350386912

This article aims to study the process of automatic text classification, with a focus on two key steps: feature extraction and classification processing. By adopting suitable feature extraction methods and classification processing models, the accuracy and efficiency of automatic text classification can be improved. The effectiveness of the proposed methods is verified through experiments and evaluations, and a corresponding result evaluation model is proposed to measure the classification performance. The research findings indicate that selecting appropriate feature extraction methods and classification processing models can enhance the accuracy and effectiveness of text classification. In the conclusion section, the research achievements of this article are summarized, and directions and prospects for future research are pointed out. This study provides a complete process for automatic text classification, which is of great guidance significance for practical applications.

关键词： Accuracy Text categorization Evaluation models Machine learning Feature extraction natural language processing

来源：评论

学校读者我要写书评

暂无评论

An empirical comparison of machine learning methods for text-based sentiment analysis of online consumer reviews

引用

INTERNATIONAL JOURNAL OF RESEARCH IN MARKETING 2022年第1期39卷 1-19页

作者： Alantari, Huwail J. Currim, Imran S. Deng, Yiting Singh, Sameer Univ Calif Irvine Paul Merage Sch Business Irvine CA 92697 USA UCL UCL Sch Management London England Univ Calif Irvine Donald Bren Sch Informat & Comp Sci Irvine CA USA

The amount of digital text-based consumer review data has increased dramatically and there exist many machine learning approaches for automated text-based sentiment analysis. Marketing researchers have employed various methods for analyzing text reviews but lack a comprehensive comparison of their performance to guide method selection in future applications. We focus on the fundamental relationship between a consumer's overall empirical evaluation, and the text-based explanation of their evaluation. We study the empirical tradeoff between predictive and diagnostic abilities, in applying various methods to estimate this fundamental relationship. We incorporate methods previously employed in the marketing literature, and methods that are so far less common in the marketing literature. For generalizability, we analyze 25,241 products in nine product categories, and 260,489 reviews across five review platforms. We find that neural network-based machine learning methods, in particular pre-trained versions, offer the most accurate predictions, while topic models such as Latent Dirichlet Allocation offer deeper diagnostics. However, neural network models are not suited for diagnostic purposes and topic models are ill equipped for making predictions. Consequently, future selection of methods to process text reviews is likely to be based on analysts' goals of prediction versus diagnostics. Published by Elsevier B.V.

关键词： Automated text analysis Sentiment analysis Online reviews User generated content Machine learning natural language processing

来源：评论

学校读者我要写书评

暂无评论

Regularizing Data for Improving Execution Time of NLP Model 35th

Regularizing Data for Improving Execution Time of NLP Model

引用

35th International Florida Artificial Intelligence Research Society conference, FLAIRS-35 2022

作者： Dang, Thang Sakai, Yasufumi Tabaru, Tsuguchika Kasagi, Akihiko Padding, Dynamic Fujitsu Limited Fujitsu Laboratories Kawasaki Japan

natural language processing (NLP) is a very important part of machine learning that can be applied to different real applications. Several NLP models with huge training datasets are proposed. The primary purpose of these large-scale NLP models is the downstream tasks. However, because of the diversity and rapidly increasing the size of these datasets, they consume a lot of resources and time. In this study, we propose a state-of-the-art method to reduce the training time of NLP models on downstream tasks while maintaining accuracy. Our method focuses on removing unimportant data from the input data set and optimizing the padding of tokens to reduce the processing time for the NLP model. Experiments are conducted on many different GLUE benchmark datasets demonstrated that our method can reduce the most up to 57% in training time compared to other methods. © 2022 by the authors. All rights reserved.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Beyond Primal-Dual methods in Bandits with Stochastic and Adversarial Constraints 38

Beyond Primal-Dual Methods in Bandits with Stochastic and Ad...

引用

38th conference on Neural Information processing Systems, NeurIPS 2024

作者： Bernasconi, Martino Castiglioni, Matteo Celli, Andrea Fusco, Federico Bocconi university Italy Politecnico di Milano Italy Sapienza University of Rome Italy

We address a generalization of the bandit with knapsacks problem, where a learner aims to maximize rewards while satisfying an arbitrary set of long-term constraints. Our goal is to design best-of-both-worlds algorithms that perform optimally under both stochastic and adversarial constraints. Previous works address this problem via primal-dual methods, and require some stringent assumptions, namely the Slater's condition, and in adversarial settings, they either assume knowledge of a lower bound on the Slater's parameter, or impose strong requirements on the primal and dual regret minimizers such as requiring weak adaptivity. We propose an alternative and more natural approach based on optimistic estimations of the constraints. Surprisingly, we show that estimating the constraints with an UCB-like approach guarantees optimal performances. Our algorithm consists of two main components: (i) a regret minimizer working on moving strategy sets and (ii) an estimate of the feasible set as an optimistic weighted empirical mean of previous samples. The key challenge in this approach is designing adaptive weights that meet the different requirements for stochastic and adversarial constraints. Our algorithm is significantly simpler than previous approaches, and has a cleaner analysis. Moreover, ours is the first best-of-both-worlds algorithm providing bounds logarithmic in the number of constraints. Additionally, in stochastic settings, it provides Oe(√T) regret without Slater's condition. © 2024 Neural information processing systems foundation. All rights reserved.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A Study of LoRA Fine-Tuned Tibetan Macromodeling Based on the TIFD Dataset

A Study of LoRA Fine-Tuned Tibetan Macromodeling Based on th...

引用

Advanced Algorithms and Control Engineering (ICAACE), International conference on

作者： Dejie Wang Ning Ma Key Laboratory of Linguistic and Cultural Computing Ministry of Education Northwest Minzu University Lanzhou Gansu China

ISBN: (数字)9798331535087

ISBN: (纸本)9798331535094

As an important minority language, Tibetan carries rich cultural information, but its related natural language processing research is less. To address this pain point, this study selects the first high-quality instruction dataset specifically designed for supervised fine-tuning of Tibetan Large language Models (LLMs), i.e., the TIFD dataset, and for the first time selects the lightweight fine-tuning framework based on Low-Rank Adaptation (LoR $A$ ), systematic evaluation of the performance of Tibetan instruction tasks for three types of base models, GLM-4, Qwen2.5 and Llama-3. models' instruction following ability on the Tibetan TIFD dataset. The experimental results show that the TIFD dataset significantly improves the model's instruction comprehension and generation ability through the combination of structured instruction-triad (instruction-input-output) and LoRA techniques, and the study reveals that the multitasking coverage of the TIFD dataset and the low-rank constraint mechanism of LoRA synergistically optimize the model's processing ability for complex linguistic phenomena such as the Tibetan honorific system and verb tense, and demonstrates the efficacy of low-rank constraints in processing complex linguistic phenomena, such as the Tibetan honorific system and verb tense. The study reveals that the multitasking coverage of the TIFD dataset and the low-rank constraint mechanism of LoR $A$ synergistically optimize the model's ability to process complex linguistic phenomena, such as the Tibetan honorific system and verb tense. This synergy provides a novel framework for applying low-rank constraints in low-resource language processing, which provides a highly efficient fine-tuning paradigm for low-resource linguistic NLP. This study not only verifies the universal optimization effect of the TIFD dataset on Tibetan multi-base models but also provides empirical evidence for cross-linguistic model design.

关键词： Adaptation models Systematics Pain Large language models Process control Linguistics Multitasking natural language processing Cultural differences Optimization

来源：评论

学校读者我要写书评

暂无评论

Text summarization implementing abstractive and extractive methods 7

Text summarization implementing abstractive and extractive m...

引用

7th International conference on Electronics, Materials Engineering and Nano-Technology, IEMENTech 2023

作者： Ghosh, Debjyoti Mazumder, Abhirup Mahata, Sainik Kumar University of Engineering and Management Institute of Engineering and Management Kolkata India

ISBN: (纸本)9798350305517

How often do we come across paragraphs which contain important information but are too long to read? Most people tend to overlook humungous paragraphs at the expense of losing out on crucial information. This leads to a gap in topics which otherwise connect other significant concepts to make a meaningful learning experience - something we term as knowledge void. This report aims to highlight the importance of summarization using the two different methods of summarization - abstractive and extractive. We shall discuss the methods in detail including the methodologies, architectures and algorithms involved. This includes the preprocessing of data, the introduction of word embeddings, the application of algorithms like TextRank, building sequence-to-sequence models using LSTMs, applying encoder-decoder architecture and other advanced NLP techniques. We shall also evaluate our work using appropriate evaluation metrics. There has been experimentation using different approaches like unidirectional LSTMs, bidirectional LSTMs, a variety of tokenizers, and incorporation of attention layer to obtain the model with optimal accuracy and consistency. © 2023 IEEE.

关键词： natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

Math Word Problem Generation with Multilingual language Models 15

Math Word Problem Generation with Multilingual Language Mode...

引用

15th International natural language Generation conference, INLG 2022

作者： Niyarepola, Kashyapa Athapaththu, Dineth Ekanayake, Savindu Ranathunga, Surangika Department of Computer Science and Engineering University of Moratuwa Katubedda10400 Sri Lanka

ISBN: (纸本)9781955917575

Auto regressive text generation for low-resource languages, particularly the option of using pre-trained language models, is a relatively under-explored problem. In this paper, we model Math Word Problem (MWP) generation as an auto-regressive text generation problem. We evaluate the pre-trained sequence-to-sequence language models (mBART and mT5) in the context of two low-resource languages, Sinhala and Tamil, as well as English. For the evaluation, we create a multi-way parallel MWP dataset for the considered languages. Our empirical evaluation analyses how the performance of the pre-trained models is affected by the (1) amount of language data used during pre-training, (2) amount of data used in finetuning, (3) input seed length and (4) context differences in MWPs. Our results reveal that the considered pre-trained models are capable of generating meaningful MWPs even for the languages under-represented in these models, even though the amount of fine-tuning data and seed length are small. Our human evaluation shows that a Mathematics tutor can edit a generation question fairly easily, thus highlighting the practical utility ofautomatically generating MWPs. © 2022 Association for Computational Linguistics.

关键词： Computational linguistics

来源：评论

学校读者我要写书评

暂无评论

MSSRNet: Manipulating Sequential Style Representation for Unsupervised Text Style Transfer 23

MSSRNet: Manipulating Sequential Style Representation for Un...

引用

29th ACM SIGKDD conference on Knowledge Discovery and Data Mining (KDD)

作者： Yang, Yazheng Zhao, Zhou Liu, Qi Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China Zhejiang Univ Coll Comp Sci Hangzhou Zhejiang Peoples R China

ISBN: (纸本)9798400701030

Unsupervised text style transfer task aims to rewrite a text into target style while preserving its main content. Traditional methods rely on the use of a fixed-sized vector to regulate text style, which is difficult to accurately convey the style strength for each individual token. In fact, each token of a text contains different style intensity and makes different contribution to the overall style. Our proposed method addresses this issue by assigning individual style vector to each token in a text, allowing for fine-grained control and manipulation of the style strength. Additionally, an adversarial training framework integrated with teacher-student learning is introduced to enhance training stability and reduce the complexity of high-dimensional optimization. The results of our experiments demonstrate the efficacy of our method in terms of clearly improved style transfer accuracy and content preservation in both two-style transfer and multi-style transfer settings.(1)

关键词： Text Style Transfer Unsupervised Learning Adversarial Training Teacher-Student Learning natural language processing

来源：评论

学校读者我要写书评

暂无评论

Aspect-Based Semantic Textual Similarity for Educational Test Items 25th

Aspect-Based Semantic Textual Similarity for Educational Tes...

引用

25th International conference on Artificial Intelligence in Education (AIED)

作者： Do, Heejin Lee, Gary Geunbae POSTECH Grad Sch AI Pohang South Korea POSTECH Dept CSE Pohang South Korea

ISBN: (纸本)9783031642982;9783031642999

In the educational domain, identifying the similarity among test items provides various advantages for exam quality management and personalized student learning. Existing studies mostly relied on student performance data, such as the number of correct or incorrect answers, to measure item similarity. However, nuanced semantic information within the test items has been overlooked, possibly due to the lack of similarity-labeled data. Human-annotated educational data demands high-cost expertise, and items comprising multiple aspects, such as questions and choices, require detailed criteria. In this paper, we introduce a task of aspect-based semantic textual similarity for educational test items (aSTS-EI), where we assess the similarity by specific aspects within test items and present an LLM-guided benchmark dataset. We report the baseline performance by extending the STS methods, setting the groundwork for future aSTS-EI tasks. In addition, to assist data-scarce settings, we propose a progressive augmentation (ProAug) method, which generates step-by-step item aspects via recursive prompting. Experimental results imply the efficacy of existing STS methods for a shorter aspect while underlining the necessity for specialized approaches in relatively longer aspects. Nonetheless, markedly improved results with ProAug highlight the assistance of our augmentation strategy to overcome data scarcity.

关键词： Educational Item Similarity Semantic Textual Similarity Dataset Aspect-based Similarity natural language processing

来源：评论

学校读者我要写书评

暂无评论

Bengali Image Caption Generation using Attention Mechanism

Bengali Image Caption Generation using Attention Mechanism

引用

Computer Vision and Machine Intelligence (CVMI), International conference on

作者： Sayantani De Ranjita Das Ashish Singh Patel Department of CSE NIT Mizoram Aizawl India Department of CSE NIT Agartala Agartala India

ISBN: (数字)9798350376876

ISBN: (纸本)9798350376883

Image captioning has emerged as a rapidly thriving area for the machine learning research community. Generally, image captioning is performed by combining various computer vision features, natural language processing, and machine learning methods with consideration of some additional inputs to get more accurate context-dependent image captions Bengali is a significant language in India, adopted by approximately 100 million people. There exist various state-of-the-art methods for generating captions in the English language; however, for the Bengali language, there are very limited methods, and existing methods in the English language are not particularly helpful. Moreover, translations from English to Bengali may overlook or misinterpret subtle meanings, tones, or cultural nuances. Therefore, this work proposes a machine-learning model for captioning pictures in Bengali using an attention mechanism. The Flickr-8k dataset, which has 8000 images, is used to train the model in this work. The proposed method generates image captions in the Bengali language and attained a BLEU score of 0.66.

关键词： Computer vision Attention mechanisms Accuracy Computational modeling natural language processing Cultural differences Machine intelligence

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：