检索结果-内蒙古大学图书馆

5th Workshop on Gender Bias in Natural Language Processing, GeBNLP 2024, held in conjunction with the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024

作者： Baghel, Bhiman Kumar Narayanan, Arun Balajiee Lekshmi Yoder, Michael Miller Department of Computer Science United States Intelligent Systems Program University of Pittsburgh PA United States

ISBN: (纸本)9798891761377

This study examines the fairness of human- and AI-generated summaries of student reflections in university STEM classes, focusing on potential gender biases. Using topic modeling, we first identify topics that are more prevalent in reflections from female students and others that are more common among male students. We then analyze whether human and AI-generated summaries reflect the concerns of students of any particular gender over others. Our analysis reveals that though human-generated and extractive AI summarization techniques do not show a clear bias, abstractive AI-generated summaries exhibit a bias towards male students. Pedagogical themes are overrepresented from male reflections in these summaries, while concept-specific topics are underrepresented from female reflections. This research contributes to a deeper understanding of AI-generated bias in educational contexts, highlighting the need for future work on mitigating these biases. ©2024 Association for Computational Linguistics.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Quantifying the Gaps Between Translation and Native Perception in Training for Multimodal, Multilingual Retrieval

Quantifying the Gaps Between Translation and Native Percepti...

引用

2024 Conference on Empirical Methods in Natural Language Processing, EMNLP 2024

作者： Buettner, Kyle Kovashka, Adriana Intelligent Systems Program University of Pittsburgh United States Department of Computer Science University of Pittsburgh United States

ISBN: (纸本)9798891761643

There is a scarcity of multilingual vision-language models that properly account for the perceptual differences that are reflected in image captions across languages and cultures. In this work, through a multimodal, multilingual retrieval case study, we quantify the existing lack of model flexibility. We empirically show performance gaps between training on captions that come from native German perception and captions that have been either machine-translated or human-translated from English into German. To address these gaps, we further propose and evaluate caption augmentation strategies. While we achieve mean recall improvements (+1.3), gaps still remain, indicating an open area of future work for the community. © 2024 Association for Computational Linguistics.

关键词： Machine translation

来源：评论

学校读者我要写书评

暂无评论

Dynamic Sampling-Based Meta-Learning Using Multilingual Acoustic Data for Under-Resourced Speech Recognition

引用

IEEE Access 2024年 12卷 106070-106083页

作者： Hsieh, I-Ting Wu, Chung-Hsien Zhao, Zhe-Hong National Cheng Kung University Graduate Program of Multimedia Systems and Intelligent Computing Tainan70101 Taiwan National Cheng Kung University Department of Computer Science and Information Engineering Tainan70101 Taiwan

Under-resourced automatic speech recognition (ASR) has become an active field of research and has experienced significant progress during the past decade. However, the performance of under-resourced ASR trained by existing methods is still far inferior to high-resourced ASR for practical applications. In this paper, speech data from languages that share the most phonemes with the under-resourced language are selected as supplementary resources for meta-training based on the Model-Agnostic Meta-Learning (MAML) strategy. Besides supplementary language selection, this paper proposes a dynamic sampling method instead of the original random sampling method to select support and query sets for each task in MAML to improve meta-training performance. In this study, Taiwanese is selected as the under-resourced language, and the speech corpus of five languages, including Mandarin, English, Japanese, Cantonese, and Thai, are chosen as supplementary training data for acoustic model training. The proposed dynamic sampling approach uses phonemes, pronunciation, and speech recognition models as the basis to determine the proportion of each supplementary language to select helpful utterances for MAML. For evaluation, with the selected utterances from each supplementary language for meta-training, we obtained a Word Error Rate of 20.24% and a Syllable Error Rate of 8.35% for Taiwanese ASR, which were better than the baseline model (26.18% and 13.99%) using only the Taiwanese corpus and other methods. © 2013 IEEE.

关键词： Speech recognition

来源：评论

学校读者我要写书评

暂无评论

An Approach to Detect Abnormal Submissions for CodeWorkout Dataset 8

An Approach to Detect Abnormal Submissions for CodeWorkout D...

引用

8th Educational Data Mining in computer science Education Workshop, CSEDM 2024

作者： Hicks, Alex Shi, Yang Lekshmi-Narayanan, Arun-Balajiee Yan, Wei Marwan, Samiha Dept of Computer Science Virginia Tech Blacksburg VA Dept of Computer Science Utah State University Logan UT Intelligent Systems Program University of Pittsburgh Pittsburgh PA School of Informatics Computing and Cyber Systems North Arizona University Flagstaff AZ Dept. of Computer Science University of Virginia wCharlottesville VA

Students’ interactions while solving problems in learning environments (i.e. log data) are often used to support students’ learning. For example, researchers use log data to develop systems that can provide students with personalized problem recommendations based on their knowledge level. However, anomalies in the students’ log data, such as cheating to solve programming problems, could introduce a hidden bias in the log data. As a result, these systems may provide inaccurate problem recommendations, and therefore, defeat their purpose. Classical cheating detection methods, such as MOSS, can be used to detect code plagiarism. However, these methods cannot detect other abnormal events such as a student gaming a system with multiple attempts of similar solutions to a particular programming problem. This paper presents a preliminary study to analyze log data with anomalies. The goal of our work is to overcome the abnormal instances when modeling personalizable recommendations in programming learning environments. © 2024 Copyright for this paper by its authors.

关键词： CS1 Dataset Cleaning Dataset Standards Educational Data Mining Introductory programming

来源：评论

学校读者我要写书评

暂无评论

intelligent Constellation Generation based on Autoencoder Communication System 2

Intelligent Constellation Generation based on Autoencoder Co...

引用

2nd IEEE International Conference on computer Vision and Machine Intelligence, CVMI 2023

作者： Matsumoto, Kaisei Toma, Takao Oshiro, Shiho Wada, Tomohisa Computer Science and Intelligent Systems Program University of the Ryukyus Okinawa Japan Magna Design Net Inc Okinawa Japan Information Technology Center University of the Ryukyus Okinawa Japan Area of Computer Science and Intelligent Systems University of the Ryukyus Dept. of Engineering Okinawa Japan

ISBN: (纸本)9798350305142

This paper discusses intelligent constellation generation based on autoencoder communication system. In previous studies, the amplitude was set to fluctuate between r=0.0 and 1.0. However, when checking the generated constellation, distortion was confirmed instead of the conventional symbol arrangement. Therefore, in this paper, it compares the case where the amplitude is constant, the case where the average amplitude within a Minibatch is 1, and the case where the average amplitude is 1 for Interval time. The communication standard used in this research is IEEE 802.11a, assuming wireless Local Area Network (LAN) specifications. The IEEE 802.11a standard has an Fast Fourier Transform (FFT) length of 64, a subcarrier number of 52, and Quadrature Phase Shift Keying (QPSK) and 16 Quadrature Amplitude Modulation (QAM), modulation methods. A guard interval of 800 ns is added and the symbol length is 4000 ns. First, a simulation was performed under the condition that the amplitude was kept constant. QPSK with 4 symbols, constant amplitude model is rounded more than previous research result. 16QAM with 16 symbols is arranged regularly like lined up on a line. Second, the simulation was performed under the condition that the average amplitude within the minibatch was set to 1. QPSK with 4 symbols, appears to rotate clockwise. 16QAM with 16 symbols has a more uniform symbol placement than previous research result. Third, a simulation was performed under the condition that the average amplitude within Interval time was set to 1. QPSK with 4 symbols, is the closest to square among QPSK output results so far. The direction is slightly tilted, but if it can be rotated a little more, it may be possible to reproduce the same symbol arrangement as before. 16QAM with 16 symbols, the symbol arrangement is biased as a whole. However, it can be seen that are arranged in line on the line, perhaps due to regularity. As future work, in addition to the conditions set this time, it will exa

关键词： Quadrature phase shift keying

来源：评论

学校读者我要写书评

暂无评论

Scalability in Autoencoder-based OFDM Communication System 2

Scalability in Autoencoder-based OFDM Communication System

引用

2nd IEEE International Conference on computer Vision and Machine Intelligence, CVMI 2023

作者： Tsugawa, Seizan Toma, Takao Oshiro, Shiho Wada, Tomohisa University of the Ryukyus Computer Science and Intelligent Systems Program Okinawa Japan Magna Design Net Inc Okinawa Japan University of the Ryukyus Information Technology Center Okinawa Japan University of the Ryukyus Area of Computer Science and Intelligent Systems Dept. of Engineering Okinawa Japan

ISBN: (纸本)9798350305142

This paper proposed Scalability in Autoencoder-based Orthogonal Frequency Division Multiplexing(OFDM) communication system. In the previous research, only the comparison between IEEE802.11a and Autoencoder by the conventional OFDM communication system was performed, and it was proved that the communication system created by Autoencoder exceeded the performance of the conventional system. Therefore, in this paper, it uses IEEE802.11n and compare whether it can be improved by expanding the bandwidth and using ***802.11n standard has an FFT length of 128, a subcarrier number of 114 (108 for data), and modulation schemes of Quadrature Phase Shift Keying(QPSK), 16 Quadrature Amplitude Modulation (16QAM). The GI length is 800ns and the symbol length is 4000ns. In the simulation, a computer simulation was performed using a conventional OFDM communication system and a communication system generated by Autoencoder. Assuming that the simulation environment had an Signal-to-Noise Ratio (SNR) of 0 to 30 and an amplitude r of 0.0 to 1.0, the Symbol Error Rate (SER) status for each SNR was output. As a result of computer simulation, QPSK converged at SNR 27 at IEEE 802.11a, but was able to reduce SER overwhelmingly to SNR 12 at IEEE 802.11n. Also, in 16QAM the convergence at r=0.0 is the same as for SNR22 , but in IEEE 802.11a it does not converge after r=0.6, but in IEEE 802.11n it does not converge only at r=0.9 and r=1.0. As a future task, it will use IEEE802.11ac, which enables communication speeds several times faster than IEEE802.11n, examine whether it is possible to further improve accuracy. And it will continue our research to correspond MIMO communication. © 2023 IEEE.

关键词： Orthogonal frequency division multiplexing

来源：评论

学校读者我要写书评

暂无评论

A Multimodal Recaptioning Framework to Account for Perceptual Diversity in Multilingual Vision-Language Modeling

arXiv

引用

arXiv 2025年

作者： Buettner, Kyle Emmerson, Jacob Kovashka, Adriana Intelligent Systems Program Department of Computer Science University of Pittsburgh United States

There are many ways to describe, name, and group objects when captioning an image. Differences are evident when speakers come from diverse cultures due to the unique experiences that shape perception. Machine translation of captions has pushed multilingual capabilities in vision-language models (VLMs), but data comes mainly from English speakers, indicating a perceptual bias and lack of model flexibility. In this work, we address this challenge and outline a data-efficient framework to instill multilingual VLMs with greater understanding of perceptual diversity. We specifically propose an LLM-based, multimodal recaptioning strategy that alters the object descriptions of English captions before translation. The greatest benefits are demonstrated in a targeted multimodal mechanism guided by native speaker data. By adding produced rewrites as augmentations in training, we improve on German and Japanese text-image retrieval cases studies (up to +3.5 mean recall overall, +4.7 on non-native error cases). We further propose a mechanism to analyze the specific object description differences across datasets, and we offer insights into cross-dataset and cross-language generalization. Copyright © 2025, The Authors. All rights reserved.

关键词： Computational grammars

来源：评论

学校读者我要写书评

暂无评论

Towards Generalization of Tactile Image Generation: Reference-Free Evaluation in a Leakage-Free Setting

arXiv

引用

arXiv 2025年

作者： Gungor, Cagri Eppinger, Derek Kovashka, Adriana Intelligent Systems Program Department of Computer Science University of Pittsburgh United States

Tactile sensing, which relies on direct physical contact, is critical for human perception and underpins applications in computer vision, robotics, and multimodal learning. Because tactile data is often scarce and costly to acquire, generating synthetic tactile images provides a scalable solution to augment real-world measurements. However, ensuring robust generalization in synthesizing tactile images—capturing subtle, material-specific contact features—remains challenging. We demonstrate that overlapping training and test samples in commonly used datasets inflate performance metrics, obscuring the true generalizability of tactile models. To address this, we propose a leakage-free evaluation protocol coupled with novel, reference-free metrics—TMMD, I-TMMD, CI-TMMD, and D-TMMD—tailored for tactile generation. Moreover, we propose a vision-to-touch generation method that leverages text as an intermediate modality by incorporating concise, material-specific descriptions during training to better capture essential tactile features. Experiments on two popular visuo-tactile datasets, Touch and Go and HCT, show that our approach achieves superior performance and enhanced generalization in a leakage-free setting. © 2025, CC BY.

关键词： Information leakage

来源：评论

学校读者我要写书评

暂无评论

Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition

Incorporating Geo-Diverse Knowledge into Prompting for Incre...

引用

Conference on computer Vision and Pattern Recognition (CVPR)

作者： Kyle Buettner Sina Malakouti Xiang Lorraine Li Adriana Kovashka Intelligent Systems Program Department of Computer Science University of Pittsburgh PA USA

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

Existing object recognition models have been shown to lack robustness in diverse geographical scenarios due to domain shifts in design and context. Class representations need to be adapted to more accurately reflect an object concept under these shifts. In the absence of training data from target geographies, we hypothesize that geographically diverse descriptive knowledge of categories can enhance robustness. For this purpose, we explore the feasibility of probing a large language model for geography-based object knowledge, and we examine the effects of integrating knowledge into zero-shot and learnable soft prompting with CLIP. Within this exploration, we propose geog-raphy knowledge regularization to ensure that soft prompts trained on a source set of geographies generalize to an un-seen target set. Accuracy gains over prompting baselines on DollarStreet while training only on Europe data are up to +2.8/1.2/1.6 on target data from Africa/Asia/Americas, and +4.6 overall on the hardest classes. Competitive performance is shown vs. few-shot target training, and analysis is provided to direct future study of geographical robustness.

关键词： Geography Training computer vision Large language models Training data Europe Robustness

来源：评论

学校读者我要写书评

暂无评论

Boosting Weakly Supervised Object Detection using Fusion and Priors from Hallucinated Depth

Boosting Weakly Supervised Object Detection using Fusion and...

引用

IEEE Workshop on Applications of computer Vision (WACV)

作者： Cagri Gungor Adriana Kovashka Intelligent Systems Program University of Pittsburgh Department of Computer Science University of Pittsburgh

Despite recent attention to depth for various tasks, it is still an unexplored modality for weakly-supervised object detection (WSOD). We propose an amplifier method for enhancing the performance of WSOD by integrating depth information. Our approach can be applied to different WSOD methods based on multiple-instance learning, without necessitating additional annotations or inducing large computational cost. Our proposed method employs monocular depth estimation to obtain hallucinated depth information, which is then incorporated into a Siamese WSOD network using contrastive loss and fusion. By analyzing the relationship between language context and depth, we calculate depth priors to identify the bounding box proposals that may contain an object of interest. These depth priors are then utilized to update the list of pseudo ground-truth boxes, or adjust the confidence of per-box predictions. We evaluate our proposed method on three datasets (COCO, PASCAL VOC, and Conceptual Captions) by implementing it on top of two state-of-the-art WSOD methods, and we demonstrate a substantial enhancement in performance.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：