检索结果-内蒙古大学图书馆

OCRBench: on the hidden mystery of OCR in large multimodal models

science China(Information sciences) 2024年第12期67卷 23-35页

作者： Yuliang LIU Zhang LI Mingxin HUANG Biao YANG Wenwen YU Chunyuan LI Xu-Cheng YIN Cheng-Lin LIU Lianwen JIN Xiang BAI School of Artificial Intelligence and Automation Huazhong University of Science and Technology School of Electronic and Information Engineering South China University of Technology Microsoft Research School of Computer & Communication Engineering University of Science and Technology Beijing Institute of Automation Chinese Academy of Sciences School of Software Engineering Huazhong University of Science and Technology

Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.

关键词： large multimodal model OCR text recognition scene text-centric VQA document-oriented VQA key information extraction handwritten mathematical expression recognition

来源：评论

学校读者我要写书评

暂无评论

TangibleMoments: Embedding XR Memories onto Physical Objects

TangibleMoments: Embedding XR Memories onto Physical Objects

引用

2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2025

作者： Khan, Omar Ahmed, Zaid Nam, Hyeongil Kim, Kangsoo University of Calgary Department of Computer Science Canada Engineering University of Calgary Department of Electrical and Software Canada

ISBN: (纸本)9798331514846

The increasing prevalence of Extended Reality (XR) and head-mounted displays (HMDs), alongside rapid advancements in 3D reality capture technology, unlocks a new paradigm for capturing and reliving past memories/experiences through XR. Current methods for accessing and interacting with these "XR Memories"still lack the ability to fully leverage the range of capabilities afforded by XR and HMDs. We introduce TangibleMoments, a novel framework that enables users to embed XR memories onto physical objects, transforming those objects into "Moments"- tangible user interfaces for accessing and interacting with XR memories. We describe and illustrate five interaction methods as part of this framework: Creating Moments, Recalling Moments, Sharing Moments, Copying Moments, and Clearing Moments. We showcase an initial prototype and discuss possible extensions. © 2025 IEEE.

关键词： Mixed reality

来源：评论

学校读者我要写书评

暂无评论

ANovel Light Weight CNN Framework Integrated with Marine Predator Optimization for the Assessment of Tear Film-Lipid Layer Patterns

引用

computer Modeling in engineering & sciences 2023年第7期136卷 87-106页

作者： Bejoy Abraham Jesna Mohan Linu Shine Sivakumar Ramachandran Department of Computer Science and Engineering College of Engineering MuttatharaThiruvananthapuramKerala695008India Department of Computer Science and Engineering Mar Baselios College of Engineering and TechnologyThiruvananthapuramKerala695015India Department of Electronics and Communication Engineering College of Engineering TrivandrumKerala695016India

Tear film,the outermost layer of the eye,is a complex and dynamic structure responsible for tear *** tear film lipid layer is a vital component of the tear film that provides a smooth optical surface for the cornea and wetting the ocular *** eye syndrome(DES)is a symptomatic disease caused by reduced tear production,poor tear quality,or excessive *** diagnosis is a difficult task due to its multifactorial *** of several clinical tests available,the evaluation of the interference patterns of the tear film lipid layer forms a potential tool for DES *** instrument known as Tearscope Plus allows the rapid assessment of the lipid layer.A grading scale composed of five categories is used to classify lipid layer *** reported work proposes the design of an automatic system employing light weight convolutional neural networks(CNN)and nature inspired optimization techniques to assess the tear film lipid layer patterns by interpreting the images acquired with the Tearscope *** designed framework achieves promising results compared with the existing state-of-the-art techniques.

关键词： Dry-eye syndrome tearscope plus tear film deep neural networks

来源：评论

学校读者我要写书评

暂无评论

Forecasting land use changes in crop classification and drought using remote sensing

引用

Journal of Arid Land 2025年第5期17卷 575-589页

作者： Mashael MAASHI Nada ALZABEN Noha NEGM Venkatesan VEERAMANI Sabarunisha Sheik BEGUM Geetha PALANIAPPAN Department of Software Engineering College of Computer and Information SciencesKing Saud UniversityRiyadh 11451Saudi Arabia Department of Computer Sciences College of Computer and Information SciencesPrincess Nourah bint Abdulrahman UniversityRiyadh 11671Saudi Arabia Department of Computer Science Applied College at MahayilKing Khalid UniversityAbha 61421Saudi Arabia Department of Civil Engineering University College of EngineeringAnna UniversityAriyalur 621731India Department of Biotechnology P.S.R.Engineering CollegeSivakasi 626140India Department of Electronics and Communication Engineering School of Engineering and TechnologyDhanalakshmi Srinivasan UniversitySamayapuram 621112India

Challenges in land use and land cover(LULC)include rapid urbanization encroaching on agricultural land,leading to fragmentation and loss of natural ***,the effects of urbanization on LULC of different crop types are less *** study assessed the impacts of LULC changes on agriculture and drought vulnerability in the Aguascalientes region,Mexico,from 1994 to 2024,and predicted the LULC in 2034 using remote sensing data,with the goals of sustainable land management and climate resilience *** increasing urbanization and drought,the integration of satellite imagery and machine learning models in LULC analysis has been underutilized in this *** Landsat imagery,we assessed crop attributes through indices such as normalized difference vegetation index(NDVI),normalized difference water index(NDWI),normalized difference moisture index(NDMI),and vegetation condition index(VCI),alongside watershed delineation and spectral *** random forest model was applied to classify LULC,providing insights into both historical and future *** indicated a significant decline in vegetation cover(109.13 km^(2))from 1994 to 2024,accompanied by an increase in built-up land(75.11 km^(2))and bare land(67.13 km^(2)).Projections suggested a further decline in vegetation cover(41.51 km^(2))and continued urban land expansion by *** study found that paddy crops exhibited the highest values,while common bean and maize performed *** analysis revealed that mildly dry areas in 2004 became severely dry in 2024,highlighting the increasing vulnerability of agriculture to climate *** study concludes that sustainable land management,improved water resource practices,and advanced monitoring techniques are essential to mitigate the adverse effects of LULC changes on agricultural productivity and drought resilience in the *** findings contribute to the understanding of how remote sensing can be effectively used for long-t

关键词： land use and land cover(LULC) crop attributes drought vulnerability machine learning models remote sensing

来源：评论

学校读者我要写书评

暂无评论

A feature-enhanced multiscale attention approach for automated hand bone segmentation

引用

Multimedia Tools and Applications 2025年第16期84卷 15949-15969页

作者： Nagaraju, Y. Venkatesh Thanu Shree Yadav, P.R. Vaishnavi, A. Tejashree, S.V. Department of Computer Science and Engineering University Visvesvaraya College of Engineering Bangalore University Karnataka Bengaluru560001 India

In the medical field, comprehensive analysis of bone structures is paramount for assessing skeletal health and diagnosing conditions. X-ray imaging serves as a cornerstone in bone age evaluation and the fabrication of implants. In addition to clinical applications, automated bone structure analysis aids students in studying anatomy and radiology by providing accurate and consistent representations. This automation has the potential to significantly reduce processing time and enhance students’ understanding of bone structures and their variations, thereby improving their understanding and training in the medical field. This study presents the Adaptive Multiscale Expansion Segmentation Model (AMESM), a lightweight encoder-decoder-based multiclass semantic segmentation model for segmenting hand bones. The proposed method assists physicians in tasks such as growth analysis, hand movement assessment, and implant design. The model incorporates a Multi-Scale Attention Block that improves feature representation and segmentation accuracy. The experimental results demonstrate promising performance, with a Mean Intersection over Union (mIoU) of 88.47% and a Dice Score of 93.86%. The comprehensive analysis confirms the model’s ability to achieve accurate segmentation while remaining computationally efficient. © The Author(s), under exclusive licence to Springer science+Business Media, LLC, part of Springer Nature 2024.

关键词： Automation

来源：评论

学校读者我要写书评

暂无评论

Fine-Tuning Cyber Security Defenses: Evaluating Supervised Machine Learning Classifiers for Windows Malware Detection

引用

computers, Materials & Continua 2024年第8期80卷 2917-2939页

作者： Islam Zada Mohammed Naif Alatawi Syed Muhammad Saqlain Abdullah Alshahrani Adel Alshamran Kanwal Imran Hessa Alfraihi Department of Software Engineering International Islamic UniversityIslamabad25000Pakistan Information Technology Department Faculty of Computers and Information TechnologyUniversity of TabukTabuk71491Saudi Arabia Department of Computer Science and Artificial Intelligence College of Computer Science and EngineeringUniversity of JeddahJeddah21493Saudi Arabia Department of Cybersecurity College of Computer Science and EngineeringUniversity of JeddahJeddah21493Saudi Arabia Department of Computer Science University of PeshawarPeshawar25121Pakistan Department of Information Systems College of Computer and Information SciencesPrincess Nourah bint Abdulrahman UniversityRiyadh11671Saudi Arabia

Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention *** machine learning classifiers have emerged as promising tools for malware ***,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware *** this gap can provide valuable insights for enhancing cybersecurity *** numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware *** the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security *** study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows *** objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows *** the accuracy,efficiency,and suitability of each classifier for real-world malware detection *** the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and *** recommendations for selecting the most effective classifier for Windows malware detection based on empirical *** study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and *** data analysis involves understanding the dataset’s characteristics and identifying preprocessing *** preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for *** training utilizes various

关键词： Security and privacy challenges in the context of requirements engineering supervisedmachine learning malware detection windows systems comparative analysis Gaussian Naive Bayes K Nearest Neighbors Stochastic Gradient Descent Classifier Decision Tree

来源：评论

学校读者我要写书评

暂无评论

Investigating Visual Guide Cues in VR: Impacts of Virtual Humans and Symbol-Based Navigation on Real-World Performance and Experience

Investigating Visual Guide Cues in VR: Impacts of Virtual Hu...

引用

2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops, VRW 2025

作者： Khan, Omar Nguyen, Anh Nam, Hyeongil Kim, Kangsoo University of Calgary Department of Computer Science Canada University of Calgary Department of Electrical and Software Engineering Canada

ISBN: (纸本)9798331514846

Virtual experiences can significantly influence our perception and behavior in the real world, shaping how we interact with and navigate physical environments. In this paper, we examine the impact of learning navigation routes in an immersive virtual environment (IVE) on navigation performance and user experience in a corresponding real-world indoor setting. We developed a guide system with two distinct audiovisual representations: a human agent guide and a symbol-based guide. A preliminary user study (N = 10) was conducted to evaluate the system. While no significant differences were observed between the two guide conditions, the findings reveal valuable insights into user-perceived confidence and enjoyment during real-world navigation tasks. Contrary to our expectations, the symbol-based guide elicited slightly higher positive scores compared to the human agent guide. We discuss these findings and outline directions for future research. © 2025 IEEE.

关键词： Virtual environments

来源：评论

学校读者我要写书评

暂无评论

MMInstruct: a high-quality multi-modal instruction tuning dataset with extensive diversity

引用

science China(Information sciences) 2024年第12期67卷 36-51页

作者： Yangzhou LIU Yue CAO Zhangwei GAO Weiyun WANG Zhe CHEN Wenhai WANG Hao TIAN Lewei LU Xizhou ZHU Tong LU Yu QIAO Jifeng DAI School of Computer Science Nanjing University School of Electronic Information and Electrical Engineering Shanghai Jiao Tong University Shanghai AI Laboratory School of Computer Science Fudan University Department of Information Engineering The Chinese University of Hong Kong SenseTime Research Department of Electronic Engineering Tsinghua University

Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1) Instruction annotation quality: despite existing VLLMs exhibiting strong performance,instructions generated by those advanced VLLMs may still suffer from inaccuracies, such as hallucinations.(2) Instructions and image diversity: the limited range of instruction types and the lack of diversity in image data may impact the model's ability to generate diversified and closer to real-world scenarios outputs. To address these challenges, we construct a high-quality, diverse visual instruction tuning dataset MMInstruct,which consists of 973k instructions from 24 domains. There are four instruction types: judgment, multiplechoice, long visual question answering, and short visual question answering. To construct MMInstruct, we propose an instruction generation data engine that leverages GPT-4V, GPT-3.5, and manual correction. Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at 1/6 the cost of manual construction. Through extensive experiment validation and ablation experiments,we demonstrate that MMInstruct could significantly improve the performance of VLLMs, e.g., the model fine-tuning on MMInstruct achieves new state-of-the-art performance on 10 out of 12 benchmarks. The code and data shall be available at https://***/yuecao0119/MMInstruct.

关键词： instruction tuning multi-modal multi-domain dataset vision large language model

来源：评论

学校读者我要写书评

暂无评论

Integrated Privacy Preserving Healthcare System Using Posture-Based Classifier in Cloud

引用

Intelligent Automation & Soft Computing 2023年第3期35卷 2893-2907页

作者： C.Santhosh Kumar K.Vishnu Kumar Department of Computer Science and Engineering Priyadarshini Engineering College635751TamilnaduIndia Department of Computer Science and Engineering KPR Institute of Engineering&TechnologyCoimbatore641407India

Privacy-preserving online disease prediction and diagnosis are critical issues in the emerging edge-cloud-based healthcare *** patient data pro-cessing from remote places may lead to severe privacy ***,the existing cloud-based healthcare system takes more latency and energy consumption during diagnosis due to ofﬂoading of live patient data to remote cloud *** the privacy *** proposed research introduces the edge-cloud enabled privacy-preserving healthcare system by exploiting additive homomorphic encryption *** can help maintain the privacy preservation and conﬁdentiality of patients’medical data during diagnosis of Parkinson’s *** addition,the energy and delay aware computational ofﬂoading scheme is proposed to minimize the uncertainty and energy consumption of end-user *** proposed research maintains the better privacy and robustness of live video data processing during prediction and diagnosis compared to existing health-care systems.

关键词： Peer-to-peer computing energy and delay aware ofﬂoading edge-cloud enabled healthcare system parkinson’s disease prediction

来源：评论

学校读者我要写书评

暂无评论

Student’s Health Exercise Recognition Tool for E-Learning Education

引用

Intelligent Automation & Soft Computing 2023年第1期35卷 149-161页

作者： Tamara al Shloul Madiha Javeed Munkhjargal Gochoo Suliman AAlsuhibany Yazeed Yasin Ghadi Ahmad Jalal Jeongmin Park Department of Humanities and Social Science Al Ain UniversityAl Ain15551UAE Department of Computer Science Air UniversityIslamabad44000Pakistan Department of Computer Science and Software Engineering United Arab Emirates UniversityAl Ain15551UAE Department of Computer Science College of ComputerQassim UniversityBuraydah51452Saudi Arabia Department of Computer Science and Software Engineering Al Ain UniversityAl Ain15551UAE Department of Computer Engineering Korea Polytechnic UniversitySiheung-siGyeonggi-do237Korea

Due to the recently increased requirements of e-learning systems,multiple educational institutes such as kindergarten have transformed their learning towards virtual *** student health exercise is a difficult task but an important one due to the physical education needs especially in young *** proposed system focuses on the necessary implementation of student health exercise recognition(SHER)using a modified Quaternion-basedfilter for inertial data refining and data fusion as the pre-processing ***,cleansed data has been segmented using an overlapping windowing approach followed by patterns identification in the form of static and kinematic signal ***,these patterns have been utilized to extract cues for both patterned signals,which are further optimized using Fisher’s linear discriminant analysis(FLDA)***,the physical exercise activities have been categorized using extended Kalmanfilter(EKF)-based neural *** system can be implemented in multiple educational establishments including intelligent training systems,virtual mentors,smart simulations,and interactive learning management methods.

关键词： E-learning exercise recognition online physical education student’s healthcare

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：