Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In thi...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.
The increasing prevalence of Extended Reality (XR) and head-mounted displays (HMDs), alongside rapid advancements in 3D reality capture technology, unlocks a new paradigm for capturing and reliving past memories/exper...
详细信息
Tear film,the outermost layer of the eye,is a complex and dynamic structure responsible for tear *** tear film lipid layer is a vital component of the tear film that provides a smooth optical surface for the cornea an...
详细信息
Tear film,the outermost layer of the eye,is a complex and dynamic structure responsible for tear *** tear film lipid layer is a vital component of the tear film that provides a smooth optical surface for the cornea and wetting the ocular *** eye syndrome(DES)is a symptomatic disease caused by reduced tear production,poor tear quality,or excessive *** diagnosis is a difficult task due to its multifactorial *** of several clinical tests available,the evaluation of the interference patterns of the tear film lipid layer forms a potential tool for DES *** instrument known as Tearscope Plus allows the rapid assessment of the lipid layer.A grading scale composed of five categories is used to classify lipid layer *** reported work proposes the design of an automatic system employing light weight convolutional neural networks(CNN)and nature inspired optimization techniques to assess the tear film lipid layer patterns by interpreting the images acquired with the Tearscope *** designed framework achieves promising results compared with the existing state-of-the-art techniques.
Challenges in land use and land cover(LULC)include rapid urbanization encroaching on agricultural land,leading to fragmentation and loss of natural ***,the effects of urbanization on LULC of different crop types are l...
详细信息
Challenges in land use and land cover(LULC)include rapid urbanization encroaching on agricultural land,leading to fragmentation and loss of natural ***,the effects of urbanization on LULC of different crop types are less *** study assessed the impacts of LULC changes on agriculture and drought vulnerability in the Aguascalientes region,Mexico,from 1994 to 2024,and predicted the LULC in 2034 using remote sensing data,with the goals of sustainable land management and climate resilience *** increasing urbanization and drought,the integration of satellite imagery and machine learning models in LULC analysis has been underutilized in this *** Landsat imagery,we assessed crop attributes through indices such as normalized difference vegetation index(NDVI),normalized difference water index(NDWI),normalized difference moisture index(NDMI),and vegetation condition index(VCI),alongside watershed delineation and spectral *** random forest model was applied to classify LULC,providing insights into both historical and future *** indicated a significant decline in vegetation cover(109.13 km^(2))from 1994 to 2024,accompanied by an increase in built-up land(75.11 km^(2))and bare land(67.13 km^(2)).Projections suggested a further decline in vegetation cover(41.51 km^(2))and continued urban land expansion by *** study found that paddy crops exhibited the highest values,while common bean and maize performed *** analysis revealed that mildly dry areas in 2004 became severely dry in 2024,highlighting the increasing vulnerability of agriculture to climate *** study concludes that sustainable land management,improved water resource practices,and advanced monitoring techniques are essential to mitigate the adverse effects of LULC changes on agricultural productivity and drought resilience in the *** findings contribute to the understanding of how remote sensing can be effectively used for long-t
In the medical field, comprehensive analysis of bone structures is paramount for assessing skeletal health and diagnosing conditions. X-ray imaging serves as a cornerstone in bone age evaluation and the fabrication of...
详细信息
Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention *** machine learning classifiers have emerged as promising tools for malware ***,there remain...
详细信息
Malware attacks on Windows machines pose significant cybersecurity threats,necessitating effective detection and prevention *** machine learning classifiers have emerged as promising tools for malware ***,there remains a need for comprehensive studies that compare the performance of different classifiers specifically for Windows malware *** this gap can provide valuable insights for enhancing cybersecurity *** numerous studies have explored malware detection using machine learning techniques,there is a lack of systematic comparison of supervised classifiers for Windows malware *** the relative effectiveness of these classifiers can inform the selection of optimal detection methods and improve overall security *** study aims to bridge the research gap by conducting a comparative analysis of supervised machine learning classifiers for detecting malware on Windows *** objectives include Investigating the performance of various classifiers,such as Gaussian Naïve Bayes,K Nearest Neighbors(KNN),Stochastic Gradient Descent Classifier(SGDC),and Decision Tree,in detecting Windows *** the accuracy,efficiency,and suitability of each classifier for real-world malware detection *** the strengths and limitations of different classifiers to provide insights for cybersecurity practitioners and *** recommendations for selecting the most effective classifier for Windows malware detection based on empirical *** study employs a structured methodology consisting of several phases:exploratory data analysis,data preprocessing,model training,and *** data analysis involves understanding the dataset’s characteristics and identifying preprocessing *** preprocessing includes cleaning,feature encoding,dimensionality reduction,and optimization to prepare the data for *** training utilizes various
Virtual experiences can significantly influence our perception and behavior in the real world, shaping how we interact with and navigate physical environments. In this paper, we examine the impact of learning navigati...
详细信息
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1...
详细信息
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1) Instruction annotation quality: despite existing VLLMs exhibiting strong performance,instructions generated by those advanced VLLMs may still suffer from inaccuracies, such as hallucinations.(2) Instructions and image diversity: the limited range of instruction types and the lack of diversity in image data may impact the model's ability to generate diversified and closer to real-world scenarios outputs. To address these challenges, we construct a high-quality, diverse visual instruction tuning dataset MMInstruct,which consists of 973k instructions from 24 domains. There are four instruction types: judgment, multiplechoice, long visual question answering, and short visual question answering. To construct MMInstruct, we propose an instruction generation data engine that leverages GPT-4V, GPT-3.5, and manual correction. Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at 1/6 the cost of manual construction. Through extensive experiment validation and ablation experiments,we demonstrate that MMInstruct could significantly improve the performance of VLLMs, e.g., the model fine-tuning on MMInstruct achieves new state-of-the-art performance on 10 out of 12 benchmarks. The code and data shall be available at https://***/yuecao0119/MMInstruct.
Privacy-preserving online disease prediction and diagnosis are critical issues in the emerging edge-cloud-based healthcare *** patient data pro-cessing from remote places may lead to severe privacy ***,the existing cl...
详细信息
Privacy-preserving online disease prediction and diagnosis are critical issues in the emerging edge-cloud-based healthcare *** patient data pro-cessing from remote places may lead to severe privacy ***,the existing cloud-based healthcare system takes more latency and energy consumption during diagnosis due to offloading of live patient data to remote cloud *** the privacy *** proposed research introduces the edge-cloud enabled privacy-preserving healthcare system by exploiting additive homomorphic encryption *** can help maintain the privacy preservation and confidentiality of patients’medical data during diagnosis of Parkinson’s *** addition,the energy and delay aware computational offloading scheme is proposed to minimize the uncertainty and energy consumption of end-user *** proposed research maintains the better privacy and robustness of live video data processing during prediction and diagnosis compared to existing health-care systems.
Due to the recently increased requirements of e-learning systems,multiple educational institutes such as kindergarten have transformed their learning towards virtual *** student health exercise is a difficult task but...
详细信息
Due to the recently increased requirements of e-learning systems,multiple educational institutes such as kindergarten have transformed their learning towards virtual *** student health exercise is a difficult task but an important one due to the physical education needs especially in young *** proposed system focuses on the necessary implementation of student health exercise recognition(SHER)using a modified Quaternion-basedfilter for inertial data refining and data fusion as the pre-processing ***,cleansed data has been segmented using an overlapping windowing approach followed by patterns identification in the form of static and kinematic signal ***,these patterns have been utilized to extract cues for both patterned signals,which are further optimized using Fisher’s linear discriminant analysis(FLDA)***,the physical exercise activities have been categorized using extended Kalmanfilter(EKF)-based neural *** system can be implemented in multiple educational establishments including intelligent training systems,virtual mentors,smart simulations,and interactive learning management methods.
暂无评论