Online reviews play an integral part in making mobile applications stand out from the large number of applications available on the Google Play store. Predominantly, users consider posted reviews for appropriate app s...
详细信息
This study presents an innovative approach that utilizes neural network-based techniques to address the Ordered Escape Routing (OER) *** OER problem holds a crucial position in integrated circuit design, requiring mul...
详细信息
Unusual crowd analysis is an important problem in surveillance video due to their features cannot be extracted efficiently on the crowd scenes. To overcome this challenge, this paper introduced the appearance and moti...
详细信息
This article proposes a multimodal sentiment analysis system for recognizing a person’s aggressiveness in pain. The implementation has been divided into five components. The first three steps are related to a text-ba...
详细信息
The management of healthcare data has significantly benefited from the use of cloud-assisted MediVault for healthcare systems, which can offer patients efficient and convenient digital storage services for storin...
详细信息
In the era of advancement in technology and modern agriculture, early disease detection of potato leaves will improve crop yield. Various researchers have focussed on disease due to different types of microbial infect...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In thi...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.
Background: Pneumonia is one of the leading causes of death and disability due to respiratory infections. The key to successful treatment of pneumonia is in its early diagnosis and correct classification. PneumoniaNet...
详细信息
INTRODUCTION With the rapid development of remote sensing technology,high-quality remote sensing images have become widely *** automated object detection and recognition of these images,which aims to automatically loc...
INTRODUCTION With the rapid development of remote sensing technology,high-quality remote sensing images have become widely *** automated object detection and recognition of these images,which aims to automatically locate objects of interest in remote sensing images and distinguish their specific categories,is an important fundamental task in the *** provides an effective means for geospatial object monitoring in many social applications,such as intelligent transportation,urban planning,environmental monitoring and homeland security.
With the continuous advancement of satellite technology, remote sensing images has been increasingly applied in fields such as urban planning, environmental monitoring, and disaster response. However, remote sensing i...
详细信息
With the continuous advancement of satellite technology, remote sensing images has been increasingly applied in fields such as urban planning, environmental monitoring, and disaster response. However, remote sensing images often feature small target sizes and complex backgrounds, posing significant computational challenges for object detection tasks. To address this issue, this paper proposes a lightweight remote sensing images object detection algorithm based on YOLOv9. The proposed algorithm incorporates the SimRMB module, which effectively reduces computational complexity while improving the efficiency and accuracy of feature extraction. Through a dynamic attention mechanism, SimRMB is capable of focusing on important regions while minimizing background interference, and by integrating residual learning and skip connections, it ensures the stability of deep networks. To further enhance detection performance, the FasterRepNCSPELAN4 module is introduced, which employs PConv operations to reduce computational load and memory usage. It also utilizes dilated convolutions and DFC attention mechanisms to strengthen feature extraction, thereby increasing the efficiency and accuracy of object detection. Additionally, this study integrates the GhostModuleV2 module, which generates core feature maps and employs lightweight operations to create redundant features, greatly reducing the computational complexity of *** results show that on the SIMD dataset, the improved YOLOv9 model has a parameter size of 167.88 MB and GFLOPs of 208.6. Compared to the baseline YOLOv9 model (parameter size: 194.57 MB, GFLOPs: 239.0), the parameter size is reduced by 13.71%, GFLOPs are reduced by 12.72%, and detection accuracy is improved by 1.4%. These results demonstrate that the proposed lightweight YOLOv9 model effectively reduces computational overhead while maintaining excellent detection performance, providing an efficient solution for object detection tasks in resou
暂无评论