INTRODUCTION With the rapid development of remote sensing technology,high-quality remote sensing images have become widely *** automated object detection and recognition of these images,which aims to automatically loc...
INTRODUCTION With the rapid development of remote sensing technology,high-quality remote sensing images have become widely *** automated object detection and recognition of these images,which aims to automatically locate objects of interest in remote sensing images and distinguish their specific categories,is an important fundamental task in the *** provides an effective means for geospatial object monitoring in many social applications,such as intelligent transportation,urban planning,environmental monitoring and homeland security.
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In thi...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.
Pancreatic cancer's devastating impact and low survival rates call for improved detection methods. While Artificial Intelligence has shown remarkable progress, its increasing complexity has led to "black box&...
详细信息
With the continuous advancement of satellite technology, remote sensing images has been increasingly applied in fields such as urban planning, environmental monitoring, and disaster response. However, remote sensing i...
详细信息
With the continuous advancement of satellite technology, remote sensing images has been increasingly applied in fields such as urban planning, environmental monitoring, and disaster response. However, remote sensing images often feature small target sizes and complex backgrounds, posing significant computational challenges for object detection tasks. To address this issue, this paper proposes a lightweight remote sensing images object detection algorithm based on YOLOv9. The proposed algorithm incorporates the SimRMB module, which effectively reduces computational complexity while improving the efficiency and accuracy of feature extraction. Through a dynamic attention mechanism, SimRMB is capable of focusing on important regions while minimizing background interference, and by integrating residual learning and skip connections, it ensures the stability of deep networks. To further enhance detection performance, the FasterRepNCSPELAN4 module is introduced, which employs PConv operations to reduce computational load and memory usage. It also utilizes dilated convolutions and DFC attention mechanisms to strengthen feature extraction, thereby increasing the efficiency and accuracy of object detection. Additionally, this study integrates the GhostModuleV2 module, which generates core feature maps and employs lightweight operations to create redundant features, greatly reducing the computational complexity of *** results show that on the SIMD dataset, the improved YOLOv9 model has a parameter size of 167.88 MB and GFLOPs of 208.6. Compared to the baseline YOLOv9 model (parameter size: 194.57 MB, GFLOPs: 239.0), the parameter size is reduced by 13.71%, GFLOPs are reduced by 12.72%, and detection accuracy is improved by 1.4%. These results demonstrate that the proposed lightweight YOLOv9 model effectively reduces computational overhead while maintaining excellent detection performance, providing an efficient solution for object detection tasks in resou
Vehicle to Everything (V2X) is a core 5G technology. V2X and its enabler, Device-to-Device (D2D), are essential for the Internet of Things (IoT) and the Internet of Vehicles (IoV). V2X enables vehicles to communicate ...
详细信息
The Internet of Things (IoT) occupies the entire world in its hands. IoT devices have a resource-constrained nature known as Low Power and Lossy Networks (LLN). The Routing Protocol for Low Power and Lossy Networks (R...
详细信息
Alzheimer's disease is a common and complex brain disorder that primarily affects the elderly. Because it is progressing and has few effective therapies, it requires a thorough understanding of the condition;our s...
详细信息
Because of the rapid development of communication and service in Taiwan, competition among telecommunication companies has become ever fiercer. Differences in marketing strategy usually become the key factor in keepin...
详细信息
In this article, a novel method is proposed to facilitate the design of compact, low-profile, pattern reconfigurable antennas with fixed or switchable circular polarization (CP) for Internet of Vehicles (IoV) applicat...
详细信息
As internet use in communication networks has grown, fake news has become a big problem. The misleading heading of the news loses the trust of the reader. Many techniques have emerged, but they fail because fraudsters...
详细信息
暂无评论