Producing executable code from natural-language directives via Large Language Models (LLMs) involves obstacles like semantic uncertainty and the requirement for task-focused context interpretation. To resolve these di...
详细信息
The proliferation of fake news on social media has intensified the spread of misinformation, promoting societal biases, hate, and violence. While recent advancements in Generative AI (GenAI), particularly large langua...
详细信息
Colorectal cancer (CRC) is a global public health concern, and early detection through screening reduces mortality rates. It is one of the common types of cancer with a high mortality rate. Traditionally, colonoscopy ...
详细信息
Human Activity Recognition (HAR) is a trading area in computer vision and deep learning. However, boosting the performance of deep learning models often necessitates increasing their size or capacity, which raises com...
详细信息
Drug-Target Interaction (DTI) involves the observation and recognition of interactions that occur between chemical molecules and target proteins in the human body. However, lab experimentation for DTI can be time-cons...
详细信息
Social network analysis provides quantifiable methods and topological metrics to examine the networked structure for several interdisciplinary applications. In our research, a social network of GitHub community is con...
详细信息
Ultrasound imaging is a common and non-invasive method for diagnosing gallbladder diseases, including gallbladder cancer (GBC). However, the inherent challenges of ultrasound images-such as noise, low contrast, and va...
详细信息
The software development projects’ testing part is usually expensive and complex, but it is essential to gauge the effectiveness of the developed software. Software Fault Prediction (SFP) primarily serves to detect f...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In thi...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.
The paper presents a novel framework for optimizing LoRaWAN gateway placement to enhance network performance and reliability. By integrating Network Time Protocol (NTP) for global synchronization and Precision Time Pr...
详细信息
暂无评论