Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In thi...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.
Accurately detecting traffic anomalies becomes increasingly crucial in network management. Algorithms that model the traffic data as a matrix suffers from low detection accuracy, while the work using the tensor model ...
详细信息
In this paper, we have proposed a multi-task learning model for multi-lingual Optical Character Recognition. Our model does the script identification and text recognition simultaneously of offline machine printed docu...
详细信息
Digital pathology employing Whole Slide Images (WSIs) plays a pivotal role in cancer detection. Nevertheless, the manual examination of WSIs for the identification of various tissue regions presents formidable challen...
详细信息
Discovering deep learning-based computer vision solutions for use with constrained devices is exceptionally hard, and the trade-offs are often too undermining. Deep learning models are enormous, which makes it challen...
详细信息
Addressing the persistent challenge of student dropout, particularly prevalent in developing countries like India, Bangladesh, etc. are of paramount importance. Factors such as poverty, natural calamities, and early m...
详细信息
INTRODUCTION With the rapid development of remote sensing technology,high-quality remote sensing images have become widely *** automated object detection and recognition of these images,which aims to automatically loc...
INTRODUCTION With the rapid development of remote sensing technology,high-quality remote sensing images have become widely *** automated object detection and recognition of these images,which aims to automatically locate objects of interest in remote sensing images and distinguish their specific categories,is an important fundamental task in the *** provides an effective means for geospatial object monitoring in many social applications,such as intelligent transportation,urban planning,environmental monitoring and homeland security.
Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery,which requires the optimization of a specific objective based on satisfying chemical ***,we aim to optimize the ...
详细信息
Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery,which requires the optimization of a specific objective based on satisfying chemical ***,we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated *** Matched Molecular Pairs(MMPs),which contain the source and target molecules,are used herein,and logD and solubility are selected as the optimization *** main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix *** intervals and state changes are then used to encode logD and solubility for subsequent *** the experiments,we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12365,1503,and 1570 MMPs as the training,validation,and test sets,*** models are compared with the baseline models with respect to their abilities to generate molecules with specific *** show that the transformer model can accurately optimize the source molecules to satisfy specific properties.
Tensors are a popular programming interface for developing artificial intelligence(AI)*** refers to the order of placing tensor data in the memory and will affect performance by affecting data locality;therefore the d...
详细信息
Tensors are a popular programming interface for developing artificial intelligence(AI)*** refers to the order of placing tensor data in the memory and will affect performance by affecting data locality;therefore the deep neural network library has a convention on the *** AI applications can use arbitrary layouts,and existing AI systems do not provide programming abstractions to shield the layout conventions of libraries,operator developers need to write a lot of layout-related code,which reduces the efficiency of integrating new libraries or developing new ***,the developer assigns the layout conversion operation to the internal operator to deal with the uncertainty of the input layout,thus losing the opportunity for layout *** on the idea of polymorphism,we propose a layout-agnostic virtual tensor programming interface,namely the VTensor framework,which enables developers to write new operators without caring about the underlying physical layout of *** addition,the VTensor framework performs global layout inference at runtime to transparently resolve the required layout of virtual tensors,and runtime layout-oriented optimizations to globally minimize the number of layout transformation *** results demonstrate that with VTensor,developers can avoid writing layout-dependent *** with TensorFlow,for the 16 operations used in 12 popular networks,VTensor can reduce the lines of code(LOC)of writing a new operation by 47.82%on average,and improve the overall performance by 18.65%on average.
It is difficult to extract targets under strong environmental disturbance in *** imaging(GI)is an innovative antiinterference imaging *** this paper,we propose a scheme for target extraction based on characteristicenh...
详细信息
It is difficult to extract targets under strong environmental disturbance in *** imaging(GI)is an innovative antiinterference imaging *** this paper,we propose a scheme for target extraction based on characteristicenhanced pseudo-thermal *** traditional GI which relies on training the detected signals or imaging results,our scheme trains the illuminating light fields using a deep learning network to enhance the target’s characteristic *** simulation and experimental results prove that our imaging scheme is sufficient to perform single-and multiple-target extraction at low *** addition,the effect of a strong scattering environment is discussed,and the results show that the scattering disturbance hardly affects the target extraction *** proposed scheme presents the potential application in target extraction through scattering media.
暂无评论