Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In thi...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.
In this work, we introduce a class of black-box(BB) reductions called committed-programming reduction(CPRed) in the random oracle model(ROM) and obtain the following interesting results:(1) we demonstrate that some we...
详细信息
In this work, we introduce a class of black-box(BB) reductions called committed-programming reduction(CPRed) in the random oracle model(ROM) and obtain the following interesting results:(1) we demonstrate that some well-known schemes, including the full-domain hash(FDH) signature(Eurocrypt1996) and the Boneh-Franklin identity-based encryption(IBE) scheme(Crypto 2001), are provably secure under CPReds;(2) we prove that a CPRed associated with an instance-extraction algorithm implies a reduction in the quantum ROM(QROM). This unifies several recent results, including the security of the Gentry-Peikert-Vaikuntanathan IBE scheme by Zhandry(Crypto 2012) and the key encapsulation mechanism(KEM) variants using the Fujisaki-Okamoto transform by Jiang et al.(Crypto 2018) in the ***, we show that CPReds are incomparable to non-programming reductions(NPReds) and randomly-programming reductions(RPReds) formalized by Fischlin et al.(Asiacrypt 2010).
The rapid development of 5G/6G and AI enables an environment of Internet of Everything(IoE)which can support millions of connected mobile devices and applications to operate smoothly at high speed and low ***,these ma...
详细信息
The rapid development of 5G/6G and AI enables an environment of Internet of Everything(IoE)which can support millions of connected mobile devices and applications to operate smoothly at high speed and low ***,these massive devices will lead to explosive traffic growth,which in turn cause great burden for the data transmission and content *** challenge can be eased by sinking some critical content from cloud to *** this case,how to determine the critical content,where to sink and how to access the content correctly and efficiently become new *** work focuses on establishing a highly efficient content delivery framework in the IoE *** particular,the IoE environment is re-constructed as an end-edge-cloud collaborative system,in which the concept of digital twin is applied to promote the *** on the digital asset obtained by digital twin from end users,a content popularity prediction scheme is firstly proposed to decide the critical content by using the Temporal Pattern Attention(TPA)enabled Long Short-Term Memory(LSTM)***,the prediction results are input for the proposed caching scheme to decide where to sink the critical content by using the Reinforce Learning(RL)***,a collaborative routing scheme is proposed to determine the way to access the content with the objective of minimizing *** experimental results indicate that the proposed schemes outperform the state-of-the-art benchmarks in terms of the caching hit rate,the average throughput,the successful content delivery rate and the average routing overhead.
Infrared imaging technology is capable of capturing the thermal radiation emitted by the human body in conditions with insufficient visible light. Consequently, infrared behavior recognition leverages this capability ...
详细信息
Bat Algorithm (BA) is a nature-inspired metaheuristic search algorithm designed to efficiently explore complex problem spaces and find near-optimal solutions. The algorithm is inspired by the echolocation behavior of ...
详细信息
In the field of the Internet of Medical Things (IoMT), the demand for Human action recognition (HAR) is growing. Due to the limitations of portability and privacy of traditional sensors, many endeavors have made signi...
详细信息
This article designs the PELAN structure based on the lightweight YOLOv7-tiny model for surface defect detection of hot-rolled steel strips. At the same time, the CA (Channel Attention) is embedded in the feature pyra...
详细信息
Preservation of the crops depends on early and accurate detection of pests on crops as they cause several diseases decreasing crop production and quality. Several deep-learning techniques have been applied to overcome...
详细信息
Preservation of the crops depends on early and accurate detection of pests on crops as they cause several diseases decreasing crop production and quality. Several deep-learning techniques have been applied to overcome the issue of pest detection on crops. We have developed the YOLOCSP-PEST model for Pest localization and classification. With the Cross Stage Partial Network (CSPNET) backbone, the proposed model is a modified version of You Only Look Once Version 7 (YOLOv7) that is intended primarily for pest localization and classification. Our proposed model gives exceptionally good results under conditions that are very challenging for any other comparable models especially conditions where we have issues with the luminance and the orientation of the images. It helps farmers working out on their crops in distant areas to determine any infestation quickly and accurately on their crops which helps in the quality and quantity of the production yield. The model has been trained and tested on 2 datasets namely the IP102 data set and a local crop data set on both of which it has shown exceptional results. It gave us a mean average precision (mAP) of 88.40% along with a precision of 85.55% and a recall of 84.25% on the IP102 dataset meanwhile giving a mAP of 97.18% on the local data set along with a recall of 94.88% and a precision of 97.50%. These findings demonstrate that the proposed model is very effective in detecting real-life scenarios and can help in the production of crops improving the yield quality and quantity at the same time.
Dealing with classification problems requires the crucial step of feature selection (FS), which helps to reduce data dimensions and shorten classification time. Feature selection and support vector machines (SVM) clas...
详细信息
The smart distribution network(SDN)is integrat ing increasing distributed generation(DG)and energy storage(ES).Hosting capacity evaluation is important for SDN plan ning with *** and ES are usually invested by users o...
详细信息
The smart distribution network(SDN)is integrat ing increasing distributed generation(DG)and energy storage(ES).Hosting capacity evaluation is important for SDN plan ning with *** and ES are usually invested by users or a third party,and they may form friendly microgrids(MGs)and operate *** centralized dispatching meth od no longer suits for hosting capacity evaluation of SDN.A quick hosting capacity evaluation method based on distributed optimal dispatching is ***,a multi-objective DG hosting capacity evaluation model is established,and the host ing capacity for DG is determined by the optimal DG planning *** steady-state security region method is applied to speed up the solving process of the DG hosting capacity evalua tion ***,the optimal dispatching models are estab lished for MG and SDN respectively to realize the operating *** the distributed dispatching strategy,the dual-side optimal operation of SDN-MGs can be realized by several iterations of power exchange ***,an SDN with four MGs is conducted considering multiple flexible *** shows that the DG hosting capacity of SDN oversteps the sum of the maximum active power demand and the rated branch ***,the annual DG electricity oversteps the maximum active power demand value.
暂无评论