检索结果-内蒙古大学图书馆

OCRBench: on the hidden mystery of OCR in large multimodal models

science China(Information sciences) 2024年第12期67卷 23-35页

作者： Yuliang LIU Zhang LI Mingxin HUANG Biao YANG Wenwen YU Chunyuan LI Xu-Cheng YIN Cheng-Lin LIU Lianwen JIN Xiang BAI School of Artificial Intelligence and Automation Huazhong University of Science and Technology School of Electronic and Information Engineering South China University of Technology Microsoft Research School of Computer & Communication Engineering University of Science and Technology Beijing Institute of Automation Chinese Academy of Sciences School of Software Engineering Huazhong University of Science and Technology

Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.

关键词： large multimodal model OCR text recognition scene text-centric VQA document-oriented VQA key information extraction handwritten mathematical expression recognition

来源：评论

学校读者我要写书评

暂无评论

Tensor Factorization for Accurate Anomaly Detection in Dynamic Networks

IEEE Transactions on Sustainable Computing

引用

IEEE Transactions on Sustainable Computing 2024年第3期10卷 439-450页

作者： Li, Xiaocan Wen, Jigang Xie, Kun Xie, Gaogang Liang, Wei Hunan University College of Computer Science and Electronics Engineering Changsha China Hunan University of Science and Technology School of Computer Science and Engineering Xiangtan China Chinese Academy of Sciences Computer Network Information Center China

Accurately detecting traffic anomalies becomes increasingly crucial in network management. Algorithms that model the traffic data as a matrix suffers from low detection accuracy, while the work using the tensor model often assumes the tensor is regular without considering that network nodes may dynamically join in or leave, which will fail in a practical network with the change of node set as a result of mobility and churn behaviors. We propose a novel Tensor Recovery scheme in a Dynamic Network (TRDN) with traffic data modeled as a practical irregular tensor for accurate anomaly detection. To take advantage of correlations among small tensors, each formed with a short time duration to capture more hidden information in the data for higher detection accuracy, we propose several novel techniques: 1) a new joint tensor factorization model to capture the characteristic shared by the common nodes of small tensors, 2) a tensor partition algorithm to identify the data that can be applied to train the shared parameters efficiently, and 3) a bar-based algorithm that partitions nodes into the minimum number of no-overlapping subsets to form the shared tensor model. Extensive experiments on two Internet traffic data sets, Abilene and GEANT, demonstrate the effectiveness of the proposed TRDN. © 2016 IEEE.

关键词： Data accuracy

来源：评论

学校读者我要写书评

暂无评论

A Low Resource Multi-lingual Simultaneous Script Identification and Text Recognition Model

引用

SN computer science 2024年第6期5卷 740页

作者： Mukherjee, Jayati Roy, Utpal Computer Science and Engineering Academy of Technology Department of computer and system sciences Visva-bharati

In this paper, we have proposed a multi-task learning model for multi-lingual Optical Character Recognition. Our model does the script identification and text recognition simultaneously of offline machine printed documents. We have extracted the spatial and temporal features of a line image by the combination of several CNN and BLSTM layers. The feature is shared between the script identification and text recognition modules. Fully connected layer and softmax identify the script. The identified script works as a case selector for the text recognizer which is a CTC layer. Finally, the text is identified by the text recognizer. The model is applied to two public datasets: ISIDDI, RETAS containing Bengali degraded, and English pages. We have created a dataset of Devnagari/Hindi and Tamil scripts to test our model. The model has achieved 99.2% accuracy for script recognition. The achieved text recognition accuracy on the scripts Bengali, English, Hindi, and Tamil are respectively 91.68%, 97.07%, 95.68% and 92.27%. © The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. 2024.

关键词： Deep learning Multi-task learning Optical character recognition

来源：评论

学校读者我要写书评

暂无评论

ETiSeg-Net: edge-aware self attention to enhance tissue segmentation in histopathological images

引用

Multimedia Tools and Applications 2024年 1-21页

作者： Rashmi, R. Girisha, S. Department of Computer Science and Engineering Manipal Institute of Technology Bengaluru Manipal Academy of Higher Education Manipal India Department of Data Science and Computer Applications Manipal Institute of Technology Manipal Academy of Higher Education Manipal India

Digital pathology employing Whole Slide Images (WSIs) plays a pivotal role in cancer detection. Nevertheless, the manual examination of WSIs for the identification of various tissue regions presents formidable challenges due to its labor-intensive nature and subjective interpretation. Convolutional Neural Network (CNN) based semantic segmentation algorithms have emerged as valuable tools for assisting in this task by automating ROI delineation. The incorporation of attention modules and carefully designed loss functions has shown promise in further augmenting the performance of these algorithms. However, there exists a notable gap in research regarding the utilization of attention modules specifically for tissue segmentation, thereby constraining our comprehension and application of these modules in this context. This study introduces ETiSeg-Net (Edge-aware self attention to enhance Tissue Segmentation), a CNN-based semantic segmentation model that uses a novel edge-based attention module to achieve effective delineation of class boundaries. In addition, an innovative iterative training strategy is devised to efficiently optimize the model parameters. The study also conducts a comprehensive investigation into the impact of attention modules and loss functions on the efficacy of semantic segmentation models. Qualitative and quantitative evaluations of these semantic segmentation models are conducted using publicly available datasets. The findings underscore the potential of attention modules in enhancing the accuracy and effectiveness of tissue semantic segmentation. © The Author(s) 2024.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

VireNet-SSD: object detection model for resource-constrained applications based on self-organized operational neural networks

引用

Neural Computing and Applications 2025年第14期37卷 8547-8569页

作者： Kamath, Vidya Renuka, A. Department of Computer Science and Engineering Manipal Institute of Technology Manipal Academy of Higher Education Karnataka Manipal576104 India

Discovering deep learning-based computer vision solutions for use with constrained devices is exceptionally hard, and the trade-offs are often too undermining. Deep learning models are enormous, which makes it challenging to deploy them on constrained platforms. The convolutional neural network is the fundamental framework for majority of the models that are currently in use. However, operational neural networks have recently shown to be a better option to the convolutional equivalents on a variety of tasks due to their heterogeneous nature and greater resemblance to the functioning of biological neurons. The question of whether heterogeneous models could function on constrained devices and be deployed in real time remains a major concern. To address this problem, an object detection model architecture based on a single-shot multi-box detector with self-organized operational neural networks as its backbone was developed, which can perform efficiently on constrained devices such as Raspberry Pi. The resultant backbone architecture was named as VireNet. In contrast to homogeneous conventional deep learning networks that use convolutions, heterogeneous networks were chosen to develop VireNet, which provides a more productive and effective solution. Furthermore, an in-depth explanation of the design space has been provided to aid any future research that is associated with this architectural search. This new approach might mark the very beginning of the use of heterogeneity to address issues on devices with constrained resources. © The Author(s) 2025.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

A Machine Learning Based Automated Model for Managing Student Dropout 22

A Machine Learning Based Automated Model for Managing Studen...

引用

22nd IEEE/ACIS International Conference on Software engineering Research, Management and Applications, SERA 2024

作者： Ghosh, Partha Charit, Arnab Banerjee, Hindol Bandhu, Debanwesa Ghosh, Agniv Pal, Ankita Goto, Takaaki Sen, Soumya Academy of Technology Department of Computer Science and Engineering Adisaptagram India Toyo University Faculty of Information Sciences and Arts Saitama Japan A.K.Choudhury School of Information Technology University of Calcutta Kolkata India

ISBN: (纸本)9798350391343

Addressing the persistent challenge of student dropout, particularly prevalent in developing countries like India, Bangladesh, etc. are of paramount importance. Factors such as poverty, natural calamities, and early marriages exacerbate this issue. High student dropout rates can negatively impact a country by diminishing its economic productivity, increasing social inequalities, and perpetuating a cycle of poverty. Addressing dropout issues requires comprehensive strategies to ensure a skilled and educated workforce, fostering societal well-being and global competitiveness. This research focuses on analysing comprehensive data on students who have dropped out. Thereafter, a machine learning based methodology is used to discern the underlying causes of student attrition in various schools. Furthermore, it allows for efficient monitoring of the state's educational landscape, with the ability to drill down to granular levels when necessary to identify specific regional challenges. The effectiveness of this approach is validated through the utilization of real-world datasets. © 2024 IEEE.

关键词： Students

来源：评论

学校读者我要写书评

暂无评论

Automated object recognition in high-resolution optical remote sensing imagery

引用

National science Review 2023年第6期10卷 38-41页

作者： Yazhou Yao Tao Chen Hanbo Bi Xinhao Cai Gensheng Pei Guoye Yang Zhiyuan Yan Xian Sun Xing Xu Hai Zhang School of Computer Science and Engineering Nanjing University of Science and Technology Aerospace Information Research Institute Chinese Academy of Sciences School of Electronic Electrical and Communication Engineering University of Chinese Academy of Sciences Key Laboratory of Network Information System Technology (NIST) Aerospace Information Research Institute Chinese Academy of Sciences Department of Computer Science and Technology Tsinghua University School of Computer Science and Engineering University of Electronic Science and Technology of China Pazhou Laboratory (Huangpu) School of Mathematics Northwest University

INTRODUCTION With the rapid development of remote sensing technology,high-quality remote sensing images have become widely *** automated object detection and recognition of these images,which aims to automatically locate objects of interest in remote sensing images and distinguish their specific categories,is an important fundamental task in the *** provides an effective means for geospatial object monitoring in many social applications,such as intelligent transportation,urban planning,environmental monitoring and homeland security.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Molecular Generation and Optimization of Molecular Properties Using a Transformer Model

引用

Big Data Mining and Analytics 2024年第1期7卷 142-155页

作者： Zhongyin Xu Xiujuan Lei Mei Ma Yi Pan School of Computer Science Shaanxi Normal UniversityXi'an 710119China Faculty of Computer Science and Control Engineering Shenzhen Institute of Advanced TechnologyChinese Academy of SciencesShenzhen 518055China

Generating novel molecules to satisfy specific properties is a challenging task in modern drug discovery,which requires the optimization of a specific objective based on satisfying chemical ***,we aim to optimize the properties of a specific molecule to satisfy the specific properties of the generated *** Matched Molecular Pairs(MMPs),which contain the source and target molecules,are used herein,and logD and solubility are selected as the optimization *** main innovative work lies in the calculation related to a specific transformer from the perspective of a matrix *** intervals and state changes are then used to encode logD and solubility for subsequent *** the experiments,we screen the data based on the proportion of heavy atoms to all atoms in the groups and select 12365,1503,and 1570 MMPs as the training,validation,and test sets,*** models are compared with the baseline models with respect to their abilities to generate molecules with specific *** show that the transformer model can accurately optimize the source molecules to satisfy specific properties.

关键词： molecular optimization transformer Matched Molecular Pairs(MMPs) logD solubility

来源：评论

学校读者我要写书评

暂无评论

VTensor:Using Virtual Tensors to Build a Layout-Oblivious AI Programming Framework

引用

Journal of computer science & technology 2023年第5期38卷 1074-1097页

作者：俞峰赵家程崔慧敏冯晓兵薛京灵 Institute of Computing Technology Chinese Academy of SciencesBeijing 100190China School of Computer Science and Technology University of Chinese Academy of SciencesBeijing 100080China School of Computer Science and Engineering University of New South WalesSydney 1466Australia

Tensors are a popular programming interface for developing artificial intelligence(AI)*** refers to the order of placing tensor data in the memory and will affect performance by affecting data locality;therefore the deep neural network library has a convention on the *** AI applications can use arbitrary layouts,and existing AI systems do not provide programming abstractions to shield the layout conventions of libraries,operator developers need to write a lot of layout-related code,which reduces the efficiency of integrating new libraries or developing new ***,the developer assigns the layout conversion operation to the internal operator to deal with the uncertainty of the input layout,thus losing the opportunity for layout *** on the idea of polymorphism,we propose a layout-agnostic virtual tensor programming interface,namely the VTensor framework,which enables developers to write new operators without caring about the underlying physical layout of *** addition,the VTensor framework performs global layout inference at runtime to transparently resolve the required layout of virtual tensors,and runtime layout-oriented optimizations to globally minimize the number of layout transformation *** results demonstrate that with VTensor,developers can avoid writing layout-dependent *** with TensorFlow,for the 16 operations used in 12 popular networks,VTensor can reduce the lines of code(LOC)of writing a new operation by 47.82%on average,and improve the overall performance by 18.65%on average.

关键词： artificial intelligence(AI)programming layout-oblivious tensor processing

来源：评论

学校读者我要写书评

暂无评论

Target extraction through strong scattering disturbance using characteristic-enhanced pseudo-thermal ghost imaging

引用

Chinese Optics Letters 2024年第12期22卷 38-44页

作者： Xuanpengfan Zou Xianwei Huang Wei Tan Liyu Zhou Xiaohui Zhu Qin Fu Xiaoqian Liang Suqin Nan Yanfeng Bai Xiquan Fu College of Computer Science and Electronic Engineering Hunan UniversityChangsha 410082China Hunan Police Academy Changsha 410138China Hunan Provincial Key Laboratory of Network Investigational Technology Hunan Police AcademyChangsha 410138China School of Computer Science Hunan University of Technology and BusinessChangsha 410205China

It is difficult to extract targets under strong environmental disturbance in *** imaging(GI)is an innovative antiinterference imaging *** this paper,we propose a scheme for target extraction based on characteristicenhanced pseudo-thermal *** traditional GI which relies on training the detected signals or imaging results,our scheme trains the illuminating light fields using a deep learning network to enhance the target’s characteristic *** simulation and experimental results prove that our imaging scheme is sufficient to perform single-and multiple-target extraction at low *** addition,the effect of a strong scattering environment is discussed,and the results show that the scattering disturbance hardly affects the target extraction *** proposed scheme presents the potential application in target extraction through scattering media.

关键词： target extraction ghost imaging characteristic enhancement strong scattering environment

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：