Deep learning techniques have achieved remarkable progress in document understanding. Most models use coordinates to represent absolute or relative spatial information of components, but they are difficult to represen...
详细信息
ISBN:
(纸本)9781665493468
Deep learning techniques have achieved remarkable progress in document understanding. Most models use coordinates to represent absolute or relative spatial information of components, but they are difficult to represent latent rules in the document layout. This makes learning layout representation to be more difficult. Unlike the previous researches which have employed the coordinate system, graph or grid to represent the document layout, we propose a novel layout representation, the cell-based layout, to provide easy-to-understand spatial information for backbone models. In line with human reading habits, it uses cell information, i.e. row and column index, to represent the position of components in a document, and makes the document layout easier to understand. Furthermore, we proposed the multi-scale layout to represent the hierarchical structure of layout, and developed a data augmentation method to improve the performance. Experiment results show that our method achieves the state-of-the-art performance in text-based tasks, including form understanding and receipt understanding, and improves the performance in image-based task such as document image classification. We released the code in the repo (a).
We present a comprehensive study on a new task named image color aesthetics assessment (ICAA), which aims to assess color aesthetics based on human perception. ICAA is important for various applications such as imagin...
ISBN:
(纸本)9798350307184
We present a comprehensive study on a new task named image color aesthetics assessment (ICAA), which aims to assess color aesthetics based on human perception. ICAA is important for various applications such as imaging measurement and image analysis. However, due to the highly diverse aesthetic preferences and numerous color combinations, ICAA presents more challenges than conventional image quality assessment tasks. To advance ICAA research, 1) we propose a baseline model called the Delegate Transformer, which not only deploys deformable transformers to adaptively allocate interest points, but also learns human color space segmentation behavior by the dedicated module. 2) We elaborately build a color-oriented dataset, ICAA17K, containing 17K images, covering 30 popular color combinations, 80 devices and 50 scenes, with each image densely annotated by more than 1,500 people. Moreover, we develop a large-scale benchmark of 15 methods, the most comprehensive one thus far based on two datasets, SPAQ and ICAA17K. Our work, not only achieves state-of-the-art performance, but more importantly offers the community a roadmap to explore solutions for ICAA. Code and dataset are available in here.
We show that crowd counting can be viewed as a decomposable point querying process. This formulation enables arbitrary points as input and jointly reasons whether the points are crowd and where they locate. The queryi...
ISBN:
(纸本)9798350307184
We show that crowd counting can be viewed as a decomposable point querying process. This formulation enables arbitrary points as input and jointly reasons whether the points are crowd and where they locate. The querying processing, however, raises an underlying problem on the number of necessary querying points. Too few imply underestimation;too many increase computational overhead. To address this dilemma, we introduce a decomposable structure, i.e., the point-query quadtree, and propose a new counting model, termed Point quEry Transformer (PET). PET implements decomposable point querying via data-dependent quadtree splitting, where each querying point could split into four new points when necessary, thus enabling dynamic processing of sparse and dense regions. Such a querying process yields an intuitive, universal modeling of crowd as both the input and output are interpretable and steerable. We demonstrate the applications of PET on a number of crowd-related tasks, including fully-supervised crowd counting and localization, partial annotation learning, and point annotation refinement, and also report state-of-the-art performance. For the first time, we show that a single counting model can address multiple crowd-related tasks across different learning paradigms. Code is available at https://***/cxliu0/PET.
The purpose of low-light image enhancement is to improve the clarity of objects in low-light environments to facilitate the recognition and detection of targets later. Local Contrast Denoising and Fusion Network (LCDF...
详细信息
Deep neural network models are more and more widely used in image reconstruction and generation tasks. By setting various loss functions, the model adaptively generates images that meet the corresponding constraints, ...
详细信息
Magnetic Compressors play major role in imageprocessing and microwave applications. AND-OR complex compound gates and XOR-XNOR modules are used to design compressor to achieve low power consumption and less hardware ...
详细信息
Recently there has been a growing interest in learning generative models from a single image. This task is important as in many real world applications, collecting large dataset is not feasible. Existing work like Sin...
详细信息
ISBN:
(纸本)9781728198354
Recently there has been a growing interest in learning generative models from a single image. This task is important as in many real world applications, collecting large dataset is not feasible. Existing work like SinGAN is able to synthesize novel images that resemble the patch distribution of the training image. However, SinGAN cannot learn high level semantics of the image, and thus their synthesized samples tend to have unrealistic spatial layouts. To address this issue, this paper proposes a spatially adaptive style-modulation (SASM) module that learns to preserve realistic spatial configuration of images. Specifically, it extracts style vector (in the form of channel-wise attention) and latent spatial mask (in the form of spatial attention) from a coarse level feature separately. The style vector and spatial mask are then aggregated to modulate features of deeper layers. The disentangled modulation of spatial and style attributes enables the model to preserve the spatial structure of the image without overfitting. Experimental results show that the proposed module learns to generate samples with better fidelity than prior works.
This paper deals with blind image separation by exploiting the statistical characteristics of the mixtures (information related to the sources independence) with the sparsity of the signals. More precisely, we investi...
详细信息
The proliferation of technologies and unstructured data on the internet poses a persistent challenge in extracting valuable information from diverse formats. To address this, research leverages Machine Learning (ML) a...
详细信息
ISBN:
(纸本)9798350326970
The proliferation of technologies and unstructured data on the internet poses a persistent challenge in extracting valuable information from diverse formats. To address this, research leverages Machine Learning (ML) and Natural Language processing (NLP) techniques. This study contributes to information extraction from unstructured text using a stateof-the-art pipeline, incorporating modules for coreference resolution (Neuralcoref), named entity linking (Wikifier API), and Relationship Extraction (RE) (OpenNRE and REBEL models). The resulting Knowledge Graph (KG) in Neo4j captures entity relationships. Experiments on a BBC news dataset analyzed the pipeline's performance, focusing on RE. Accuracies of 61.4% (OpenNRE) and 87% (REBEL) were achieved. The research demonstrates the efficacy of the proposed pipeline in extracting structured knowledge from unstructured data, facilitating the preservation and utilization of valuable information.
The update of modern intelligent technology has driven the progress of computer related technology. The tide of technological revolution is impacting the economic system all over the world. The emergence of informatio...
详细信息
暂无评论