检索结果-内蒙古大学图书馆

Proceedings of the 2021 4th International Conference on Algorithms, Computing and Artificial Intelligence

作者： Xueke Chi Da-Han Wang Yuefeng Wu Yun Wu Fujian Key Laboratory of Pattern Recognition and Image Understanding China and School of Computer and Information Engineering Xiamen University of Technology China

ISBN: (纸本)9781450385053

Attention-based encoder-decoder models have made great success on handwritten mathematical expression recognition in recent years. However, this kind of method has the problem of attention drift, because under the local attention mechanism based on RNN, the high similarity between coding features can cause attention confusion. To settle this problem, we propose an encoder-decoder model with self-attention, which captures the global information of the feature map and fuses the local information of the CNN as complementary features. Experiments are conducted on the CROHME2014 and CROHME 2016 competition datasets. The experimental results show that, when only using the official training dataset, the proposed method achieves recognition accuracies of 51.98% and 50.74% on the CROHME2014 and CROHME2016 competition datasets, respectively, which outperforms the other methods significantly. The improvements demonstrate the effectiveness of the self-attention module.

关键词： handwritten mathematical expression offline recognition non-local self-attention

来源：评论

学校读者我要写书评

暂无评论

ES6D: A Computation Efficient and Symmetry-Aware 6D Pose Regression Framework

arXiv

引用

arXiv 2022年

作者： Mo, Ningkai Gan, Wanshui Yokoya, Naoto Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China The University of Tokyo Japan RIKEN Japan

In this paper, a computation efficient regression framework is presented for estimating the 6D pose of rigid objects from a single RGB-D image, which is applicable to handling symmetric objects. This framework is designed in a simple architecture that efficiently extracts point-wise features from RGB-D data using a fully convolutional network, called XYZNet, and directly regresses the 6D pose without any post refinement. In the case of symmetric object, one object has multiple ground-truth poses, and this one-to-many relationship may lead to estimation ambiguity. In order to solve this ambiguity problem, we design a symmetry-invariant pose distance metric, called average (maximum) grouped primitives distance or A(M)GPD. The proposed A(M)GPD loss can make the regression network converge to the correct state, i.e., all minima in the A(M)GPD loss surface are mapped to the correct poses. Extensive experiments on YCB-Video and TLESS datasets demonstrate the proposed framework's substantially superior performance in top accuracy and low computational cost. The relevant code is available in https://***/GANWANSHUI/***. Copyright © 2022, The Authors. All rights reserved.

关键词： Computational efficiency

来源：评论

学校读者我要写书评

暂无评论

ARNET: ACTIVE-REFERENCE NETWORK FOR FEW-SHOT IMAGE SEMANTIC SEGMENTATION

ARNET: ACTIVE-REFERENCE NETWORK FOR FEW-SHOT IMAGE SEMANTIC ...

引用

2021 IEEE International Conference on Multimedia and Expo, ICME 2021

作者： Shi, Guangchen Wu, Yirui Palaiahnakote, Shivakumara Pal, Umapada Lu, Tong College of Computer and Information Hohai University China Department of Computer System and Information Technology University of Malaya Malaysia Computer Vision and Pattern Recognition Unit Indian Statistical Institute India National Key Lab for Novel Software Technology Nanjing University China

ISBN: (纸本)9781665438643

To make predictions on unseen classes, few-shot segmentation becomes a research focus recently. However, most methods build on pixel-level annotation requiring quantity of manual work. Moreover, inherent information on same-category objects to guide segmentation could have large diversity in feature representation due to differences in size, appearance, layout, and so on. To tackle these problems, we present an active-reference network (ARNet) for few-shot segmentation. The proposed active-reference mechanism not only supports accurately co-occurrent objects in either support or query images, but also relaxes high constraint on pixel-level labeling, allowing for weakly boundary labeling. To extract more intrinsic feature representation, a category-modulation module (CMM) is further applied to fuse features extracted from multiple support images, thus forgetting useless and enhancing contributive information. Experiments on PASCAL-5i dataset show the proposed method achieves a m-IOU score of 56.5% for 1-shot and 59.8% for 5-shot segmentation, being 0.5% and 1.3% higher than current state-of-the-art method. © 2021 IEEE

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

New Traveling Wave Solutions and Dynamic Behavior Analysis of the Nonlinear Rangwala-Rao Model

SSRN

引用

SSRN 2023年

作者： Peng, Chen Li, Zhao College of Computer Science Chengdu University Chengdu610106 China Key Laboratory of Pattern Recognition and Intelligent Information Processing Institutions of Higher Education of Sichuan Province Chengdu University 610106 China

This work investigates the nonlinear Rangwala-Rao equation, which stems from the mixed derivative nonlinear Schrödinger equation. For retrieving new exact solutions to the equation, the complete discriminant system for polynomial method is employed. In results, some novel traveling wave solutions, including solitary wave solutions, triangular function solutions, periodic solutions and Jacobi elliptic function solutions are obtained and demonstrated through numerical simulations. The bifurcations of phase portraits of the traveling wave solutions are depicted to reveal the dynamic behavior of the Rangwala-Rao equation using the qualitative theory of dynamical systems. Furthermore, considering external perturbation, the chaotic motions of the perturbed Rangwala-Rao equation are investigated. © 2023, The Authors. All rights reserved.

关键词： Dynamical systems

来源：评论

学校读者我要写书评

暂无评论

Dynamic Effects on Traveling Wave Solutions of the Space-Fractional Long-Short-Wave Interaction System with Multiplicative White Noise

SSRN

引用

SSRN 2023年

In this paper, the stochastic space-fractional long-short-wave interaction system (SF-LSWIS) with multiplicative white noise is considered. The stochastic exact solutions including triangular function solutions, hyperbolic function solutions as well as Jacobian elliptic function solutions are obtained via complete discriminant system for polynomial method. For this purpose, the traveling wave transformation is applied to carry out the ordinary differential equation of this model, which is also converted to a dynamical system. Thereafter, Qualitative behavior, bifurcation of the phase portraits and chaotic behaviors of the system are studied. Furthermore, the 3D surface, graphs of contour plots and level curves of some exact solutions are depicted by choosing proper parameter values. The influence and effect of fractional derivative and the noise strength on the solutions are also studied. © 2023, The Authors. All rights reserved.

关键词： Hyperbolic functions

来源：评论

学校读者我要写书评

暂无评论

WaveDM: Wavelet-Based Diffusion Models for Image Restoration

arXiv

引用

arXiv 2023年

作者： Huang, Yi Huang, Jiancheng Liu, Jianzhuang Yan, Mingfu Dong, Yu Lyu, Jiaxi Chen, Chaoqi Chen, Shifeng Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen518055 China University of Chinese Academy of Sciences Beijing100039 China The University of Hong Kong Hong Kong

Latest diffusion-based methods for many image restoration tasks outperform traditional models, but they encounter the long-time inference problem. To tackle it, this paper proposes a Wavelet-Based Diffusion Model (WaveDM). WaveDM learns the distribution of clean images in the wavelet domain conditioned on the wavelet spectrum of degraded images after wavelet transform, which is more time-saving in each step of sampling than modeling in the spatial domain. To ensure restoration performance, a unique training strategy is proposed where the low-frequency and high-frequency spectrums are learned using distinct modules. In addition, an Efficient Conditional Sampling (ECS) strategy is developed from experiments, which reduces the number of total sampling steps to around 5. Evaluations on twelve benchmark datasets including image raindrop removal, rain steaks removal, dehazing, defocus deblurring, demoiréing, and denoising demonstrate that WaveDM achieves state-of-the-art performance with the efficiency that is comparable to traditional one-pass methods and over 100× faster than existing image restoration methods using vanilla diffusion models. The code is available at https://***/stayalive16/WaveDM. Copyright © 2023, The Authors. All rights reserved.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

Masked Image Training for Generalizable Deep Image Denoising

Masked Image Training for Generalizable Deep Image Denoising

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Haoyu Chen Jinjin Gu Yihao Liu Salma Abdel Magid Chao Dong Qiong Wang Hanspeter Pfister Lei Zhu The Hong Kong University of Science and Technology (Guangzhou) Shanghai AI Lab The University of Sydney ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences Harvard University Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology Shenzhen Institute of Advanced Technology Chinese Academy of Sciences The Hong Kong University of Science and Technology

When capturing and storing images, devices inevitably introduce noise. Reducing this noise is a critical task called image denoising. Deep learning has become the de facto method for image denoising, especially with the emergence of Transformer-based models that have achieved notable state-of-the-art results on various image tasks. However, deep learning-based methods often suffer from a lack of generalization ability. For example, deep models trained on Gaussian noise may perform poorly when tested on other noise distributions. To address this issue, we present a novel approach to enhance the generalization performance of denoising networks, known as masked training. Our method involves masking random pixels of the input image and reconstructing the missing information during training. We also mask out the features in the self-attention layers to avoid the impact of training-testing inconsistency. Our approach exhibits better generalization ability than other deep learning models and is directly applicable to real-world scenarios. Additionally, our interpretability analysis demonstrates the superiority of our method.

关键词：

来源：评论

学校读者我要写书评

暂无评论

From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models

From Mimicking to Integrating: Knowledge Integration for Pre...

引用

2022 Findings of the Association for Computational Linguistics: EMNLP 2022

作者： Li, Lei Lin, Yankai Ren, Xuancheng Zhao, Guangxiang Li, Peng Zhou, Jie Sun, Xu MOE Key Lab of Computational Linguistics School of Computer Science Peking University China Gaoling School of Artificial Intelligence Renmin University of China Beijing China Beijing Key Laboratory of Big Data Management and Analysis Methods Beijing China Tsinghua University China Pattern Recognition Center WeChat AI Tencent Inc. China

Investigating better ways to reuse the released pre-trained language models (PLMs) can significantly reduce the computational cost and the potential environmental side-effects. This paper explores a novel PLM reuse paradigm, Knowledge Integration (KI). Without human annotations available, KI aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model. To achieve this, we first derive the correlation between virtual golden supervision and teacher predictions. We then design a Model Uncertainty-aware Knowledge Integration (MUKI) framework to recover the golden supervision for the student. Specifically, MUKI adopts Monte-Carlo Dropout to estimate model uncertainty for the supervision integration. An instance-wise re-weighting mechanism based on the margin of uncertainty scores is further incorporated, to deal with the potential conflicting supervision from teachers. Experimental results demonstrate that MUKI achieves substantial improvements over baselines on benchmark datasets. Further analysis shows that MUKI can generalize well for merging teacher models with heterogeneous architectures, and even teachers major in cross-lingual datasets. © 2022 Association for Computational Linguistics.

关键词： Integration

来源：评论

学校读者我要写书评

暂无评论

Hierarchical Local Global Transformer for Point Clouds Analysis

SSRN

引用

SSRN 2023年

作者： Li, Dilong Zheng, Shenghong Chen, Ziyi Li, Xiang Wang, Lanying Du, Jixiang College of Computer Science and Technology Fujian Key Laboratory of Big Data Intelligence and Security Xiamen Key Laboratory of Computer Vision and Pattern Recognition Xiamen Key Laboratory of Data Security and Blockchain Technology Huaqiao University FJ Xiamen361021 China School of Economics and Finance Huaqiao University FJ Quanzhou362021 China Department of Geography and Environmental Management University of Waterloo WaterlooONN2L 3G1 Canada

Transformer networks have demonstrated remarkable performance in point cloud analysis. However, achieving a balance between local regional context and global long-range context learning remains a significant challenge. In this paper, we propose a Hierarchical Local Global Transformer Network (LGTNet), designed to capture local and global contexts in a hierarchical manner. Specifically, we employ serial local and global Transformers to learn the inner-group and cross-group self-attention, respectively. Besides, we propose a geometric moment-based position encoding for local Transformer, enabling the embedding of comprehensive local geometric relationship. Additionally, we also introduce a global feature pooling module that extracts the global features from each encoder layers. Extensive experimental results demonstrate that LGTNet achieves state-of-the-art performance on ShapeNetPart and ScanObjectNN datasets. This approach effectively enhances the understanding of point cloud scenes, thereby facilitating the use of point cloud data in remote sensing applications. © 2023, The Authors. All rights reserved.

关键词： Geometry

来源：评论

学校读者我要写书评

暂无评论

Generating Cartoon Images from Face Photos with Cycle-Consistent Adversarial Networks

引用

computers, Materials & Continua 2021年第11期69卷 2733-2747页

作者： Tao Zhang Zhanjie Zhang Wenjing Jia Xiangjian He Jie Yang School of Artificial Intelligence and Computer Science Jiangnan UniversityWuxi214000China Key Laboratory of Artificial Intelligence Jiangsu214000China The Global Big Data Technologies Centre University of Technology SydneyUltimoNSW2007Australia The Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong UniversityShanghai201100China

The generative adversarial network(GAN)is first proposed in 2014,and this kind of network model is machine learning systems that can learn to measure a given distribution of data,one of the most important applications is style *** transfer is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output ***-GAN is a classic GAN model,which has a wide range of scenarios in style *** its unsupervised learning characteristics,the mapping is easy to be learned between an input image and an output ***,it is difficult for CYCLE-GAN to converge and generate high-quality *** order to solve this problem,spectral normalization is introduced into each convolutional kernel of the *** convolutional kernel reaches Lipschitz stability constraint with adding spectral normalization and the value of the convolutional kernel is limited to[0,1],which promotes the training process of the proposed ***,we use pretrained model(VGG16)to control the loss of image content in the position of l1 *** avoid overfitting,l1 regularization term and l2 regularization term are both used in the object loss *** terms of Frechet Inception Distance(FID)score evaluation,our proposed model achieves outstanding performance and preserves more discriminative *** results show that the proposed model converges faster and achieves better FID scores than the state of the art.

关键词： Generative adversarial network spectral normalization Lipschitz stability constraint VGG16 l1 regularization term l2 regularization term Frechet inception distance

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：