检索结果-内蒙古大学图书馆

Tile selection method based on error minimization for photomosaic image creation

Frontiers of computer Science 2021年第3期15卷 165-172页

作者： Hongbo ZHANG Xin GAO Jixiang DU Qing LEI Lijie YANG Department of Computer Science and Technology Huaqiao UniversityXiamen 361021China Fujian Key Laboratory of Big Data Intelligence and Security Huaqiao UniversityXiamen 361021China Xiamen Key Laboratory of Computer Vision and Pattern Recognition Huaqiao UniversityXiamen 361021China School of Computer Science and Technology Harbin Institute of TechnologyShenzhen 518055China

Photomosaic images are composite images composed of many small images called *** its overall visual effect,a photomosaic image is similar to the target image,and photomosaics are also called“montage art”.Noisy blocks and the loss of local information are the major obstacles in most methods or programs that create photomosaic *** solve these problems and generate a photomosaic image in this study,we propose a tile selection method based on error minimization.A photomosaic image can be generated by partitioning the target image in a rectangular pattern,selecting appropriate tile images,and then adding them with a weight *** on the principles of montage art,the quality of the generated photomosaic image can be evaluated by both global and local *** the proposed framework,via an error function analysis,the results show that selecting a tile image using a global minimum distance minimizes both the global error and the local error ***,the weight coefficient of the image superposition can be used to adjust the ratio of the global and local ***,to verify the proposed method,we built a new photomosaic creation dataset during this *** experimental results show that the proposed method achieves a low mean absolute error and that the generated photomosaic images have a more artistic effect than do the existing approaches.

关键词： photomosaic image tile image target image error minimization mean absolute error

来源：评论

学校读者我要写书评

暂无评论

DFL: cross-view cross-layer discriminative feature learning for fine-grained 3D shape classification

引用

Neural Computing and Applications 2025年 1-22页

作者： Jiang, Jinzhe Bai, Jing Ma, Xiangyu The School of Computer Science and Engineering North Minzu University Yinchuan China The Key Laboratory of Images Processing and Pattern Recognition Laboratory North Minzu University Yinchuan China

Fine-grained 3D shape classification poses challenges in effectively capturing and integrating discriminative features residing in subtle local regions. Previous methods typically extract features independently from individual views of 3D shapes, with a focus on various strategies for fusing these extracted view features. However, this approach neglects interview correlations and potential redundancies among different views. In this study, we introduce $$\hbox {C}^2$$ DFL, which consists of two primary modules: cross-view discriminative feature extraction (CV-DFE) and cross-layer discriminative feature fusion (CL-DFF). CV-DFE integrates discriminative features by merging inputs from multiple views, mitigating limitations associated with isolated feature extraction. CL-DFF dynamically selects key tokens using a transformer model to interactively fuse discriminative features from various levels. Extensive experiments conducted on three categories of the FG3D dataset demonstrate the exceptional efficacy of $$\hbox {C}^2$$ DFL in capturing and integrating discriminative features of 3D shapes. The proposed method achieves state-of-the-art accuracy in fine-grained 3D shape classification (FGSC).

关键词：

来源：评论

学校读者我要写书评

暂无评论

Feature Decoupled of Deep Mutual Information Maximization 2

Feature Decoupled of Deep Mutual Information Maximization

引用

2nd International Conference on Automation, Robotics and computer Engineering, ICARCE 2023

作者： He, Xing Peng, Changgen Wang, Lin Tan, Weijie Wang, Zifan State Key Laboratory of Public Big Data College of Computer Science and Technology Guizhou University Guizhou Key Laboratory of Pattern Recognition and Intelligent System Guizhou Minzu University Guiyang China Guizhou Big Data Academy Guizhou University Guiyang China Guizhou Minzu University Guizhou Key Laboratory of Pattern Recognition and Intelligent System Guiyang China Institute of Guizhou Aerospace Measuring and Testing Technology Guiyang China

ISBN: (纸本)9798350308341

In deep learning, supervised learning techniques usually require a large amount of expensive labeled data to train the network, and the feature representations extracted by the model usually mix multiple attributes, resulting in feature representations that are difficult to decouple and are non-interpretable, which restricts the application and development of deep learning techniques, and for this reason, it is particularly important to study decoupled feature representation methods for unsupervised learning. Although the Learning deep representations by mutual information estimation and maximization (DIM) method achieves excellent results in unsupervised learning, the feature representations learned by the DIM method still suffer from the problem of difficult decoupling. decoupling problem. To address this problem, we minimize the mutual information between the intermediate layer feature representations learned by the hidden layer of the encoder during the encoder training process, so that the features learned by each filter are as uncorrelated as possible, thus realizing feature decoupling, and our method is called FP-DIM. Finally, the effectiveness of the FP-DIM method is verified on the CIFAR-10, STL-10, and Fashion-MNIST datasets. The experiments show that our proposed FP-DIM method is more significant for the learned decouplable middle layer feature representation. Finally, we also propose a reflection of future research for the FP-DIM method, aiming to provide a research idea and direction for solving unsupervised interpretable machine learning and to lay a solid theoretical and application foundation for machine learning fields such as image classification and migration learning. © 2023 IEEE.

关键词： Unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

MoAFormer: Aggregating Adjacent Window Features into Local vision Transformer Using Overlapped Attention Mechanism for Volumetric Medical Segmentation 11

MoAFormer: Aggregating Adjacent Window Features into Local V...

引用

11th International Conference on Computing and pattern recognition, ICCPR 2022

作者： Luo, Yixi Yin, Huayi Du, Xia Department of Computer and Information Engineering Fujian Provincial Key Laboratory of Pattern Recognition and Image Understanding Xiamen University of Technology China

ISBN: (纸本)9781450397056

The window-based attention is used to alleviate the problem of abrupt increase in computation as the input image resolution grows and shows excellent performance. However, the problem that aggregating global features from different windows is waiting to be resolved. Swin-Transformer is proposed to construct hierarchical encoding by a shifted-window mechanism to interactively learn the information between different windows. In this work, we investigate the outcome of applying an overlapped attention block (MoA) after the local attention layer and apply plenty to medical image segmentation tasks. The overlapped attention module employs slightly larger and overlapped patches in the key and value to enable neighbouring pixel information transmission, which leads to significant performance gain. The experimental results on the ACDC and Synapse datasets demonstrate that the used method performs better than previous Transformer models. © 2022 ACM.

关键词： Image resolution

来源：评论

学校读者我要写书评

暂无评论

Efficient Image Super-Resolution Using Vast-Receptive-Field Attention 17th

Efficient Image Super-Resolution Using Vast-Receptive-Field ...

引用

17th European Conference on computer vision, ECCV 2022

作者： Zhou, Lin Cai, Haoming Gu, Jinjin Li, Zheyuan Liu, Yingqi Chen, Xiangyu Qiao, Yu Dong, Chao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen China Shanghai AI Laboratory Shanghai China The University of Sydney Sydney Australia University of Macau Zhuhai China

ISBN: (纸本)9783031250620

The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel attention module and gradually modify it to achieve better super-resolution performance with reduced parameters. The specific approaches include: (1) increasing the receptive field of the attention branch, (2) replacing large dense convolution kernels with depthwise separable convolutions, and (3) introducing pixel normalization. These approaches paint a clear evolutionary roadmap for the design of attention mechanisms. Based on these observations, we propose VapSR, the Vast-receptive-field Pixel attention network. Experiments demonstrate the superior performance of VapSR. VapSR outperforms the present lightweight networks with even fewer parameters. And the light version of VapSR can use only 21.68% and 28.18% parameters of IMDB and RFDN to achieve similar performances to those networks. The code and models are available at https://***/zhoumumu/VapSR. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild

arXiv

引用

arXiv 2024年

作者： Liu, Zhi-Song Courant, Robin Kalogeiton, Vicky Computer Vision and Pattern Recognition Laboratory Lappeenranta-Lahti University of Technology Finland LIX Ecole Polytechnique IP Paris France

Automatically understanding funny moments (i.e., the moments that make people laugh) when watching comedy is challenging, as they relate to various features, such as body language, dialogues and culture. In this paper, we propose FunnyNet-W, a model that relies on cross- and self-attention for visual, audio and text data to predict funny moments in videos. Unlike most methods that rely on ground truth data in the form of subtitles, in this work we exploit modalities that come naturally with videos: (a) video frames as they contain visual information indispensable for scene understanding, (b) audio as it contains higher-level cues associated with funny moments, such as intonation, pitch and pauses and (c) text automatically extracted with a speech-to-text model as it can provide rich information when processed by a Large Language Model. To acquire labels for training, we propose an unsupervised approach that spots and labels funny audio moments. We provide experiments on five datasets: the sitcoms TBBT, MHD, MUStARD, Friends, and the TED talk UR-Funny. Extensive experiments and analysis show that FunnyNet-W successfully exploits visual, auditory and textual cues to identify funny moments, while our findings reveal FunnyNet-W’s ability to predict funny moments in the wild. FunnyNet-W sets the new state of the art for funny moment detection with multimodal cues on all datasets with and without using ground truth information. © 2024, CC BY.

关键词： C (programming language)

来源：评论

学校读者我要写书评

暂无评论

Foreground Prediction for Image Composition with Local and Global Feature Fusion 16

Foreground Prediction for Image Composition with Local and G...

引用

2024 16th International Conference on Graphics and Image Processing, ICGIP 2024

作者： Sun, Liliang He, Yuanlie Li, Wensheng Feng, Fujian Liang, Yihui School of Computer Guangdong University of Technology Guangzhou510000 China School of Computer Science Zhongshan Institute University of Electronic Science and Technology of China Zhongshan528400 China Guizhou Key Laboratory of Pattern Recognition and Intelligent System Guizhou Minzu University Guiyang550025 China

ISBN: (数字)9781510688780

ISBN: (纸本)9781510688773

This paper focuses on the image composition of transparent objects, where existing image matting methods suffer from composition errors due to the lack of accurate foreground during the composition process. We propose a foreground prediction model named ALGM, which leverages the local feature extraction capabilities of Convolutional Neural Networks (CNNs) and incorporates an attention mechanism for global information modeling. The proposed alpha-assisted foreground prediction module extracts foreground information from the original image and conveys it. The extracted foreground color information is combined with the deep structural features of the encoder and used for foreground color prediction. ALGM reduces image composition errors in the quantitative data from the Composition-1k dataset and improves the visual quality of composed images on the AIM-500 and Transparent-460 datasets. © 2025 SPIE.

关键词： Prediction models

来源：评论

学校读者我要写书评

暂无评论

FG3DFormer: Fine-Grained 3D Shape Classification Based on vision Transformer

FG3DFormer: Fine-Grained 3D Shape Classification Based on Vi...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Xiangyu Ma Jing Bai Jinzhe Jiang Bin Peng The School of Computer Science and Engineering North Minzu University The Key Laboratory of Images Processing and Pattern Recognition Laboratory Yinchuan China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Fine-grained 3D shape classification (FGSC) remains challenging due to the difficulty of adaptively capturing global structure differences and subtle inter-class distinctions. This paper directly extends vision Transformer (ViT) to FGSC, proposing a pure Transformer network FG3DFormer that fully leverages ViT’s global correlation and local attention abilities. FG3Dformer comprises the Hierarchical Feature Extraction (HFE) and the Hierarchical Feature Refinement (HFR), interconnected through the Adaptive View Region Selection (AVRS). Firstly, the HFE comprehensively evaluates the significance of intra-view patches and views driven by inter-view and intraview attention. Then, the AVRS adaptively selects crucial patch Tokens from different views to serve as sources of subtle local features. Finally, the HFR refines the 3D shape descriptor, capturing more discriminative global and subtle local features by leveraging both the view and selected crucial patch Tokens. Extensive experiments on FG3D and ModelNet40 demonstrate the superiority of FG3Dformer in FGSC and meta-category 3D shape classification tasks.

关键词： computer vision Visualization Solid modeling Three-dimensional displays Correlation Shape Signal processing Transformers Feature extraction Speech processing

来源：评论

学校读者我要写书评

暂无评论

Multi-scale Promoted Self-adjusting Correlation Learning for Facial Action Unit Detection

引用

IEEE Transactions on Affective Computing 2024年第2期16卷 697-711页

作者： Liu, Xin Yuan, Kaishen Niu, Xuesong Shi, Jingang Yu, Zitong Yue, Huanjing Yang, Jingyu Tianjin University School of Electrical and Information Engineering Tianjin300072 China Lappeenranta-Lahti University of Technology LUT Computer Vision and Pattern Recognition Laboratory School of Engineering Science Lappeenranta53850 Finland Beijing Institute for General Artificial Intelligence Beijing100080 China Xi'an Jiaotong University School of Software Engineering Xi'an710049 China Great Bay University Dongguan523000 China

Facial Action Unit (AU) detection is a crucial task in affective computing and social robotics as it helps to identify emotions expressed through facial expressions. Anatomically, there are innumerable correlations between AUs, which contain rich information and are vital for AU detection. Previous methods used fixed AU correlations based on expert experience or statistical rules on specific benchmarks, but it is challenging to comprehensively reflect complex correlations between AUs via hand-crafted settings. There are alternative methods that employ a fully connected graph to learn these dependencies exhaustively. However, these approaches can result in a computational explosion and high dependency with a large dataset. To address these challenges, this paper proposes a novel self-adjusting AU-correlation learning (SACL) method with less computation for AU detection. This method adaptively learns and updates AU correlation graphs by efficiently leveraging the characteristics of different levels of AU motion and emotion representation information extracted in different stages of the network. Moreover, this paper explores the role of multi-scale learning in correlation information extraction, and design a simple yet effective multi-scale feature learning (MSFL) method to promote better performance in AU detection. By integrating AU correlation information with multi-scale features, the proposed method obtains a more robust feature representation for the final AU detection. Extensive experiments show that the proposed method outperforms the state-of-the-art methods on widely used AU detection benchmark datasets, with only 28.7% and 12.0% of the parameters and FLOPs of the best method, respectively. The code for this method is available at https://***/linuxsino/Self-adjusting-AU. © 2010-2012 IEEE.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

UDC-UNet: Under-Display Camera Image Restoration via U-shape Dynamic Network 17th

UDC-UNet: Under-Display Camera Image Restoration via U-shap...

引用

17th European Conference on computer vision, ECCV 2022

作者： Liu, Xina Hu, Jinfan Chen, Xiangyu Dong, Chao Shenzhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Beijing China University of Chinese Academy of Sciences Beijing China University of Macau Zhuhai China Shanghai AI Laboratory Shanghai China

ISBN: (纸本)9783031250712

Under-Display Camera (UDC) has been widely exploited to help smartphones realize full-screen displays. However, as the screen could inevitably affect the light propagation process, the images captured by the UDC system usually contain flare, haze, blur, and noise. Particularly, flare and blur in UDC images could severely deteriorate the user experience in high dynamic range (HDR) scenes. In this paper, we propose a new deep model, namely UDC-UNet, to address the UDC image restoration problem with an estimated PSF in HDR scenes. Our network consists of three parts, including a U-shape base network to utilize multi-scale information, a condition branch to perform spatially variant modulation, and a kernel branch to leverage the prior knowledge of the PSF. According to the characteristics of HDR data, we additionally design a tone mapping loss to stabilize network optimization and achieve better visual quality. Experimental results show that the proposed UDC-UNet outperforms the state-of-the-art methods in quantitative and qualitative comparisons. Our approach won second place in the UDC image restoration track of the MIPI challenge. Codes and models are available at https://***/J-FHu/UDCUNet. © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：