检索结果-内蒙古大学图书馆

Fast Algorithm for Parallel Solving Inversion of Large Scale Small Matrices Based on Gpu

学校读者我要写书评

暂无评论

SSRN

SSRN 2022年

作者： Jin, Xuebin Chen, Yewang Fan, Wentao Zhang, Yong Du, Jixiang The College of Computer Science and Technology Huaqiao University Xiamen China Fujian Key Laboratory of Big Data Intelligence and Security Huaqiao University Xiamen China Xiamen Key Laboratory of Computer Vision and Pattern Recognition Huaqiao University China Provincial Key Laboratory for Computer Information Processing Technology Soochow University Soochow China College of Mechanical Engineering and Automation Huaqiao University Xiamen China

Inverting a matrix is time-consuming, and many works focus on accelerating inverting a single large matrix by GPU. However, the problem of inverting large-scale small matrices has little attention. In this paper, we propose a Revised In-Place Inversion algorithm for inverting large-scale small matrices, which runs on the CUDA platform and utilizes multicore GPUs. The proposed algorithm is well-suited for large-scale-small matrices (dimension $n \leq 32$), which can invert millions of small matrices within tens of milliseconds. In addition, we find there is an upper bound of input data size for each GPU device, the performance will degrade if the input data size is too large. Hence, we improve the proposed method by dividing matrices into batches according to this finding. Experimental results show that this strategy can effectively alleviate the performance degradation for inverting large scale small matrices, and maintain the high-performance execution of the algorithm. © 2022, The Authors. All rights reserved.

关键词： Matrix algebra

Blueprint Separable Residual Network for Efficient Image Super-Resolution

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Li, Zheyuan Liu, Yingqi Chen, Xiangyu Cai, Haoming Gu, Jinjin Qiao, Yu Dong, Chao ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China University of Macau China Shanghai AI Laboratory Shanghai China The University of Sydney Australia

Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective solutions have been proposed. Convolutional neural network (CNN) with the attention mechanism has attracted increasing attention due to its efficiency and effectiveness. However, there is still redundancy in the convolution operation. In this paper, we propose Blueprint Separable Residual Network (BSRN) containing two efficient designs. One is the usage of blueprint separable convolution (BSConv), which takes place of the redundant convolution operation. The other is to enhance the model ability by introducing more effective attention modules. The experimental results show that BSRN achieves state-of-the-art performance among existing efficient SR methods. Moreover, a smaller variant of our model BSRN-S won the first place in model complexity track of NTIRE 2022 Efficient SR Challenge. The code is available at https://***/xiaom233/BSRN. Copyright © 2022, The Authors. All rights reserved.

关键词： Blueprints

A Simple yet Effective Network based on vision Transformer for Camouflaged Object and Salient Object Detection

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Hao, Chao Yu, Zitong Liu, Xin Xu, Jun Yue, Huanjing Yang, Jingyu The School of Electrical and Information Engineering Tianjin University Tianjin300072 China The School of Computing and Information Technology Great Bay University Dongguan523000 China The Computer Vision and Pattern Recognition Laboratory Lappeenranta-Lahti University of Technology LUT Lappeenranta53850 Finland The School of Statistics and Data Science Nankai University Tianjin300072 China

Camouflaged object detection (COD) and salient object detection (SOD) are two distinct yet closely-related computer vision tasks widely studied during the past decades. Though sharing the same purpose of segmenting an image into binary foreground and background regions, their distinction lies in the fact that COD focuses on concealed objects hidden in the image, while SOD concentrates on the most prominent objects in the image. Previous works achieved good performance by stacking various hand-designed modules and multi-scale features. However, these carefully-designed complex networks often performed well on one task but not on another. In this work, we propose a simple yet effective network (SENet) based on vision Transformer (ViT), by employing a simple design of an asymmetric ViT-based encoder-decoder structure, we yield competitive results on both tasks, exhibiting greater versatility than meticulously crafted ones. Furthermore, to enhance the Transformer’s ability to model local information, which is important for pixel-level binary segmentation tasks, we propose a local information capture module (LICM). We also propose a dynamic weighted loss (DW loss) based on Binary Cross-Entropy (BCE) and Intersection over Union (IoU) loss, which guides the network to pay more attention to those smaller and more difficult-to-find target objects according to their size. Moreover, we explore the issue of joint training of SOD and COD, and propose a preliminary solution to the conflict in joint training, further improving the performance of SOD. Extensive experiments on multiple benchmark datasets demonstrate the effectiveness of our method. The code is available at https://***/linuxsino/SENet. Copyright © 2024, The Authors. All rights reserved.

关键词： Object detection

Multi-Task Learning with Deep Dual-Path Network for Facial Attribute recognition 2020

学校读者我要写书评

暂无评论

Multi-Task Learning with Deep Dual-Path Network for Facial A...

Proceedings of the 2020 9th International Conference on Computing and pattern recognition

作者： Xinyu Lai Si Chen Da-Han Wang Shunzhi Zhu School of Computer and Information Engineering Xiamen University of Technology Fujian Key Laboratory of Pattern Recognition and Image Understanding School of Computer and Information Engineering Xiamen University of Technology Fujian Key Laboratory of Pattern Recognition and Image Understanding

ISBN: (纸本)9781450387835

Facial attribute recognition is a popular and challenging research topic in computer vision. In the traditional deep learning based attribute recognition methods, the mid-level network features and the differences between attribute groups are not fully explored. To solve the above problem, a deep dual-path network is proposed for facial attribute recognition. In the multi-task learning framework, two sub-networks are employed to respectively extract the features of two attribute groups, i.e., local attributes and global ones, and designed with both different scale images and different depth networks. Furthermore, an adaptive Focal loss penalty scheme is developed to automatically assign weights to handle the class imbalance problem for facial attribute recognition. Experimental results on the challenging CelebA dataset show that the proposed method achieves the better performance than state-of-the-art methods.

关键词： multi-task learning dual-path network Facial attribute recognition attribute groups

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Chen, Boyu Yue, Zhengrong Chen, Siran Wang, Zikang Liu, Yang Li, Peng Wang, Yali Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China School of Artificial Intelligence University of Chinese Academy of Sciences China Tsinghua University Beijing China Dept. of Comp. Sci. & Tech. Institute for AI Tsinghua University Beijing China Shanghai Artificial Intelligence Laboratory China Shanghai Jiao Tong University China

Existing Multimodal Large Language Models (MLLMs) encounter significant challenges in modeling the temporal context within long videos. Currently, mainstream Agent-based methods use external tools (e.g., search engine, memory banks, OCR, retrieval models) to assist a single MLLM in answering long video questions. Despite such tool-based support, a solitary MLLM still offers only a partial understanding of long videos, resulting in limited performance. In order to better address long video tasks, we introduce LVAgent, the first framework enabling multi-round dynamic collaboration of MLLM agents in long video understanding. Our methodology consists of four key steps: 1) Selection: We pre-select appropriate agents from the model library to form optimal agent teams based on different tasks. 2) Perception: We design an effective retrieval scheme for long videos, improving the coverage of critical temporal segments while maintaining computational efficiency. 3) Action: Agents answer long video-related questions and exchange reasons. 4) Reflection: We evaluate each agent’s performance in each round of discussion and optimize the agent team for dynamic collaboration. The agents iteratively refine their answers by multi-round dynamical collaboration of MLLM agents. LVAgent is the first agent system method that outperforms all closed-source models (including GPT-4o) and open-source models (including InternVL-2.5 and Qwen2-VL) in the long video understanding tasks. Our LVAgent achieves an accuracy of 80% on four mainstream long video understanding tasks. Notably, on the LongVideoBench dataset, LVAgent improves accuracy by up to 13.3% compared with SOTA. © 2025, CC BY-NC-SA.

关键词： Search engines

VFHQ: A High-Quality Dataset and Benchmark for Video Face Super-Resolution

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Xie, Liangbin Wang, Xintao Zhang, Honglun Dong, Chao Shan, Ying Shenzhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China ARC Lab Tencent PCG China

Most of the existing video face super-resolution (VFSR) methods are trained and evaluated on VoxCeleb1, which is designed specifically for speaker identification and the frames in this dataset are of low quality. As a consequence, the VFSR models trained on this dataset can not output visual-pleasing results. In this paper, we develop an automatic and scalable pipeline to collect a high-quality video face dataset (VFHQ), which contains over 16, 000 high-fidelity clips of diverse interview scenarios. To verify the necessity of VFHQ, we further conduct experiments and demonstrate that VFSR models trained on our VFHQ dataset can generate results with sharper edges and finer textures than those trained on VoxCeleb1. In addition, we show that the temporal information plays a pivotal role in eliminating video consistency issues as well as further improving visual performance. Based on VFHQ, by analyzing the benchmarking study of several state-of-the-art algorithms under bicubic and blind settings. Copyright © 2022, The Authors. All rights reserved.

关键词： Benchmarking

Stain-Adaptive Self-Supervised Learning for Histopathology Image Analysis

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Ye, Hai-Li Wang, Da-Han Department of Computer and Information Engineering Xiamen University of Technology Xiamen361000 China Fujian Provincial Key Laboratory of Pattern Recognition and Image Understanding Xiamen361000 China

It is commonly recognized that color variations caused by differences in stains is a critical issue for histopathology image analysis. Existing methods adopt color matching, stain separation, stain transfer or the combination of them to alleviate the stain variation problem. In this paper, we propose a novel Stain-Adaptive Self-Supervised Learning(SASSL) method for histopathology image analysis. Our SASSL integrates a domain-adversarial training module into the SSL framework to learn distinctive features that are robust to both various transformations and stain variations. The proposed SASSL is regarded as a general method for domain-invariant feature extraction which can be flexibly combined with arbitrary downstream histopathology image analysis modules (e.g. nuclei/tissue segmentation) by fine-tuning the features for specific downstream tasks. We conducted experiments on publicly available pathological image analysis datasets including the PANDA, BreastPathQ, and CAMELYON16 datasets, achieving the state-of-the-art performance. Experimental results demonstrate that the proposed method can robustly improve the feature extraction ability of the model, and achieve stable performance improvement in downstream tasks. Copyright © 2022, The Authors. All rights reserved.

关键词： Image analysis

Towards Unifying Multi-Lingual and Cross-Lingual Summarization

学校读者我要写书评

暂无评论

arXiv 2023年

作者： Wang, Jiaan Meng, Fandong Zheng, Duo Liang, Yunlong Li, Zhixu Qu, Jianfeng Zhou, Jie School of Computer Science and Technology Soochow University Suzhou China Pattern Recognition Center WeChat AI Tencent Inc China Beijing University of Posts and Telecommunications China Shanghai Key Laboratory of Data Science School of Computer Science Fudan University Shanghai China

To adapt text summarization to the multilingual world, previous work proposes multilingual summarization (MLS) and cross-lingual summarization (CLS). However, these two tasks have been studied separately due to the different definitions, which limits the compatible and systematic research on both of them. In this paper, we aim to unify MLS and CLS into a more general setting, i.e., many-to-many summarization (M2MS), where a single model could process documents in any language and generate their summaries also in any language. As the first step towards M2MS, we conduct preliminary studies to show that M2MS can better transfer task knowledge across different languages than MLS and CLS. Furthermore, we propose PISCES, a pre-trained M2MS model that learns language modeling, cross-lingual ability and summarization ability via three-stage pre-training. Experimental results indicate that our PISCES significantly outperforms the state-of-the-art baselines, especially in the zero-shot directions, where there is no training data from the source-language documents to the target-language summaries.1 Copyright © 2023, The Authors. All rights reserved.

关键词： Modeling languages

PON: Proposal Optimization Network for Temporal Action Proposal Generation 1

学校读者我要写书评

暂无评论

16th International Conference on Intelligent Computing, ICIC 2020

作者： Peng, Xiaoxiao Du, Jixiang Zhang, Hongbo Department of Computer Science and Technology Huaqiao University Quanzhou China Fujian Key Laboratory of Big Data Intelligence and Security Huaqiao University Quanzhou China Xiamen Key Laboratory of Computer Vision and Pattern Recognition Huaqiao University Quanzhou China

ISBN: (数字)9783030607968

ISBN: (纸本)9783030607951

Temporal action localization is a challenging task in video understanding. Although great progress has been made in temporal action localization, the most advanced methods still have the problem of sharp performance degradation when an action proposal generated. Most methods use sliding windows method or simply group frames according to frame-level scores. These methods are not enough to provide accurate action boundary and maintain reasonable temporal structure. In order to solve these problems, we propose a novel proposal optimization network to generate start score, end score, action score and regression score, and then remove the redundancy by NMS algorithm. In the proposed method, we introduce a metric loss function to maintain the temporal structure of action proposal in the training process. To verify the effectiveness of the proposed method, we have made comparative experiments on ActivityNet-1.3 dataset respectively, and the proposed method has surpassed some of the state-of-the-art methods on the dataset. © 2020, Springer Nature Switzerland AG.

关键词： Passive optical networks