The attention mechanism plays a pivotal role in designing advanced super-resolution (SR) networks. In this work, we design an efficient SR network by improving the attention mechanism. We start from a simple pixel att...
详细信息
Inverting a matrix is time-consuming, and many works focus on accelerating inverting a single large matrix by GPU. However, the problem of inverting large-scale small matrices has little attention. In this paper, we p...
详细信息
Recent advances in single image super-resolution (SISR) have achieved extraordinary performance, but the computational cost is too heavy to apply in edge devices. To alleviate this problem, many novel and effective so...
详细信息
Camouflaged object detection (COD) and salient object detection (SOD) are two distinct yet closely-related computervision tasks widely studied during the past decades. Though sharing the same purpose of segmenting an...
详细信息
Facial attribute recognition is a popular and challenging research topic in computervision. In the traditional deep learning based attribute recognition methods, the mid-level network features and the differences bet...
详细信息
ISBN:
(纸本)9781450387835
Facial attribute recognition is a popular and challenging research topic in computervision. In the traditional deep learning based attribute recognition methods, the mid-level network features and the differences between attribute groups are not fully explored. To solve the above problem, a deep dual-path network is proposed for facial attribute recognition. In the multi-task learning framework, two sub-networks are employed to respectively extract the features of two attribute groups, i.e., local attributes and global ones, and designed with both different scale images and different depth networks. Furthermore, an adaptive Focal loss penalty scheme is developed to automatically assign weights to handle the class imbalance problem for facial attribute recognition. Experimental results on the challenging CelebA dataset show that the proposed method achieves the better performance than state-of-the-art methods.
Existing Multimodal Large Language Models (MLLMs) encounter significant challenges in modeling the temporal context within long videos. Currently, mainstream Agent-based methods use external tools (e.g., search engine...
详细信息
Most of the existing video face super-resolution (VFSR) methods are trained and evaluated on VoxCeleb1, which is designed specifically for speaker identification and the frames in this dataset are of low quality. As a...
详细信息
It is commonly recognized that color variations caused by differences in stains is a critical issue for histopathology image analysis. Existing methods adopt color matching, stain separation, stain transfer or the com...
详细信息
To adapt text summarization to the multilingual world, previous work proposes multilingual summarization (MLS) and cross-lingual summarization (CLS). However, these two tasks have been studied separately due to the di...
详细信息
Temporal action localization is a challenging task in video understanding. Although great progress has been made in temporal action localization, the most advanced methods still have the problem of sharp performance d...
详细信息
暂无评论