With the explosive growth of information, recommendation systems have emerged to alleviate the problem of information overload. In order to improve the performance of recommendation systems, many existing methods intr...
详细信息
In driving scenarios, automobile active safety systems are increasingly incorporating deep learning technology. These systems typically need to handle multiple tasks simultaneously, such as detecting fatigue driving a...
详细信息
In the evolving landscape of sequential recommendation systems, the application of Large Language Models (LLMs) is increasingly prominent. However, current attempts typically utilize general-purpose LLMs, which presen...
详细信息
Next-basket recommendation (NBR) infers a set of items that a user will interact with in the next basket. Existing methods often struggle with the data sparsity problem, particularly when the number of baskets is sign...
详细信息
The natural environment presents a multitude of scenes with diverse content, posing challenges for satisfactory segmentation results using existing segmentation networks. In response, we propose a Cascaded Resolution ...
详细信息
Virtual Reality(VR)is a key industry for the development of the digital economy in the *** VR has advantages in terms of mobility,lightweight and cost-effectiveness,which has gradually become the mainstream implementa...
详细信息
Virtual Reality(VR)is a key industry for the development of the digital economy in the *** VR has advantages in terms of mobility,lightweight and cost-effectiveness,which has gradually become the mainstream implementation of *** this paper,a mobile VR video adaptive transmission mechanism based on intelligent caching and hierarchical buffering strategy in Mobile Edge Computing(MEC)-equipped 5G networks is proposed,aiming at the low latency requirements of mobile VR services and flexible buffer management for VR video adaptive *** support VR content proactive caching and intelligent buffer management,users’behavioral similarity and head movement trajectory are jointly used for viewpoint *** tile-based content is proactively cached in the MEC nodes based on the popularity of the VR ***,a hierarchical buffer-based adaptive update algorithm is presented,which jointly considers bandwidth,buffer,and predicted viewpoint status to update the tile chunk in client ***,according to the decomposition of the problem,the buffer update problem is modeled as an optimization problem,and the corresponding solution algorithms are ***,the simulation results show that the adaptive caching algorithm based on 5G intelligent edge and hierarchical buffer strategy can improve the user experience in the case of bandwidth fluctuations,and the proposed viewpoint prediction method can significantly improve the accuracy of viewpoint prediction by 15%.
作者:
Chen JiaFan ShiXu ChengSchool of Computer Science and Engineering
The Engineering Research Center of Learning-Based Intelligent System (Ministry of Education) The Key Laboratory of Computer Vision and System (Ministry of Education) Tianjin University of Technology Tianjin China
4D light field imaging captures rich spatial-angular information, providing essential geometric cues for semantic segmentation tasks. In this paper, we introduce a novel backbone network called the Light Field Extract...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
4D light field imaging captures rich spatial-angular information, providing essential geometric cues for semantic segmentation tasks. In this paper, we introduce a novel backbone network called the Light Field Extraction Interaction Network (LFEI-Net). LFEI-Net excels in extracting global structures and multi-scale spatial-angular features, capturing feature dependencies through channel modeling and diverse feature interactions. Unlike traditional methods that depend on pyramid and dilated feature extraction, LFEI-Net pioneers an efficient method by integrating large-scale horizontal depth-wise convolution (HDWC) and vertical depth-wise convolution (VDWC) with interactive operations for comprehensive spatial multi-scale feature extraction. Furthermore, we present the Multi-Angular Modeling (MAM) module, which effectively captures scene angle variations from multiple perspectives and precisely delineates object boundaries, thereby improving model adaptability. Our experimental evaluations on two datasets demonstrate that LFEI-Net significantly outperforms state-ofthe-art (SOTA) 2D and 4D light field semantic segmentation methods, achieving mean Intersection over Union (mIoU) of 83.72% and 86.88%, respectively.
Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of vi...
Although action recognition systems can achieve top performance when evaluated on in-distribution test points, they are vulnerable to unanticipated distribution shifts in test data. However, test-time adaptation of video action recognition models against common distribution shifts has so far not been demonstrated. We propose to address this problem with an approach tailored to spatio-temporal models that is capable of adaptation on a single video sample at a step. It consists in a feature distribution alignment technique that aligns online estimates of test set statistics towards the training statistics. We further enforce prediction consistency over temporally augmented views of the same test video sample. Evaluations on three benchmark action recognition datasets show that our proposed technique is architecture-agnostic and able to significantly boost the performance on both, the state of the art convolutional architecture TANet and the Video Swin Transformer. Our proposed method demonstrates a substantial performance gain over existing test-time adaptation approaches in both evaluations of a single distribution shift and the challenging case of random distribution shifts. Code will be available at https://***/wlin-at/ViTTA.
In autonomous driving scenarios, current object detection models show strong performance when tested in clear weather. However, their performance deteriorates significantly when tested in degrading weather conditions....
In autonomous driving scenarios, current object detection models show strong performance when tested in clear weather. However, their performance deteriorates significantly when tested in degrading weather conditions. In addition, even when adapted to perform robustly in a sequence of different weather conditions, they are often unable to perform well in all of them and suffer from catastrophic forgetting. To efficiently mitigate forgetting, we propose Domain-Incremental learning through Activation Matching (DILAM), which employs unsupervised feature alignment to adapt only the affine parameters of a clear weather pre-trained network to different weather conditions. We propose to store these affine parameters as a memory bank for each weather condition and plug-in their weather-specific parameters during driving (i.e. test time) when the respective weather conditions are encountered. Our memory bank is extremely lightweight, since affine parameters account for less than 2% of a typical object detector. Furthermore, contrary to previous domain-incremental learning approaches, we do not require the weather label when testing and propose to automatically infer the weather condition by a majority voting linear classifier.
Owing to the large distribution gap between the heterogeneous data in Visible-Infrared Person Re-identification (VI Re-ID), we point out that existing paradigms often suffer from the inter-modal semantic misalignment ...
Owing to the large distribution gap between the heterogeneous data in Visible-Infrared Person Re-identification (VI Re-ID), we point out that existing paradigms often suffer from the inter-modal semantic misalignment issue and thus fail to align and compare local details properly. In this paper, we present Concordant Attention learning (CAL), a novel framework that learns semantic-aligned representations for VI Re-ID. Specifically, we design the Target-aware Concordant Alignment paradigm, which allows target-aware attention adaptation when aligning heterogeneous samples (i.e., adaptive attention adjustment according to the target image being aligned). This is achieved by exploiting the discriminative clues from the modality counterpart and designing effective modality-agnostic correspondence searching strategies. To ensure semantic concordance during the cross-modal retrieval stage, we further propose MatchDistill, which matches the attention patterns across modalities and learns their underlying semantic correlations by bipartite-graph-based similarity modeling and cross-modal knowledge exchange. Extensive experiments on VI Re-ID benchmark datasets demonstrate the effectiveness and superiority of the proposed CAL.
暂无评论