The behavior of users on online life service platforms like Meituan and Yelp often occurs within specific finegrained spatiotemporal contexts(i.e., when and where). Recommender systems, designed to serve millions of u...
详细信息
The behavior of users on online life service platforms like Meituan and Yelp often occurs within specific finegrained spatiotemporal contexts(i.e., when and where). Recommender systems, designed to serve millions of users, typically operate in a fully server-based manner, requiring on-device users to upload their behavioral data, including fine-grained spatiotemporal contexts, to the server, which has sparked public concern regarding privacy. Consequently, user devices only upload coarse-grained spatiotemporal contexts for user privacy protection. However, previous research mostly focuses on modeling fine-grained spatiotemporal contexts using knowledge graph convolutional models, which are not applicable to coarse-grained spatiotemporal contexts in privacy-constrained recommender systems. In this paper, we investigate privacy-preserving recommendation by leveraging coarse-grained spatiotemporal contexts. We propose the coarse-grained spatiotemporal knowledge graph for privacy-preserving recommendation(CSKG), which explicitly models spatiotemporal co-occurrences using common-sense knowledge from coarse-grained contexts. Specifically, we begin by constructing a spatiotemporal knowledge graph tailored to coarse-grained spatiotemporal contexts. Then we employ a learnable metagraph network that integrates common-sense information to filter and extract co-occurrences. CSKG evaluates the impact of coarsegrained spatiotemporal contexts on user behavior through the use of a knowledge graph convolutional network. Finally, we introduce joint learning to effectively learn representations. By conducting experiments on two real large-scale datasets,we achieve an average improvement of about 11.0% on two ranking metrics. The results clearly demonstrate that CSKG outperforms state-of-the-art baselines.
Anomaly detection(AD) has been extensively studied and applied across various scenarios in recent years. However, gaps remain between the current performance and the desired recognition accuracy required for practical...
详细信息
Anomaly detection(AD) has been extensively studied and applied across various scenarios in recent years. However, gaps remain between the current performance and the desired recognition accuracy required for practical *** paper analyzes two fundamental failure cases in the baseline AD model and identifies key reasons that limit the recognition accuracy of existing approaches. Specifically, by Case-1, we found that the main reason detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has not/has been recovered into its original state. By Case-2, we surprisingly found that the abnormal area that cannot be recognized in image-level representations can be easily recognized in the feature-level representation. Based on the above observations, we propose a novel recover-then-discriminate(ReDi) framework for *** takes a self-generated feature map(e.g., histogram of oriented gradients) and a selected prompted image as explicit input information to address the identified in Case-1. Additionally, a feature-level discriminative network is introduced to amplify abnormal differences between the recovered and input representations. Extensive experiments on two widely used yet challenging AD datasets demonstrate that ReDi achieves state-of-the-art recognition accuracy.
Matrix minimization techniques that employ the nuclear norm have gained recognition for their applicability in tasks like image inpainting, clustering, classification, and reconstruction. However, they come with inher...
详细信息
Matrix minimization techniques that employ the nuclear norm have gained recognition for their applicability in tasks like image inpainting, clustering, classification, and reconstruction. However, they come with inherent biases and computational burdens, especially when used to relax the rank function, making them less effective and efficient in real-world scenarios. To address these challenges, our research focuses on generalized nonconvex rank regularization problems in robust matrix completion, low-rank representation, and robust matrix regression. We introduce innovative approaches for effective and efficient low-rank matrix learning, grounded in generalized nonconvex rank relaxations inspired by various substitutes for the ?0-norm relaxed functions. These relaxations allow us to more accurately capture low-rank structures. Our optimization strategy employs a nonconvex and multi-variable alternating direction method of multipliers, backed by rigorous theoretical analysis for complexity and *** algorithm iteratively updates blocks of variables, ensuring efficient convergence. Additionally, we incorporate the randomized singular value decomposition technique and/or other acceleration strategies to enhance the computational efficiency of our approach, particularly for large-scale constrained minimization problems. In conclusion, our experimental results across a variety of image vision-related application tasks unequivocally demonstrate the superiority of our proposed methodologies in terms of both efficacy and efficiency when compared to most other related learning methods.
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1...
详细信息
Despite the effectiveness of vision-language supervised fine-tuning in enhancing the performance of vision large language models(VLLMs), existing visual instruction tuning datasets include the following limitations.(1) Instruction annotation quality: despite existing VLLMs exhibiting strong performance,instructions generated by those advanced VLLMs may still suffer from inaccuracies, such as hallucinations.(2) Instructions and image diversity: the limited range of instruction types and the lack of diversity in image data may impact the model's ability to generate diversified and closer to real-world scenarios outputs. To address these challenges, we construct a high-quality, diverse visual instruction tuning dataset MMInstruct,which consists of 973k instructions from 24 domains. There are four instruction types: judgment, multiplechoice, long visual question answering, and short visual question answering. To construct MMInstruct, we propose an instruction generation data engine that leverages GPT-4V, GPT-3.5, and manual correction. Our instruction generation engine enables semi-automatic, low-cost, and multi-domain instruction generation at 1/6 the cost of manual construction. Through extensive experiment validation and ablation experiments,we demonstrate that MMInstruct could significantly improve the performance of VLLMs, e.g., the model fine-tuning on MMInstruct achieves new state-of-the-art performance on 10 out of 12 benchmarks. The code and data shall be available at https://***/yuecao0119/MMInstruct.
In this work, we present DocPedia, a novel large multimodal model(LMM) for versatile OCRfree document understanding, capable of parsing images up to 2560 × 2560 resolution. Unlike existing studies that either str...
详细信息
In this work, we present DocPedia, a novel large multimodal model(LMM) for versatile OCRfree document understanding, capable of parsing images up to 2560 × 2560 resolution. Unlike existing studies that either struggle with high-resolution documents or give up the large language model thus vision or language ability constrained, our DocPedia directly processes visual input in the frequency domain rather than the pixel space. The unique characteristic enables DocPedia to capture a greater amount of visual and textual information using a limited number of visual tokens. To consistently enhance both the perception and comprehension abilities of our DocPedia, we develop a dual-stage training strategy and enrich instructions/annotations of all training tasks covering multiple document types. Extensive quantitative and qualitative experiments are conducted on various publicly available benchmarks and the results confirm the mutual benefits of jointly learning perception and comprehension tasks. The results provide further evidence of the effectiveness and superior performance of our DocPedia over other methods.
Delay/disruption tolerant networking(DTN) is proposed as a networking architecture to overcome challenging space communication characteristics for reliable data transmission service in presence of long propagation del...
详细信息
Delay/disruption tolerant networking(DTN) is proposed as a networking architecture to overcome challenging space communication characteristics for reliable data transmission service in presence of long propagation delays and/or lengthy link disruptions. Bundle protocol(BP) and Licklider Transmission Protocol(LTP) are the main key technologies for DTN. LTP red transmission offers a reliable transmission mechanism for space networks. One of the key metrics used to measure the performance of LTP in space applications is the end-to-end data delivery delay, which is influenced by factors such as the quality of spatial channels and the size of cross-layer packets. In this paper, an end-to-end reliable data delivery delay model of LTP red transmission is proposed using a roulette wheel algorithm, and the roulette wheel algorithm is more in line with the typical random characteristics in space networks. The proposed models are validated through real data transmission experiments on a semi-physical testing platform. Furthermore, the impact of cross-layer packet size on the performance of LTP reliable transmission is analyzed, with a focus on bundle size, block size, and segment size. The analysis and study results presented in this paper offer valuable contributions towards enhancing the reliability of LTP transmission in space communication scenarios.
Machine learning(ML)has been empowering all aspects of the wireless communication system design, among which, the reinforcement learning(RL)-based approaches have attracted a lot of research attention since they can i...
详细信息
Machine learning(ML)has been empowering all aspects of the wireless communication system design, among which, the reinforcement learning(RL)-based approaches have attracted a lot of research attention since they can interact with the environment directly and learn from the collected experiences efficiently. In this paper, we propose a novel and efficient RL-based multi-beam combining scheme for future millimeter-wave(mmWave)three-dimensional(3D)multi-input multi-output(MIMO)communication systems. The proposed scheme does not require perfect channel state information(CSI)or precise user location information which both are generally difficult to obtain in practice, and well addresses the crucial challenge of computational complexity incurred by the extremely huge state and action spaces associated with multiple users, multiple paths, and multiple 3D beams. In particular, a self-attention deep deterministic policy gradient(DDPG)-based beam selection and combination framework is proposed to learn the 3D beamforming pattern without CSI adaptively. We aim to maximize the sum-rate of the mmWave 3D-MIMO system by optimizing the serving beam set and the corresponding combining weights for each user. To this end, the transformer is incorporated into the DDPG to obtain the global information of the input elements and capture the signal directions precisely, which leads to a near-optimal beamformer design. Simulation results verify the superiority of the proposed self-attention DDPG over conventional ML-based beamforming schemes in terms of sum-rate under various scenarios.
This article proposes an open-space emergency guiding (OSEG) framework that explores deep learning techniques to predict individual densities for evacuation based on Internet of Things localization. The OSEG framework...
详细信息
Effective management of electricity consumption (EC) in smart buildings (SBs) is crucial for optimizing operational efficiency, cost savings, and ensuring sustainable resource utilization. Accurate EC prediction enabl...
详细信息
The 3GPP vehicle-to-everything (C-V2X) technology is a key solution to provide communication services for applications of intelligent transportation systems (ITS). According to the C-V2X specification, vehicles are al...
详细信息
暂无评论