This paper proposes to employ multi-dimensional controller for driving LED backlight scanning in a 120 Hz LCD for overcoming the hold-type characteristic of an LCD in time-multiplexed stereoscopic displays. A synchron...
详细信息
Session-based recommendation (SBR) methods often rely on user behavior data, which can struggle with the sparsity of session data, limiting performance. researchers have identified that beyond behavioral signals, rich...
详细信息
Session-based recommendation (SBR) methods often rely on user behavior data, which can struggle with the sparsity of session data, limiting performance. researchers have identified that beyond behavioral signals, rich semantic information in item descriptions is crucial for capturing hidden user intent. While large language models (LLMs) offer new ways to leverage this semantic data, the challenges of session anonymity, short-sequence nature, and high LLM training costs have hindered the development of a lightweight, efficient LLM framework for *** address the above challenges, we propose an LLM-enhanced SBR framework that integrates semantic and behavioral signals from multiple views. This two-stage framework leverages the strengths of both LLMs and traditional SBR models while minimizing training costs. In the first stage, we use multi-view prompts to infer latent user intentions at the session semantic level, supported by an intent localization module to alleviate LLM hallucinations. In the second stage, we align and unify these semantic inferences with behavioral representations, effectively merging insights from both large and small models. Extensive experiments on two real datasets demonstrate that the LLM4SBR framework can effectively improve model performance. We release our codes along with the baselines at https://***/tsinghua-fib-lab/LLM4SBR.
Empowered by the continuous integration of social multimedia and artificial intelligence, the application scenarios of information retrieval (IR) progressively tend to be diversified and personalized. Currently, User-...
详细信息
Empowered by the continuous integration of social multimedia and artificial intelligence, the application scenarios of information retrieval (IR) progressively tend to be diversified and personalized. Currently, User-Generated Content (UGC) systems have great potential to handle the interactions between large-scale users and massive media contents. As an emerging multimedia IR, Fashion Compatibility Modeling (FCM) aims to predict the matching degree of each given outfit and provide complementary item recommendation for user queries. Although existing studies attempt to explore the FCM task from a multimodal perspective with promising progress, they still fail to fully leverage the interactions between multimodal information or ignore the item-item contextual connectivities of intra-outfit. In this paper, a novel fashion compatibility modeling scheme is proposed based on Correlation-aware Cross-modal Attention Network. To better tackle these issues, our work mainly focuses on enhancing comprehensive multimodal representations of fashion items by integrating the cross-modal collaborative contents and uncovering the contextual correlations. Since the multimodal information of fashion items can deliver various semantic clues from multiple aspects, a modality-driven collaborative learning module is presented to explicitly model the interactions of modal consistency and complementarity via a co-attention mechanism. Considering the rich connections among numerous items in each outfit as contextual cues, a correlation-aware information aggregation module is further designed to adaptively capture significant intra-correlations of item-item for characterizing the content-aware outfit representations. Experiments conducted on two real-world fashion datasets demonstrate the superiority of our approach over state-of-the-art methods.
With the help of 5G network, edge intelligence (EI) can not only provide distributed, low-latency, and high-reliable intelligent services, but also enable intelligent maintenance and management of smart city. However,...
详细信息
With the help of 5G network, edge intelligence (EI) can not only provide distributed, low-latency, and high-reliable intelligent services, but also enable intelligent maintenance and management of smart city. However, the constantly changing available computing resources of end devices and edge servers cannot continuously guarantee the performance of intelligent inference. In order to guarantee the sustainability of intelligent services in smart city, we propose the Adaptive Model Selection and Partition Mechanism (AMSPM) in 5G smart city where EI provides services, which mainly consists of Adaptive Model Selection (AMS) and Adaptive Model Partition (AMP). In AMSPM, the model selection and partition of deep neural network (DNN) are formulated as an optimization problem. Firstly, we propose a recursive-based algorithm named AMS based on the computing resources of edge devices to derive an appropriate DNN model that satisfies the latency demand of intelligent services. Then, we adaptively partition the selected DNN model according to the computing resources of edge devices. The experimental results demonstrate that, when compared with state-of-the-art model selection and partition mechanisms, AMSPM not only reduces latency but also enhances computing resource utilization.
Artificial intelligence (AI) empowered edge computing has given rise to a new paradigm and effectively facilitated the promotion and development of multimedia applications. The speech assistant is one of the significa...
详细信息
Artificial intelligence (AI) empowered edge computing has given rise to a new paradigm and effectively facilitated the promotion and development of multimedia applications. The speech assistant is one of the significant services provided by multimedia applications, which aims to offer intelligent interactive experiences between humans and machines. However, malicious attackers may exploit spoofed speeches to deceive speech assistants, posing great challenges to the security of multimedia applications. The limited resources of multimedia terminal devices hinder their ability to effectively load speech spoofing detection models. Furthermore, processing and analyzing speech in the cloud can result in poor real-time performance and potential privacy risks. Existing speech spoofing detection methods rely heavily on annotated data and exhibit poor generalization capabilities for unseen spoofed speeches. To address these challenges, this paper first proposes the Coordinate Attention Network (CA2Net) that consists of coordinate attention blocks and Res2Net blocks. CA2Net can simultaneously extract temporal and spectral speech feature information and represent multi-scale speech features at a granularity level. Besides, a contrastive learning-based speech spoofing detection framework named GEMINI is proposed. GEMINI can be effectively deployed on edge nodes and autonomously learn speech features with strong generalization capabilities. GEMINI first performs data augmentation on speech signals and extracts conventional acoustic features to enhance the feature robustness. Subsequently, GEMINI utilizes the proposed CA2Net to further explore the discriminative speech features. Then, a tensor-based multi-attention comparison model is employed to maximize the consistency between speech contexts. GEMINI continuously updates CA2Net with contrastive learning, which enables CA2Net to effectively represent speech signals and accurately detect spoofed speeches. Extensive experiments on
Graph pattern mining is essential for deciphering complex networks. In the real world, graphs are dynamic and evolve over time, necessitating updates in mining patterns to reflect these changes. Traditional methods us...
详细信息
Graph pattern mining is essential for deciphering complex networks. In the real world, graphs are dynamic and evolve over time, necessitating updates in mining patterns to reflect these changes. Traditional methods use fine-grained incremental computation to avoid full re-mining after each update, which improves speed but often overlooks potential gains from examining inter-update interactions holistically, thus missing out on overall efficiency *** this paper, we introduce Cheetah, a dynamic graph mining system that processes updates in a coarse-grained manner by leveraging exploration domains. These domains exploit the community structure of real-world graphs to uncover data reuse opportunities typically missed by existing approaches. Exploration domains, which encapsulate extensive portions of the graph relevant to updates, allow multiple updates to explore the same regions efficiently. Cheetah dynamically constructs these domains using a management module that identifies and maintains areas of redundancy as the graph changes. By grouping updates within these domains and employing a neighbor-centric expansion strategy, Cheetah minimizes redundant data accesses. Our evaluation of Cheetah across five real-world datasets shows it outperforms current leading systems by an average factor of 2.63 ×.
暂无评论