The number of vulnerabilities reported in open source software has increased substantially in recent years. Security patches provide the necessary measures to protect software from attacks and vulnerabilities. In prac...
详细信息
Video-based human activity recognition (HAR) is an important task in many fields, such as healthcare monitoring, video surveillance, and sports analysis. This review paper aims to give an in-depth look at the current ...
详细信息
With the wide adoption of recommender systems, fairness has increasingly become a critical topic in many applications, such as e-commerce, job search, and online entertainment. Collaborative filtering is susceptible t...
详细信息
It has recently been discovered that using a pretrained vision-language model (VLM), e.g., CLIP, to align a whole query image with several finer text descriptions generated by a large language model can significantly ...
It has recently been discovered that using a pretrained vision-language model (VLM), e.g., CLIP, to align a whole query image with several finer text descriptions generated by a large language model can significantly enhance zero-shot performance. However, in this paper, we empirically find that the finer descriptions tend to align more effectively with local areas of the query image rather than the whole image, and then we theoretically validate this finding. Thus, we present a method called weighted visual-text cross alignment (WCA). This method begins with a localized visual prompting technique, designed to identify local visual areas within the query image. The local visual areas are then cross-aligned with the finer descriptions by creating a similarity matrix using the pre-trained VLM. To determine how well a query image aligns with each category, we develop a score function based on the weighted similarities in this matrix. Extensive experiments demonstrate that our method significantly improves zero-shot performance across various datasets, achieving results that are even comparable to few-shot learning methods. The code is available at ***/tmlr-group/WCA.
In recent years, a variety of rolling bearing fault diagnosis methods based on deep learning has become an emerging research orientation. However, there is still a gap between the existing diagnostic model and the pra...
详细信息
The IEEE 802.15.4 standard is designed for low-rate wireless personal area networks (LR-WPANs). Deterministic and Synchronous Multi-channel Extension (DSME) is one of the key Medium Access Control (MAC) modes of the I...
详细信息
ChatGPT is a language model based on Generative AI. Existing research work on ChatGPT focused on its use in various domains. However, its potential for Sign Language Translation (SLT) is yet to be explored. This paper...
详细信息
Abstract: There are conflicting views about how social enterprise logic impacts community networks' sustainability (CNs). Some authors believe that running with the social enterprise logic spells doom on CNs. Conv...
详细信息
The rapid growth and widespread adoption of containerization technologies, such as Docker, and the increasing popularity of Kubernetes as a container orchestration platform have significantly shaped the landscape of m...
详细信息
MSC Codes 68T45, 68T10It has recently been discovered that using a pretrained vision-language model (VLM), e.g., CLIP, to align a whole query image with several finer text descriptions generated by a large language mo...
详细信息
暂无评论