Metasurfaces have enabled the realization of several optical functionalities over an ultrathin platform,fostering the exciting field of flat *** metasurfaces are achieved by arranging a layout of static meta-atoms to ...
详细信息
Metasurfaces have enabled the realization of several optical functionalities over an ultrathin platform,fostering the exciting field of flat *** metasurfaces are achieved by arranging a layout of static meta-atoms to imprint a desired operation on the impinging wavefront,but their functionality cannot be *** and programmability of metasurfaces are the next important step to broaden their impact,adding customized on-demand functionality in which each meta-atom can be individually *** demonstrate a mechanical metasurface platform with controllable rotation at the meta-atom level,which can implement continuous Pancharatnam–Berry phase control of circularly polarized *** the proof-of-concept experiments,we demonstrate metalensing,focused vortex beam generation,and holographic imaging in the same metasurface template,exhibiting versatility and superior *** dynamic control of electromagnetic waves using a single,low-cost metasurface paves an avenue towards practical applications,driving the field of reprogrammable intelligent metasurfaces for a variety of applications.
Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i.e., p(candidates|query). While straightforward, this de facto paradigm overlooks the und...
Existing text-video retrieval solutions are, in essence, discriminant models focused on maximizing the conditional likelihood, i.e., p(candidates|query). While straightforward, this de facto paradigm overlooks the underlying data distribution p(query), which makes it challenging to identify out-of-distribution data. To address this limitation, we creatively tackle this task from a generative viewpoint and model the correlation between the text and the video as their joint probability p(candidates,query). This is accomplished through a diffusion-based text-video retrieval framework (Diffusion-Ret), which models the retrieval task as a process of gradually generating joint distribution from noise. During training, DiffusionRet is optimized from both the generation and discrimination perspectives, with the generator being optimized by generation loss and the feature extractor trained with contrastive loss. In this way, DiffusionRet cleverly leverages the strengths of both generative and discriminative methods. Extensive experiments on five commonly used text-video retrieval benchmarks, including MSRVTT, LSMDC, MSVD, ActivityNet Captions, and DiDeMo, with superior performances, justify the efficacy of our method. More encouragingly, without any modification, DiffusionRet even performs well in out-domain retrieval settings. We believe this work brings fundamental insights into the related fields. Code is available at https://***/jpthu17/DiffusionRet.
The received signal strength (RSS) clustering has played a significant role in site surveying and data pre-processing for the location tracking in Wi-Fi environment. To this end, this paper presents a novel clustering...
详细信息
Floor plan construction is the basis for fingerprint-based localization and people's activity learning, especially in the environments where the geometrical map is not available or does not exist. To this end, we ...
详细信息
Contrastive learning-based video-language representation learning approaches, e.g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs. To clarify this coa...
Contrastive learning-based video-language representation learning approaches, e.g., CLIP, have achieved outstanding performance, which pursue semantic interaction upon pre-defined video-text pairs. To clarify this coarse-grained global interaction and move a step further, we have to encounter challenging shell-breaking interactions for fine-grained cross-modal learning. In this paper, we creatively model video-text as game players with multivariate cooperative game theory to wisely handle the uncertainty during fine-grained semantic interaction with diverse granularity, flexible combination, and vague intensity. Concretely, we propose Hierarchical Banzhaf Interaction (HBI) to value possible correspondence between video frames and text words for sensitive and explainable cross-modal contrast. To efficiently realize the cooperative game of multiple video frames and multiple text words, the proposed method clusters the original video frames (text words) and computes the Banzhaf Interaction between the merged tokens. By stacking token merge modules, we achieve cooperative games at different semantic levels. Extensive experiments on commonly used text-video retrieval and video-question answering bench-marks with superior performances justify the efficacy of our HBI. More encouragingly, it can also serve as a visualization tool to promote the understanding of cross-modal interaction, which have a far-reaching impact on the community. Project page is available at https://***/HBI/.
Path tracking in wireless and mobile environments is a fundamental technology for ubiquitous location-based services (LBSs). In particular, it is very challenging to develop highly accurate and cost-efficient tracking...
详细信息
Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction ta...
Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e.g., unsupervised semantic segmentation (USS). The extracted relationship among pixel-level representations typically contains rich class-aware information that semantically identical pixel embeddings in the representation space gather together to form sophisticated concepts. However, leveraging the learned models to ascertain semantically consistent pixel groups or regions in the image is non-trivial since over/ under-clustering overwhelms the conceptualization procedure under various semantic distributions of different images. In this work, we investigate the pixel-level semantic aggregation in self-supervised ViT pre-trained models as image Segmentation and propose the Adaptive Conceptualization approach for USS, termed ACSeg. Concretely, we explicitly encode concepts into learnable prototypes and design the Adaptive Concept Generator (ACG), which adaptively maps these prototypes to informative concepts for each image. Meanwhile, considering the scene complexity of different images, we propose the modularity loss to optimize ACG independent of the concept number based on estimating the intensity of pixel pairs belonging to the same concept. Finally, we turn the USS task into classifying the discovered concepts in an unsupervised manner. Extensive experiments with state-of-the-art results demonstrate the effectiveness of the proposed ACSeg.
As Metaverse emerges as the next-generation Internet paradigm, the ability to efficiently generate content is paramount. AI-Generated Content (AIGC) emerges as a key solution, yet the resource-intensive nature of larg...
详细信息
As Metaverse emerges as the next-generation Internet paradigm, the ability to efficiently generate content is paramount. AI-Generated Content (AIGC) emerges as a key solution, yet the resource-intensive nature of larg...
详细信息
Dynamically encircling exceptional points (EPs) have unveiled intriguing chiral dynamics in photonics. However, the traditional approach based on an open manifold of Hamiltonian parameter space fails to explore trajec...
详细信息
Dynamically encircling exceptional points (EPs) have unveiled intriguing chiral dynamics in photonics. However, the traditional approach based on an open manifold of Hamiltonian parameter space fails to explore trajectories that pass through an infinite boundary. Here, by mapping the full parameter space onto a closed manifold of the Riemann sphere, we introduce a framework to describe encircling-EP loops. We demonstrate that an encircling trajectory crossing the north vertex can realize near-unity asymmetric transmission. An efficient gain-free, broadband asymmetric polarization-locked device is realized by mapping the encircling path onto L-shaped silicon waveguides.
暂无评论