Education, healthcare, poverty, and equity are just some of the social problems in which XR researchers leverage augmented reality, virtual reality, and mixed reality to create novel solutions. In this workshop propos...
详细信息
The performance of learning models often deteriorates when deployed in out-of-sample environments. To ensure reliable deployment, we propose a stability evaluation criterion based on distributional perturbations. Conc...
详细信息
Non-autoregressive Transformers (NATs) significantly reduce the decoding latency by generating all tokens in parallel. However, such independent predictions prevent NATs from capturing the dependencies between the tok...
Face Video Retouching is a complex task that often requires labor-intensive manual editing. Conventional image retouching methods perform less satisfactorily in terms of generalization performance and stability when a...
详细信息
ISBN:
(数字)9798350353006
ISBN:
(纸本)9798350353013
Face Video Retouching is a complex task that often requires labor-intensive manual editing. Conventional image retouching methods perform less satisfactorily in terms of generalization performance and stability when applied to videos without exploiting the correlation among frames. To address this issue, we propose a Video Retouching transformEr to remove facial imperfections in videos, which is referred to as VRetouchEr. Specifically, we estimate the apparent motion of imperfections between two consecutive frames, and the resulting displacement vectors are used to refine the imperfection map, which is synthesized from the current frame together with the corresponding encoder features. The flow-based imperfection refinement is critical for precise and stable retouching across frames. To leverage the temporal contextual information, we inject the refined imperfection map into each transformer block for multi-frame masked attention computation, such that we can capture the interdependence between the current frame and multiple reference frames. As a result, the imperfection regions can be replaced with normal skin with high fidelity, while at the same time keeping the other regions unchanged. Extensive experiments are performed to verify the superiority of VRetouchEr over state-of-the-art image retouching methods in terms of fidelity and stability.
Power consumption of household and commercial establishments is increasing every year. The vast amount of data generated by the smart energy metres cannot be monitored manually. Hence, automated power consumption moni...
详细信息
WiFi-based technology is appealing for indoor localization due to the widely deployed infrastructures. Recently, path separation solutions have been proposed to address the multipath effects and achieve decimeter-leve...
详细信息
ISBN:
(数字)9798331519186
ISBN:
(纸本)9798331519193
WiFi-based technology is appealing for indoor localization due to the widely deployed infrastructures. Recently, path separation solutions have been proposed to address the multipath effects and achieve decimeter-level localization accuracy in line-of-sight (LoS) scenarios. However, these solutions experience serious performance degradation in non-line-of-sight (NLoS) scenarios, and couldn't be used for mobile device tracking where continuous LoS/NLoS switching happens. In this paper, we propose SaTrack, a LoS/NLoS state-aware mobile device tracking system. SaTrack identifies LoS/NLoS states based on the diversity of the strongest estimated paths when using different reference antennas. With the observation of spatial aggregation and temporal continuity for the Tx-Rx direct path, SaTrack chooses the direct path through two-step clustering, i.e., clustering in the spatial domain to form candidates and clustering again in the temporal domain to select the winner. Extensive experiments are conducted to evaluate the effectiveness of SaTrack. In a typical indoor environment with abundant multipath, SaTrack achieves 0.64m and 1.27m for the median and 90th percentile tracking errors, outperforming the state-of-the-art (SOTA) solutions.
User intentions are typically formalized as evaluation rewards to be maximized when fine-tuning language models (LMs). Existing alignment methods, such as Direct Preference Optimization (DPO), are mainly tailored for ...
ISBN:
(纸本)9798331314385
User intentions are typically formalized as evaluation rewards to be maximized when fine-tuning language models (LMs). Existing alignment methods, such as Direct Preference Optimization (DPO), are mainly tailored for pairwise preference data where rewards are implicitly defined rather than explicitly given. In this paper, we introduce a general framework for LM alignment, leveraging Noise Contrastive Estimation (NCE) to bridge the gap in handling reward datasets explicitly annotated with scalar evaluations. Our framework comprises two parallel algorithms, NCA and InfoNCA, both enabling the direct extraction of an LM policy from reward data as well as preference data. Notably, we show that the DPO loss is a special case of our proposed InfoNCA objective under pairwise preference settings, thereby integrating and extending current alignment theories. By comparing NCA and InfoNCA, we demonstrate that the well-observed decreasing-likelihood trend of DPO/InfoNCA is caused by their focus on adjusting relative likelihood across different responses. In contrast, NCA optimizes the absolute likelihood for each response, thereby effectively preventing the chosen likelihood from decreasing. We evaluate our methods in both reward and preference settings with Mistral-8 x7B and 7B models. Experiments suggest that InfoNCA/NCA surpasses various preference baselines when reward datasets are available. We also find NCA significantly outperforms DPO in complex reasoning tasks like math and coding. Code: https://***/thu-ml/Noise-Contrastive-Alignment.
The automotive sector, which has a significant impact on economic and social growth, has been a driving force behind the development of Industry 4.0 and its associated technologies such as Smart Production, Smart Manu...
详细信息
Advances in machine learning and neural networks have transformed natural language processing (NLP) and computer vision (CV) applications. Recent research efforts have begun to bridge the gap between the two domains. ...
详细信息
ISBN:
(数字)9798350363012
ISBN:
(纸本)9798350363029
Advances in machine learning and neural networks have transformed natural language processing (NLP) and computer vision (CV) applications. Recent research efforts have begun to bridge the gap between the two domains. In this work, we propose a semi supervised Multi-Modal Encoder Decoder Network (MMEDN) to capture the relationship between images and textual descriptions, allowing us to generate meaningful descriptions of images and retrieve images from a database using cross-modality search. The semi-supervised training approach, which combines ground truth text descriptions and pseudotext generated by the text decoder within the model, requires far fewer image-text pairs in the training data and can directly add new raw images without manual text labelling for training. This approach is particularly useful for active learning environments, where labels are expensive and hard to obtain. We show that our model performs well with qualitative evaluations. We applied our model for finding images of a person from large databases and generating descriptions of people involved in an event for adding to an automatically generated report. The model was able to retrieve relevant images and generate accurate descriptions, demonstrating its applicability to more practical use cases.
The assessment of power quality is vital for evaluating the existing state of the electricity supply, pinpointing its deficiencies, and guiding enhancements to guarantee a stable and consistent power supply, particula...
详细信息
暂无评论