检索结果-内蒙古大学图书馆

Prototypical Residual Networks for Anomaly Detection and Localization

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Zhang, Hui Wu, Zuxuan Wang, Zheng Chen, Zhineng Jiang, Yu-Gang Shanghai Key Lab of Intell. Info. Processing School of CS Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing China School of Computer Science Zhejiang University of Technology China

Anomaly detection and localization are widely used in industrial manufacturing for its efficiency and effectiveness. Anomalies are rare and hard to collect and supervised models easily over-fit to these seen anomalies with a handful of abnormal samples, producing unsatisfactory performance. On the other hand, anomalies are typically subtle, hard to discern, and of various appearance, making it difficult to detect anomalies and let alone locate anomalous regions. To address these issues, we propose a framework called Prototypical Residual Network (PRN), which learns feature residuals of varying scales and sizes between anomalous and normal patterns to accurately reconstruct the segmentation maps of anomalous regions. PRN mainly consists of two parts: multi-scale prototypes that explicitly represent the residual features of anomalies to normal patterns;a multi-size self-attention mechanism that enables variable-sized anomalous feature learning. Besides, we present a variety of anomaly generation strategies that consider both seen and unseen appearance variance to enlarge and diversify anomalies. Extensive experiments on the challenging and widely used MVTec AD benchmark show that PRN outperforms current state-of-the-art unsupervised and supervised methods. We further report SOTA results on three additional datasets to demonstrate the effectiveness and generalizability of PRN. Copyright © 2022, The Authors. All rights reserved.

关键词： Anomaly detection

A Novel Smartphone Recommendation System Using Ensemble Machine Learning

学校读者我要写书评

暂无评论

A Novel Smartphone Recommendation System Using Ensemble Mach...

2023 IEEE Asia-Pacific Conference on computer science and Data Engineering, CSDE 2023

作者： Almadhor, Ahmad Abbas, Sidra Sampedro, Gabriel Avelino Abisado, Mideth Gadekallu, Thippa Reddy College of Computer and Information Sciences Jouf University Sakaka72388 Saudi Arabia Comsats University Islamabad Department of Computer Science Islamabad Pakistan University of the Philippines Open University Faculty of Information and Communication Studies Los Baños4031 Philippines De la Salle University Center for Computational Imaging and Visual Innovations 2401 Taft Ave. Manila1004 Philippines College of Computing and Information Technologies National University Manila Philippines Zhongda Group Jiaxing City Zhejiang Province Haiyan County314312 China Lebanese American University Department of Electrical and Computer Engineering Byblos Lebanon School of Information Technology and Engineering Vellore Institute of Technology Tamil Nadu India College of Information Science and Engineering Jiaxing University Jiaxing314001 China Lovely Professional University Division of Research and Development Phagwara India

ISBN: (纸本)9798350341072

Due to the proliferation of internet evaluations brought on by the rising demand for smartphones, consumers find it challenging to make accurate selections when purchasing. In this paper, we offer ensemble voting methods based on TF-IDF (Term Frequency-Inverse Document Frequency) features for clas-sifying mobile phone ratings. We use a recently assembled dataset comprising over 13,000 smartphone reviews from the Flipkart website. The suggested approach includes feature extraction using the TF-IDF, data cleaning, balancing, and voting-based model prediction. To identify the recently created Flipkart dataset, the suggested method created an ensemble voting mechanism based on machine learning techniques. According to the experimental findings, the suggested method performs more accurately and efficiently than conventional machine learning techniques. At 98.0 %, the model achieved the greatest accuracy. The suggested method can be expanded to additional e-commerce platforms with sizable datasets of online evaluations and assist customers in making updated purchase preferences. © 2023 IEEE.

关键词： Machine learning

ObjectFormer for Image Manipulation Detection and Localization

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Wang, Junke Wu, Zuxuan Chen, Jingjing Han, Xintong Shrivastava, Abhinav Lim, Ser-Nam Jiang, Yu-Gang Shanghai Key Lab of Intelligent Information Processing School of Computer Science Fudan University China Shanghai Collaborative Innovation Center on Intelligent Visual Computing China Huya Inc University of Maryland United States Meta AI

Recent advances in image editing techniques have posed serious challenges to the trustworthiness of multimedia data, which drives the research of image tampering detection. In this paper, we propose ObjectFormer to detect and localize image manipulations. To capture subtle manipulation traces that are no longer visible in the RGB domain, we extract high-frequency features of the images and combine them with RGB features as multimodal patch embeddings. Additionally, we use a set of learnable object prototypes as mid-level representations to model the object-level consistencies among different regions, which are further used to refine patch embeddings to capture the patch-level consistencies. We conduct extensive experiments on various datasets and the results verify the effectiveness of the proposed method, outperforming state-of-the-art tampering detection and localization methods. Copyright © 2022, The Authors. All rights reserved.

关键词： Embeddings

Modality-Aware Contrastive Instance Learning with Self-Distillation forWeakly-Supervised Audio-visual Violence Detection

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Yu, Jiashuo Liu, Jinyu Cheng, Ying Feng, Rui Zhang, Yuejie School of Computer Science Shanghai Key Laboratory of Intelligent Information Processing Fudan University China Academy for Engineering and Technology Fudan University China Shanghai Collaborative Innovation Center of Intelligent Visual Computing Fudan University China

Weakly-supervised audio-visual violence detection aims to distinguish snippets containing multimodal violence events with video-level labels. Many prior works perform audio-visual integration and interaction in an early or intermediate manner, yet overlooking the modality heterogeneousness over the weakly-supervised setting. In this paper, we analyze the modality asynchrony and undifferentiated instances phenomena of the multiple instance learning (MIL) procedure, and further investigate its negative impact on weakly-supervised audio-visual learning. To address these issues, we propose a modality-aware contrastive instance learning with self-distillation (MACIL-SD) strategy. Specifically, we leverage a lightweight two-stream network to generate audio and visual bags, in which unimodal background, violent, and normal instances are clustered into semi-bags in an unsupervised way. Then audio and visual violent semi-bag representations are assembled as positive pairs, and violent semi-bags are combined with background and normal instances in the opposite modality as contrastive negative pairs. Furthermore, a self-distillation module is applied to transfer unimodal visual knowledge to the audio-visual model, which alleviates noises and closes the semantic gap between unimodal and multimodal features. Experiments show that our framework outperforms previous methods with lower complexity on the large-scale XD-Violence dataset. Results also demonstrate that our proposed approach can be used as plug-in modules to enhance other networks. Codes are available at https://***/JustinYuu/MACIL-SD. Copyright © 2022, The Authors. All rights reserved.

关键词： Distillation

Fast peer adaptation with context-aware exploration 24

学校读者我要写书评

暂无评论

Fast peer adaptation with context-aware exploration

Proceedings of the 41st International Conference on Machine Learning

作者： Long Ma Yuanfei Wang Fangwei Zhong Song-Chun Zhu Yizhou Wang Academy for Advanced Interdisciplinary Studies Peking University and Nat'l Key Laboratory of General Artificial Intelligence BIGAI&PKU Center on Frontiers of Computing Studies School of Computer Science Peking University and Nat'l Key Laboratory of General Artificial Intelligence BIGAI&PKU School of Intelligence Science and Technology Peking University and Nat'l Key Laboratory of General Artificial Intelligence BIGAI&PKU Inst. for Artificial Intelligence and School of Intelligence Science and Technology Peking University and Nat'l Key Laboratory of General Artificial Intelligence BIGAI&PKU Center on Frontiers of Computing Studies School of Computer Science and Inst. for Artificial Intelligence and Nat'l Eng. Research Center of Visual Technology Peking University and Nat'l Key Laboratory of General Artificial Intelligence BIGAI&PKU

Fast adapting to unknown peers (partners or opponents) with different strategies is a key challenge in multi-agent games. To do so, it is crucial for the agent to probe and identify the peer's strategy efficiently, as this is the prerequisite for carrying out the best response in adaptation. However, exploring the strategies of unknown peers is difficult, especially when the games are partially observable and have a long horizon. In this paper, we propose a peer identification reward, which rewards the learning agent based on how well it can identify the behavior pattern of the peer over the historical context, such as the observation over multiple episodes. This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation, i.e., to actively seek and collect informative feedback from peers when uncertain about their policies and to exploit the context to perform the best response when confident. We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents. We demonstrate that our method induces more active exploration behavior, achieving faster adaptation and better outcomes than existing methods.

关键词：

Medical Image Registration and Its Application in Retinal Images: A Review

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Nie, Qiushi Zhang, Xiaoqing Hu, Yan Gong, Mingdao Liu, Jiang Research Institute of Trustworthy Autonomous Systems Department of Computer Science and Engineering Southern University of Science and Technology Shenzhen518055 China Center for High Performance Computing Shenzhen Key Laboratory of Intelligent Bioinformatics Shenzhen institute of Advanced Technology Chinese Academy of Sciences Shenzhen518055 China Singapore Eye Research Institute 169856 Singapore State Key Laboratory of Ophthalmology Optometry and Visual Science Eye Hospital Wenzhou Medical University Wenzhou325027 China

Medical image registration is vital for disease diagnosis and treatment with its ability to merge diverse information of images, which may be captured under different times, angles, or modalities. Although several surveys have reviewed the development of medical image registration, these surveys have not systematically summarized methodologies of existing medical image registration methods. To this end, we provide a comprehensive review of these methods from traditional and deep learning-based directions, aiming to help audiences understand the development of medical image registration quickly. In particular, we review recent advances in retinal image registration at the end of each section, which has not attracted much attention. Additionally, we also discuss the current challenges of retinal image registration and provide insights and prospects for future research. © 2024, CC BY-NC-SA.

关键词： Deep learning

Motion Guided Region Message Passing for Video Captioning

学校读者我要写书评

暂无评论

Motion Guided Region Message Passing for Video Captioning

International Conference on computer Vision (ICCV)

作者： Shaoxiang Chen Yu-Gang Jiang Shanghai Key Lab of Intelligent Information Processing School of Computer Science Fudan University Shanghai Collaborative Innovation Center on Intelligent Visual Computing

ISBN: (纸本)9781665428132

Video captioning is an important vision task and has been intensively studied in the computer vision community. Existing methods that utilize the fine-grained spatial information have achieved significant improvements, however, they either rely on costly external object detectors or do not sufficiently model the spatial/temporal relations. In this paper, we aim at designing a spatial information extraction and aggregation method for video captioning without the need of external object detectors. For this purpose, we propose a Recurrent Region Attention module to better extract diverse spatial features, and by employing Motion-Guided Cross-frame Message Passing, our model is aware of the temporal structure and able to establish high-order relations among the diverse regions across frames. They jointly encourage information communication and produce compact and powerful video representations. Furthermore, an Adjusted Temporal Graph Decoder is proposed to flexibly update video features and model high-order temporal relations during decoding. Experimental results on three benchmark datasets: MSVD, MSR-VTT, and VATEX demonstrate that our proposed method can outperform state-of-the-art methods.

关键词： Location awareness computer vision visualization Message passing Computational modeling Detectors Feature extraction

Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Zou, Yicheng Liu, Hongwei Gui, Tao Wang, Junzhe Zhang, Qi Tang, Meng Li, Haixiang Wang, Daniel Institute of Modern Languages and Linguistics Fudan University Shanghai China School of Computer Science Fudan University Shanghai China Shanghai Collaborative Innovation Center of Intelligent Visual Computing Shanghai China IPS Tencent PCG Beijing China

Text semantic matching is a fundamental task that has been widely used in various scenarios, such as community question answering, information retrieval, and recommendation. Most state-of-the-art matching models, e.g., BERT, directly perform text comparison by processing each word uniformly. However, a query sentence generally comprises content that calls for different levels of matching granularity. Specifically, keywords represent factual information such as action, entity, and event that should be strictly matched, while intents convey abstract concepts and ideas that can be paraphrased into various expressions. In this work, we propose a simple yet effective training strategy for text semantic matching in a divide-and-conquer manner by disentangling keywords from intents. Our approach can be easily combined with pre-trained language models (PLM) without influencing their inference efficiency, achieving stable performance improvements against a wide range of PLMs on three benchmarks. Copyright © 2022, The Authors. All rights reserved.

关键词： Semantics

An Optimization Method of Primer Design Based on Attention-BiLSTM

学校读者我要写书评

暂无评论

An Optimization Method of Primer Design Based on Attention-B...

Robotics, Artificial Intelligence and Intelligent Control (RAIIC), International Conference on

作者： Binhao Bai Jinyu Long Zhibo Yang Junli Li Ping Wei College of Computer Science Sichuan Normal University Chengdu China Visual Computing and Virtual Reality Key Laboratory of Sichuan Sichuan Normal University Chengdu China Sichuan Key Laboratory of Translational Medicine of Traditional Chinese Medicine Sichuan Academy of Traditional Chinese Medicine Sichuan Center of Translational Medicine Chengdu China

In this paper, we propose a method to predict the success of primer amplification based on the relationship existing between the sequence of primer and template, which can optimize the primer design and select the primer with better amplification from the candidate primer set. The double-stranded structure between primer and template nucleotide sequences is represented here by a number of words, each consisting of five characters that form sentences, as the dataset for the experiment, which is learned using an attention-based mechanism of bidirectional long short-term memory neural network model (Attention-BiLSTM), and then predicts primer amplification. The model predicted the results of polymerase chain reaction (PCR) involving specific primers and specific DNA templates with 82% accuracy, an improvement of about 2% over the performance of the LSTM with more stable value. These results show that the model can be used to effectively predict the results of PCR. This is the first paper to optimize primer design by screening the candidate primer set with a neural network model.

关键词：