Visible-Infrared Person Re-identification (VI-ReID) is a challenging task of cross-modality person retrieval. Traditional approaches, hindered by significant inter-modality variations, have predominantly targeted the ...
详细信息
ISBN:
(纸本)9798400707032
Visible-Infrared Person Re-identification (VI-ReID) is a challenging task of cross-modality person retrieval. Traditional approaches, hindered by significant inter-modality variations, have predominantly targeted the extraction of shared features in the network's final output layer, often overlooking the valuable shallow-level information. To counter this, the paper introduces a novel framework for VI-ReID, the Deep-Shallow Spatial-Frequency Feature Fusion (DSSF3), which prioritizes the integration of rich, multi-level features. Primarily, the Four-Stream Feature Extraction network (FSFE) is expertly crafted to bridge the gap between visible and infrared images, bolstering the network's fine-grained semantic feature extraction via strategic data augmentation. Concurrently, this paper proposes the Spatial-Frequency Fusion Module (SFFM), which adeptly captures critical spatial and frequency domain details that are commonly neglected during training. The results on two public datasets demonstrate a significant improvement of the proposed method.
Brand logos are often rendered in a different style based on a context such as an event promotion. For example, Warner Bros. uses a different variety of their brand logo for different movies for promotion and aestheti...
详细信息
ISBN:
(纸本)9781665448994
Brand logos are often rendered in a different style based on a context such as an event promotion. For example, Warner Bros. uses a different variety of their brand logo for different movies for promotion and aesthetic appeal. In this paper, we propose an automated method to render brand logos in the coloring style of branding material such as movie posters. For this, we adopt a photo-realistic neural style transfer method using movie posters as the style source. We propose a color-based image segmentation and matching method to assign style segments to logo segments. Using these, we render the well-known Warner Bros. logo in the coloring style of 141 movie posters. We also present survey results where 287 participants rate the machine-stylized logos for their representativeness and visual appeal.
The accurate recognition and comprehensive understanding of medical images depicting human tissue represent a central focus in computervision research. Many tasks within medical imaging rely on deep neural networks, ...
详细信息
This paper proposes an advanced traffic light control system using IoT devices and computervision, integrated through M2M interactions and modeled with AnyLogic PLE. The key contribution is the combination of IoT and...
详细信息
ISBN:
(纸本)9783031702587;9783031702594
This paper proposes an advanced traffic light control system using IoT devices and computervision, integrated through M2M interactions and modeled with AnyLogic PLE. The key contribution is the combination of IoT and computervision for real-time, adaptive traffic light control. The study highlights the practical value of M2M technology, facilitating seamless interaction between web camera-equipped traffic lights and personal computers, overcoming the complexity of traditional wired methods like Siemens microcontrollers. Using a socket library for communication between Windows and Linux-based Raspberry Pi, the system implements interactive Wi-Fi information exchange for video monitoring and real-time road situation recognition. These data inputs control traffic lights via computervision, enabling automated, adaptive traffic management. The prototype demonstrates real-time animated simulation managed by a dispatcher, enhancing the efficiency of traffic systems. The integration of M2M, IoT, and computervision marks a significant advancement in intelligent transportation systems.
Intelligent monitoring technology has become a new research direction in the field of computervision in recent years. The computervision system is the video data received from the camera, analyzed and learned throug...
详细信息
Deep-learning based generative models are proven to be capable for achieving excellent results in numerous image processing tasks with a wide range of applications. One significant improvement of deep-learning approac...
详细信息
ISBN:
(纸本)9781665448994
Deep-learning based generative models are proven to be capable for achieving excellent results in numerous image processing tasks with a wide range of applications. One significant improvement of deep-learning approaches compared to traditional approaches is their ability to regenerate semantically coherent images by only relying on an input with limited information. This advantage becomes even more crucial when the input size is only a very minor proportion of the output size. Such image expansion tasks can be more challenging as the missing area may originally contain many semantic features that are critical in judging the quality of an image. In this paper we propose an edge-guided generative network model for producing semantically consistent output from a small image input. Our experiments show the proposed network is able to regenerate high quality images even when some structural features are missing in the input.
The idea of the proposed methodology is to develop an efficient system that can recognize handwritten Kannada characters, a language whose script is quite complex. It makes use of feature extraction by using CNN in co...
详细信息
This paper introduces a new approach for food image segmentation utilizing the Segment Anything Model (SAM), with the additional refinement achieved through fine-tuning with Low-Rank Adaptation layers (LoRA). The segm...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
This paper introduces a new approach for food image segmentation utilizing the Segment Anything Model (SAM), with the additional refinement achieved through fine-tuning with Low-Rank Adaptation layers (LoRA). The segmentation task involves generating a binary mask for food in RGB images, with pixels categorized as background or food. We conduct various experiments to assess and compare the performance of our proposed method with previous approaches. Our findings indicate that our method consistently outperforms other techniques, achieving an accuracy of 94.14%. The improved accuracy of our approach highlights its potential for various applications in food image analysis, contributing to the advancement of computervision techniques in the realm of food recognition and segmentation.
Convolutional neural networks are able to learn realistic image priors from numerous training samples in low-level image generation and restoration [66]. We show that, for high-level image recognition tasks, we can fu...
详细信息
ISBN:
(纸本)9781665448994
Convolutional neural networks are able to learn realistic image priors from numerous training samples in low-level image generation and restoration [66]. We show that, for high-level image recognition tasks, we can further reconstruct "realistic" images of each category by leveraging intrinsic Batch Normalization (BN) statistics without any training data. Inspired by the popular VAE/GAN methods, we regard the zero-shot optimization process of synthetic images as generative modeling to match the distribution of BN statistics. The generated images serve as a calibration set for the following zero-shot network quantizations. Our method meets the needs for quantizing models based on sensitive information, e.g., due to privacy concerns, no data is available. Extensive experiments on benchmark datasets show that, with the help of generated data, our approach consistently outperforms existing data-free quantization methods.
Nowadays, Emotion recognition and detection technology are trending among researchers. Automatically recognizing facial emotions is a challenging task in computervision a face picture has a wide range of potential ap...
详细信息
ISBN:
(纸本)9798350360806;9798350360790
Nowadays, Emotion recognition and detection technology are trending among researchers. Automatically recognizing facial emotions is a challenging task in computervision a face picture has a wide range of potential applications, including security while driving, interaction between humans and computers, healthcare, psychology, video conferencing, cognitive research, and others. In this study, a deep-learning-based method is recommended to assess a person's expressions with their faces. While these emotions are extremely complicated and hard for robots to comprehend, they are simple to comprehend for humans. This FER will used in lie detection by analyzing the micro-facial emotions of humans it will predict the accurate result. By using deep learning and CNN methods it can detect automatically in a real-time environment from human facial expressions. We can implement deep learning to develop robust and reliable systems. The FER is capable of recognizing and detecting facial expressions automatically. It involves training in convolutional neural networks to analyze them and examine the classification of facial emotions. The result of the dataset will depend on the specific application and the desired model accuracy. Some of the most popular and well-regarded datasets for facial expression recognition include AffectNet, FER-2013, JAFFE, CK+, and DISFA. However because they haven't considered the difficulties presented by variations in head position, their usefulness is limited, and the accuracy still falls short of expectations. The work focused on methods and datasets that will be used to predict all kinds of such as joy, sorrow, rage, contempt, indifference, surprise, etc. And also this review provides brief information about the working methods of FER and evaluates the future challenges. This review influence will increase the efficiency of user experiences in applications like educational software, virtual assistants, and entertainment. It provides highly accurate
暂无评论