Understanding the mechanisms underlying human visual attention is an important research problem in cognitive neuroscience and computervision. While existing models predict salient regions (i.e., saliency maps) and te...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Understanding the mechanisms underlying human visual attention is an important research problem in cognitive neuroscience and computervision. While existing models predict salient regions (i.e., saliency maps) and temporal sequences of eye fixations (i.e., scanpaths) in images, their designs often partially follow theoretical frameworks. Here, we introduce ScanpathNet, a deep learning model inspired by the latest theoretical model in neuroscience. It is 'guided' by a dynamic priority map influenced by semantic content and fixation history. The model leverages convolutional neural networks to extract rich semantic features, convolutional long short-term memory networks to model the inhibition of return mechanism and sequential dependencies of fixations, and mixture density networks to predict probability distributions of fixations for each pixel. Simulated human scanpaths can then be generated by sequentially sampling the output of the proposed model. Despite its simplicity, ScanpathNet showed promising qualitative and quantitative scanpath prediction performance in extensive experiments on numerous eye-tracking benchmark datasets.
The relatively hot temperature of the human body causes people to turn into long-wave infrared light sources. Since this emitted light has a larger wavelength than visible light, many surfaces in typical scenes act as...
详细信息
ISBN:
(纸本)9798350301298
The relatively hot temperature of the human body causes people to turn into long-wave infrared light sources. Since this emitted light has a larger wavelength than visible light, many surfaces in typical scenes act as infrared mirrors with strong specular reflections. We exploit the thermal reflections of a person onto objects in order to locate their position and reconstruct their pose, even if they are not visible to a normal camera. We propose an analysis-by-synthesis framework that jointly models the objects, people, and their thermal reflections, which combines generative models with differentiable rendering of reflections. Quantitative and qualitative experiments show our approach works in highly challenging cases, such as with curved mirrors or when the person is completely unseen by a normal camera.
Interaction recognition from multi-person videos is a challenging yet essential task in computervision. Often the videos depict actions with multiple actors involved, some of whom participate in the main event, and t...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Interaction recognition from multi-person videos is a challenging yet essential task in computervision. Often the videos depict actions with multiple actors involved, some of whom participate in the main event, and the rest are present in the scene without being part of the actual event. This paper proposes a model to tackle the problem of interaction recognition from multi-person videos. Our model consists of a Recurrent Neural Network (RNN) equipped with a time-varying attention mechanism. It receives scene features and localized actors features to predict the interaction class. Additionally, the attention model identifies the people responsible for the main event. We chose penalty classification from ice hockey broadcast videos as our application. These videos are multi-persons and depict complex interactions between players in a non-laboratory recording setup. We evaluate our model on a new dataset of ice hockey penalty videos and report 93.93% classification accuracy. We include a qualitative analysis of the attention mechanism by visualizing the attention weights. Our code is publicly available (1).
Empirical robustness evaluation (RE) of deep learning models against adversarial perturbations involves solving non-trivial constrained optimization problems. Recent work has shown that these RE problems can be reliab...
详细信息
Attribute-based person retrieval enables individuals to be searched and retrieved using their soft biometric features, for instance, gender, accessories, and clothing colors. The process has numerous practical use cas...
详细信息
ISBN:
(纸本)9798350370287;9798350370713
Attribute-based person retrieval enables individuals to be searched and retrieved using their soft biometric features, for instance, gender, accessories, and clothing colors. The process has numerous practical use cases, such as surveillance, retail, or smart cities. Notably, attribute-based person retrieval empowers law enforcement agencies to efficiently comb through vast volumes of surveillance footage from extensive multi-camera networks, facilitating the swift localization of missing persons or criminals. However, for real-world application, attribute-based person retrieval is required to generalize to multiple settings in indoor and outdoor scenarios with their respective challenges. For its second edition, the WACV 2024 Pedestrian Attribute recognition and Attribute-based Person Retrieval Challenge (UPAR-Challenge) aimed once again to spotlight the current challenges and limitations of existing methods to bridge the domain gaps in real-world surveillance contexts. Analogous to the first edition, two tracks are offered: pedestrian attribute recognition and attribute-based person retrieval. The UPAR-Challenge 2024 dataset extends the UPAR dataset with the introduction of harmonized annotations for the MEVID dataset, which is used as a novel test domain. To this aim, 1.1M additional annotations were manually labeled and validated. Each track evaluates the robustness of the competing methods to domain shifts by training and evaluating on data from entirely different domains. The challenge attracted 82 registered participants, which was considered a success from the organizers' perspective. While ten competing teams surpassed the baseline for track 1, no team managed to outperform the baseline on track 2, emphasizing the task's difficulty. This work describes the challenge design, the adopted dataset, obtained results, as well as future directions on the topic. The UPAR-Challenge dataset is available on GitHub: https://***/speckean/upar_challenge.
A new non-central model suitable for calibrating fisheye cameras is proposed. It is a direct extension of the popular central model developed by Scaramuzza et al., used by Matlab computervision Toolbox fisheye calibr...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
A new non-central model suitable for calibrating fisheye cameras is proposed. It is a direct extension of the popular central model developed by Scaramuzza et al., used by Matlab computervision Toolbox fisheye calibration tool. It allows adapting existing applications that are using this central model to a non-central projection that is more accurate, especially when objects captured in the images are close to the camera, and it makes it possible to switch easily between the more accurate non-central characterization of the fisheye camera and the more convenient central approximation, as needed. It is shown that the algorithms proposed by Scaramuzza et al. for their central model can be modified to accommodate the angle dependent axial viewpoint shift. This means, besides other, that a similar process can be used for calibration involving the viewpoint shift characterization and a user-friendly calibration tool can be produced with this new non-central model that does not require the user to provide detailed lens design specifications or an educated guess for the initial parameter values. Several other improvements to the Scaramuzza's central model are also introduced, helping to improve the performance of both the central model, and its non-central extension.
vision Transformer models process input images by dividing them into a spatially regular grid of equal-size patches. Conversely, Transformers were originally introduced over natural language sequences, where each toke...
详细信息
Aging people may be prone to accidents in bathrooms and toilets. The detection of strain motion for a smart toilet application has not been studied sufficiently. In this paper, we propose a method for strain detection...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Aging people may be prone to accidents in bathrooms and toilets. The detection of strain motion for a smart toilet application has not been studied sufficiently. In this paper, we propose a method for strain detection from a force sensor placed on a toilet seat for a smart toilet healthcare application. The method first extracts breath and motion features that are assumed to be key components for the strain detection. The method then learns the discriminator model based on the random forest classifier using the aforementioned features. Finally, the method recognizes actions in the toilet room. There were five detection actions: seating, taking up toilet paper, wiping bottom, which are normal actions when sitting on a toilet seat, and strain actions (strong and weak). An experiment with 19 subjects was also conducted. Compared with a microwave sensor-based recognition, which is a conventional method (accuracy = 61.6%), our method was able to recognize the actions with high accuracy of 80.2% (significant test: T = 12.7, P < 0.01) in the experiment. Our strain detection method has the potential to be used as a smart toilet system to prevent blood pressure elevation and collapse caused by strain in the future.
In recent years, patternrecognition has advanced, allowing computers to detect and classify things from various sources. Recent advances in image and patternrecognition research can enhance user experience for vario...
详细信息
ISBN:
(纸本)9798350349467;9798350349450
In recent years, patternrecognition has advanced, allowing computers to detect and classify things from various sources. Recent advances in image and patternrecognition research can enhance user experience for various applications. Our research aims to address a common issue for shoppers: locating apparel with a specific style or pattern. Our system uses computervision and patternrecognition to recognise designs on clothing based on user sketches and finds the best images or products online. To train our model, we will collect doodle sketches from end users by presenting them with a clothing product image. The model will extract features based on shape, colour, and design. The model will then generate a search query which will be inputted on a search engine to find the product that most closely resembles the doodle. Our methodology enables shoppers to find desired clothing without excessive browsing, improving human-computer connection and simplifying purchasing experiences.
Sketch-based understanding is a critical component of human cognitive learning and is a primitive communication means between humans. This topic has recently attracted the interest of the computervision community as ...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
Sketch-based understanding is a critical component of human cognitive learning and is a primitive communication means between humans. This topic has recently attracted the interest of the computervision community as sketching represents a powerful tool to express static objects and dynamic scenes. Unfortunately, despite its broad application domains, the current sketch-based models strongly rely on labels for supervised training, ignoring knowledge from unlabeled data, thus limiting the underlying generalization and the applicability. Therefore, we present a study about the use of unlabeled data to improve a sketch-based model. To this end, we evaluate variations of VAE and semi-supervised VAE, and present an extension of BYOL to deal with sketches. Our results show the superiority of sketch-BYOL, which outperforms other self-supervised approaches increasing the retrieval performance for known and unknown categories. Furthermore, we show how other tasks can benefit from our proposal.
暂无评论