The theoretical and practical issues of the development of a control system for the technical vision of robots, which is based on the specifics of the processed information and the principles of building an integral s...
详细信息
Object detection tasks have made significant progress on class-balanced, high-quality datasets. However, training data often exhibit a long-tail distribution in real-world scenarios. Existing re-weighting methods have...
详细信息
This paper introduces a novel denoising approach image denoising is one of the fundamental challenges in the field of imageprocessing andcomputervision. Our main aim of the project is to get a complete noiseless im...
详细信息
Lip-reading is the operation of recognizing speech from lip movements. This is a difficult task because the movements of the lips when pronouncing the words are similar for some of them. Viseme is used to describe lip...
详细信息
One of the most important stages in the fate of the embryo in In vitro fertilization (IVF) is the blastocyst stage. There is currently no way to diagnose blastocyst. In this study, using Resnet and Unet networks, the ...
详细信息
We implemented a real-time ensemble model for face detection by combining the results of YOLO v1 to v4. We used the WIDER FACE benchmark for training YOLOv1 to v4 in the Darknet framework. Then, we ensemble their resu...
详细信息
Sign Language Detection has become crucial and effective for humans and research in this area is in progress and is one of the applications of computervision. Earlier works included detection using static signs with ...
详细信息
Underwater images usually have low contrast, blurring, and extreme color distortion because the light is refracted, scattered, and absorbed as it passes through the water. These features can lead to challenges in imag...
详细信息
Multimodal human understanding and analysis is an emerging research area that cuts through several disciplines like computervision (CV), Natural Language processing (NLP), Speech processing, Human-computer Interactio...
详细信息
ISBN:
(纸本)9798400701245
Multimodal human understanding and analysis is an emerging research area that cuts through several disciplines like computervision (CV), Natural Language processing (NLP), Speech processing, Human-computer Interaction (HCI), and Multimedia. Several multimodal learning techniques have recently shown the benefit of combining multiple modalities in image-text, audio-visual and video representation learning and various downstream multimodal tasks. At the core, these methods focus on modelling the modalities and their complex interactions by using large amounts of data, different loss functions and deep neural network architectures. However, for many Web and Social media applications, there is the need to model the human, including the understanding of human behaviour and perception. For this, it becomes important to consider interdisciplinary approaches, including social sciences, semiotics and psychology. The core is understanding various cross-modal relations, quantifying bias such as social biases, and the applicability of models to real-world problems. Interdisciplinary theories such as semiotics or gestalt psychology can provide additional insights and analysis on perceptual understanding through signs and symbols via multiple modalities. In general, these theories provide a compelling view of multimodality and perception that can further expand computational research and multimedia applications on the Web and Social media. The theme of the MUWS workshop, multimodal human understanding, includes various interdisciplinary challenges related to social bias analyses, multimodal representation learning, detection of human impressions or sentiment, hate speech, sarcasm in multimodal data, multimodal rhetoric and semantics, and related topics. The MUWS workshop will be an interactive event and include keynotes by relevant experts, poster and demo sessions, research presentations and discussion.
Infrared (NIR) and visible light (VIS) images matching is a critical issue in the field of computervision, aiming to align and correlate images from different spectral ranges. The structural deformation between NIR a...
详细信息
暂无评论