Face recognition is a fundamental problem in numerous real-world applications, and it can be tackled using classification deep learning models. This kind of solution requires extensive datasets to represent the target...
详细信息
ISBN:
(纸本)9798350391893;9798350391886
Face recognition is a fundamental problem in numerous real-world applications, and it can be tackled using classification deep learning models. This kind of solution requires extensive datasets to represent the target subjects, where maintaining their privacy is essential. Further, there are challenging scenarios where the face of the subject can be partially occluded, e.g. a subject wearing a mask, making it difficult the recognition process. In this paper, extensive experiments are presented for deep learning-based face recognition, considering different dataset settings. More precisely, cropping operations are proposed to partially represent the face focusing on the eyes and nose region for model training, minimizing privacy issues, storage space, and allowing the trained model to process partially occluded faces. Two publicly available datasets named VGG-Face and DigiFace-1M are adapted for evaluation, and five convolutional neural network models are used for comparison, including ResNet, VGG-16, AlexNet, DenseNet-169, and EfficientNetV2-M. The results suggest that, although this cropping operation impacts the accuracy of the model when compared to the processing of full faces, it can be viable solution to minimize the pointed issues while preserving acceptable performance.
IR image recognition has been a promising field for the past few years. However, it is difficult to identify facial emotions when it is dark, the lighting is poor, or there are other elements present. Thermal pictures...
详细信息
Advances in machinelearning and neuralnetworks have transformed natural language processing (NLP) and computer vision (CV) applications. Recent research efforts have begun to bridge the gap between the two domains. ...
详细信息
ISBN:
(纸本)9798350363029;9798350363012
Advances in machinelearning and neuralnetworks have transformed natural language processing (NLP) and computer vision (CV) applications. Recent research efforts have begun to bridge the gap between the two domains. In this work, we propose a semi supervised Multi-Modal Encoder Decoder Network (MMEDN) to capture the relationship between images and textual descriptions, allowing us to generate meaningful descriptions of images and retrieve images from a database using cross-modality search. The semi-supervised training approach, which combines ground truth text descriptions and pseudotext generated by the text decoder within the model, requires far fewer image-text pairs in the training data and can directly add new raw images without manual text labelling for training. This approach is particularly useful for active learning environments, where labels are expensive and hard to obtain. We show that our model performs well with qualitative evaluations. We applied our model for finding images of a person from large databases and generating descriptions of people involved in an event for adding to an automatically generated report. The model was able to retrieve relevant images and generate accurate descriptions, demonstrating its applicability to more practical use cases.
Melanoma represents one of the most lethal forms of skin cancer, underscoring the importance of early detection for effective treatment and improved survival rates. Traditional diagnostic methods, which predominantly ...
详细信息
Breast cancer detection presents considerable challenges in terms of diagnostic accuracy and efficiency, particularly when relying on traditional manual examination techniques. As medical imaging data grows in scale a...
详细信息
This research explores a human multi-modal behavior identification algorithm using deep neuralnetworks. The algorithm leverages various deep neuralnetworks tailored to different types of modal information to process...
详细信息
This research presents a deep learning framework designed to automatically detect and classify liver tumors in CT images, leveraging Convolutional neuralnetworks (CNNs). The suggested method includes a sequence of pr...
详细信息
This research delves into the prospects of using deep learning and data mining to design an English teaching and quality system model. The paper first conducts a research and analysis on the relevant literature that e...
详细信息
The rise of social media and global communication requires multilingual image captioning for diverse participants, enabling cross-cultural understanding and engagement. By automatically generating captions in multiple...
详细信息
Foggy or hazy images result from light scattering and absorption by atmospheric particles. Intensity transformation techniques offer solutions to solve this problem, but param-eters selection significantly impacts the...
详细信息
暂无评论