In this paper, we propose a gait recognition method using a convolutional neural network (CNN). A CNN architecture is designed and trained to learn an efficient representation with which walking patterns i.e., gait ca...
详细信息
ISBN:
(数字)9783031581816
ISBN:
(纸本)9783031581809;9783031581816
In this paper, we propose a gait recognition method using a convolutional neural network (CNN). A CNN architecture is designed and trained to learn an efficient representation with which walking patterns i.e., gait can be disentangled from the visual appearance of the subjects caused by covariate factors such as variation in view angles, clothing and carrying conditions. Since dynamic areas contain the most informative part of the human gait and are insensitive to changes in various covariate conditions, we feed the gait entropy images as input to CNN model to capture mostly the motion information. the learned gait features from CNN are then fed into a K-NN classifier to identify individuals based on their unique gait patterns. Experiments are carried out for cross-view and cross-walking gait recognition using the CASIA-B dataset. Our experimental results demonstrate the effectiveness of the proposed method.
this work proposes a novel approach to talking face generation using driving audio. the driving audio and a single image of the target person are provided as input to the proposed model. the model generates a realisti...
详细信息
ISBN:
(纸本)9783031581809;9783031581816
this work proposes a novel approach to talking face generation using driving audio. the driving audio and a single image of the target person are provided as input to the proposed model. the model generates a realistic video of the target person uttering the driving audio. Recent works in this domain have focused on either one of expressions or lip-sync or identity preservation. this model provides supervision over photo realism, expression fulfilment, identity preservation and audio-visual synchrony which are crucial factors in synthesizing a realistic video. the proposed system is end-to-end trainable and the learning is performed with six losses. this method can generate photo realistic, expressive and audio-synced talking faces while preserving the identity of the target person. this work proposes a discriminator network to impose audio-visual synchrony in the generated video. the proposed model is trained on RAVDESS dataset containing 24 professional actors (12 female and 12 male), uttering two statements in a neutral North American accent with disgust, sad, angry, happy, surprise, fearful and calm emotions. this work is benchmarked on the VID-TIMIT dataset against three baseline models.
Deep learning-based methods are extensively used in image captioning, but most of these methods depend on features from a single encoder for generating captions. Different encoders capture different features of an ima...
详细信息
ISBN:
(纸本)9783031581809;9783031581816
Deep learning-based methods are extensively used in image captioning, but most of these methods depend on features from a single encoder for generating captions. Different encoders capture different features of an image, and thus, using features from multiple encoders may help improve the models' performance. Moreover, there needs to be more research on Hindi caption generation on large datasets such asMSCOCO. Recently, transformers have performed well on tasks such as image classification and object detection. One such transformer is the Swin Transformer. It captures both local as well as global information present in the image. A Faster RCNN, on the other hand, captures only local (object-level) information but does not capture global details. Using a single image feature generation method might sometimes result in incorrect feature generation, or some important objects may be missed while generating the feature vector. this problem can be mitigated by combining features from different methods. Furthermore, as local features-based models have produced better results in different domains, utilizing both Swin Transformer and Faster RCNN may result in better captioning models. this work proposes to use Swin Transformer-based image features along with Faster RCNN-based image features to generate Hindi captions for images. A decoder with two GRUs and Multi-head Attention uses these image features to build Hindi captions. Experiments demonstrate that the proposed method can generate high-quality captions while improving the performance of automatic evaluation metrics, establishing the method's efficacy.
image road detection is a significant problem in the applications of autonomous vehicles and mobile robots. the bend direction and angle are the main subjects for the inclined roads. To move safely on the road, it is ...
详细信息
ISBN:
(纸本)9798350351088;9798350351095
image road detection is a significant problem in the applications of autonomous vehicles and mobile robots. the bend direction and angle are the main subjects for the inclined roads. To move safely on the road, it is very important to determine the road lines and the direction of the bend. For this reason, a Vanishing Point (VP) detection algorithm is proposed to estimate the road boundaries for an Electric Vehicle (EV). However, it is challenging to effectively estimate the VP from a video image of the inclined roads. therefore, a reliable VP estimate approach is suggested that makes use of the junction points of the line segments taken from an image and a probabilistic voting process. Besides, the OpenCV library, which has widely used computervision algorithms and the Python software language, is utilized in the study. As a result, a Driver Assistance System (DAS), which is an important step in autonomous driving, is developed, and safe driving is provided by this work.
Rice is a popular staple diet in India, and its demand has recently increased. thanjavur, located in the Cauvery Delta region, is known as the rice granary of South India. Due to recent technological advancements, dig...
详细信息
ISBN:
(纸本)9789819752119;9789819752126
Rice is a popular staple diet in India, and its demand has recently increased. thanjavur, located in the Cauvery Delta region, is known as the rice granary of South India. Due to recent technological advancements, digital farming and globalization have significantly impacted the agricultural industry. It is crucial to differentiate between types of rice grains to prevent fraudulent labeling during import and export. To achieve this, a dataset, namely "TaPaSe Dataset", comprising five varieties of rice, including MTU 1010, MTU 1290, Narmadha, Pacha Ponni, and Sonna Masur, which are mainly cultivated in thanjavur, has been collected. We designed an image acquisition system to capture the aforementioned varieties in real time. the captured paddy rice images are highly challenging in the sense that all the images are captured under illumination and scale variations. We evaluated existing deep learning models to understand their ability to classify paddy seed varieties. the existing pre-trained models attain remarkable recognition rates on the proposed paddy seed varieties dataset.
Air quality estimation through sensor-based methods is widely used. Nevertheless, their frequent failures and maintenance challenges constrain the scalability of air pollution monitoring efforts. Recently, it has been...
详细信息
ISBN:
(纸本)9798400710759
Air quality estimation through sensor-based methods is widely used. Nevertheless, their frequent failures and maintenance challenges constrain the scalability of air pollution monitoring efforts. Recently, it has been demonstrated that air quality estimation can be done using image-based methods. these methods offer several advantages including ease of use, scalability, and low cost. However, the accuracy of these methods hinges significantly on the diversity and magnitude of the dataset utilized. the advancement of air quality estimation through image analysis has been limited due to the lack of available datasets. Addressing this gap, we present TRAQID - Traffic-Related Air Quality image Dataset, a novel dataset capturing 26,678 front and rear images of traffic alongside co-located weather parameters, multiple levels of Particulate Matters (PM) and Air Quality Index (AQI) values. Spanning over multiple seasons, with over 70 hours of data collection in the twin cities of Hyderabad and Secunderabad, India, the TRAQID offers diverse day and night imagery amid unstructured traffic conditions, encompassing six AQI categories ranging from "Good" to "Severe". State-of-the-art air quality estimation techniques, which were trained on a smaller and less-diverse dataset, showed poor results on the dataset presented in this paper. TRAQID models various uncertainty types, including seasonal changes, unstructured traffic patterns, and lighting conditions. the information from the two views (front and rear) of the traffic can be combined to improve the estimation performance in such challenging conditions. As such, the TRAQID serves as a benchmark for image-based air quality estimation tasks and AQI prediction, given its diversity and magnitude. Dataset Link
this paper presents an integrated model that uses machine learning techniques to perform text-to-text, image-to-text, and audio-to-text conversions, with particularly focus on indian languages. the proposed model whic...
详细信息
Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images, requiring all training data for all objects to be available from the start. In dynamic environment...
详细信息
ISBN:
(纸本)9798400710759
Single-image 3D reconstruction is a research challenge focused on predicting 3D object shapes from single-view images, requiring all training data for all objects to be available from the start. In dynamic environments, it's impractical to gather data for all objects at once;data becomes available in phases with restrictions on past data access. therefore, the model must reconstruct new objects while retaining the ability to reconstruct previous objects without accessing prior data. Additionally, existing 3D reconstruction methods in continual learning fail to reproduce previous shapes accurately, as they are not designed to manage changing shape information in dynamic scenes. To this end, we propose a continual learning-based 3D reconstruction method. Our goal is to design a model that can accurately reconstruct previously seen classes even after training on new ones, ensuring faithful reconstruction of both current and previous objects. To achieve this, we propose using variational distribution from the latent space, which represent abstract shapes and effectively retain shape information within a simplified code structure that requires minimal memory. Additionally, saliency maps preserve object attributes, capturing both local minor shape details and the overall shape structure. We employ experience replay to leverage these saliency maps effectively. Both methods ensure that the shape is faithfully reconstructed, preserving all minor details from the previous dataset. this is vital due to resource constraints in storing extensive training data. thorough experiments show competitive results compared to established methods, both quantitatively and qualitatively.
the field of artificial intelligence (AI) holds a variety of algorithms designed withthe goal of achieving high accuracy at low computational cost and latency. One popular algorithm is the vision transformer (ViT), w...
详细信息
ISBN:
(纸本)9798350383638;9798350383645
the field of artificial intelligence (AI) holds a variety of algorithms designed withthe goal of achieving high accuracy at low computational cost and latency. One popular algorithm is the vision transformer (ViT), which excels at various computervision tasks for its ability to capture long-range dependencies effectively. this paper analyzes a computing paradigm, namely, spatial transformer networks (STN), in terms of accuracy and hardware complexity for image classification tasks. the paper reveals that for 2D applications, such as image recognition and classification, STN is a great backbone for AI algorithms for its efficiency and fast inference time. this framework offers a promising solution for efficient and accurate AI for resource-constrained Internet of things (IoT) and edge devices. the comparative analysis of STN implementations on the central processing unit (CPU), Raspberry Pi (RPi), and Resistive Random Access Memory (RRAM) architectures reveals nuanced performance variations, providing valuable insights into their respective computational efficiency and energy utilization.
To build a smarter and safer city, a secure, efficient, and sustainable transportation system is a key requirement. the autonomous driving system (ADS) plays an important role in the development of smart transportatio...
详细信息
暂无评论