The recent state of the art on monocular 3D face reconstruction from image data has made some impressive advancements, thanks to the advent of Deep Learning. However, it has mostly focused on input coming from a singl...
The recent state of the art on monocular 3D face reconstruction from image data has made some impressive advancements, thanks to the advent of Deep Learning. However, it has mostly focused on input coming from a single RGB image, overlooking the following important factors: a) Nowadays, the vast majority of facial image data of interest do not originate from single images but rather from videos, which contain rich dynamic information. b) Furthermore, these videos typically capture individuals in some form of verbal communication (public talks, teleconferences, audiovisual human-computer interactions, interviews, monologues/dialogues in movies, etc). When existing 3D face reconstruction methods are applied in such videos, the artifacts in the reconstruction of the shape and motion of the mouth area are often severe, since they do not match well with the speech *** overcome the aforementioned limitations, we present the first method for visual speech-informed perceptual reconstruction of 3D mouth expressions. We do this by proposing a "lipreading" loss, which guides the fitting process so that the elicited perception from the 3D reconstructed talking head resembles that of the original video footage. We demonstrate that, interestingly, the lipreading loss is better suited for 3D reconstruction of mouth movements compared to traditional landmark losses, and even direct 3D supervision. Furthermore, the devised method does not rely on any text transcriptions or corresponding audio, rendering it ideal for training in unlabeled datasets. We verify the efficiency of our method through objective evaluations on three large-scale datasets, as well as subjective evaluation with two web-based user studies. Project webpage: https://***/spectre/
This research introduces a facial recognition-based attendance system that leverages an open-source computervision library and integrates with a real-time database system. The system comprises essential components, i...
详细信息
India has a large population of spinach eaters. Despite this fact most people and young generation have difficulty in distinguishing the spinach species because of the structure similarity of many plant species. So, a...
详细信息
LiDAR-camera fusion methods have shown impressive performance in 3D object detection. Recent advanced multi-modal methods mainly perform global fusion, where image features and point cloud features are fused across th...
LiDAR-camera fusion methods have shown impressive performance in 3D object detection. Recent advanced multi-modal methods mainly perform global fusion, where image features and point cloud features are fused across the whole scene. Such practice lacks fine-grained region-level information, yielding suboptimal fusion performance. In this paper, we present the novel Local-to-Global fusion network (LoGoNet), which performs LiDAR-camerafusion at both local and global levels. Concretely, the Global Fusion (GoF) of LoGoNet is built upon previous literature, while we exclusively use point centroids to more precisely represent the position of voxel features, thus achieving better crossmodal alignment. As to the Local Fusion (LoF), we first divide each proposal into uniform grids and then project these grid centers to the images. The image features around the projected grid points are sampled to be fused with position-decorated point cloud features, maximally uti-lizing the rich contextual information around the proposals. The Feature Dynamic Aggregation (FDA) module is further proposed to achieve information interaction between these locally and globally fused features, thus producing more informative multi-modal features. Extensive experiments on both Waymo Open Dataset (WOD) and KITTI datasets show that LoGoNet outperforms all state-of-the-art 3D detection methods. Notably, LoGoNet ranks 1st on Waymo 3D object detection leaderboard and obtains 81.02 mAPH (L2) detection performance. It is noteworthy that, for the first time, the detection performance on three classes surpasses 80 APH (L2) simultaneously. Code will be available at https://***/sankin97/LoGoNet.
3D interacting hand pose estimation from a single RGB image is a challenging task, due to serious self-occlusion and inter-occlusion towards hands, confusing similar appearance patterns between 2 hands, ill-posed join...
3D interacting hand pose estimation from a single RGB image is a challenging task, due to serious self-occlusion and inter-occlusion towards hands, confusing similar appearance patterns between 2 hands, ill-posed joint position mapping from 2D to 3D, etc.. To address these, we propose to extend A2J-the state-of-the-art depth-based 3D single hand pose estimation method-to RGB domain under interacting hand condition. Our key idea is to equip A2J with strong local-global aware ability to well capture interacting hands' local fine details and global articulated clues among joints jointly. To this end, A2J is evolved under Transformer's non-local encoding-decoding framework to build A2J- Transformer. It holds 3 main advantages over A2J. First, self-attention across local anchor points is built to make them global spatial context aware to better capture joints' articulation clues for resisting occlusion. Secondly, each anchor point is regarded as learnable query with adaptive feature learning for facilitating pattern fitting capacity, instead of having the same local representation with the others. Last but not least, anchor point locates in 3D space instead of 2D as in A2J, to leverage 3D pose prediction. Experiments on challenging InterHand 2.6M demonstrate that, A2J-Transformer can achieve state-of-the-art model-free performance (3.38mm MPJPE advancement in 2-hand case) and can also be applied to depth domain with strong generalization. The code is avaliable at https://***/ChanglongJiangGit/A2J-Transformer.
Many rural communities have a strong belief in plant diversity. They collect useful plants and herbs and use them with indigenous knowledge and customs. One such oldest system that results in the use of medicinal herb...
详细信息
Many rural communities have a strong belief in plant diversity. They collect useful plants and herbs and use them with indigenous knowledge and customs. One such oldest system that results in the use of medicinal herbs is Ayurveda. Approximately 10,000 plants are used medicinally in India, but not all plants are included in the official Ayurvedic Pharmacopoeia. Before becoming part of Ayurvedic medicine, all plants need to be thoroughly studied. For this reason, identifying herbs is the most important step. Many of these identifications are fully supported by human perception, leaving room for error and misjudgement. Therefore, it is necessary to develop an efficient system using computervision, patternrecognition, and imageprocessing algorithms alongside the availability of various combinations of feature detection methods with different classifiers that are often utilized in building an automatic identification system for herbal leaves using leaf images and reveal its associated information to realize knowledge.
The ability to identify specific cows is a critical aspect of keeping cattle. Registration is essential for the production, distribution, and breeding of cows. It is dangerous to brand cows in the ear in the tradition...
The ability to identify specific cows is a critical aspect of keeping cattle. Registration is essential for the production, distribution, and breeding of cows. It is dangerous to brand cows in the ear in the traditional manner. To solve this issue, we recommend using a biometric scanner for identification. Numerous studies have shown that the muzzle pattern can be used to distinguish animals and eliminate inconsistent patterns. This strategy might be effective in reducing spurious matching muzzle patterns. On the tip of their nose or muzzle, the majority of animals have a distinctive pattern. An animal's characteristic pattern is obvious to the unaided eye once it is born. Animals can be recognized by human muzzle patterns, just like with fingerprints. Systems for identifying animals are needed, as they would be useful for submitting loan and insurance applications. Based on a biometric muzzle scanner, the approach of machine learning, imageprocessing, and computervision has been assessed for identification purposes.
In recent years, artificial intelligence applications have been on the rise. Many enterprises have embraced digital transformation and have established new business models based on artificial intelligence and the Inte...
详细信息
ISBN:
(纸本)9798400709388
In recent years, artificial intelligence applications have been on the rise. Many enterprises have embraced digital transformation and have established new business models based on artificial intelligence and the Internet of Things, such as the telerehabilitation industry. The companies may utilize sensors or cameras to collect user data, and data mining is applied to discover insights for doctors’ aids. This paper establishes a novel two-stage data mining model combining gait recognition and sequential pattern mining. In the first stage, a particular computervision application, gait recognition, identifies possible diseases using the subject's walking postures. The gaits in a video can be converted to a temporal sequence according to user-defined events. For example, (normal gait, Parkinsonian gait, normal gait) is a temporal sequence in which the identified gaits are arranged by temporal orders in the sequence. In the second stage, after collecting a dataset of temporal sequences, the frequent patterns are discovered by sequential pattern mining. Our preliminary experiment collected 30 samples from the real world and demonstrated the model's feasibility.
The patterns and designs used in batik are a traditional textile art form from Indonesia. With the increasing popularity of computervision applications, the interest in automatic detection and recognition of batik pa...
The patterns and designs used in batik are a traditional textile art form from Indonesia. With the increasing popularity of computervision applications, the interest in automatic detection and recognition of batik patterns is increasing. Convolutional Neural Networks (CNNs) show very good performance in detecting objects. The problem addressed in this study is the variation in image sizes, which affects accuracy. The selection of pre-processing techniques will impact the accuracy of the results. For this reason, it is important in this study to compare tests on data that has been processed using resize and Region of Interest (ROI). This study aims to determine the impact of two pre-processing techniques, resize and ROI, on the accuracy of batik patternrecognition using the Visual Geometry Group (VGG)-16 model. The dataset consisted of 1,445 images, with 1,301 images used for training and 144 images used for testing. The classes used were Kawung, Parang, Satriomanah, Sawat, Sementrante, Sidomukti, Tambal, and Truntum. The experimental results demonstrate that the choice of pre-processing techniques significantly affects the accuracy of batik pattern detection. Resizing provides an efficient computational solution, while ROI achieves a detection accuracy of 0.96, which is superior to Resize's accuracy of 0.89. This study highlights the importance of preprocessing techniques in detecting batik patterns using the VGG-16 model. Resizing and ROI have advantages and disadvantages, ROI techniques generally result in higher accuracy.
Agriculture plays a major role in today’s world. 70% of rice production is cultivated in south India. Thailand, the US, India, Vietnam, etc., are exporting rice all over the world. But they are facing one of the most...
Agriculture plays a major role in today’s world. 70% of rice production is cultivated in south India. Thailand, the US, India, Vietnam, etc., are exporting rice all over the world. But they are facing one of the most difficult tasks: assuring the rice seed quality. Good seeds can increase production, so selecting seeds is very important. A maximum of the work was done physically or by using huge, expensive machines. For choosing good seeds, agronomists be conscious of two factors, such as the variety which is determined by some quality criteria, and next by the skilled one who is going to analyze the rice seed physically. Both tasks take a long time and the result of accuracy is also not as expected. Nowadays computervision technology with Digital imageprocessing plays a major role in many fields, namely agronomy, healthcare, machine vision, patternrecognition, remote sensing, Video processing, etc. Our research work mainly focuses on the agricultural arena, going to provide good quality seeds and identify dissimilar varieties of rice seeds by digital imageprocessing techniques such as image capturing, segmenting the rice seeds, extracting the features from the rice seeds and finally feeding into the machine learning algorithm to find out the accuracy too.
暂无评论