Semantic segmentation and image parsing have rapidly become an eminent research area in computer vision and machine learning domain. Many applications have required a robust mechanism for segmentation, such as self-dr...
详细信息
Semantic segmentation and image parsing have rapidly become an eminent research area in computer vision and machine learning domain. Many applications have required a robust mechanism for segmentation, such as self-driving, augmentative reality, object recognition, etc. Due to the high applicability in the various domains, In this paper, we have introduced a two-step frame-work that parses the image into predefined labels by using a novel CNN architecture and improving the likelihood of labels. In step-1, nine-layer CNN architecture has been introduced, which trains on minimal training samples and results in the pixel-wise Soft-Max probabilities. These probabilities are the soft estimates derived from a hard classifier, i.e., MLP. Data in step-1 has been prepared in the form of a patch-label set. In step-2, we have introduced a Jacobian optimization-based label relaxation method that fuses the local extrema as an edge prior. The proposed frame-work has been denoted as CNN-EFF in this work. The CNN-EFF scheme has been evaluated two publicly available benchmark data-sets, which has arranged in the form of image and their pixel label ground-truth. The experimental results have been compared with the previously proposed state-of-the-art methods. The CNN-EFF has greatly improved semantic labeling accuracy up to a significant gain from the past techniques. The CNN-EFF process has reported 84.42%, 85.91%, 94.66%, 97.14%, and 98.27% accuracy for the Highway, House, sheep, Horse-rider, and Horse-keeper images, respectively. Conclusively, the Proposed frame-work has out-performed the previously proposed state-of-the-art methods.
In recent years, multi-person pose estimation has emerged as a prominent research direction in the field of computer vision, holding significant importance in applications such as human-computer interaction, action an...
详细信息
ISBN:
(数字)9798350355413
ISBN:
(纸本)9798350355420
In recent years, multi-person pose estimation has emerged as a prominent research direction in the field of computer vision, holding significant importance in applications such as human-computer interaction, action analysis, and virtual reality. However, traditional methods are often complex and inefficient, particularly during the feature fusion process, which can lead to the loss of critical information and increased errors under occlusion and complex poses. To address this, multiple attention modules were introduced in the early stages of the network to enhance the modeling of dependencies between key points, ultimately overcoming the limitations of conventional heatmaps through a coordinate classification approach. The design utilizing multiple attention modules achieves a balance between maintaining a lightweight structure and improving accuracy. Furthermore, by introducing a multi-attention mechanism, information loss is reduced, thereby enhancing the model's robustness in handling occlusion and other challenges in complex scenarios. Compared to existing advanced methods, the approach presented in this paper achieves an average precision increase of 1.5 percentage points on the COCO dataset.
The daily detection of highspeed electric multiple units (EMU) body is very important for China railway maintenance system. This paper proposes a new method based on machinevision to detect bolts and switches on EMU ...
详细信息
Offline Handwritten Text Recognition (HTR) has been an active area of research due to its wide range of applications and challenges. Recently, many offline HTR techniques have been developed. However, most of the exis...
详细信息
Offline Handwritten Text Recognition (HTR) has been an active area of research due to its wide range of applications and challenges. Recently, many offline HTR techniques have been developed. However, most of the existing techniques were trained on the datasets that contain the handwritten text images on plain pages. Nevertheless, in real life, the handwritten text can be written on either plain pages or ruled-line pages. Therefore, the approaches proposed in recent literature are unable to convert the digital text accurately written on ruled-line pages. Hence, this study proposes a tailor-made end-to-end offline HTR technique that can accurately convert the offline handwritten text written on ruled-line pages into digital text with the help of computer vision and deep neural network-based techniques. To Evaluate the performance of our proposed technique, we developed a relatively complex dataset that contains the hand-written text images on the ruled-line pages. Our experimental results show that our proposed technique is capable of converting the hand-written text on ruled-line pages into digital text with an overall accuracy of 76.7%. Moreover, the experimental results show that our proposed technique obtained 20% more accurate results compared to baseline techniques. We believe that our proposed technique will contribute positively in the body of knowledge in the field of offline HTR. Moreover, the modular design of our proposed technique allows tailored modifications with respect to data while eliminating the need to retrain the neural network-based models.
Hand Recognition and Gesture Control For Dino Game Using Computer vision to control the popular Chrome Dino game using hand recognition and gesture control through computer vision techniques. The system leverages real...
详细信息
ISBN:
(数字)9798331515683
ISBN:
(纸本)9798331515690
Hand Recognition and Gesture Control For Dino Game Using Computer vision to control the popular Chrome Dino game using hand recognition and gesture control through computer vision techniques. The system leverages real-time imageprocessing and machine learning algorithms to detect and interpret hand gestures, allowing for an intuitive and interactive gaming experience. The proposed method utilizes a webcam to capture live video feed, from which hand landmarks are extracted using a pre-trained neural network model. Various hand gestures, such as swipe and hold, are then mapped to corresponding in-game actions such as jumping and ducking. This gesture-based control mechanism not only enhances user engagement but also demonstrates the potential of computer vision in creating touchless interfaces for gaming applications.
Peripheral vision is a vital component of human visual processing that allows for efficient and accurate recognition of visual features across diverse regions of the visual field. Analogously, endoscopic images often ...
详细信息
Peripheral vision is a vital component of human visual processing that allows for efficient and accurate recognition of visual features across diverse regions of the visual field. Analogously, endoscopic images often exhibit peripheral regions of blur, due to their inherent imaging properties. Previous strategies employing either coarse-grained global attention or fine-grained local attention to enhance performance have often inadvertently compromised the intrinsic self-attention mechanism of multilayer transformers, leading to less optimal solutions. This research introduces Self-Peripheral-Attention (SPA), an innovative mechanism that incorporates peripheral vision modeling into self-attention, so as to enhance the accuracy and efficiency of classification and segmentation tasks in endoscopic imaging. SPA synthesizes fine-grained central and coarsegrained peripheral interactions and possesses three primary characteristics: (i) peripheral contextualization aggregation;(ii) interaction between coarse-grained peripheral and fine-grained central features facilitated by depthwise dilated convolution;(iii) element-wise affine transformation to integrate attention into the value. The effectiveness and generalizability of the proposed SPA -Net were assessed on XJUEE, XJUEESEG, Kvasir and Kvasir-SEG endoscopy datasets. The results underscore the potential of peripheral vision modeling in self-attention for augmenting machine perception models. The associated code can be accessed at https://***/huoxiangzuo/SPA.
Without agriculture, human existence would be inconceivable. A large percentage of the world's population relies on agriculture for their daily needs. In addition, it creates a big number of jobs in the area. Usin...
详细信息
In this paper, we address the problem of smoke plume segmentation from background clutter. Smoke plumes can be generated from fires, explosions, etc. In the mining industry, plumes from blasts need to be characterized...
详细信息
Recent years have seen a rapid development in machine Learning, which has profoundly influenced many areas of science and engineering. Among them, computer vision takes the leading place, where important tasks are ima...
详细信息
ISBN:
(数字)9798331542726
ISBN:
(纸本)9798331542733
Recent years have seen a rapid development in machine Learning, which has profoundly influenced many areas of science and engineering. Among them, computer vision takes the leading place, where important tasks are image classifications powered by CNNs. Despite the great performance of CNNs in complicated scenarios, they remain sensitive to so-called adversarial attacks, and deliberate perturbations leading them to incorrect predictions. Besides more innocuous consequences, this has serious security implications for critical applications, in-cluding medical diagnostics, where misclassifications might result in disastrous outcomes. This research work discusses adversarial attacks on CNNs and other DNNs in computer vision, studying a full range of the generation and detection methods with details while discussing intrinsic vulnerability and robustness. It also proposes a learning framework that will enhance the robustness and security of DNNs and CNNs against such adversarial perils. The ultimate goal is directed to an improvement in the reliability of such models in absolutely critical scenarios for safe deployment into applications where accuracy is crucial.
In modern agriculture, crop growth monitoring is a crucial component, as it offers intuitive information about the health and growth of the plant, assisting farmers and other agricultural specialists. This systematic ...
详细信息
ISBN:
(纸本)9798350385939;9798350385922
In modern agriculture, crop growth monitoring is a crucial component, as it offers intuitive information about the health and growth of the plant, assisting farmers and other agricultural specialists. This systematic growth monitoring is necessary for crop health and agricultural productivity. We preferred the YOLOv8, which utilizes machine learning and offers efficient plant analysis in agriculture. This preferred method predicts bounding boxes and the probability of each possible class, allowing it to achieve exceptional detection speed without trading off accuracy. Pre-processing was done on the created, "Okra-dataset" to standardize the image to a fixed resolution and enhance our dataset's strength. We tested our work using different models: YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. Our test revealed that YOLOv8x achieved the highest mean average precision (mAP) of 82.9%. The implementation of research indicates that YOLOv8x is a good tool for agricultural applications, which can be very helpful to farmers.
暂无评论