Object recognition is a challenging computer vision application that finds wide use in various fields such as autonomous cars, robotics, security tracking and guiding visually impaired individuals. People with visual ...
详细信息
1. Animal phenotypic traits are utilised in a variety of studies. Often the traits are measured from images. The processing of a large number of images can be challenging;nevertheless, image analytical applications, b...
详细信息
1. Animal phenotypic traits are utilised in a variety of studies. Often the traits are measured from images. The processing of a large number of images can be challenging;nevertheless, image analytical applications, based on neural networks, can be an effective tool in automatic trait collection.2. Our aim was to develop a stand-alone application to effectively segment an arthropod from an image and to recognise individual body parts: namely, head, thorax (or prosoma), abdomen and four pairs of appendages. It is based on convolutional neural network with U-Net architecture trained on more than a thousand images showing dorsal views of arthropods (mainly of wingless insects and spiders). The segmentation model gave very good results, with the automatically generated segmentation masks usually requiring only slight manual adjustments.3. The application, named MAPHIS, can further (1) organise and preprocess the images;(2) adjust segmentation masks using a simple graphical editor;and (3) calculate various size, shape, colouration and pattern measures for each body part organised in a hierarchical manner. In addition, a special plug-in function can align body profiles of selected individuals to match a median profile and enable comparison among groups. The usability of the application is shown in three practical examples.4. The application can be used in a variety of fields where measures of phenotypic diversity are required, such as taxonomy, ecology and evolution (e.g. mimetic similarity). Currently, the application is limited to arthropods, but it can be easily extended to other animal taxa.
Application of machine learning techniques on fiber speckle images to infer fiber deformation allows the use of an unmodified multimode fiber to act as a shape sensor. This approach eliminates the need for complex fib...
详细信息
Application of machine learning techniques on fiber speckle images to infer fiber deformation allows the use of an unmodified multimode fiber to act as a shape sensor. This approach eliminates the need for complex fiber design or construction (e.g., Bragg gratings and time-of-flight). Prior work in shape determination using neural networks trained on a finite number of possible fiber shapes (formulated as a classification task), or trained on a few continuous degrees of freedom, has been limited to reconstruction of fiber shapes only one bend at a time. Furthermore, generalization to shapes that were not used in training is challenging. Our innovative approach improves generalization capabilities, using computer vision-assisted parameterization of the actual fiber shape to provide a ground truth, and multiple specklegrams per fiber shape obtained by controlling the input field. Results from experimenting with several neural network architectures, shape parameterization, number of inputs, and specklegram resolution show that fiber shapes with multiple bends can be accurately predicted. Our approach is able to generalize to new shapes that were not in the training set. This approach of end-to-end training on parameterized ground truth opens new avenues for fiber-optic sensor applications. We publish the datasets used for training and validation, as well as an out-of-distribution (OOD) test set, and encourage interested readers to access these datasets for their own model development.
There is amazing progress in deep learning-based models for image captioning and low-light image enhancement. For the first time in literature, this paper develops a deep learning model that translates night scenes to...
详细信息
The problem of cheating in handwritten academic essays has become more significant over the past few years. One type of cheating involves submitting the same paper, photographed in a different environment (for example...
详细信息
The problem of cheating in handwritten academic essays has become more significant over the past few years. One type of cheating involves submitting the same paper, photographed in a different environment (for example, from another angle, in a different light, or in lower quality) or changed by automatic augmentation. The existing methods for detecting near-duplicates are not designed to work on large collections of handwritten documents, which significantly limits their use in practice. A machine learning-based method is presented that enables the detection of near-duplicate handwritten text images among large collections of potential sources. The proposed approach consists of three stages: converting the image into a vector representation, searching for candidates, and then selecting the source of duplication among the candidates. Our method achieved 80% and 59% recall-at-1 with false positive rate of 4.8% and 5.5% on Synthetic and Real data, respectively. The search latency is 5.5 seconds per query for a collection of 10 000 images. The results showed that the developed method is sufficiently robust to solve problems that require checking large collections of handwritten documents for cheating.
We present Depth-Informed Crop Segmentation(DepthCropSeg),an almost unsupervised crop segmentation approach without manual pixel-level *** segmentation is a fundamental vision task in agriculture,which benefits a numb...
详细信息
We present Depth-Informed Crop Segmentation(DepthCropSeg),an almost unsupervised crop segmentation approach without manual pixel-level *** segmentation is a fundamental vision task in agriculture,which benefits a number of downstream applications such as crop growth monitoring and yield *** the past decade,image-based crop segmentation approaches have shifted from classic color-based paradigms to recent deep learning-based *** latter,however,rely heavily on large amounts of data with high-quality manual annotation such that considerable human labor and time are *** this work,we leverage Depth Anything v2,a vision foundation model,to produce high-quality pseudo crop masks for training segmentation *** compile a dataset of 17,199 images from six public plant segmentation sources,generating pseudo masks from depth maps after normalization and *** a coarse-to-fine manual screening,1378 images with reliable masks are *** compare four semantic segmentation models and enhance the top-performing one with depth-informed two-stage self-training and depth-informed *** evaluate the feasibility and robustness of DepthCropSeg,we benchmark the segmentation performance on 10 public crop segmentation testing sets and a self-collect dataset covering in-field,laboratory,and unmanned aerial vehicle(UAv)*** results show that our DepthCropSeg approach can achieve crop segmentation performance comparable to the fully supervised model trained with manually annotated data(86.91 vs.87.10).For the first time,we demonstrate almost unsupervised,close-to-full-supervision crop segmentation successfully.
We present LInKs, a novel unsupervised learning method to recover 3D human poses from 2D kinematic skeletons obtained from a single image, even when occlusions are present. Our approach follows a unique two-step proce...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
We present LInKs, a novel unsupervised learning method to recover 3D human poses from 2D kinematic skeletons obtained from a single image, even when occlusions are present. Our approach follows a unique two-step process, which involves first lifting the occluded 2D pose to the 3D domain, followed by filling in the occluded parts using the partially reconstructed 3D coordinates. This lift-then-fill approach leads to significantly more accurate results compared to models that complete the pose in 2D space alone. Additionally, we improve the stability and likelihood estimation of normalising flows through a custom sampling function replacing PCA dimensionality reduction used in prior work. Furthermore, we are the first to investigate if different parts of the 2D kinematic skeleton can be lifted independently which we find by itself reduces the error of current lifting approaches. We attribute this to the reduction of long-range keypoint correlations. In our detailed evaluation, we quantify the error under various realistic occlusion scenarios, showcasing the versatility and applicability of our model. Our results consistently demonstrate the superiority of handling all types of occlusions in 3D space when compared to others that complete the pose in 2D space. Our approach also exhibits consistent accuracy in scenarios without occlusion, as evidenced by a 7.9% reduction in reconstruction error compared to prior works on the Human3.6M dataset. Furthermore, our method excels in accurately retrieving complete 3D poses even in the presence of occlusions, making it highly applicable in situations where complete 2D pose information is unavailable.
It is challenging to find a solution for lane detection. It has aroused the curiosity of the computer vision field for many years. It has been found that computer vision and machine learning algorithms struggle to tac...
详细信息
ISBN:
(数字)9789819738106
ISBN:
(纸本)9789819738090
It is challenging to find a solution for lane detection. It has aroused the curiosity of the computer vision field for many years. It has been found that computer vision and machine learning algorithms struggle to tackle the multi-feature identification problem known as lane detection. Even though there are a few different machine learning approaches that may be used for lane identification, these approaches are often employed for classification rather than feature development. On the other hand, contemporary techniques of machine learning may be used to discover features that have a high recognition value, and they have shown success in feature identification tests. These strategies haven’t been applied correctly, which compromises their efficiency and accuracy when it comes to lane recognition. In this study, we provide a fresh approach to solving the problem. A brand-new preprocessing and Region of Interest (ROI) selection method is presented in this article. The major objective is to extract white features by making use of the HSv color transformation, adding preliminary edge feature detection while doing preprocessing, and then selecting ROI based on the preprocessing that was proposed. With the help of this cutting-edge preprocessing strategy, the lane may be found. The integrated autonomous vehicle that we envision is one that is controlled by a Robotic Operating System and that is capable of making intelligent driving choices. The unique filtering and noise reduction techniques that were used on the visual feedback by means of the processing unit served as the basis for the digital image-processing algorithm that was responsible for the greatest performance achieved by the autonomous vehicle. Within the control system, we used two separate control units, one of which was a master and the other of which was a slave. The master control unit is in charge of the visual processing and filtering, while the slave control unit is in charge of the vehicle’s propulsio
The development of deep learning (DL) models has dramatically improved marker-free human pose estimation, including an important task of hand tracking. However, for applications in real-time critical and embedded syst...
详细信息
ISBN:
(纸本)9783031723582;9783031723599
The development of deep learning (DL) models has dramatically improved marker-free human pose estimation, including an important task of hand tracking. However, for applications in real-time critical and embedded systems, e.g. in robotics or augmented reality, hand tracking based on standard frame-based cameras is too slow and/or power hungry. The latency is limited by the frame rate of the image sensor already, and any subsequent DL processing further increases the latency gap, while requiring substantial power for processing. Dynamic vision sensors, on the other hand, enable sub-millisecond time resolution and output sparse signals that can be processed with an efficient Sigma Delta Neural Network (SDNN) model that preserves the sparsity advantage in the neural network. This paper presents the training and evaluation of a small SDNN for hand detection, based on event data from the DHP19 dataset deployed on Intel's Loihi 2 neuromorphic development board. We found it possible to deploy a hand detection model in neuromorphic hardware backend without a notable performance difference to the original GPU implementation, at an estimated mean dynamic power consumption for the network running on the chip of approximate to 7 mW.
Lung diseases are one of the most common diseases around the world. The risk of these diseases are more in under-developed and developing countries, where millions of people are battling with poverty and living in pol...
详细信息
Lung diseases are one of the most common diseases around the world. The risk of these diseases are more in under-developed and developing countries, where millions of people are battling with poverty and living in polluted air. Chest X-Ray images are helpful screening tool for lung disease detection. However, disease diagnosis requires expert medical professionals. Furthermore, in developing and under-developed nations, the doctor-to-patient ratio is comparatively poor. Deep learning algorithms have recently demonstrated promise in the analysis of medical images and the discovery of patterns. In this current work, we have proposed a model MLDC (Multi-Lung Disease Classification) to detect common lung diseases. It introduces a MLDC feature extraction model with two different new classifiers, considering ANN (an artificial neural network) and QC (a quantum classifier). In this proposed model, tests are performed on the LDD (Lung Disease Dataset), which includes COvID-19, pneumonia, tuberculosis, and a healthy person's lung from chest X-ray images. Our proposed model achieves an accuracy of 95.6% for MLDC-ANN and 97.5% for MLDC-QC at a lower computational cost.
暂无评论