To create a clean living environment, governments around the world have hired a large number of workers to clean up waste on pavements, which is inefficient for waste management. To better alleviate this problem, rele...
详细信息
To create a clean living environment, governments around the world have hired a large number of workers to clean up waste on pavements, which is inefficient for waste management. To better alleviate this problem, relevant scholars have proposed several deep learning methods based on RGB images to achieve waste detection and recognition. Considering the limitations of color images, we propose an efficient multi -modal learning solution for pavement waste detection and recognition. Specifically, we construct a high-quality outdoor pavement waste dataset called OPWaste, which is more in line with real needs. Compared to other waste datasets, OPWaste dataset not only has the advantages of rich background and high diversity, but also provides color anddepth images. Meanwhile, we explore six different multi -modal fusion methods and propose a novel multi -modal multiscale network (MM -net) for RGB-d waste detection and recognition. MM -net introduces a novel multi-scale refinement module (MRM) and multi-scale interaction module (MIM). MRM can effectively refine critical features using attention mechanisms. MIM can gradually realize information interaction between hierarchical features. In addition, we select several representative methods and perform comparative experiments. Experimental results show that MM -net based on the image addition fusion method outperforms other deep learning models and reaches 97.3% and 84.4% on mAP0.5 and AR metrics. In fact, multi -modal learning plays an important role in intelligent waste recycling. As a promising auxiliary tool, our solution can be applied to intelligent cleaning robots for automatic outdoor waste management.
Recognizing three-dimensional (3dimensonal) objects is crucial for several uses for computervision like service robots, self-driving cars and surveillance drones to navigate effectively in complex environments. Howe...
详细信息
Three dimensional (3d) object recognition is becoming a key desired capability for many computervision systems such as autonomous vehicles, service robots and surveillance drones to operate more effectively in unstru...
详细信息
ISBN:
(纸本)9780738142661
Three dimensional (3d) object recognition is becoming a key desired capability for many computervision systems such as autonomous vehicles, service robots and surveillance drones to operate more effectively in unstructured environments. These real-time systems require effective classification methods that are robust to various sampling resolutions, noisy measurements, and unconstrained pose configurations. Previous research has shown that points' sparsity, rotation and positional inherent variance can lead to a significant drop in the performance of point cloud based classification techniques. However, neither of them is sufficiently robust to multifactorial variance and significant sparsity. In this regard, we propose a novel approach for 3d classification that can simultaneously achieve invariance towards rotation, positional shift, scaling, and is robust to point sparsity. To this end, we introduce a new feature that utilizes graph structure of point clouds, which can be learned end-to-end with our proposed neural network to acquire a robust latent representation of the 3d object. We show that such latent representations can significantly improve the performance of object classification and retrieval tasks when points are sparse. Further, we show that our approach outperforms Pointnet and 3dmFV by 35.0% and 28.1% respectively in Modelnet 40 classification tasks using sparse point clouds of only 16 points under arbitrary SO(3) rotation.
Recognizing three-dimensional (3dimensonal) objects is crucial for several uses for computervision like service robots, self-driving cars and surveillance drones to navigate effectively in complex environments. Howe...
详细信息
ISBN:
(数字)9798331523893
ISBN:
(纸本)9798331523909
Recognizing three-dimensional (3 dimensonal) objects is crucial for several uses for computervision like service robots, self-driving cars and surveillance drones to navigate effectively in complex environments. However, existing classification techniques struggle with challenges such as varying resolutions, noisy data, anddiverse object poses. Previous studies have highlighted the limitations of point cloud-basedmethods in handling sparsity, rotation, and positional variance. In this study, we concurrently address these difficulties with a unique technique to 3d object categorization. Our method leverages the graph structure of point clouds and employs a to develop a strong latent representation of 3d objects using a neural network. This representation achieves invariance to rotation, positional shift, and scaling while remaining resilient to point sparsity. Our technique outperforms existing approaches, as evidenced by experimental findings on the Modelnet40 dataset, which show a 45.0% and 38.1% gain in classification accuracy when employing sparse point clouds, respectively, over existing models.
Targets detection and segmentation in a synthetic aperture radar (SAR) image is a vital step for its interpretation. It is quite challenging for most conventional methodsdue to complex background and the speckle. Fur...
详细信息
ISBN:
(数字)9781728163741
ISBN:
(纸本)9781728163758
Targets detection and segmentation in a synthetic aperture radar (SAR) image is a vital step for its interpretation. It is quite challenging for most conventional methodsdue to complex background and the speckle. Furthermore, the sizes of targets in a scene are variable. Inspired by the success of neural networks in computervision, In this paper, we propose a 3ddilated multi-scale U-shape convolutional neural network (3ddM-Unet). In the proposed method, we first build a 3d image block via a multiscale stationary wavelet transform to exploit the structural information of targets with various sizes. Then, the built 3d image block is fed into a 3ddilated multiscale U-net. To train the proposednetwork, we build a dataset from a scene of SAR image with various sizes and shapes of ship targets. Finally, the trainednetwork is employed to the testing set to obtain the segmentation results. Experimental results on test images show that the proposed method achieved better performance than conventional methods.
Binocular depth estimation is a hot research topic in computervision. Traditional methods need high precision camera calibration and key point matching, but the results are not ideal. In this paper, we introduce an a...
详细信息
Automatic age estimation relying on human facial images is a key technology of many real-world applications, which is still a challenging task in the computervision field. There are three cascade modules for facial a...
详细信息
ISBN:
(纸本)9781467369541
Automatic age estimation relying on human facial images is a key technology of many real-world applications, which is still a challenging task in the computervision field. There are three cascade modules for facial age estimation: facial aging feature extraction, dimension reduction (or feature selection) and estimation method. Many existing literatures focus on the first or last module while for an age estimation system, it's also important to construct a reasonable framework. Our work focuses on creating an effective framework by selecting methods for these modules reasonably. Firstly, a BIM (bio-inspired model) is employed to extract facial aging features because it can not only capture discriminative local and global features, but also overcome interferences of some 2ddeformations to some extent. Then, LdA (linear discriminant analysis) is used for reducing the BIF (bio-inspired features) to lower dimensions and extracting more discriminative information at the same time. Finally, CS-OHRank (cost-sensitive ordinal hyperplane rank), which tackles with sparse data well and reflects the cumulative attributes of aging, is applied as the estimation method. Experimental results on benchmark dataset FG-net show that our framework combining BIF, LdA and CS-OHRank is competitive among the state of the art, with MAE (mean absolute error) = 4.72 years..
This conference proceedings contains 50 papers. The topics discussed are wavelets, multiresolution, and Gabor techniques;neuralnetmethods in robotics andcomputervision;neuromorphology of biologicalvision as a bas...
详细信息
ISBN:
(纸本)0819410276
This conference proceedings contains 50 papers. The topics discussed are wavelets, multiresolution, and Gabor techniques;neuralnetmethods in robotics andcomputervision;neuromorphology of biologicalvision as a basis for machine vision;fuzzy reasoning in pattern recognition;predictive 3-dvision;and 3-dvisionmethods.
The neural architecture, neurophysiology and behavioral abilities of insect vision are described, and compared with that of mammals. Insects have a hardwiredneural architecture of highly differentiated neurons, quite...
详细信息
ISBN:
(纸本)0819410276
The neural architecture, neurophysiology and behavioral abilities of insect vision are described, and compared with that of mammals. Insects have a hardwiredneural architecture of highly differentiated neurons, quite different from the cerebral cortex, yet their behavioral abilities are in important respects similar to those of mammals. These observations challenge the view that the key to the power of biologicalneural computation is distributed processing by a plastic, highly interconnected, network of individually undifferentiated and unreliable neurons that has been a dominant picture of biological computation since Pitts and McCulloch's seminal work in the 1940's.
In this paper, we have discussed the improved technique of surface interpolation, clipping, and hidden surface elimination to solve some problems in generating perspective views of digital terrain model (dTM) data bas...
详细信息
ISBN:
(纸本)0819410276
In this paper, we have discussed the improved technique of surface interpolation, clipping, and hidden surface elimination to solve some problems in generating perspective views of digital terrain model (dTM) data based on matching of stereo image pair.
暂无评论