检索结果-内蒙古大学图书馆

Multi-modal deep learning networks for RGB-d pavement waste detection and recognition

WASTE MANAGEMENT 2024年 177卷 125-134页

作者： Li, Yangke Zhang, xinman Xi An Jiao Tong Univ Fac Elect & Informat Engn Sch Automat Sci & Engn Xian 710049 Shaanxi Peoples R China

To create a clean living environment, governments around the world have hired a large number of workers to clean up waste on pavements, which is inefficient for waste management. To better alleviate this problem, relevant scholars have proposed several deep learning methods based on RGB images to achieve waste detection and recognition. Considering the limitations of color images, we propose an efficient multi -modal learning solution for pavement waste detection and recognition. Specifically, we construct a high-quality outdoor pavement waste dataset called OPWaste, which is more in line with real needs. Compared to other waste datasets, OPWaste dataset not only has the advantages of rich background and high diversity, but also provides color and depth images. Meanwhile, we explore six different multi -modal fusion methods and propose a novel multi -modal multiscale network (MM -net) for RGB-d waste detection and recognition. MM -net introduces a novel multi-scale refinement module (MRM) and multi-scale interaction module (MIM). MRM can effectively refine critical features using attention mechanisms. MIM can gradually realize information interaction between hierarchical features. In addition, we select several representative methods and perform comparative experiments. Experimental results show that MM -net based on the image addition fusion method outperforms other deep learning models and reaches 97.3% and 84.4% on mAP0.5 and AR metrics. In fact, multi -modal learning plays an important role in intelligent waste recycling. As a promising auxiliary tool, our solution can be applied to intelligent cleaning robots for automatic outdoor waste management.

关键词： Waste detection Waste recognition deep neural network computer vision Multi -modal learning

来源：评论

学校读者我要写书评

暂无评论

Resilient 3d Object Recognition using GR-net in Sparse Point Clouds 3

Resilient 3D Object Recognition using GR-Net in Sparse Point...

引用

3rd International Conference on intelligent Systems, Advanced Computing, and Communication, ISACC 2025

作者： Ghadekar, Premanand dhame, Pratik dixit, Soham Patil, Arpit Sanjekar, Rushikesh Shinde, Siddhesh Vishwakarma Institute of Technology Department of Information Technology Maharashtra Pune India

ISBN: (纸本)9798331523893

Recognizing three-dimensional (3 dimensonal) objects is crucial for several uses for computer vision like service robots, self-driving cars and surveillance drones to navigate effectively in complex environments. However, existing classification techniques struggle with challenges such as varying resolutions, noisy data, and diverse object poses. Previous studies have highlighted the limitations of point cloud-based methods in handling sparsity, rotation, and positional variance. In this study, we concurrently address these difficulties with a unique technique to 3d object categorization. Our method leverages the graph structure of point clouds and employs a to develop a strong latent representation of 3d objects using a neural network. This representation achieves invariance to rotation, positional shift, and scaling while remaining resilient to point sparsity. Our technique outperforms existing approaches, as evidenced by experimental findings on the Modelnet40 dataset, which show a 45.0% and 38.1% gain in classification accuracy when employing sparse point clouds, respectively, over existing models. © 2025 IEEE.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

Triangle-net: Towards Robustness in Point Cloud Learning

Triangle-Net: Towards Robustness in Point Cloud Learning

引用

IEEE Winter Conference on Applications of computer vision (WACV)

作者： xiao, Chenxi Wachs, Juan Purdue Univ W Lafayette IN 47907 USA

ISBN: (纸本)9780738142661

Three dimensional (3d) object recognition is becoming a key desired capability for many computer vision systems such as autonomous vehicles, service robots and surveillance drones to operate more effectively in unstructured environments. These real-time systems require effective classification methods that are robust to various sampling resolutions, noisy measurements, and unconstrained pose configurations. Previous research has shown that points' sparsity, rotation and positional inherent variance can lead to a significant drop in the performance of point cloud based classification techniques. However, neither of them is sufficiently robust to multifactorial variance and significant sparsity. In this regard, we propose a novel approach for 3d classification that can simultaneously achieve invariance towards rotation, positional shift, scaling, and is robust to point sparsity. To this end, we introduce a new feature that utilizes graph structure of point clouds, which can be learned end-to-end with our proposed neural network to acquire a robust latent representation of the 3d object. We show that such latent representations can significantly improve the performance of object classification and retrieval tasks when points are sparse. Further, we show that our approach outperforms Pointnet and 3dmFV by 35.0% and 28.1% respectively in Modelnet 40 classification tasks using sparse point clouds of only 16 points under arbitrary SO(3) rotation.

关键词： computer vision Three-dimensional displays Service robots Surveillance neural networks Feature extraction Robustness

来源：评论

学校读者我要写书评

暂无评论

Resilient 3d Object Recognition using GR-net in Sparse Point Clouds

Resilient 3D Object Recognition using GR-Net in Sparse Point...

引用

International Symposium on Advanced Computing and Communication (ISACC)

作者： Premanand Ghadekar Pratik dhame Soham dixit Arpit Patil Rushikesh Sanjekar Siddhesh Shinde Department of Information Technology Vishwakarma Institute of Technology Pune Maharashtra India

ISBN: (数字)9798331523893

ISBN: (纸本)9798331523909

关键词： Point cloud compression Training Technological innovation Solid modeling Three-dimensional displays Service robots Surveillance Feature extraction Robustness Object recognition

来源：评论

学校读者我要写书评

暂无评论

Ship Segmentation on High-Resolution Sar Image by a 3d dilated Multiscale U-net

Ship Segmentation on High-Resolution Sar Image by a 3D Dilat...

引用

IEEE International Symposium on Geoscience and Remote Sensing (IGARSS)

作者： Jichao Li Chubing Guo Shuiping Gou Yuanbo Chen Miao Wang Jia-Wei Chen Key Laboratory of Intelligent Perception and Image Understanding of Ministry of Education School of Artificial Intelligence Xidian University Xi'an China CETC Key laboratory of Data Link Technology Beijing China Beijing Huahang Radio Measurement and Research Institute Beijing China

ISBN: (数字)9781728163741

ISBN: (纸本)9781728163758

Targets detection and segmentation in a synthetic aperture radar (SAR) image is a vital step for its interpretation. It is quite challenging for most conventional methods due to complex background and the speckle. Furthermore, the sizes of targets in a scene are variable. Inspired by the success of neural networks in computer vision, In this paper, we propose a 3d dilated multi-scale U-shape convolutional neural network (3ddM-Unet). In the proposed method, we first build a 3d image block via a multiscale stationary wavelet transform to exploit the structural information of targets with various sizes. Then, the built 3d image block is fed into a 3d dilated multiscale U-net. To train the proposed network, we build a dataset from a scene of SAR image with various sizes and shapes of ship targets. Finally, the trained network is employed to the testing set to obtain the segmentation results. Experimental results on test images show that the proposed method achieved better performance than conventional methods.

关键词： Radar polarimetry Three-dimensional displays Image segmentation Kernel Synthetic aperture radar Marine vehicles Object detection

来源：评论

学校读者我要写书评

暂无评论

Binocular depth estimation using convolutional neural network with siamese branches

Binocular depth estimation using convolutional neural networ...

引用

2019 IEEE International Conference on Robotics and Biomimetics, ROBIO 2019

作者： Liu, Guodong Jiang, Guolai xiong, Rong Ou, Yongsheng Chinese Academy of Sciences Shenzhen Institutes of Advanced Technology Shenzhen518055 China University of Chinese Academy of Sciences Beijing100049 China Guangdong Provincial Key Lab of Robotics and Intelligent System Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen518055 China CAS Key Laboratory of Human-Machine Intelligence-Synergic Systems Shenzhen Institutes of Advanced Technology Shenzhen518055 China

ISBN: (纸本)9781728163215

Binocular depth estimation is a hot research topic in computer vision. Traditional methods need high precision camera calibration and key point matching, but the results are not ideal. In this paper, we introduce an approach of binocular depth estimation method based on deep learning. A new convolutional neural network is designed, which consists of two sub-networks. The first sub-network is a deep network with Siamese branches and 3d convolutional layer, it learns parallax and global information and generates a global depth estimation result in low resolution. The second is a fully convolutional deep network, which reconstructions the depth map to original resolution. The two sub-networks are connected by a pool pyramid. Experiments are taken on the Middlebury Stereo dataset show that the proposed method can generate much more accurate depth image than traditional methods. © 2019 IEEE.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

Facial age estimation using bio-inspired features and cost-sensitive ordinal hyperplane rank

Facial age estimation using bio-inspired features and cost-s...

引用

IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS)

作者： xiaohu Sun Pingping Wu Hong Liu School of Information Science and Technology Northwest University Xi'an China Engineering Lab on Intelligent Perception for Internet of Things (ELIP) Peking University Beijing China

ISBN: (纸本)9781467369541

Automatic age estimation relying on human facial images is a key technology of many real-world applications, which is still a challenging task in the computer vision field. There are three cascade modules for facial age estimation: facial aging feature extraction, dimension reduction (or feature selection) and estimation method. Many existing literatures focus on the first or last module while for an age estimation system, it's also important to construct a reasonable framework. Our work focuses on creating an effective framework by selecting methods for these modules reasonably. Firstly, a BIM (bio-inspired model) is employed to extract facial aging features because it can not only capture discriminative local and global features, but also overcome interferences of some 2d deformations to some extent. Then, LdA (linear discriminant analysis) is used for reducing the BIF (bio-inspired features) to lower dimensions and extracting more discriminative information at the same time. Finally, CS-OHRank (cost-sensitive ordinal hyperplane rank), which tackles with sparse data well and reflects the cumulative attributes of aging, is applied as the estimation method. Experimental results on benchmark dataset FG-net show that our framework combining BIF, LdA and CS-OHRank is competitive among the state of the art, with MAE (mean absolute error) = 4.72 years..

关键词： Active appearance model Principal component analysis biological system modeling Analytical models Support vector machines Probabilistic logic Robustness

来源：评论

学校读者我要写书评

暂无评论

intelligent robots and computer vision xi: biological, neural net, and 3-d methods

Intelligent Robots and Computer Vision XI: Biological, Neura...

引用

intelligent robots and computer vision xi: biological, neural net, and 3-d methods

This conference proceedings contains 50 papers. The topics discussed are wavelets, multiresolution, and Gabor techniques;neural net methods in robotics and computer vision;neuromorphology of biological vision as a bas... 详细信息

ISBN: (纸本)0819410276

关键词： computer vision

来源：评论

学校读者我要写书评

暂无评论

Insect vision as model for machine vision

Insect vision as model for machine vision

引用

intelligent robots and computer vision xi: biological, neural net, and 3-d methods

作者： Osorio, d. Sobey, Peter J. Australian Natl. Univ. Canberra ACT Australia

ISBN: (纸本)0819410276

The neural architecture, neurophysiology and behavioral abilities of insect vision are described, and compared with that of mammals. Insects have a hardwired neural architecture of highly differentiated neurons, quite different from the cerebral cortex, yet their behavioral abilities are in important respects similar to those of mammals. These observations challenge the view that the key to the power of biological neural computation is distributed processing by a plastic, highly interconnected, network of individually undifferentiated and unreliable neurons that has been a dominant picture of biological computation since Pitts and McCulloch's seminal work in the 1940's.

关键词： vision

来源：评论

学校读者我要写书评

暂无评论

Shaded display of digital terrain model based on matching of stereo image pair

Shaded display of digital terrain model based on matching of...

引用

intelligent robots and computer vision xi: biological, neural net, and 3-d methods

作者： Zheng, Tan Hwang, Toung Xian Jiaotong Univ. Xi'an China

In this paper, we have discussed the improved technique of surface interpolation, clipping, and hidden surface elimination to solve some problems in generating perspective views of digital terrain model (dTM) data bas... 详细信息

ISBN: (纸本)0819410276

关键词： Aerial photography

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：