In the field of multi-label image classification, accurately identifying multiple relevant labels in an image is crucial for applications such as image understanding, automatic annotation, and intelligent search. Howe...
详细信息
In the realm of computer vision, the registration of infrared and visible images is pivotal, harnessing the strengths of distinct imaging modalities to surmount challenges in low-light or adverse weather conditions. G...
详细信息
As processing technology advances rapidly and image editors become widely used, computer-generated (CG) images are becoming increasingly lifelike, making it harder and harder to distinguish them from photographic (PG)...
详细信息
To solve the problems of complex background, self-occlusion, and posture inaccuracy detection in 3D human pose estimation, a hybrid attention multilayer perceptron graph convolutional network (HAMLP-Graph) was propose...
详细信息
ISBN:
(纸本)9798350386783;9798350386776
To solve the problems of complex background, self-occlusion, and posture inaccuracy detection in 3D human pose estimation, a hybrid attention multilayer perceptron graph convolutional network (HAMLP-Graph) was proposed. Based on an improved multi-layer perceptron mixer (MLP-Mixer) architecture, the network introduces spatial and channel attention mechanisms to enhance useful spatial channel information by adaptively adjusting the weights of different locations and channels, enabling the network to identify key human feature points. In addition, the graph convolution module is added to the network and the connection and symmetry relations of human bone nodes are fully utilized. The features of nodes and edges are extracted by using convolution operations on the graph structure, to improve the accuracy of the model for node detection. The experimental results show that compared with GraphMLP, the MPJPE index on the Human3.6M dataset decreased by 3.5 percentage points. On the MPI-INF-3DHP dataset, 3DPCK index and AUC index increased by 1.7 percentage points and 2.1 percentage points respectively, which verified that the proposed network was more robust.
Colorectal polyps can evolve into colon cancer over time. Early screening or detection of colon polyps using computer-aided detection (CAD) techniques along with removal of the polyps can lower the risk of colon cance...
详细信息
Focused on the issues of blurring effect and spectral distortion in current pansharpening approaches, we propose a multiscale pansharpening method based on frequency feature guidance. Firstly, we extract frequency fea...
详细信息
In the current competitive environment, businesses are looking at how to lower their IT expenses while simultaneously improving their productivity and flexibility to respond quickly to changes in their operational pro...
详细信息
The transition to Industry 4.0 intensifies the demand for advanced manufacturing techniques and efficient data processing capabilities. A notable challenge in engineering is that many older engineering drawings are on...
详细信息
ISBN:
(纸本)9783031683015;9783031683022
The transition to Industry 4.0 intensifies the demand for advanced manufacturing techniques and efficient data processing capabilities. A notable challenge in engineering is that many older engineering drawings are only available in paper form, creating significant barriers for modern automated systems. This study tackles these challenges by employing advanced deep-learning techniques alongside traditional image processing to convert legacy engineering drawings into structured, machine-readable formats. Following this digitization process, this multi-modal approach further processes drawings containing a lot of heterogeneous data by filtering non-essential details to isolate and extract critical features. This process enables the conversion of complex drawings into formats suitable for computer vision and deep learning applications. The structured datasets resulting from this process are then utilized to enhance the efficiency of automated processes significantly. For instance, they enable more efficient pick-and-place operations by providing the data necessary for machine learning-driven automation.
This paper presents a novel approach for dense scene text detection called DSSNet (Dense Script Spotter Network). The network leverages ResNet and FPN for feature extraction, employing multi-scale feature fusion and T...
详细信息
Typical computer interfaces are designed for one-on-one interaction, leading to decreased efficiency in shared spaces like conference rooms where a single mouse or keyboard is shared among multiple users. Virtual mous...
详细信息
暂无评论