Currently, action recognition is predominately performed on video data as processed by CNNs. We investigate if the representation process of CNNs can also be leveraged for multimodal action recognition by incorporatin...
详细信息
ISBN:
(纸本)9781665475921
Currently, action recognition is predominately performed on video data as processed by CNNs. We investigate if the representation process of CNNs can also be leveraged for multimodal action recognition by incorporating image-based audio representations of actions in a task. To this end, we propose Multimodal Audio-image and Video Action Recognizer (MAiVAR), a CNN-based audio-image to video fusion model that accounts for video and audio modalities to achieve superior action recognition performance. MAiVAR extracts meaningful image representations of audio and fuses it with video representation to achieve better performance as compared to both modalities individually on a large-scale action recognition dataset.
Due to the large memory requirement and a large amount of computation, traditional deep learning networks cannot run on mobile devices as well as embedded devices. In this paper, we propose a new mobile architecture c...
详细信息
ISBN:
(纸本)9781665475921
Due to the large memory requirement and a large amount of computation, traditional deep learning networks cannot run on mobile devices as well as embedded devices. In this paper, we propose a new mobile architecture combining MobileNetV2 and pruning, which further decreases the Flops and number of parameters. The performance of MobileNetV2 has been widely demonstrated, and pruning operation can not only allow further model compression but also prevent overfitting. We have done ablation experiments at CIIP Tire Data for different pruning combinations. In addition, we introduced a global hyperparameter to effectively weigh the accuracy and precision. Experiments show that the accuracy of 98.3 % is maintained under the premise that the model size is only 804.5 KB, showing better performance than the baseline method.
Learning-based image compression methods have emerged as state-of-The-Art, showcasing higher performance compared to conventional compression solutions. These data-driven approaches aim to learn the parameters of a ne...
详细信息
This paper presents a novel video encoding algorithm called Block Importance Mapping. In this method, blocks are assigned a quantization parameter, QP, offset based on how likely the block samples are to be used as re...
详细信息
ISBN:
(纸本)9781665475921
This paper presents a novel video encoding algorithm called Block Importance Mapping. In this method, blocks are assigned a quantization parameter, QP, offset based on how likely the block samples are to be used as references in encoding of nearby pictures. The reusability estimation is based on a motion search in pictures immediately before and after the current picture in output order. Blocks that are likely to be used as reference blocks are coded with a lower QP value, i.e., higher quality, whereas blocks that are deemed unlikely to be referenced are coded with a higher QP value. This method has been implemented in the VVC reference software VTM and tested on various configurations and contents. It is reported to provide BD-rate savings of 1.93% for PSNR and 3.69% for MS-SSIM in Random Access and 2.26% for PSNR and 4.29% for MS-SSIM in Low Delay coding. For HDR content, wPSNR BD-rate savings of 1.18% in Random Access configuration is reported.
Most image dehazing deep learning models target synthetic datasets of hazy images, resulting in not considering features in natural hazy images. Leveraging on depth attention with adaptation, we propose a novel dehazi...
详细信息
In the Internet era, the explosive growth of media data processing poses significant challenges for the research of image Coding for Machines (ICM) in improving the efficiency of AI models while reducing the burdens o...
详细信息
End-to-end optimized image compression has emerged as a disruptive technique to reduce the spatial redundancies with an improved reconstruction quality. However, existing entropy model for latent representations canno...
详细信息
ISBN:
(纸本)9781728180687
End-to-end optimized image compression has emerged as a disruptive technique to reduce the spatial redundancies with an improved reconstruction quality. However, existing entropy model for latent representations cannot sufficiently exploit their spatial and channel-wise correlations. In this paper, we propose a novel entropy model based on spatial-channel contexts for end-to-end optimized image compression. The proposed model jointly leverages spatial structural dependencies and channel-wise correlations to improve the probabilistic estimation of latent representations. Instead of complex autoregressive hyperprior network, shallow artificial neural networks (ANNs) incorporating 3-D masks are developed to efficiently realize the entropy model with a guarantee of causality. Experimental results demonstrate that the proposed model achieves competitive rate-distortion performance and reduces model complexity in comparison to recent end-to-end optimized methods for image compression.
A seam is a set of pixels with minimum energy forming a continuous line in an image. By eliminating or duplicating seams iteratively, an input image can be retargeted. However, this process often results in blurring, ...
详细信息
Underwater images suffer from low contrast, color distortion and visibility degradation due to the light scattering and attenuation. Over the past few years, the importance of underwater image enhancement has increase...
详细信息
ISBN:
(纸本)9781728185514
Underwater images suffer from low contrast, color distortion and visibility degradation due to the light scattering and attenuation. Over the past few years, the importance of underwater image enhancement has increased because of ocean engineering and underwater robotics. Existing underwater image enhancement methods are based on various assumptions. However, it is almost impossible to define appropriate assumptions for underwater images due to the diversity of underwater images. Therefore, they are only effective for specific types of underwater images. Recently, underwater image enhancement algorisms using CNNs and GANS have been proposed, but they are not as advanced as other imageprocessing methods due to the lack of suitable training data sets and the complexity of the issues. To solve the problems, we propose a novel underwater image enhancement method which combines the residual feature attention block and novel combination of multi-scale and multi-patch structure. Multi-patch network extracts local features to adjust to various underwater images which are often Non-homogeneous. In addition, our network includes multi-scale network which is often effective for image restoration. Experimental results show that our proposed method outperforms the conventional method for various types of images.
Depth image upsampling is an important issue in three-dimensional (3D) applications. However, edge blurring artifacts are still challenging problems in depth image upsampling, resulting in jagged artifacts in synthesi...
详细信息
ISBN:
(纸本)9781479902880
Depth image upsampling is an important issue in three-dimensional (3D) applications. However, edge blurring artifacts are still challenging problems in depth image upsampling, resulting in jagged artifacts in synthesized views which produce unpleasant visual perception. In this paper, an edge-preserving single depth image interpolation (ESDI) method is proposed. Specifically, local planar hypothesis (LPH) assuming that depth in natural scene are clustered as local planar planes is first explored. Then finite candidates generation (FCG) is proposed to generate limited discrete values satisfied with LPH to interpolated pixels. At last, the optimal combination of candidates is formulated as an energy minimization problem with a constraint in gradient domain, solved by iterated conditional modes (ICM) algorithm. Experiments demonstrate that ESDI achieves high resolution (HR) depth image with clear and sharp edges, and produces synthesized views with desirable quality.
暂无评论