Symmetry is present in nature and science. In imageprocessing, kernels for spatial filtering possess some symmetry (e.g. Sobel operators, Gaussian, Laplacian). Convolutional layers in artificial feed-forward neural n...
详细信息
Symmetry is present in nature and science. In imageprocessing, kernels for spatial filtering possess some symmetry (e.g. Sobel operators, Gaussian, Laplacian). Convolutional layers in artificial feed-forward neuralnetworks have typically considered the kernel weights without any constraint. We propose to investigate the impact of a symmetry constraint in convolutional layers for image classification tasks, taking our inspiration from the processes involved in the primary visual cortex and common imageprocessing techniques. The goal is to determine if it is necessary to learn each weight of the filters independently, and the extent to which it is possible to enforce symmetrical constraints on the filters throughout the training process of a convolutional neural network by modifying the weight update preformed during the backpropagation algorithm and to evaluate the change in performance. The symmetrical constraint reduces the number of free parameters in the network, and it is able to achieve near identical performance. We address the following cases: x/y-axis symmetry, point reflection, and anti-point reflection. The performance is evaluated on four databases of images representing handwritten digits. The results support the conclusion that while random weights offer more freedom to the model, the symmetry constraint provides a similar level of performance while decreasing substantially the number of free parameters in the model. Such an approach can be valuable in phase-sensitive applications that require a linear phase property throughout the feature extraction process.
artificial Intelligence Generated Content (AIGC) has experienced significant advancements, particularly in the areas of natural language processing and 2D image generation. However, the generation of three-dimensional...
详细信息
ISBN:
(纸本)9789819785070;9789819785087
artificial Intelligence Generated Content (AIGC) has experienced significant advancements, particularly in the areas of natural language processing and 2D image generation. However, the generation of three-dimensional (3D) content from a single image still poses challenges, particularly when the input image contains complex backgrounds. This limitation hinders the potential applications of AIGC in areas such as human-machine interaction, virtual reality (VR), and architectural design. Despite the progress made so far, existing methods face difficulties when dealing with single images that have intricate backgrounds. Their reconstructed 3D shapes tend to be incomplete, noisy, or lack of partial geometric structures. In this paper, we introduce a 3D generation framework for indoor scenes from a single image to generate realistic and visually-pleasing 3D geometry shapes, without the requirement of point clouds, multi-view images, depth or masks as input. The main idea of our method is clustering-based 3D shape learning and prediction, followed by a shape deformation. Since more than one objects tend to be existing in indoor scenes, our framework will simultaneously generate multi-objects and predict the layout with a camera pose, as well as 3D object bounding boxes for holistic 3D scene understanding. We have evaluated the proposed framework on benchmark datasets including ShapeNet, SUN RGB-D and Pix3D, and state-of-the-art performance has been achieved. We have also given examples to illustrate immediate applications in virtual reality.
With the advent of the big data era, there has been a surge in research focused on the application of convolutional neuralnetworks (CNNs) and imageprocessing. Similar to how humans effortlessly identify cats and dog...
详细信息
Computer vision object detection is the task of detecting and identifying objects present in an image or a video sequence. Models based on artificial convolutional neuralnetworks are commonly used as detector models....
Computer vision object detection is the task of detecting and identifying objects present in an image or a video sequence. Models based on artificial convolutional neuralnetworks are commonly used as detector models. Object detection precision and inference efficiency are crucial for surveillance-based applications. A decrease in the detector model complexity as well as in the complexity of the post-processing computations promotes increased inference efficiency. Modern object detectors for surveillance applications usually make use of a regression algorithm and bounding box priors referred to as anchor boxes to compute bounding box proposals, and the proposal selection algorithm contributes to the computational cost at inference. In this study, an anchor-free and low complexity deep learning detector model was implemented within a surveillance applications setting, and was evaluated and compared to a reference baseline state-of-the-art anchor-based object detector. A key-point-based detector model (CenterNet), predicting Gaussian distribution based object centers, was selected for the evaluation against the baseline. The surveillance applications adapted anchor-free detector exhibited a factor 2.4 lower complexity than the baseline detector. Further, a significant redistribution to shorter post-processing times was demonstrated at inference for the anchor-free surveillance adapted CenterNet detector, giving a modal values factor 0.6 of the baseline detector post-processing time. Furthermore, the surveillance adapted CenterNet model was shown to outperform the baseline in terms of detection precision for several surveillance applications relevant classes and for objects of smaller spatial scale.
Forward-only learning algorithms have recently gained attention as alternatives to gradient backpropagation, replacing the backward step of this latter solver with an additional contrastive forward pass. Among these a...
详细信息
In this paper, we propose three different methods for anomaly detection in surveillance videos based on modeling of observation likelihoods. By means of the methods we propose, normal (typical) events in a scene are l...
详细信息
In this paper, we propose three different methods for anomaly detection in surveillance videos based on modeling of observation likelihoods. By means of the methods we propose, normal (typical) events in a scene are learned in a probabilistic framework by estimating the features of consecutive frames taken from the surveillance camera. The proposed methods are based on long short-term memory (LSTM) and linear regression. To decide whether an observation sequence (i.e., a small video patch) contains an anomaly or not, its likelihood under the modeled typical observation distribution is thresholded. An anomaly is decided to be present if the threshold is exceeded. Due to its effectiveness in object detection and action recognition applications, covariance features are used in this study to compactly reduce the dimensionality of the shape and motion cues of spatiotemporal patches obtained from the video segments. The two most successful methods are based on the final state vector of LSTM and support vector regression applied to mean covariance features and achieve an average performance of up to 0.95 area under curve on benchmark datasets.
Power supply interruptions in low-voltage customers, caused by blackouts and other factors, can significantly impact the functioning of rural healthcare centres. To address this issue, the development of a predictive ...
详细信息
The leap in film and television special effects technology has updated a series of film production methods. Post-production using the most advanced computer graphics technology stimulates the creativity of the produce...
详细信息
ISBN:
(纸本)9783031243660;9783031243677
The leap in film and television special effects technology has updated a series of film production methods. Post-production using the most advanced computer graphics technology stimulates the creativity of the producers, simplifies the post-production process, and improves the quality of the entire film. The purpose of this paper is to study the application of digital special effects technology in film and television post-production based on neural network algorithm. First, the digital technology used in the widely used film and television post-production is introduced, and then some applications and problems of artificialneuralnetworks are introduced. Then the PointNet network structure is introduced, which is a deep learning network in 3D point cloud. Framework, which eliminates the ambiguity caused by the disorder and rotation of point clouds by introducing T-net and utilizing max-pooling, and finally we introduce an encoder-decoder network for 3D human reconstruction, which encodes The network extracts the features, and uses the decoding network to learn the transformation between the template and the input point cloud, so as to complete the deformation fitting between the template and the point cloud.
The proceedings contain 35 papers. The topics discussed include: noise robust focal distance detection in laser material processing using CNNs and Gaussian processes;machine learning-based high-precision and real-time...
ISBN:
(纸本)9781510651524
The proceedings contain 35 papers. The topics discussed include: noise robust focal distance detection in laser material processing using CNNs and Gaussian processes;machine learning-based high-precision and real-time focus detection for laser material processing systems;sargassum detection and path estimation using neuralnetworks;neuron segmentation in epifluorescence microscopy imaging with deep learning;multimodal super-resolution reconstruction based on encoder-decoder network;synthetic apertures for array ptychography imaging via deep learning;infrared image super-resolution pseudo-color reconstruction based on dual-path propagation;effective laser pest control with modulated UV-A light trapping for mushroom fungus gnats;and integration of augmented reality and imageprocessing in plasma dynamic analysis: digital concepts and structural system design.
Low level image restoration is an integral component of modern artificial intelligence (AI) driven camera pipelines. Most of these frameworks are based on deep neuralnetworks which present a massive computational ove...
详细信息
Low level image restoration is an integral component of modern artificial intelligence (AI) driven camera pipelines. Most of these frameworks are based on deep neuralnetworks which present a massive computational overhead on resource constrained platform like a mobile phone. In this paper, we propose several lightweight low-level modules which can be used to create a computationally low cost variant of a given baseline model. Recent works for efficient neuralnetworks design have mainly focused on classification. However, low-level imageprocessing falls under the 'image-to-image' translation genre which requires some additional computational modules not present in classification. This paper seeks to bridge this gap by designing generic efficient modules which can replace essential components used in contemporary deep learning based image restoration networks. We also present and analyse our results highlighting the drawbacks of applying depthwise separable convolutional kernel (a popular method for efficient classification network) for sub-pixel convolution based upsampling (a popular upsampling strategy for low-level vision applications). This shows that concepts from domain of classification cannot always be seamlessly integrated into 'image-to-image' translation tasks. We extensively validate our findings on three popular tasks of image inpainting, denoising and super-resolution. Our results show that proposed networks consistently output visually similar reconstructions compared to full capacity baselines with significant reduction of parameters, memory footprint and execution speeds on contemporary mobile devices.
暂无评论