Neural Radiance Fields (NeRF) rendering is a promising Artificial intelligence (AI) technology for generating photorealistic views, with significant potential for automotive applications. However, traditional metrics ...
详细信息
Due to unavoidable environmental and/or technical constraints, many photographs are often taken in low-light conditions, which result in underexposure and severe noise. Existing low-light enhancement and denoising met...
详细信息
Due to unavoidable environmental and/or technical constraints, many photographs are often taken in low-light conditions, which result in underexposure and severe noise. Existing low-light enhancement and denoising methods can deal with both problems individually, but the forced cascading of such methods does not deal well with the combined degradation of light and noise, and is also time-consuming. To address this problem, we propose an efficient network-LDNet, to perform joint low-light enhancement and denoising tasks. LDNet contains an encoder for low-light enhancement, L-Encoder, and a decoder for denoising, D-Decoder. Specifically, we customize the lighten enhancement block (LEB) in L-Encoder to recover rich texture information and luminance information. In D-Decoder, we use image adaptive projection for denoising. Furthermore, since training an end-to-end network requires paired data support, we collect a large-scale real low-light image paired dataset (LN-data). Both the proposed network and dataset provide the basis for this challenging joint task. Extensive experimental results show that our approach achieves better results in both qualitative and quantitative evaluation, notably with a PSNR value of 27.69 and an SSIM value of 0.91 on the LN-data dataset, outperforming other optimal methods.
Despite the advancements in neural network technologies driving interactive image segmentation forward, challenges persist, especially concerning segmentation ambiguities caused by overlapping or visually similar obje...
详细信息
ISBN:
(纸本)9789819784899;9789819784905
Despite the advancements in neural network technologies driving interactive image segmentation forward, challenges persist, especially concerning segmentation ambiguities caused by overlapping or visually similar objects against complex backgrounds, as well as intricate object boundaries. Addressing these challenges, we introduce FusionNet, focusing on effective feature fusion. Firstly, the Hierarchical Context Fusion Module aids in grasping holistic structures and multi-scale contextual information of target objects. Secondly, the Attention Feature Fusion Module captures more representative feature expressions. This design empowers FusionNet to capture details and contextual relationships better, thereby enhancing segmentation accuracy. For fine-grained boundary details, we propose the Local Correction Module, refining local mask details meticulously. This module initially focuses on information around newly clicked areas, employing discriminative correction feedback for enhanced detail processing accuracy. Rigorous experimentations on datasets like SBD, DAVIS, GrabCut, and Berkeley validate our model's effectiveness, with segmentation results strongly supporting the superiority of our approach.
This paper provides an in-depth analysis and study of the simulation of 3D human animation visualization techniques by enhancing machine learning algorithms. Based on the statistical analysis of the data obtained from...
详细信息
This paper provides an in-depth analysis and study of the simulation of 3D human animation visualization techniques by enhancing machine learning algorithms. Based on the statistical analysis of the data obtained from different measurement methods, the extraction of human body feature parameters based on millimeter-wave point cloud data is realized, and the 3D reconstruction and simulation of the human body are realized using parametric human modeling software. In video-based action recognition, most methods are data-driven and use deep networks to automatically learn features of the entire video image. In this process, specific research on human actions is not included or reflected. However, human action recognition is a processing of the semantic level of video content. Realizing universal human action recognition requires a semantic understanding of human behavior. Firstly, the geometric feature analysis of the 3D scanned human model is performed to extract the human body shape characteristic parameters, and the research on the analysis and estimation methods of body shape characteristic parameters is carried out to establish the human body shape parameter relationship model;then, the millimeter-wave point cloud is calculated and measured, the Li group features extracted using the group skeletal representation model with high data dimensionality, to be able to process the high-dimensional data, while reducing the complexity of the recognition process and speeding up the computation, feature learning and classification are performed with convolutional neural networks. To verify the better library portability and robustness of the method in this paper, the method was tested on a self-built human action database in the laboratory, and an average recognition rate of 97.26% was achieved. Meanwhile, this paper investigates the natural interaction application of virtual characters in a virtual learning environment based on human action recognition. Four testers tested t
The size distribution of iron ore sinter is critical to efficient blast furnace operation and is an optimised variable in sinter plants globally. Prompt process control response to discrepancies in sinter size is esse...
详细信息
The size distribution of iron ore sinter is critical to efficient blast furnace operation and is an optimised variable in sinter plants globally. Prompt process control response to discrepancies in sinter size is essential, and the standard sieve measurement test introduces significant delay in data acquisition. We introduce a networked optical sensor system that is shown to accurately measure size distribution within 5 s, collect data continuously at 0.5 Hz, and is well correlated to sieving measurements. This system is deployed at the end of a sinter plant, providing real-time process control data with digital image analysis performed on an integrated microprocessor. The systems performance was assessed with a 12-week validation period, showing excellent correlation with sieve data. Systems such as ours can be widely implemented in sinter plants, and in similar steelmaking applications, due to its cost-effective implementation of continuous data acquisition and the systems versatility to be adapted.
Activation functions play a pivotal role in determining the training dynamics and neural network performance. The widely adopted activation function ReLU despite being simple and effective has few disadvantages includ...
详细信息
ISBN:
(纸本)9781665493468
Activation functions play a pivotal role in determining the training dynamics and neural network performance. The widely adopted activation function ReLU despite being simple and effective has few disadvantages including the Dying ReLU problem. In order to tackle such problems, we propose a novel activation function called Serf which is self-regularized and non-monotonic in nature. Like Mish, Serf also belongs to the Swish family of functions. Based on several experiments on computer vision (image classification and object detection) and natural language processing (machine translation, sentiment classification and multi-modal entailment) tasks with different state-of-the-art architectures, it is observed that Serf vastly outperforms ReLU (baseline) and other activation functions including both Swish and Mish, with a markedly bigger margin on deeper architectures. Ablation studies further demonstrate that Serf based architectures perform better than those of Swish and Mish in varying scenarios, validating the effectiveness and compatibility of Serf with varying depth, complexity, optimizers, learning rates, batch sizes, initializers and dropout rates. Finally, we investigate the mathematical relation between Swish and Serf, thereby showing the impact of pre-conditioner function ingrained in the first derivative of Serf which provides a regularization effect making gradients smoother and optimization faster.
The project addresses the challenge of accurately identifying blurred faces in computer vision and facial recognition. It introduces a novel framework that integrates deblurring techniques, utilizing point spread func...
详细信息
Deep learning models for computer visionapplications specifically and for machine learning generally are now the state of the art. The growth of size and complexity of neural networks has made them more and more reli...
详细信息
Deep learning models for computer visionapplications specifically and for machine learning generally are now the state of the art. The growth of size and complexity of neural networks has made them more and more reliable, yet in greater need of computational power and memory as is evident from the heavy reliance on graphical processing units and cloud computing for training them. As the complexity of deep neural networks increases, the need for fast processing neural networks in real-time embedded applications at the edge also increases and accelerating them using reconfigurable hardware suggests a solution. In this work, a convolutional neural network based on the inception net architecture is first optimized in software and then accelerated by taking advantage of field programmable gate array (FPGA) parallelism. Genetic algorithm augmented training is proposed and used on the neural network to produce an optimum model from the first training run without re-training iterations. Quantization of the network parameters is performed according to the weights of the network. The resulting neural network is then transformed into hardware by writing the register transfer level (RTL) code for FPGAs with exploitation of layer parallelism and a simple trial-and-error allocation of resources with the help of the roofline model. The approach is simple and easy to use as compared to many complex existing methods in literature and relies on trial and error to customize the FPGA design to the model needed to work on any computer vision or multimedia application deep learning model. Simulation and synthesis are performed. The results prove that the genetic algorithm reduces the number of back-propagation epochs in software and brings the network closer to the global optimum in terms of performance. Quantization to 16 bits also shows a reduction in network size by almost half with no performance drop. The synthesis of our design also shows that the Inception-based classifier is cap
Deep learning is a very powerful analytic tool to recognize the patterns in data to make appropriate predictions. It has tremendous potential in data analyses, particularly for cell biology domain, caused by the growi...
详细信息
Deep learning is a very powerful analytic tool to recognize the patterns in data to make appropriate predictions. It has tremendous potential in data analyses, particularly for cell biology domain, caused by the growing scale and inherent complexity of biological data. The core purpose of this research work is to design, implement, and calibrate an efficient deep convolutional neural network (DCNN) architecture in the context of binary-class classification problem. This diversified network is developed to precisely identify human induced pluripotent stem cell-derived endothelial cells (hiPSC-derived EC) based on photomicrograph. The proposed architecture is cerebrally developed with numerous convolutional modules, multiple kernel sizes, various pooling layers, activation functions and strides, nevertheless fewer trainable parameters to strengthen the network and enhance its performance. The proposed feature fusion framework is compared with the classifier fusion approach in terms of Matthews's correlation coefficient (MCC), training time, inference time, number of layers, number of parameters, graphics processing unit (GPU) memory utilization, and floating-point operations (FLOPS). Specifically, it achieves 94.6% sensitivity, 94.5% specificity, and 94.7% precision. Induced pluripotent stem cell (iPS) dataset is also introduced in this research work that has 16278 images which are labelled by three independent and experienced human experts of cell biology domain to facilitate future research. Experimental results show that the proposed framework offers an innovative and attainable algorithm for accelerating and systematizing the classification task along with saving time and effort.
The 'Smart Exercise Counter using Computer vision' is a groundbreaking system that blends cutting-edge computer vision technology with exercise monitoring. In an age where fitness and health are paramount, thi...
详细信息
暂无评论