Underwater imaging plays a critical role in various fields such as marine biology, environmental monitoring, underwater archaeology, and defense. However, it faces unique challenges including light absorption and scat...
详细信息
Modeling non-euclidean data is drawing extensive attention along with the unprecedented successes of deep neural networks in diverse fields. Particularly, a symmetric positive definite matrix is being actively studied...
详细信息
Modeling non-euclidean data is drawing extensive attention along with the unprecedented successes of deep neural networks in diverse fields. Particularly, a symmetric positive definite matrix is being actively studied in computer vision, signal processing, and medical image analysis, due to its ability to learn beneficial statistical representations. However, owing to its rigid constraints, it remains challenging to optimization problems and inefficient computational costs, especially, when incorporating it with a deeplearning framework. In this paper, we propose a framework to exploit a diffeomorphism mapping between Riemannian manifolds and a Cholesky space, by which it becomes feasible not only to efficiently solve optimization problems but also to greatly reduce computation costs. Further, for dynamic modeling of time-series data, we devise a continuous manifold learning method by systematically integrating a manifold ordinary differential equation and a gated recurrent neural network. It is worth noting that due to the nice parameterization of matrices in a Cholesky space, training our proposed network equipped with Riemannian geometric metrics is straightforward. We demonstrate through experiments over regular and irregular time-series datasets that our proposed model can be efficiently and reliably trained and outperforms existing manifold methods and state-of-the-art methods in various time-series tasks.
The short evolution of 6G and augmented reality (AR) technology heralds a transformative era in education and training, promising to redefine traditional learning paradigms using immersive, interactive reports. This p...
详细信息
The short evolution of 6G and augmented reality (AR) technology heralds a transformative era in education and training, promising to redefine traditional learning paradigms using immersive, interactive reports. This paper explores the integration of advanced capsule networks (CapsNet) architectures with semantic segmentation techniques to cope with the inherent annoying conditions of applying 6G and AR in academic contexts. CapsNets, well known for its efficiency in discerning spatial hierarchies and relationships, are paired with semantic segmentation to enhance object recognition and interplay within AR environments. It offers a deep, contextual understanding essential for growing dynamic, personalised learning studies. The synergy among this technology allows for the unique delineation and classification of each image pixel into corresponding object training, facilitating seamless interaction among virtual content and the real world. Need of the highly dependable, low-latency communication exchange skills of 6G, this combination guarantees powerful processing of complex scenes in real-time, crucial for preserving student engagement and facilitating adaptive learning pathways. The paper discusses optimisation techniques for real-time performance, including version pruning and effective routing mechanisms, to cope with the computational needs of CapsNets and semantic segmentation. Moreover, it highlights the role of 6G in assisting the bandwidth and latency necessities critical for deploying these state-of-the-art AI models in AR-based instructional programs. Through this integration, we aim to release new potentials for personalised, context-aware learning experiences, which can considerably enhance knowledge retention, engagement, and accessibility in education and training across various fields.
Aiming at the problem that it is difficult for teachers to obtain accurate classroom status in daily teaching process, which is not conducive to making targeted adjustments to teaching methods, this paper proposes a d...
详细信息
Aiming at the problem that it is difficult for teachers to obtain accurate classroom status in daily teaching process, which is not conducive to making targeted adjustments to teaching methods, this paper proposes a deeplearning-based student class status analysis system. The system uses camera to capture classroom video, and uses image recognition, target detection, deeplearning and other technologies to detect the behavioral state of students in the classroom in realtime and concentration information, and through the statistical analysis of the collected data, it helps the teacher to get timely feedback in the classroom, and better judge the learning state and concentration of students. In order to realize the real-time and accuracy of the system design, the system in the paper introduces the OpenPose model into the YOLOv5 network to identify the students' skeletal keypoints, and synthesizes the results of the processing of the YOLOv5 model and the OpenPose model to make an analysis of the students' classroom behaviors and concentration. The experimental results show that the loss curve can achieve good convergence, and the AP and mAP can reach 95.1% and 88.0%, respectively.
Low-power edge devices equipped with Graphics processing Units (GPUs) are a popular target platform for real-time scheduling of inference pipelines. Such application-architecture combinations are popular in Advanced D...
详细信息
Low-power edge devices equipped with Graphics processing Units (GPUs) are a popular target platform for real-time scheduling of inference pipelines. Such application-architecture combinations are popular in Advanced Driver-assistance Systems for aiding in the real-time decision-making of automotive controllers. However, the real-time throughput sustainable by such inference pipelines is limited by resource constraints of the target edge devices. Modern GPUs, both in edge devices and workstation variants, support the facility of concurrent execution of computation kernels and data transfers using the primitive of streams, also allowing for the assignment of priority to these streams. This opens up the possibility of executing computation layers of inference pipelines within a multi-priority, multi-stream environment on the GPU. However, manually co-scheduling such applications while satisfying their throughput requirement and platform memory budget may require an unmanageable number of profiling runs. In this work, we propose a deep Reinforcement learning (DRL)-based method for deciding the start time of various operations in each pipeline layer while optimizing the latency of execution of inference pipelines as well as memory consumption. Experimental results demonstrate the promising efficacy of the proposed DRL approach in comparison with the baseline methods, particularly in terms of real-time performance enhancements, schedulability ratio, and memory savings. We have additionally assessed the effectiveness of the proposed DRL approach using a real-time traffic simulation tool IPG CarMaker.
Denoising is one of the most significant procedures in the imageprocessing pipeline. Nowadays, deep-learning-based algorithms have achieved superior denoising quality than traditional algorithms. However, the noise b...
详细信息
Denoising is one of the most significant procedures in the imageprocessing pipeline. Nowadays, deep-learning-based algorithms have achieved superior denoising quality than traditional algorithms. However, the noise becomes severe in the dark environment, where even the SOTA algorithms fail to achieve satisfactory performance. Besides, the high computational complexity of deep-learning-based denoising algorithms makes them hardware unfriendly and difficult to process high-resolution images in real-time. To address these issues, a novel low-light RAW denoising algorithm Two-Stage-Denoising (TSDN), is proposed in this paper. In TSDN, denoising consists of two procedures: noise removal and image restoration. Firstly, in the noise-removal stage, most noise is removed from the image, and an intermediate image that is easier for the network to recover the clean image is obtained. Then, in the restoration stage, the clean image is restored from the intermediate image. The TSDN is designed to be light-weight for real-time and hardware friendly. However, the tiny network will be insufficient for satisfactory performance if directly trained from scratch. Therefore, we present an Expand-Shrink-learning (ESL) method to train the TSDN. In the ESL method, firstly, the tiny network is expanded to a larger one with similar architecture but more channels and layers, which enhances the learning ability of the network because of more parameters. Secondly, the larger network is shrunk and restored to the original small network in fine-grained learning procedures, including Channel-Shrink-learning (CSL) and Layer-Shrink-learning (LSL). Experimental results demonstrate that the proposed TSDN achieves better performance (PSNR and SSIM) than other SOTA algorithms in the dark environment. Besides, the model size of TSDN is one-eighth of that of the U-Net for denoising (a classical denoising network).
License plate recognition is crucial in Intelligent Transportation Systems (ITS) for vehicle management, traffic monitoring, and security inspection. In highway scenarios, this task faces challenges such as diversity,...
详细信息
License plate recognition is crucial in Intelligent Transportation Systems (ITS) for vehicle management, traffic monitoring, and security inspection. In highway scenarios, this task faces challenges such as diversity, blurriness, occlusion, and illumination variation of license plates. This article explores Recurrent Neural Networks based on Connectionist Temporal Classification (RNN-CTC) for license plate recognition in challenging highway conditions. Four neural network models: ResNet50, ResNeXt, InceptionV3, and SENet, all combined with RNN-CTC are comparatively evaluated. Furthermore, a novel architecture named ResNet50 deep Fusion Network using Connectionist Temporal Classification (ResNet50-DFN-CTC) is proposed. Comparative and ablation experiments are conducted using the Highway License Plate Dataset of Southeast University (HLPD-SU). Results demonstrate the superior performance of ResNet50-DFN-CTC in challenging highway conditions, achieving 93.158% accuracy with a processingtime of 7.91 ms, outperforming other tested models. This research contributes to advancing license plate recognition technology for real-world highway applications under adverse conditions. We propose a novel architecture named ResNet50 deep Fusion Network using Connectionist Temporal Classification (ResNet50-DFN-CTC). Comparative and ablation experiments are conducted using the Highway License Plate Dataset of Southeast University (HLPD-SU). Results demonstrate the superior performance of ResNet50-DFN-CTC in challenging highway conditions, achieving 93.158% accuracy with a processingtime of 7.91 ms, outperforming other tested models. This research contributes to advancing license plate recognition technology for real-world highway applications under adverse conditions. image
Forest ecosystems are of paramount importance to the sustainable existence of life on earth. Unique natural and artificial phenomena pose severe threats to the perseverance of such ecosystems. With the advancement of ...
详细信息
Forest ecosystems are of paramount importance to the sustainable existence of life on earth. Unique natural and artificial phenomena pose severe threats to the perseverance of such ecosystems. With the advancement of artificial intelligence technologies, the effectiveness of implementing forest monitoring systems based on acoustic surveillance has been established due to the practicality of the approach. It can be identified that with the support of transfer learning, deeplearning algorithms outperform conventional machine learning algorithms for forest acoustic classification. Further, a clear requirement to move the conventional cloud-based sound classification to the edge is raised among the research community to ensure real-time identification of acoustic incidents. This article presents a comprehensive survey on the state-of-the-art forest sound classification approaches, publicly available datasets for forest acoustics, and the associated infrastructure. Further, we discuss the open challenges and future research aspects that govern forest acoustic classification.
A lightweight seedling detection model with improved YOLOv8s is proposed to address the seedling identification problem in the replenishment process in industrial vegetable seedling *** CBS module for feature extracti...
详细信息
A lightweight seedling detection model with improved YOLOv8s is proposed to address the seedling identification problem in the replenishment process in industrial vegetable seedling *** CBS module for feature extraction in Backbone and Neck has been replaced with a lightweight depthwise separable convolution (DSC) in order to reduce the number of model parameters and increase the speed of detection. Furthermore, the fifth layer of Backbone has been augmented with efficient multiscale attention (EMA), which can aggregate multi-scale spatial structure information more rapidly through the two branches of the parallel structure, thereby enhancing the extraction of multi-scale features. Ultimately, the computational complexity of the model is further reduced by enhancing the structure of the bottleneck to form the VoVGSCSP module, which replaces the C2f module in Neck. The mAP of the improved model on the test set is 96.2%, its parameters, GFLOPS, and model size are 7.88 M, 20.9, and 16.1 MB, respectively. The detection speed of the algorithm is 116.3 frames per second (FPS), which is higher than that of the original model (107.5 FPS). The results indicate that the improved model can accurately identify empty cell and unqualified seedling in the plug tray in realtime and has a smaller number of parameters and GFLOPS, making it suitable for use on embedded or mobile devices for seedling replenishment and contributing to the realization of automated and unmanned seedling replenishment.
Owing to time and scene constraints, a significant number of sectional maps exist in paper form. These maps contain a vast amount of data and hold high information value. However, they often suffer from issues such as...
详细信息
Owing to time and scene constraints, a significant number of sectional maps exist in paper form. These maps contain a vast amount of data and hold high information value. However, they often suffer from issues such as annotations, stains, deformation, and missing content during preservation. Traditional processing methods require a large amount of manual image registration, which is extremely inconvenient. In this study, a map image labeling program is designed using OpenCV to prepare a map image dataset, and the U2Net-p algorithm for map segmentation is trained on this dataset. Furthermore, a comprehensive method for automatically merging sectional maps is designed and implemented, which can repair and process sectional maps and seamlessly integrate them into target grids according to map sheet numbering rules. This method has been applied to the production of base maps for natural resource demarcation projects, achieving a stitching accuracy of 96.67% on marked anchor points and considerably improving processing speed. This indicates that our approach has broad application value in the field of automatic stitching and fusion of sectional map images.
暂无评论