As an interferometric imaging method, digital holography has shown its unique potential in many fields, especially in the mature and diverse fields of crystallography. Compared to early microscopy imaging and X-ray di...
详细信息
As an interferometric imaging method, digital holography has shown its unique potential in many fields, especially in the mature and diverse fields of crystallography. Compared to early microscopy imaging and X-ray diffraction approach, this technique captures and accurately reproduces the three-dimensional information of the crystal in realtime. It offers advantages such as fast imaging, nondestructive testing, and optimized data processing. This review discusses the progress of digital holography in crystallography, covering crystallization, mineral imaging, and microstructure analysis of two-dimensional materials. The reconstruction of copper sulfate pentahydrate and sodium chloride crystallization serves as an example to demonstrate its powerful ability. Particular emphasis is placed on the advancement of optical instruments and the development of image reconstruction approaches. Regarding the solutions to problems such as dataset processing and field of view limitations, this paper summarizes the research results of combining digital holography with deeplearning algorithm models and the free field of view method. In addition, the operating principle of the technology is expounded and the future development direction is also prospected.
Lane detection is an important aspect of autonomous driving. For real-world applications, both accuracy and processingtime are very important for this technology. Thus, this study proposed a dynamic real-timedeep le...
详细信息
Lane detection is an important aspect of autonomous driving. For real-world applications, both accuracy and processingtime are very important for this technology. Thus, this study proposed a dynamic real-timedeeplearning-based lane detection method based on the adaptive scheduling of input frames. An adaptive scheduling network was proposed to separate the input into key and adjacent frames based on the expected confidence. Consequently, an improved LaneNet encoder-decoder model was adopted as the primary image segmentation network, which was used to perform lane detection on a keyframe to ensure high accuracy and robustness. A neural optical flow network was used to perform lane predictions on adjacent frames to improve the speed of the detection network. We evaluated our proposed method on TuSimple, UCLane, and our campus datasets to demonstrate its effectiveness. The experimental results indicated that the proposed method performed well on both challenging public and self-made campus datasets, with substantial improvements compared with certain well-known methods.
Layout analysis is the main component of a typical Document image Analysis (DIA) system and plays an important role in pre-processing. However, regarding the Pashto language, the document images have not been explored...
详细信息
Layout analysis is the main component of a typical Document image Analysis (DIA) system and plays an important role in pre-processing. However, regarding the Pashto language, the document images have not been explored so far. This research, for the first time, examines Pashto text along with graphics and proposes a deeplearningbased classifier that can detect Pashto text and graphics per document. Another notable contribution of this research is the creation of a real dataset, which contains more than 1,000 images of the Pashto documents captured by a camera. For this dataset, we applied the convolution neural network (CNN) following a deeplearning technique. Our intended method is based on the development of the advanced and classical variant of Faster R-CNN called Single-Shot Detector (SSD). The evaluation was performed by examining the 300 images from the test set. Through this way, we achieved a mean average precision (mAP) of 84.90%.
Augmented reality is a visualization technology that displays information by adding virtual images to the real world. Effective implementation of augmented reality requires recognition of the current scene. Identifyin...
详细信息
ISBN:
(纸本)9781510673199;9781510673182
Augmented reality is a visualization technology that displays information by adding virtual images to the real world. Effective implementation of augmented reality requires recognition of the current scene. Identifying objects in real-time video on computationally limited hardware requires significant effort. One way to solve this problem is to create a hybrid system that, based on machine learning and computer vision technology, processes and analyzes visual data to identify and classify real-world objects. The proposed architecture is based on a combination of the Vuforia augmented system, which provides good performance by balancing prediction accuracy and efficiency. First, the Vuforia neural network architecture allows convenient interaction with AR in Unity and provides initial conditions for detecting 3D objects. The augmented reality construction algorithm is based on the ARCore framework and the OpenGL interface for embedded systems. The system integrates recognition data with an AR platform to display corresponding 3D models, allowing users to interact with them through the functionality of the AR application. This method also involves the development of an enhanced user interface for AR, making the augmented environment more accessible for navigation and control. Experimental research has shown that the proposed method significantly improves the accuracy of object recognition and the ease of working with 3D models in AR.
In recent years, photovoltaic (PV) power generation has received increasing attention. However, uncertainties in PV power output-especially the random variations caused by cloud cover-make accurate short-term forecast...
详细信息
In recent years, photovoltaic (PV) power generation has received increasing attention. However, uncertainties in PV power output-especially the random variations caused by cloud cover-make accurate short-term forecasting essential for efficient power dispatch. This study proposes a novel ultra-short-term PV power forecasting framework that differs from previous methods by effectively integrating deepimage features from ground-based sky images with multi-source meteorological and power generation data. Central to this approach is PatchDLinear, a precise and lightweight time series forecasting algorithm that improves upon traditional models by incorporating normalization and patch-based processing, which better captures local temporal patterns and trends. Initially, the Cloud Y-Net model is used to calculate cloud coverage and classify cloud types, extracting deep-level information from sky images. These features are then combined with meteorological and power generation data and fed into PatchDLinear for collaborative forecasting. Experiments using real-world images and data demonstrate that the improved module achieves RMSE enhancements of 5.10% and 7.84% over the original model structure, while the overall framework shows RMSE improvements ranging from 17.61 to 43.24% compared to baseline methods. These results confirm that PatchDLinear significantly enhances prediction accuracy, offering a robust solution for ultra-short-term PV power forecasting.
The purpose is to study the applicability of digital and intelligent real-timeimageprocessing (IP) in fitness motion detection under the environment of the Internet of Things (IoT). Given the absence of real-time tr...
详细信息
The purpose is to study the applicability of digital and intelligent real-timeimageprocessing (IP) in fitness motion detection under the environment of the Internet of Things (IoT). Given the absence of real-time training standards and possible workout injury problems during fitness activities, an intelligent fitness real-time IP system based on deeplearning (DL) is implemented. Specifically, the keyframes of the real-timeimages are collected from the fitness monitoring video, and the DL algorithm is introduced to analyze the fitness motions. Afterward, the performance of the proposed system is evaluated through simulation. Subsequently, the Noise Reduction (NR) performance of the proposed algorithm is evaluated from the Peak Signal-to-Noise Ratio (PSNR), which remains above 20 dB for seriously noisy images (with a noise density reaching up to 90%). By comparison, the PSNR of the Standard Median Filter (SMF) and Ranked-order Based Adaptive Median Filter (RAMF) algorithms are not higher than 10 dB. Meanwhile, the proposed algorithm outperforms other DL algorithms by over 2.24% with a detection accuracy of 97.80%;the proposed system can adaptively detect the fitness motion, with a transmission delay no larger than 1 s given a maximum of 750 keyframes. Therefore, the proposed DL-based intelligent fitness real-time IP algorithm has strong robustness, high detection accuracy, and excellent real-timeimage diagnosis and processing effect, thus providing an experimental reference for sports digitalization and intellectualization.
The position of blind lanes must be correctly determined in order for blind people to travel safely. Aiming at the low accuracy and slow speed of traditional blind lanes image segmentation algorithms, a semantic segme...
详细信息
The position of blind lanes must be correctly determined in order for blind people to travel safely. Aiming at the low accuracy and slow speed of traditional blind lanes image segmentation algorithms, a semantic segmentation method based on SegNet and MobileNetV3 is proposed. The main idea is to replace the coding part of the original SegNet model with the feature extraction part of MobileNetV3 and remove the pooling layer. Blind lanes images were collected through online search and self-shooting, and then the data were manually marked by LabelMe software and trained on TensorFlow deeplearning framework. The experimental results show that the improved model has high segmentation accuracy and recognition speed. The pixel accuracy of blind lanes segmentation is 98.21%, the mean intersection over union is 96.29%, and the average time for processing a 416 x 416 image is 0.057 s, which meets the real-time requirements of the blind guidance system.
The accuracy of many computer vision tasks is reduced by blurred images, so deblur is important. More details of the image can be captured by a common multi-stage network, but the computational complexity of this meth...
详细信息
The accuracy of many computer vision tasks is reduced by blurred images, so deblur is important. More details of the image can be captured by a common multi-stage network, but the computational complexity of this method is higher compared with a single-stage network. However, a single-stage network cannot capture multi-scale information well. To tackle the problem, a novel convolutional encoder-decoder-restorer architecture is proposed. In this architecture, a multi-scale input structure is used in the encoder. Improved supervised attention module is inserted into the encoder for enhanced feature acquisition. In decoder, information supplement block is proposed to fuse multi-scale features. Finally, the fused features are used for image recovery in the restorer. In order to optimise the model in multiple domains, the loss function is calculated separately in the spatial and frequency domains. Our method is compared with existing methods on the GOPRO dataset. In addition, to verify the applications of our proposed method, we conduct experiments on the realimage dataset, the VOC2007 dataset and the LFW dataset. Experimental results show that our proposed method outperforms state-of-the-art deblurring methods and improves the accuracy of different vision tasks.
deeplearning models perform remarkably well on many classification tasks recently. The superior performance of deep neural networks relies on the large number of training data, which at the same time must have an equ...
详细信息
In Advanced Driving Assistance Systems (ADAS), Automated Driving Systems (ADS), and Driver Assistance Systems (DAS), RGB camera sensors are extensively utilized for object detection, semantic segmentation, and object ...
详细信息
In Advanced Driving Assistance Systems (ADAS), Automated Driving Systems (ADS), and Driver Assistance Systems (DAS), RGB camera sensors are extensively utilized for object detection, semantic segmentation, and object tracking. Despite their popularity due to low costs, RGB cameras exhibit weak robustness in complex environments, particularly underperforming in low-light conditions, which raises a significant concern. To address these challenges, multi-sensor fusion systems or specialized low-light cameras have been proposed, but their high costs render them unsuitable for widespread deployment. On the other hand, improvements in post-processing algorithms offer a more economical and effective solution. However, current research in low-light image enhancement still shows substantial gaps in detail enhancement on nighttime driving datasets and is characterized by high deployment costs, failing to achieve real-time inference and edge deployment. Therefore, this paper leverages the Swin Vision Transformer combined with a gamma transformation integrated U-Net for the decoupled enhancement of initial low-light inputs, proposing a deeplearning enhancement network named Vehicle-based Efficient Low-light image Enhancement (VELIE). VELIE achieves state-of-the-art performance on various driving datasets with a processingtime of only 0.19 s, significantly enhancing high-dimensional environmental perception tasks in low-light conditions.
暂无评论