Modern smartphones usually have automatic camera adjustment features that predetermine how images will be processed. Without an intervention from the user (e.g., manual adjustment of exposure settings, addition/remova...
详细信息
ISBN:
(纸本)9798400716553
Modern smartphones usually have automatic camera adjustment features that predetermine how images will be processed. Without an intervention from the user (e.g., manual adjustment of exposure settings, addition/removal of certain image filters), the predetermined camera settings dictate the look and feel of images taken. Since higher-end mobile devices tend to gravitate towards a more visually appealing style and clearer images, image enhancement on entry-level devices could be performed by transferring the style from a higher-end device to a lower-end one. This paper proposes a learning-based, style-driven image enhancement for entry-level devices. Using a deep residual style transfer network, we train a model that learns the relationship between images taken from a high-end device and those taken from an entry-level device to create a filter that could be used to enhance the images captured from an entry-level device. Our quantitative and qualitative analyses show that our proposed method can enhance images to match the qualities produced by higher-end mobile device cameras.
Application of artificial intelligence methods in agriculture is gaining research attention with focus on improving planting, harvesting, post-harvesting, etc. Fruit quality recognition is crucial for farmers during h...
详细信息
Application of artificial intelligence methods in agriculture is gaining research attention with focus on improving planting, harvesting, post-harvesting, etc. Fruit quality recognition is crucial for farmers during harvesting and sorting, for food retailers for quality monitoring, and for consumers for freshness evaluation, etc. However, there is a lack of multi-fruit datasets to support real-time fruit quality evaluation. To address this gap, we present a new dataset of fruit images aimed at evaluating fruit freshness, which addresses the lack of multi-fruit datasets for real-time fruit quality evaluation. The dataset contains images of 11 fruits categorized into three freshness classes, and five well-known deep learning models (ShuffleNet, SqueezeNet, EfficientNet, ResNet18, and MobileNet-v2) were adopted as baseline models for fruit quality recognition using the dataset. The study provides a benchmark dataset for the classification task, which could improve research endeavors in the field of fruit quality recognition. The dataset is systematically organized and annotated, making it suitable for testing the performance of state-of-the-art methods and new learning classifiers. The research community in the fields of computer vision, machine learning, and pattern recognition could benefit from this dataset by applying it to various research tasks such as fruit classification and fruit quality recognition. The study achieved impressive results with the best classifier being ResNet-18 with an overall best performance of 99.8% for accuracy. The study also identified limitations, such as the small size of the dataset, and proposed future work to improve deep learning techniques for fruit quality classification tasks.
For a traditional traffic situational awareness system (TSAS), its "Road-side unit (RSU) + cloud-based analysis" structure is difficult to meet the demands of rapidly expanding urban areas. Relatively high c...
详细信息
For a traditional traffic situational awareness system (TSAS), its "Road-side unit (RSU) + cloud-based analysis" structure is difficult to meet the demands of rapidly expanding urban areas. Relatively high costs of microwave speed detection modules and bandwidth requirements of information systems significantly increase construction costs. By computer vision (Cv) and edge computing technologies, traffic situational awareness tasks can be integrated into cheaper edge devices (roadside surveillance, RSS), effectively addressing such challenges. In this study, we present a low-cost TSAS developed based on YOLO v8 and grey wolf optimizer-long short-term memory (GWO-LSTM) neural network. Proposed system can automatically perform vehicle and license plate recognition, speed measurement, and data recording within the field of view of RSSs. Additionally, it accurately predicts the future traffic conditions of monitored roads using recorded information. Experimental results demonstrate that the proposed TSAS achieves a license plate recognition accuracy of 97.7%, vehicle type recognition accuracy of 98.1%, and speed measurement error of less than 0.45 km/h, with R2 of 0.8971 for GWO-LSTM predictions. This system is sufficiently effective for traffic monitoring and situational awareness tasks but enforcement forensic applications.
Subject of study. The study investigated the possibility of using neural network models of second-order visual mech-anisms as inputs for neural network classifiers. Second-order visual mechanisms can detect spatial in...
详细信息
Subject of study. The study investigated the possibility of using neural network models of second-order visual mech-anisms as inputs for neural network classifiers. Second-order visual mechanisms can detect spatial inhomogeneities in the contrast, orientation, and spatial frequency of an image. These mechanisms are traditionally considered one of the stages of early visual processing;their role in the perception of textures has been well studied. Aim of study. The study aimed to investigate whether the use of classifier input modules pretrained to demodulate the spatial modulations of luminance gradients contributed to object and scene classifications. Method. Neural network modeling was used as the main method. At the first stage of the study, a set of texture images was generated to train neural network models of second-order visual mechanisms. At the second stage, the object and scene samples were prepared, based on which classifier networks were trained. Pretrained models of second-order visual mechanisms with fixed weights were applied as these network inputs. Main results. The second-order information presented as a map of instantaneous values of the modulation function of contrast, orientation, and spatial frequency of the image was sufficient for the identification of only some of the scene classes. In general, the use of the values of luminance gradient modulation functions for object classification proved to be ineffective within the proposed neural network architectural framework. Thus, the hypothesis stating that second-order visual filters encode features enabling the identification of objects was not confirmed. This result makes it necessary to test an alternative hypothesis stating that the role of second-order filters is limited to the construction of saliency maps, and filters are windows through which information is received from the first-order filter outputs. Practical significance. The possibility of using second-order models of visual mechanism
image captioning is a pretty modern area of the convergence of computer vision and natural language processing and is widely used in a range of applications such as multi-modal search, robotics, security, remote sensi...
详细信息
image captioning is a pretty modern area of the convergence of computer vision and natural language processing and is widely used in a range of applications such as multi-modal search, robotics, security, remote sensing, medical, and visual aid. The image captioning techniques have witnessed a paradigm shift from classical machine-learning-based approaches to the most contemporary deep learning-based techniques. We present an in-depth investigation of image captioning methodologies in this survey using our proposed taxonomy. Furthermore, the study investigates several eras of image captioning advancements, including template-based, retrieval-based, and encoder-decoder-based models. We also explore captioning in languages other than English. A thorough investigation of benchmark image captioning datasets and assessment measures is also discussed. The effectiveness of real-time image captioning is a severe barrier that prevents its use in sensitive applications such as visual aid, security, and medicine. Another observation from our research is the scarcity of personalized domain datasets that limits its adoption into more advanced issues. Despite influential contributions from several academics, further efforts are required to construct substantially robust and reliable image captioning models.
In the realm of mechanical machining, tool wear is an unavoidable phenomenon. Monitoring the condition of tool wear is crucial for enhancing machining quality and advancing automation in the manufacturing process. Thi...
详细信息
In the realm of mechanical machining, tool wear is an unavoidable phenomenon. Monitoring the condition of tool wear is crucial for enhancing machining quality and advancing automation in the manufacturing process. This paper investigates an innovative approach to tool wear monitoring that integrates machinevision with force signal analysis. It relies on a deep residual two-stream convolutional model optimized with the scSE (concurrent spatial and channel squeeze and excitation) attention mechanism (scSE-ResNet-50-TSCNN). The force signals are converted into the corresponding wavelet scale images following wavelet threshold denoising and continuous wavelet transform. Concurrently, the images undergo processing using contrast limited adaptive histogram equalization and the structural similarity index method, allowing for the selection of the most suitable image inputs. The processed data are subsequently input into the developed scSE-ResNet-50-TSCNN model for precise identification of the tool wear state. To validate the model, the paper employed X850 carbon fibre reinforced polymer and Ti-6Al-4v titanium alloy as laminated experimental materials, conducting a series of tool wear tests while collecting pertinent machining data. The experimental results underscore the model's effectiveness, achieving an impressive recognition accuracy of 93.86%. When compared with alternative models, the proposed approach surpasses them in performance on the identical dataset, showcasing its efficient monitoring capabilities in contrast to single-stream networks or unoptimized networks. Consequently, it excels in monitoring tool wear status and promots crucial technical support for enhancing machining quality control and advancing the field of intelligent manufacturing.
Traditional machine learning, mainly supervised learning, follows the assumptions of closed-world learning, i.e., for each testing class, a training class is available. However, such machine learning models fail to id...
详细信息
Traditional machine learning, mainly supervised learning, follows the assumptions of closed-world learning, i.e., for each testing class, a training class is available. However, such machine learning models fail to identify the classes, which were not available during training time. These classes can be referred to as unseen classes. Open-world machine Learning (OWML) is a novel technique, which deals with unseen classes. Although OWML is around for a few years and many significant research works have been carried out in this domain, there is no comprehensive survey of the characteristics, applications, and impact of OWML on the major research areas. In this article, we aimed to capture the different dimensions of OWML with respect to other traditional machine learning models. We have thoroughly analyzed the existing literature and provided a novel taxonomy of OWML considering its two major application domains: Computer vision and Natural Language processing. We listed the available software packages and open datasets in OWML for future researchers. Finally, the article concludes with a set of research gaps, open challenges, and future directions.
Low-light image enhancement is highly desirable for outdoor imageprocessing and computer visionapplications. Research conducted in recent years has shown that images taken in low-light conditions often pose two main...
详细信息
Models based on the transformer architecture have seen widespread application across fields such as natural language processing (NLP), computer vision, and robotics, with large language models (LLMs) like ChatGPT revo...
详细信息
Models based on the transformer architecture have seen widespread application across fields such as natural language processing (NLP), computer vision, and robotics, with large language models (LLMs) like ChatGPT revolutionizing machine understanding of human language and demonstrating impressive memory capacity and reproduction capabilities. Traditional machine learning algorithms struggle with catastrophic forgetting, detrimental to the diverse and generalized abilities required for robotic deployment. This article investigates the receptance weighted key value (RWKv) framework, known for its advanced capabilities in efficient and effective sequence modeling, integration with the decision transformer (DT), and experience replay architectures. It focuses on potential performance enhancements in sequence decision-making and lifelong robotic learning tasks. We introduce the decision-RWKv (DRWKv) model and conduct extensive experiments using the D4RL database within the OpenAI Gym environment and on the D'Claw platform to assess the DRWKv model's performance in single-task tests and lifelong learning scenarios, showing its ability to handle multiple subtasks efficiently. The code for all algorithms, training, and image rendering in this study is available online (open source).
In the current subsea industry scenario, autonomous underwater vehicles (AUvs) are widely used for expeditions and explorations. However, the mission duration is limited due to the limitations in the battery capacity....
详细信息
In the current subsea industry scenario, autonomous underwater vehicles (AUvs) are widely used for expeditions and explorations. However, the mission duration is limited due to the limitations in the battery capacity. To increase the endurance, there is a need for a submerged docking station (DS) to charge the battery, also to update the next mission profile. In this letter, deep learning (DL) technique aided short-range vision guidance is envisaged for a reliable and precise AUv homing operation. Intelligent control algorithms with an efficient DL-based you only look once (YOLO) v5-imageprocessing techniques are used for DS detection and tracking and deployed in an edge computer integrated into AUv prototype. The developed illuminated DS and AUv prototype with high-definition camera has been demonstrated in test tank at depth of 2 m. An analysis was conducted on the DS data set, which comprised 132 images of clear and turbid water, 13 were designated for testing, 40 for validation, and 79 for training purposes. The results were observed that the probability of detecting the DS is 95%, detection range is 5 m, the probability of homing toward the DS is CEP 90 with the position error of 5% in less-turbid waters and in high-turbid waters, 60% is the probability of DS detection with position error up to 25%, detectable range is 1 m. The proposed embedded hardware is extremely useful for underwater reliable homing applications.
暂无评论