Photo enhancement is a long-standing and challenging problem in imageprocessing community. Despite having witnessed significant achievements in recent years, many of them are built upon supervised learning theories a...
详细信息
Photo enhancement is a long-standing and challenging problem in imageprocessing community. Despite having witnessed significant achievements in recent years, many of them are built upon supervised learning theories and thus required expertise in constructing a huge collection of paired data, which is well-known to be a problem as the acquisition of such data in real life can be impractical. We address this issue by proposing a multi-scale GAN framework that can be trained in an unsupervised fashion. Notably, we unify the design principle of the generator and discriminator in our framework so as to maximize the ability to learn deep latent representations. Specifically, rather than maintaining the content consistency through complicated two-way loss, we present a one-way loss that measures the content distance between multi-scale latent representations of inputs and outputs to speed up the training by 1.7x. Furthermore, we redesign the discriminator into a multi-scale-multi-stage manner to strengthen the adversarial learning, where the multiple latent features with varying scales are produced by the main discriminator and these features are then sent to auxiliary discriminators for final recognition. Extensive experiments have been conducted in the well-known MIT-Adobe-fivek and HDR+ datasets, and the results demonstrated that the proposed multi-scale representation learning framework shows outstanding performance in photo enhancement task.
Quality assessment of pome fruits (i.e. apples and pears) is used not only for determining the optimal harvest time but also for the progression of fruit-quality attributes during storage. Therefore, it is typical to ...
详细信息
Quality assessment of pome fruits (i.e. apples and pears) is used not only for determining the optimal harvest time but also for the progression of fruit-quality attributes during storage. Therefore, it is typical to repeatedly evaluate fruits during the course of a postharvest experiment. This evaluation often includes careful visual assessments of fruit for apparent defects and physiological symptoms. A general best practice for quality assessment is to rate fruit using the same individual rater or group of individual raters to reduce bias. However, such consistency across labs, facilities, and experiments is often not feasible or attainable. Moreover, while these visual assessments are critical empirical data, they are often coarse-grained and lack consistent objective criteria. Granny, is a tool designed for rating fruit using machine-learning and image-processing to address rater bias and improve resolution. Additionally, Granny supports backward compatibility by providing ratings compatible with long-established standards and references, promoting research program continuity. Current Granny ratings include starch content assessment, rating levels of peel defects, and peel color analyses. Integrative analyses enhanced by Granny's improved resolution and reduced bias, such as linking fruit outcomes to global scale -omics data, environmental changes, and other quantitative fruit quality metrics like soluble solids content and flesh firmness, will further enrich our understanding of fruit quality dynamics. Lastly, Granny is open-source and freely available.
Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types, such as audio and visual information. We harnessed this capability to...
详细信息
Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types, such as audio and visual information. We harnessed this capability to develop a deeplearning model that utilizes Distil HuBERT for jointly learning these combined features in speech emotion recognition (SER). Our experiments highlight its distinct advantages: it significantly outperforms Wav2vec 2.0 in both offline and real-time accuracy on RAVDESS and BAVED datasets. Although slightly trailing HuBERT’s offline accuracy, Distil HuBERT shines with comparable performance at a fraction of the model size, making it an ideal choice for resource-constrained environments like mobile devices. This smaller size does come with a slight trade-off: Distil HuBERT achieved notable accuracy in offline evaluation, with 96.33% on the BAVED database and 87.01% on the RAVDESS database. In real-time evaluation, the accuracy decreased to 79.3% on the BAVED database and 77.87% on the RAVDESS database. This decrease is likely a result of the challenges associated with real-timeprocessing, including latency and noise, but still demonstrates strong performance in practical scenarios. Therefore, Distil HuBERT emerges as a compelling choice for SER, especially when prioritizing accuracy over real-timeprocessing. Its compact size further enhances its potential for resource-limited settings, making it a versatile tool for a wide range of applications.
We introduce an interpretable deep-learning (DL) approach for direction-of-arrival (DOA) estimation with a single snapshot. Classical subspace-based methods, such as multiple signal classification (MUSIC) and estimati...
详细信息
We introduce an interpretable deep-learning (DL) approach for direction-of-arrival (DOA) estimation with a single snapshot. Classical subspace-based methods, such as multiple signal classification (MUSIC) and estimation of parameters by rotational invariant technique (ESPRIT), use spatial smoothing on uniform linear arrays (ULAs) for single-snapshot DOA estimation but face drawbacks in reduced array aperture and inapplicability to sparse arrays. Single-snapshot methods, such as compressive sensing (CS) and iterative adaptive approach (IAA), encounter challenges with high-computational costs and slow convergence, hampering real-time use. Recent DL DOA methods offer promising accuracy and speed. However, the practical deployment of deep networks is hindered by their black-box nature. To address this, we propose a deep-minimum power distortionless response (MPDR) network translating MPDR-type beamformer into DL, enhancing generalization and efficiency. Comprehensive experiments conducted using both simulated and real-world datasets substantiate its dominance in terms of inference time and accuracy in comparison with conventional methods. Moreover, it excels in terms of efficiency, generalizability, and interpretability when contrasted with other DL DOA estimation networks.
In this study, we propose an efficient fusion framework that utilizes deeplearning and a genetic algorithm for the classification of femoral neck fracture images. This is the first study to utilize a genetic algorith...
详细信息
In this study, we propose an efficient fusion framework that utilizes deeplearning and a genetic algorithm for the classification of femoral neck fracture images. This is the first study to utilize a genetic algorithm (GA) to optimize the architecture of a Convolutional neural network (CNN) model for the classification of femoral neck fractures. The proposed CNN was trained on a large dataset of 10 000 real patient cases, who underwent both skeletal bone mineral density measurement and hip X-ray at the University Hospital Center of Oujda between 2016 and 2023. The performance of the model was extensively evaluated and compared to various machine learning and deeplearning models, including Random Forest, SVM, VGG19, ResNet50, InceptionV3, and EfficientNet. The experimental results demonstrate that the proposed CNN achieved an accuracy of 97%, and it is currently being used by seven doctors at the University Hospital Center of Oujda, Marocco.
image segmentation plays a crucial role in the roadwork operations of autonomous line-painting machines. However, the limited resources of mobile platforms in intelligent line-painting applications pose a dual challen...
详细信息
image segmentation plays a crucial role in the roadwork operations of autonomous line-painting machines. However, the limited resources of mobile platforms in intelligent line-painting applications pose a dual challenge of ensuring both accuracy and real-time performance in road segmentation. To address this issue, this study introduces a lightweight yet efficient image segmentation model, termed the SLTM Network. Central to this network is the lightweight SLTM module, which significantly reduces the model's parameter count and lowers the computational overhead of the decoder. To enhance the interplay of information at different spatial resolutions, the network incorporates an SE attention-enhanced upsampling module (SAUM) and employs a Spatial Attention Sequence (SAS) unit to improve global environment perception at a low computational cost. Comprehensive experimental evaluations on the Cityscapes dataset demonstrate that the SLTM Network excels in balancing speed and accuracy, achieving an mIoU of 70.5% with only 4.07M parameters and an impressive inference speed of 267.1 FPS. On the embedded device Jetson Xavier NX, it achieves an inference speed of 34.2 FPS. Compared to existing lightweight image segmentation models, the SLTM Network exhibits significant advantages in both processing speed and accuracy, making it particularly suitable for real-time autonomous line-painting machine applications.
Physiological signals obtained from electroencephalography (EEG), electromyography (EMG), and electrocardiography (ECG) provide valuable clinical information but pose challenges for analysis due to their high-dimensio...
详细信息
Physiological signals obtained from electroencephalography (EEG), electromyography (EMG), and electrocardiography (ECG) provide valuable clinical information but pose challenges for analysis due to their high-dimensional nature. Traditional machine learning techniques, relying on hand-crafted features from fixed analysis windows, can lead to the loss of discriminative information. Recent studies have demonstrated the effectiveness of deep convolutional neural networks (CNNs) for robust automated feature learning from raw physiological signals. However, standard CNN architectures require two-dimensional image data as input. This has motivated research into innovative signal-to-image (STI) transformation techniques to convert one-dimensional time series into images preserving spectral, spatial, and temporal characteristics. This paper reviews recent advances in strategies for physiological signal-to-image conversion and their applications using CNNs for automated processing tasks. A systematic analysis of EEG, EMG, and ECG signal transformation and CNN-based analysis techniques spanning diverse applications, including brain-computer interfaces, seizure detection, motor control, sleep stage classification, arrhythmia detection, and more, are presented. Key insights are synthesized regarding the relative merits of different transformation approaches, CNN model architectures, training procedures, and benchmark performance. Current challenges and promising research directions at the intersection of deeplearning and physiological signal processing are discussed. This review aims to catalyze continued innovations in effective end-to-end systems for clinically relevant information extraction from multidimensional physiological data using convolutional neural networks by providing a comprehensive overview of state-of-the-art techniques.
Mobile target tracking remains a significant issue in smart cities. Due to complex changes in time and space of targets, real-time tracking remains a challenging problem. As a result, this paper proposes a real-time t...
详细信息
Mobile target tracking remains a significant issue in smart cities. Due to complex changes in time and space of targets, real-time tracking remains a challenging problem. As a result, this paper proposes a real-time tracking approach for moving objects by combining the advantages of YOLOv7 and SORT algorithms. First, we use the YOLOv7 algorithm for object detection, which has the characteristics of high accuracy and efficiency. Then, we apply the SORT algorithm to the target tracking stage, which estimates and updates the target state through Kalman filtering. The collaborative function of the two parts is expected to achieve high-quality tracking of moving targets. Besides, this paper also demonstrates experiments and analysis on image datasets. The experimental results show that the proposed algorithm has achieved good performance in real-time tracking of moving targets. Compared with traditional methods, it can more accurately predict the position and trajectory of targets and has better real-time performance. In addition, the proposed algorithm is equally effective for target tracking in complex scenes, such as multi-target tracking and target occlusion. Future research can further optimize the performance of algorithms to cope with more complex scenarios and problems.
This article presents a comprehensive survey of the integration of machine learning techniques into robotic grasping, with a special emphasis on the challenges and advancements for space applications. The incorporatio...
详细信息
This article presents a comprehensive survey of the integration of machine learning techniques into robotic grasping, with a special emphasis on the challenges and advancements for space applications. The incorporation of artificial intelligence, particularly through deeplearning, reinforcement learning, transfer learning, convolutional neural networks and recurrent neural networks, has significantly revolutionized robotic grasping. These advancements facilitate autonomous, efficient, and sophisticated manipulation in the challenging environment of outer space, transitioning from traditional mechanical grippers to sophisticated systems powered by advanced algorithms. This transition highlights the critical integration of sensory perception, grasp planning, and execution mechanisms, enhancing robots' capabilities to perceive, interact with, and manipulate objects with unprecedented precision and adaptability. The article meticulously outlines significant advancements achieved through the deployment of convolutional neural networks for visual information processing, RNNs for sequential decision-making, RL for autonomous strategy refinement, and transfer learning for leveraging pre-learned knowledge in novel tasks. These technologies address the unique challenges of space environments, such as varied textures, occlusions, microgravity conditions, and the sim-to-real gap, by enhancing sample efficiency, improving sim-to-real transfer capabilities, and integrating multimodal data for better object localization and pose estimation. Furthermore, the review explores the specific challenges faced in space robotic grasping, including handling varied textures and occlusions, adapting to unpredictable conditions, achieving real-timeprocessing, and ensuring safety and reliability. It proposes future research directions focused on overcoming these hurdles, such as enhanced generalization through multimodal learning, robust sim-to-real transfer techniques, and the development of
Recovery of the mineral and grade of the product in an integrated mineral processing plant are two key performance indicators that define plant profitability. Online monitoring and optimization of these parameters hel...
详细信息
Recovery of the mineral and grade of the product in an integrated mineral processing plant are two key performance indicators that define plant profitability. Online monitoring and optimization of these parameters helps improve process performance in real-time. However, achieving high product grade and high mineral recovery simultaneously is challenging due to their conflicting nature. We have applied machine-learning and deep-learning algorithms to build models for predicting recovery and grade on hourly and daily basis. We have further formulated and solved a multi-objective optimization problem maximizing recovery and grade to obtain a pareto optimal solution using a non-dominated sorting-based evolutionary algorithm, NSGA-II. The results obtained are useful in identifying the operability of a mineral processing plant to achieve the optimum grade and recovery for a given feed grade and the processing circuit.
暂无评论