检索结果-内蒙古大学图书馆

learning deep Representations for Photo Retouching

IEEE TRANSACTIONS ON MULtimeDIA 2024年 26卷 3153-3163页

作者： Li, Di Rahardja, Susanto Northwestern Polytech Univ Ctr Intelligent Acoust & Immers Commun Sch Marine Sci & Technol Xian 710072 Peoples R China Singapore Inst Technol Singapore 138683 Singapore

Photo enhancement is a long-standing and challenging problem in image processing community. Despite having witnessed significant achievements in recent years, many of them are built upon supervised learning theories and thus required expertise in constructing a huge collection of paired data, which is well-known to be a problem as the acquisition of such data in real life can be impractical. We address this issue by proposing a multi-scale GAN framework that can be trained in an unsupervised fashion. Notably, we unify the design principle of the generator and discriminator in our framework so as to maximize the ability to learn deep latent representations. Specifically, rather than maintaining the content consistency through complicated two-way loss, we present a one-way loss that measures the content distance between multi-scale latent representations of inputs and outputs to speed up the training by 1.7x. Furthermore, we redesign the discriminator into a multi-scale-multi-stage manner to strengthen the adversarial learning, where the multiple latent features with varying scales are produced by the main discriminator and these features are then sent to auxiliary discriminators for final recognition. Extensive experiments have been conducted in the well-known MIT-Adobe-fivek and HDR+ datasets, and the results demonstrated that the proposed multi-scale representation learning framework shows outstanding performance in photo enhancement task.

关键词： Contrastive learning deep learning generative adversarial nets image enhancement unsupervised learning

来源：评论

学校读者我要写书评

暂无评论

Rating pome fruit quality traits using deep learning and image processing

引用

PLANT DIRECT 2024年第10期8卷 e70005页

作者： Nguyen, Nhan H. Michaud, Joseph Mogollon, Rene Zhang, Huiting Hargarten, Heidi Leisso, Rachel Torres, Carolina A. Honaas, Loren Ficklin, Stephen Washington State Univ Dept Hort Pullman WA 99164 USA ARS Physiol & Pathol Tree Fruits Res Unit Hood River W USDA Hood River OR USA Washington State Univ Tree Fruit Res & Extens Ctr Dept Hort Wenatchee WA USA ARS Physiol & Pathol Tree Fruits Res Unit USDA Wenatchee WA 98801 USA

Quality assessment of pome fruits (i.e. apples and pears) is used not only for determining the optimal harvest time but also for the progression of fruit-quality attributes during storage. Therefore, it is typical to repeatedly evaluate fruits during the course of a postharvest experiment. This evaluation often includes careful visual assessments of fruit for apparent defects and physiological symptoms. A general best practice for quality assessment is to rate fruit using the same individual rater or group of individual raters to reduce bias. However, such consistency across labs, facilities, and experiments is often not feasible or attainable. Moreover, while these visual assessments are critical empirical data, they are often coarse-grained and lack consistent objective criteria. Granny, is a tool designed for rating fruit using machine-learning and image-processing to address rater bias and improve resolution. Additionally, Granny supports backward compatibility by providing ratings compatible with long-established standards and references, promoting research program continuity. Current Granny ratings include starch content assessment, rating levels of peel defects, and peel color analyses. Integrative analyses enhanced by Granny's improved resolution and reduced bias, such as linking fruit outcomes to global scale -omics data, environmental changes, and other quantitative fruit quality metrics like soluble solids content and flesh firmness, will further enrich our understanding of fruit quality dynamics. Lastly, Granny is open-source and freely available.

关键词： machine learning pome fruit trait prediction

来源：评论

学校读者我要写书评

暂无评论

Self-supervised learning for Speech Emotion Recognition Task Using Audio-visual Features and Distil Hubert Model on BAVED and RAVDESS Databases

引用

Journal of Systems Science and Systems Engineering 2024年第5期33卷 576-606页

作者： Karim Dabbabi Abdelkarim Mars Research Unite of Analyse and Processing of Electrical and Energetic Systems Faculty of Sciences of TunisTunis El-Manar University2092Tunis-Tunisia Research Laboratory in Algebra Numbers Theory and Intelligent SystemsFaculty of Sciences of Monastir90 Mohamed V street5000-MonastirTunisia

Existing pre-trained models like Distil HuBERT excel at uncovering hidden patterns and facilitating accurate recognition across diverse data types, such as audio and visual information. We harnessed this capability to develop a deep learning model that utilizes Distil HuBERT for jointly learning these combined features in speech emotion recognition (SER). Our experiments highlight its distinct advantages: it significantly outperforms Wav2vec 2.0 in both offline and real-time accuracy on RAVDESS and BAVED datasets. Although slightly trailing HuBERT’s offline accuracy, Distil HuBERT shines with comparable performance at a fraction of the model size, making it an ideal choice for resource-constrained environments like mobile devices. This smaller size does come with a slight trade-off: Distil HuBERT achieved notable accuracy in offline evaluation, with 96.33% on the BAVED database and 87.01% on the RAVDESS database. In real-time evaluation, the accuracy decreased to 79.3% on the BAVED database and 77.87% on the RAVDESS database. This decrease is likely a result of the challenges associated with real-time processing, including latency and noise, but still demonstrates strong performance in practical scenarios. Therefore, Distil HuBERT emerges as a compelling choice for SER, especially when prioritizing accuracy over real-time processing. Its compact size further enhances its potential for resource-limited settings, making it a versatile tool for a wide range of applications.

关键词： Wav2vec 2.0 Distil HuBERT HuBERT SER audio and audio-visual features

来源：评论

学校读者我要写书评

暂无评论

Interpretable and Efficient Beamforming-Based deep learning for Single-Snapshot DOA Estimation

引用

IEEE SENSORS JOURNAL 2024年第14期24卷 22096-22105页

作者： Zheng, Ruxin Sun, Shunqiao Liu, Hongshan Chen, Honglei Li, Jian Univ Alabama Dept Elect & Comp Engn Tuscaloosa AL 38457 USA Univ Florida Dept Elect & Comp Engn Gainesville FL 32611 USA Univ Florida Dept Elect & Comp Engn Gainesville FL 32611 USA

We introduce an interpretable deep-learning (DL) approach for direction-of-arrival (DOA) estimation with a single snapshot. Classical subspace-based methods, such as multiple signal classification (MUSIC) and estimation of parameters by rotational invariant technique (ESPRIT), use spatial smoothing on uniform linear arrays (ULAs) for single-snapshot DOA estimation but face drawbacks in reduced array aperture and inapplicability to sparse arrays. Single-snapshot methods, such as compressive sensing (CS) and iterative adaptive approach (IAA), encounter challenges with high-computational costs and slow convergence, hampering real-time use. Recent DL DOA methods offer promising accuracy and speed. However, the practical deployment of deep networks is hindered by their black-box nature. To address this, we propose a deep-minimum power distortionless response (MPDR) network translating MPDR-type beamformer into DL, enhancing generalization and efficiency. Comprehensive experiments conducted using both simulated and real-world datasets substantiate its dominance in terms of inference time and accuracy in comparison with conventional methods. Moreover, it excels in terms of efficiency, generalizability, and interpretability when contrasted with other DL DOA estimation networks.

关键词： Direction-of-arrival estimation Estimation deep learning Covariance matrices Sensors Mathematical models Array signal processing automotive radar deep learning (DL) interpretability single-snapshot direction-of-arrival (DOA) estimation

来源：评论

学校读者我要写书评

暂无评论

Advanced CNN based on genetic algorithm to automated femoral neck fracture classification

引用

SIGNAL image AND VIDEO processing 2024年第6-7期18卷 5229-5238页

作者： Berrajaa, Achraf Merras, Mostafa Berrajaa, Issam Univ Mohamed First Fac Sci Dept Comp Sci Oujda Morocco Moulay Ismail Univ Sch Technol Meknes Morocco Mohammed VI Univ Hosp Oujda Morocco

In this study, we propose an efficient fusion framework that utilizes deep learning and a genetic algorithm for the classification of femoral neck fracture images. This is the first study to utilize a genetic algorithm (GA) to optimize the architecture of a Convolutional neural network (CNN) model for the classification of femoral neck fractures. The proposed CNN was trained on a large dataset of 10 000 real patient cases, who underwent both skeletal bone mineral density measurement and hip X-ray at the University Hospital Center of Oujda between 2016 and 2023. The performance of the model was extensively evaluated and compared to various machine learning and deep learning models, including Random Forest, SVM, VGG19, ResNet50, InceptionV3, and EfficientNet. The experimental results demonstrate that the proposed CNN achieved an accuracy of 97%, and it is currently being used by seven doctors at the University Hospital Center of Oujda, Marocco.

关键词： Femoral neck fractures Smart model deep learning CNN Genetic algorithm

来源：评论

学校读者我要写书评

暂无评论

SLTM Network: Efficient Application of Lightweight image Segmentation Technology in Detecting Drivable Areas for Unmanned Line-Marking Machines

引用

IEEE ACCESS 2024年 12卷 169001-169012页

作者： Wang, Chao Chen, Xiangkai Wang, Bingtao Zhang, Liang Liu, Bing Shandong Univ Sch Mech Elect & Informat Engn Weihai 264209 Peoples R China Shandong Univ Off Acad Affairs Weihai 264209 Peoples R China

image segmentation plays a crucial role in the roadwork operations of autonomous line-painting machines. However, the limited resources of mobile platforms in intelligent line-painting applications pose a dual challenge of ensuring both accuracy and real-time performance in road segmentation. To address this issue, this study introduces a lightweight yet efficient image segmentation model, termed the SLTM Network. Central to this network is the lightweight SLTM module, which significantly reduces the model's parameter count and lowers the computational overhead of the decoder. To enhance the interplay of information at different spatial resolutions, the network incorporates an SE attention-enhanced upsampling module (SAUM) and employs a Spatial Attention Sequence (SAS) unit to improve global environment perception at a low computational cost. Comprehensive experimental evaluations on the Cityscapes dataset demonstrate that the SLTM Network excels in balancing speed and accuracy, achieving an mIoU of 70.5% with only 4.07M parameters and an impressive inference speed of 267.1 FPS. On the embedded device Jetson Xavier NX, it achieves an inference speed of 34.2 FPS. Compared to existing lightweight image segmentation models, the SLTM Network exhibits significant advantages in both processing speed and accuracy, making it particularly suitable for real-time autonomous line-painting machine applications.

关键词： Roads image segmentation Feature extraction Computer architecture Computational efficiency Synthetic aperture sonar Semantics Lightweight image segmentation unmanned line-marking machines deep learning in autonomous driving embedded systems optimization SLTM network real-time road segmentation

来源：评论

学校读者我要写书评

暂无评论

Signal to image Conversion and Convolutional Neural Networks for Physiological Signal processing: A Review

引用

IEEE ACCESS 2024年 12卷 66726-66764页

作者： Vidyasagar, K. E. Ch Kumar, K. Revanth Sai, G. N. K. Anantha Ruchita, Munagala Saikia, Manob Jyoti Univ North Florida Biomed Sensors & Syst Lab Jacksonville FL 32224 USA Osmania Univ Univ Coll Engn Dept Biomed Engn Hyderabad 500007 India Univ North Florida Dept Elect Engn Jacksonville FL 32224 USA

Physiological signals obtained from electroencephalography (EEG), electromyography (EMG), and electrocardiography (ECG) provide valuable clinical information but pose challenges for analysis due to their high-dimensional nature. Traditional machine learning techniques, relying on hand-crafted features from fixed analysis windows, can lead to the loss of discriminative information. Recent studies have demonstrated the effectiveness of deep convolutional neural networks (CNNs) for robust automated feature learning from raw physiological signals. However, standard CNN architectures require two-dimensional image data as input. This has motivated research into innovative signal-to-image (STI) transformation techniques to convert one-dimensional time series into images preserving spectral, spatial, and temporal characteristics. This paper reviews recent advances in strategies for physiological signal-to-image conversion and their applications using CNNs for automated processing tasks. A systematic analysis of EEG, EMG, and ECG signal transformation and CNN-based analysis techniques spanning diverse applications, including brain-computer interfaces, seizure detection, motor control, sleep stage classification, arrhythmia detection, and more, are presented. Key insights are synthesized regarding the relative merits of different transformation approaches, CNN model architectures, training procedures, and benchmark performance. Current challenges and promising research directions at the intersection of deep learning and physiological signal processing are discussed. This review aims to catalyze continued innovations in effective end-to-end systems for clinically relevant information extraction from multidimensional physiological data using convolutional neural networks by providing a comprehensive overview of state-of-the-art techniques.

关键词： Biomedical signal analysis convolutional neural networks deep learning machine learning physiological signals signal-to-image conversion

来源：评论

学校读者我要写书评

暂无评论

A real-time Tracking Approach for Moving Objects Based on an Integrated Algorithm of YOLOv7 and SORT

引用

JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS 2025年第4期34卷

作者： Wang, Lei Liu, Tongtong Yellow River Conservancy Tech Inst Coll Elect Engn Kaifeng 475004 Henan Peoples R China

Mobile target tracking remains a significant issue in smart cities. Due to complex changes in time and space of targets, real-time tracking remains a challenging problem. As a result, this paper proposes a real-time tracking approach for moving objects by combining the advantages of YOLOv7 and SORT algorithms. First, we use the YOLOv7 algorithm for object detection, which has the characteristics of high accuracy and efficiency. Then, we apply the SORT algorithm to the target tracking stage, which estimates and updates the target state through Kalman filtering. The collaborative function of the two parts is expected to achieve high-quality tracking of moving targets. Besides, this paper also demonstrates experiments and analysis on image datasets. The experimental results show that the proposed algorithm has achieved good performance in real-time tracking of moving targets. Compared with traditional methods, it can more accurately predict the position and trajectory of targets and has better real-time performance. In addition, the proposed algorithm is equally effective for target tracking in complex scenes, such as multi-target tracking and target occlusion. Future research can further optimize the performance of algorithms to cope with more complex scenarios and problems.

关键词： Moving target tracking real-time image processing YOLOv7 Kalman filtering deep learning

来源：评论

学校读者我要写书评

暂无评论

Review of machine learning in robotic grasping control in space application

引用

ACTA ASTRONAUTICA 2024年 220卷 37-61页

作者： Jahanshahi, Hadi Zhu, Zheng H. York Univ Dept Mech Engn 4700 Keele St Toronto ON M3J 1P3 Canada

This article presents a comprehensive survey of the integration of machine learning techniques into robotic grasping, with a special emphasis on the challenges and advancements for space applications. The incorporation of artificial intelligence, particularly through deep learning, reinforcement learning, transfer learning, convolutional neural networks and recurrent neural networks, has significantly revolutionized robotic grasping. These advancements facilitate autonomous, efficient, and sophisticated manipulation in the challenging environment of outer space, transitioning from traditional mechanical grippers to sophisticated systems powered by advanced algorithms. This transition highlights the critical integration of sensory perception, grasp planning, and execution mechanisms, enhancing robots' capabilities to perceive, interact with, and manipulate objects with unprecedented precision and adaptability. The article meticulously outlines significant advancements achieved through the deployment of convolutional neural networks for visual information processing, RNNs for sequential decision-making, RL for autonomous strategy refinement, and transfer learning for leveraging pre-learned knowledge in novel tasks. These technologies address the unique challenges of space environments, such as varied textures, occlusions, microgravity conditions, and the sim-to-real gap, by enhancing sample efficiency, improving sim-to-real transfer capabilities, and integrating multimodal data for better object localization and pose estimation. Furthermore, the review explores the specific challenges faced in space robotic grasping, including handling varied textures and occlusions, adapting to unpredictable conditions, achieving real-time processing, and ensuring safety and reliability. It proposes future research directions focused on overcoming these hurdles, such as enhanced generalization through multimodal learning, robust sim-to-real transfer techniques, and the development of

关键词： Machine learning Space robotics Robotic grasping deep learning Reinforcement learning Transfer learning Convolutional neural networks Recurrent neural networks Sim-to-real transfer Multimodal learning

来源：评论

学校读者我要写书评

暂无评论

Artificial Intelligence for Monitoring and Optimization of an Integrated Mineral processing Plant

引用

TRANSACTIONS OF THE INDIAN INSTITUTE OF METALS 2024年第12期77卷 4231-4240页

作者： Masampally, Vishnu Swaroopji Pareek, Aditya Nadimpalli, Naga Ravikumar Varma Runkana, Venkataramana Tata Consultancy Serv Ltd TCS Res Pune 411013 India

Recovery of the mineral and grade of the product in an integrated mineral processing plant are two key performance indicators that define plant profitability. Online monitoring and optimization of these parameters helps improve process performance in real-time. However, achieving high product grade and high mineral recovery simultaneously is challenging due to their conflicting nature. We have applied machine-learning and deep-learning algorithms to build models for predicting recovery and grade on hourly and daily basis. We have further formulated and solved a multi-objective optimization problem maximizing recovery and grade to obtain a pareto optimal solution using a non-dominated sorting-based evolutionary algorithm, NSGA-II. The results obtained are useful in identifying the operability of a mineral processing plant to achieve the optimum grade and recovery for a given feed grade and the processing circuit.

关键词： Mineral processing Multi-objective optimization Genetic algorithm Machine learning deep learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：