检索结果-内蒙古大学图书馆

Tensor Network Methods for Hyperparameter Optimization and Compression of Convolutional Neural Networks

APPLIED SCIENCES-BASEL 2025年第4期15卷 1852-1852页

作者： Naumov, A. Melnikov, A. Perelshtein, M. Melnikov, Ar. Abronin, V. Oksanichenko, F. Terra Quantum AG Kornhausstr 25 CH-9000 St Gallen Switzerland

Neural networks have become a cornerstone of computer vision applications, with tasks ranging from image classification to object detection. However, challenges such as hyperparameter optimization (HPO) and model compression remain critical for improving performance and deploying models on resource-constrained devices. In this work, we address these challenges using Tensor Network-based methods. For HPO, we propose and evaluate the TetraOpt algorithm against various optimization algorithms. These evaluations were conducted on subsets of the NATS-Bench dataset, including CIFAR-10, CIFAR-100, and ImageNet subsets. TetraOpt consistently demonstrated superior performance, effectively exploring the global optimization space and identifying configurations with higher accuracies. For model compression, we introduce a novel iterative method that combines CP, SVD, and Tucker tensor decompositions. Applied to ResNet-18 and ResNet-152, we evaluated our method on the CIFAR-10 and Tiny ImageNet datasets. Our method achieved compression ratios of up to 14.5x for ResNet-18 and 2.5x for ResNet-152. Additionally, the inference time for processing an image on a CPU remained largely unaffected, demonstrating the practicality of the method.

关键词： tensortrain optimisation hyperparameter optimisation image classification computer vision and pattern recognition convolutional neural networks model compression CP decomposition SVD decomposition tucker decomposition

来源：评论

学校读者我要写书评

暂无评论

A Recurrent CNN for Online Object Detection on Raw Radar Frames

引用

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS 2024年第10期25卷 13432-13441页

作者： Decourt, Colin VanRullen, Rufin Salle, Didier Oberlin, Thomas Univ Toulouse Artificial & Nat Intelligence Toulouse Inst F-31400 Toulouse France Univ Toulouse ISAE SUPAERO F-31400 Toulouse France CerCO CNRS UMR5549 F-31052 Toulouse France NXP Semicond F-31100 Toulouse France

Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light conditions. However, radar sensors suffer from low resolution and huge intra-class variations in the shape of objects. Exploiting the time information (e.g, multiple frames) has been shown to help to capture better the dynamics of objects and, therefore, the variation in the shape of objects. Most temporal radar object detectors use 3D convolutions to learn spatial and temporal information. However, these methods are often non-causal and unsuitable for real-time applications. This work presents RECORD, a new recurrent CNN architecture for online radar object detection. We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn spatio-temporal dependencies between successive frames. Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects. Our experiments show such a method's relevance for detecting objects in different radar representations (range-Doppler, range-angle) and outperform state-of-the-art models on the ROD2021 and CARRADA datasets while being less computationally expensive.

关键词： computer vision and pattern recognition radar object detection autonomous driving radar imaging

来源：评论

学校读者我要写书评

暂无评论

Visual acuity prediction on real-life patient data using a machine learning based multistage system

引用

SCIENTIFIC REPORTS 2024年第1期14卷 1-18页

作者： Schlosser, Tobias Beuth, Frederik Meyer, Trixy Kumar, Arunodhayan Sampath Stolze, Gabriel Furashova, Olga Engelmann, Katrin Kowerko, Danny Tech Univ Chemnitz Jr Professorship Media Comp D-09107 Chemnitz Germany Klinikum Chemnitz gGmbH Dept Ophthalmol D-09116 Chemnitz Germany

In ophthalmology, intravitreal operative medication therapy (IVOM) is a widespread treatment for diseases related to the age-related macular degeneration (AMD), the diabetic macular edema, as well as the retinal vein occlusion. However, in real-world settings, patients often suffer from loss of vision on time scales of years despite therapy, whereas the prediction of the visual acuity (VA) and the earliest possible detection of deterioration under real-life conditions is challenging due to heterogeneous and incomplete data. In this contribution, we present a workflow for the development of a research-compatible data corpus fusing different IT systems of the department of ophthalmology of a German maximum care hospital. The extensive data corpus allows predictive statements of the expected progression of a patient and his or her VA in each of the three diseases. For the disease AMD, we found out a significant deterioration of the visual acuity over time. Within our proposed multistage system, we subsequently classify the VA progression into the three groups of therapy "winners", "stabilizers", and "losers" (WSL classification scheme). Our OCT biomarker classification using an ensemble of deep neural networks results in a classification accuracy (F1-score) of over 98%, enabling us to complete incomplete OCT documentations while allowing us to exploit them for a more precise VA modelling process. Our VA prediction requires at least four VA examinations and optionally OCT biomarkers from the same time period to predict the VA progression within a forecasted time frame, whereas our prediction is currently restricted to IVOM/no therapy. We achieve a final prediction accuracy of 69% in macro average F1-score, while being in the same range as the ophthalmologists with 57.8 and 50 +/- 10.7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{

关键词： Ophthalmology Ophthalmology diseases Treatment progression OCT biomarkers computer vision and pattern recognition Predictive statistics Machine learning Deep learning

来源：评论

学校读者我要写书评

暂无评论

Hybrid quantum ResNet for car classification and its hyperparameter optimization

引用

QUANTUM MACHINE INTELLIGENCE 2023年第2期5卷 1-15页

作者： Sagingalieva, Asel Kordzanganeh, Mo Kurkin, Andrii Melnikov, Artem Kuhmistrov, Daniil Perelshtein, Michael Melnikov, Alexey Skolik, Andrea Von Dollen, David Terra Quantum AG CH-9000 St Gallen Switzerland Volkswagen Data Lab D-80805 Munich Germany Volkswagen Grp Amer Auburn Hills MI 48326 USA Leiden Univ NL-2333 CA Leiden Netherlands

Image recognition is one of the primary applications of machine learning algorithms. Nevertheless, machine learning models used in modern image recognition systems consist of millions of parameters that usually require significant computational time to be adjusted. Moreover, adjustment of model hyperparameters leads to additional overhead. Because of this, new developments in machine learning models and hyperparameter optimization techniques are required. This paper presents a quantum-inspired hyperparameter optimization technique and a hybrid quantum-classical machine learning model for supervised learning. We benchmark our hyperparameter optimization method over standard black-box objective functions and observe performance improvements in the form of reduced expected run times and fitness in response to the growth in the size of the search space. We test our approaches in a car image classification task and demonstrate a full-scale implementation of the hybrid quantum ResNet model with the tensor train hyperparameter optimization. Our tests show a qualitative and quantitative advantage over the corresponding standard classical tabular grid search approach used with a deep neural network ResNet34. A classification accuracy of 0.97 was obtained by the hybrid model after 18 iterations, whereas the classical model achieved an accuracy of 0.92 after 75 iterations.

关键词： Hybrid quantum neural networks Tensor train optimisation Hybrid quantum machine learning Hybrid quantum computing Hyperparameter optimisation Image classification computer vision and pattern recognition Machine learning

来源：评论

学校读者我要写书评

暂无评论

RoDLA: Benchmarking the Robustness of Document Layout Analysis Models

RoDLA: Benchmarking the Robustness of Document Layout Analys...

引用

IEEE/CVF Conference on computer vision and pattern recognition (CVPR)

作者： Chen, Yufan Zhang, Jiaming Peng, Kunyu Zheng, Junwei Liu, Ruiping Torre, Philip Stiefelhagen, Rainer Karlsruhe Inst Technol Karlsruhe Germany Univ Oxford Oxford England

ISBN: (纸本)9798350353006

Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literature. To address this, we are the first to introduce a robustness benchmark for DLA models, which includes 450K document images of three datasets. To cover realistic corruptions, we propose a perturbation taxonomy with 12 common document perturbations with 3 severity levels inspired by real-world document processing. Additionally, to better understand document perturbation impacts, we propose two metrics, Mean Perturbation Effect (mPE) for perturbation assessment and Mean Robustness Degradation (mRD) for robustness evaluation. Furthermore, we introduce a self-titled model, i.e., Robust Document Layout Analyzer (RoDLA), which improves attention mechanisms to boost extraction of robust features. Experiments on the proposed benchmarks (PubLayNet-P, DocLayNet-P, and M6Doc-P) demonstrate that RoDLA obtains state-of-the-art mRD scores of 115.7, 135.4, and 150.4, respectively. Compared to previous methods, RoDLA achieves notable improvements in mAP of +3.8%, +7.1% and +12.1%, respectively.

关键词： computer vision and pattern recognition Document Analysis Robustness

来源：评论

学校读者我要写书评

暂无评论

Weber local descriptor for image analysis and recognition: a survey

引用

VISUAL computer 2022年第1期38卷 321-343页

作者： Banerjee, Arnab Das, Nibaran Santosh, K. C. Jadavpur Univ Kolkata 700032 W Bengal India Univ South Dakota Vermillion SD 57069 USA

Weber local descriptor (WLD) is applied for addressing the challenges in image/pattern problems, especially in computer vision and pattern recognition domains. In this paper, we review literature on theories and applications of WLD. Using WLD, we address the different challenges of image analysis and recognition features with respect to illumination changes, contrast differences, and geometrical transformations like rotation, scaling, translation, and mirroring. Further, the role of the classifiers and experimental protocols used in the different applications are discussed. Applications include texture classification, medical imaging, agricultural safety, fingerprint analysis, forgery analysis, and face recognition.

关键词： Weber local descriptor Image analysis and recognition computer vision and pattern recognition

来源：评论

学校读者我要写书评

暂无评论

Balinese traditional building architecture dataset

引用

DATA IN BRIEF 2025年 60卷

作者： Kadyanan, I. Gusti Agung Gede Arya Sanjaya, E. R. Ngurah Agus Dwidasmara, Ida Bagus Gede Adnyana, I. Kadek Dwi Pratama, I. Komang Widia Wiradhanta, Ngakan Made Alit Mahayasa, I. Agung Gede Ary Udayana Univ Fac Math & Nat Sci Dept Informat Badung Indonesia

This study introduces a dataset focused on Balinese traditional architecture, comprising two main categories, residential and sacred buildings. The dataset includes 14 types of residential buildings with 68 images and 29 types of sacred buildings with 76 images. Each design is represented through 2D drawings, such as front, side, section views, and floor plans, created using AutoCAD software. These drawings are meticulously derived from traditional Balinese architectural texts to ensure alignment with cultural norms. The completed designs were exported into JPG format using Figma, with accompanying metadata that includes size and structural details to enhance usability. This dataset aims to support researchers, cultural preservationists, and educators in exploring and preserving Balinese architectural heritage. (c) 2025 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC license ( http://***/licenses/by-nc/4.0/ )

关键词： Balinese Traditional Architecture Balinese Traditional Buildings Kosala Kosali Balinese Traditional Architecture Dataset Residential Buildings Sacred Buildings Cultural Preservation computer vision and pattern recognition computer science

来源：评论

学校读者我要写书评

暂无评论

Improving learning-based birdsong classification by utilizing combined audio augmentation strategies

引用

ECOLOGICAL INFORMATICS 2024年 82卷

作者： Kumar, Arunodhayan Sampath Schlosser, Tobias Kahl, Stefan Kowerko, Danny Tech Univ Chemnitz Media Comp Str Nationen 62 D-09107 Chemnitz Saxony Germany Tech Univ Chemnitz Media Informat Str Nationen 62 D-09107 Chemnitz Saxony Germany Cornell Univ K Lisa Yang Ctr Conservat Bioacoust Cornell Lab Ornithol 159 Sapsucker Woods Rd Ithaca NY 14850 USA

In ecology, changes in environmental conditions are often closely linked to shifts in species diversity. This relationship can be investigated by analyzing avian vocalizations, which are robust indicators of trends in biodiversity. Within this contribution, we explored various data augmentation techniques and deep learning strategies for the classification of birdsong within natural soundscapes. For this purpose, we employed three fundamental deep neural network architectures, such as vision transformers, to classify 397 different bird species. To improve both the accuracy and generalizability of our models, we incorporated up to 19 well-established data augmentation techniques commonly used in audio classification. This included an iterative selection process where only augmentations that enhanced classification performance were selected. The primary augmentation technique involved the integration of various noise samples and non-bird audio elements, which significantly improved model performance as assessed on the BirdCLEF 2021 data set. Individual augmentations achieved F1scores from 48.0 % (vertical flip) to 72.6 % (primary background noise soundscapes). Through the strategic combination of key techniques - namely simulated pink noise, interspecies sound mixing, and loudness normalization - we achieved a top F1-score of 73.7%. Depending on the selected classification model, this corresponds to an improvement by 4.81 % to 10.5 %. Improvements and deteriorations of all applied augmentation techniques appeared to be robust across our three evaluated models. Therefore, our approach highlights the potential of sophisticated audio augmentations in refining the accuracy and robustness of birdsong classification models.

关键词： Audio classification Augmentation strategies Birdsong soundscapes computer vision and pattern recognition Convolutional neural networks vision transformers

来源：评论

学校读者我要写书评

暂无评论

Optimizing Input Selection for Cardiac Model Training and Inference: An Efficient 3D Convolutional Neural Networks-Based Approach to Automate Coronary Angiogram Video Selection

引用

Mayo Clinic Proceedings: Digital Health 2025年第1期3卷 100195-100195页

作者： Chang, Shih-Sheng Rostami, Behrouz LoRusso, Gerardo Liu, Chia-Hao Alkhouli, Mohamad Department of Cardiovascular Medicine Mayo Clinic Rochester MN United States Division of Cardiology Department of Internal Medicine China Medical University Hospital Taichung Taiwan Artificial Intelligence Center China Medical University Hospital Taichung Taiwan School of Medicine College of Medicine China Medical University Taichung Taiwan Department of Clinical Sciences and Community Health Cardiovascular Section University of Milan Milan Italy

Objective: To develop an efficient and automated method for selecting appropriate coronary angiography videos for training deep learning models, thereby improving the accuracy and efficiency of medical image analysis. Patients and Methods: We developed deep learning models using 232 coronary angiographic studies from the Mayo Clinic. We utilized 2 state-of-the-art convolutional neural networks (CNN: ResNet and X3D) to identify low-quality angiograms through binary classification (satisfactory/unsatisfactory). Ground truth for the quality of the input angiogram was determined by 2 experienced cardiologists. We validated the developed model in an independent dataset of 3208 procedures from 3 Mayo sites. Results: The 3D-CNN models outperformed their 2D counterparts, with the X3D-L model achieving superior performance across all metrics (AUC 0.98, accuracy 0.96, precision 0.87, and F1 score 0.92). Compared with 3D models, 2D architectures are smaller and less computationally complex. Despite having a 3D architecture, the X3D-L model had lower computational demand (19.34 Giga Multiply Accumulate Operation) and parameter count (5.34 M) than 2D models. When validating models on the independent dataset, slight decreases in all metrics were observed, but AUC and accuracy remained robust (0.95 and 0.92, respectively, for the X3D-L model). Conclusion: We developed a rapid and effective method for automating the selection of coronary angiogram video clips using 3D-CNNs, potentially improving model accuracy and efficiency in clinical applications. The X3D-L model reports a balanced trade-off between computational efficiency and complexity, making it suitable for real-life clinical applications. © 2025 The Authors

关键词： AUC area under the curve CNN convolutional neural networks CVPR computer vision and pattern recognition ML machine learning NPV negative predictive value QC quality control

来源：评论

学校读者我要写书评

暂无评论

Technical note: Impact of tissue section thickness on accuracy of cell classification with a deep learning network

引用

Journal of Pathology Informatics 2025年 17卷 100440-100440页

作者： Christiansen, Ida Skovgaard Hartvig, Rasmus Jensen, Thomas Hartvig Lindkær Department of Pathology Copenhagen University Hospital Rigshospitalet Copenhagen Denmark Faculty of Health and Medical Sciences University of Copenhagen Copenhagen Denmark

Introduction: We are currently developing a cell classification system intended for routine histopathology. During observation, cells of interest are added to a deep learning (DL) network, which after training classifies the remaining cells of interest with high and immediately validatable accuracy. In this study, we identify the optimal histological microsection thickness for this process and describe in high detail the morphological differences introduced by variation in microsection thickness. Method: From HE-stained digitized sections of liver cut manually at 5 thicknesses and on an automated microtome (DS), hepatocytes and non-hepatocytes were manually annotated and loaded into a DL convolutional neural network (ResNet). The network was trained at different settings to identify the thickness with optimal relation between number of training cells and validation accuracy. To shed interpretable light on the impact of thickness, exhaustive morphological details of the annotated cells were quantified and the differences between hepatocytes and non-hepatocytes were analyzed with random forest. Results: Classifying hepatocytes from DS sections clearly resulted in highest validation accuracy with least number of cells and for the remaining thicknesses a trend towards thin sections being more efficient was observed. Random forest analysis generally identified variations in nuclear granularity as the most important features in distinguishing cells. In DS and the thinner tissue sections, nuclear granularity features were more distinguished. Conclusion: Microsections cut with DS in particular and thin sections in general are better suited for the intended cell classification system. © 2025 The Authors

关键词： computer vision and pattern recognition Deep learning Histology Machine learning Tissues and organs

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：