Neural networks have become a cornerstone of computervision applications, with tasks ranging from image classification to object detection. However, challenges such as hyperparameter optimization (HPO) and model comp...
详细信息
Neural networks have become a cornerstone of computervision applications, with tasks ranging from image classification to object detection. However, challenges such as hyperparameter optimization (HPO) and model compression remain critical for improving performance and deploying models on resource-constrained devices. In this work, we address these challenges using Tensor Network-based methods. For HPO, we propose and evaluate the TetraOpt algorithm against various optimization algorithms. These evaluations were conducted on subsets of the NATS-Bench dataset, including CIFAR-10, CIFAR-100, and ImageNet subsets. TetraOpt consistently demonstrated superior performance, effectively exploring the global optimization space and identifying configurations with higher accuracies. For model compression, we introduce a novel iterative method that combines CP, SVD, and Tucker tensor decompositions. Applied to ResNet-18 and ResNet-152, we evaluated our method on the CIFAR-10 and Tiny ImageNet datasets. Our method achieved compression ratios of up to 14.5x for ResNet-18 and 2.5x for ResNet-152. Additionally, the inference time for processing an image on a CPU remained largely unaffected, demonstrating the practicality of the method.
Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light c...
详细信息
Automotive radar sensors provide valuable information for advanced driving assistance systems (ADAS). Radars can reliably estimate the distance to an object and the relative velocity, regardless of weather and light conditions. However, radar sensors suffer from low resolution and huge intra-class variations in the shape of objects. Exploiting the time information (e.g, multiple frames) has been shown to help to capture better the dynamics of objects and, therefore, the variation in the shape of objects. Most temporal radar object detectors use 3D convolutions to learn spatial and temporal information. However, these methods are often non-causal and unsuitable for real-time applications. This work presents RECORD, a new recurrent CNN architecture for online radar object detection. We propose an end-to-end trainable architecture mixing convolutions and ConvLSTMs to learn spatio-temporal dependencies between successive frames. Our model is causal and requires only the past information encoded in the memory of the ConvLSTMs to detect objects. Our experiments show such a method's relevance for detecting objects in different radar representations (range-Doppler, range-angle) and outperform state-of-the-art models on the ROD2021 and CARRADA datasets while being less computationally expensive.
In ophthalmology, intravitreal operative medication therapy (IVOM) is a widespread treatment for diseases related to the age-related macular degeneration (AMD), the diabetic macular edema, as well as the retinal vein ...
详细信息
In ophthalmology, intravitreal operative medication therapy (IVOM) is a widespread treatment for diseases related to the age-related macular degeneration (AMD), the diabetic macular edema, as well as the retinal vein occlusion. However, in real-world settings, patients often suffer from loss of vision on time scales of years despite therapy, whereas the prediction of the visual acuity (VA) and the earliest possible detection of deterioration under real-life conditions is challenging due to heterogeneous and incomplete data. In this contribution, we present a workflow for the development of a research-compatible data corpus fusing different IT systems of the department of ophthalmology of a German maximum care hospital. The extensive data corpus allows predictive statements of the expected progression of a patient and his or her VA in each of the three diseases. For the disease AMD, we found out a significant deterioration of the visual acuity over time. Within our proposed multistage system, we subsequently classify the VA progression into the three groups of therapy "winners", "stabilizers", and "losers" (WSL classification scheme). Our OCT biomarker classification using an ensemble of deep neural networks results in a classification accuracy (F1-score) of over 98%, enabling us to complete incomplete OCT documentations while allowing us to exploit them for a more precise VA modelling process. Our VA prediction requires at least four VA examinations and optionally OCT biomarkers from the same time period to predict the VA progression within a forecasted time frame, whereas our prediction is currently restricted to IVOM/no therapy. We achieve a final prediction accuracy of 69% in macro average F1-score, while being in the same range as the ophthalmologists with 57.8 and 50 +/- 10.7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{
Image recognition is one of the primary applications of machine learning algorithms. Nevertheless, machine learning models used in modern image recognition systems consist of millions of parameters that usually requir...
详细信息
Image recognition is one of the primary applications of machine learning algorithms. Nevertheless, machine learning models used in modern image recognition systems consist of millions of parameters that usually require significant computational time to be adjusted. Moreover, adjustment of model hyperparameters leads to additional overhead. Because of this, new developments in machine learning models and hyperparameter optimization techniques are required. This paper presents a quantum-inspired hyperparameter optimization technique and a hybrid quantum-classical machine learning model for supervised learning. We benchmark our hyperparameter optimization method over standard black-box objective functions and observe performance improvements in the form of reduced expected run times and fitness in response to the growth in the size of the search space. We test our approaches in a car image classification task and demonstrate a full-scale implementation of the hybrid quantum ResNet model with the tensor train hyperparameter optimization. Our tests show a qualitative and quantitative advantage over the corresponding standard classical tabular grid search approach used with a deep neural network ResNet34. A classification accuracy of 0.97 was obtained by the hybrid model after 18 iterations, whereas the classical model achieved an accuracy of 0.92 after 75 iterations.
Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literat...
详细信息
ISBN:
(纸本)9798350353006
Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literature. To address this, we are the first to introduce a robustness benchmark for DLA models, which includes 450K document images of three datasets. To cover realistic corruptions, we propose a perturbation taxonomy with 12 common document perturbations with 3 severity levels inspired by real-world document processing. Additionally, to better understand document perturbation impacts, we propose two metrics, Mean Perturbation Effect (mPE) for perturbation assessment and Mean Robustness Degradation (mRD) for robustness evaluation. Furthermore, we introduce a self-titled model, i.e., Robust Document Layout Analyzer (RoDLA), which improves attention mechanisms to boost extraction of robust features. Experiments on the proposed benchmarks (PubLayNet-P, DocLayNet-P, and M6Doc-P) demonstrate that RoDLA obtains state-of-the-art mRD scores of 115.7, 135.4, and 150.4, respectively. Compared to previous methods, RoDLA achieves notable improvements in mAP of +3.8%, +7.1% and +12.1%, respectively.
Weber local descriptor (WLD) is applied for addressing the challenges in image/pattern problems, especially in computer vision and pattern recognition domains. In this paper, we review literature on theories and appli...
详细信息
Weber local descriptor (WLD) is applied for addressing the challenges in image/pattern problems, especially in computer vision and pattern recognition domains. In this paper, we review literature on theories and applications of WLD. Using WLD, we address the different challenges of image analysis and recognition features with respect to illumination changes, contrast differences, and geometrical transformations like rotation, scaling, translation, and mirroring. Further, the role of the classifiers and experimental protocols used in the different applications are discussed. Applications include texture classification, medical imaging, agricultural safety, fingerprint analysis, forgery analysis, and face recognition.
This study introduces a dataset focused on Balinese traditional architecture, comprising two main categories, residential and sacred buildings. The dataset includes 14 types of residential buildings with 68 images and...
详细信息
This study introduces a dataset focused on Balinese traditional architecture, comprising two main categories, residential and sacred buildings. The dataset includes 14 types of residential buildings with 68 images and 29 types of sacred buildings with 76 images. Each design is represented through 2D drawings, such as front, side, section views, and floor plans, created using AutoCAD software. These drawings are meticulously derived from traditional Balinese architectural texts to ensure alignment with cultural norms. The completed designs were exported into JPG format using Figma, with accompanying metadata that includes size and structural details to enhance usability. This dataset aims to support researchers, cultural preservationists, and educators in exploring and preserving Balinese architectural heritage. (c) 2025 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC license ( http://***/licenses/by-nc/4.0/ )
In ecology, changes in environmental conditions are often closely linked to shifts in species diversity. This relationship can be investigated by analyzing avian vocalizations, which are robust indicators of trends in...
详细信息
In ecology, changes in environmental conditions are often closely linked to shifts in species diversity. This relationship can be investigated by analyzing avian vocalizations, which are robust indicators of trends in biodiversity. Within this contribution, we explored various data augmentation techniques and deep learning strategies for the classification of birdsong within natural soundscapes. For this purpose, we employed three fundamental deep neural network architectures, such as vision transformers, to classify 397 different bird species. To improve both the accuracy and generalizability of our models, we incorporated up to 19 well-established data augmentation techniques commonly used in audio classification. This included an iterative selection process where only augmentations that enhanced classification performance were selected. The primary augmentation technique involved the integration of various noise samples and non-bird audio elements, which significantly improved model performance as assessed on the BirdCLEF 2021 data set. Individual augmentations achieved F1scores from 48.0 % (vertical flip) to 72.6 % (primary background noise soundscapes). Through the strategic combination of key techniques - namely simulated pink noise, interspecies sound mixing, and loudness normalization - we achieved a top F1-score of 73.7%. Depending on the selected classification model, this corresponds to an improvement by 4.81 % to 10.5 %. Improvements and deteriorations of all applied augmentation techniques appeared to be robust across our three evaluated models. Therefore, our approach highlights the potential of sophisticated audio augmentations in refining the accuracy and robustness of birdsong classification models.
Objective: To develop an efficient and automated method for selecting appropriate coronary angiography videos for training deep learning models, thereby improving the accuracy and efficiency of medical image analysis....
详细信息
Introduction: We are currently developing a cell classification system intended for routine histopathology. During observation, cells of interest are added to a deep learning (DL) network, which after training classif...
详细信息
暂无评论