Road surface monitoring in winter conditions is of great importance to ensure the safety of road users. Estimation of snow coverage on roads can be included in intelligent transportation systems to alert drivers or im...
详细信息
ISBN:
(数字)9781665463829
ISBN:
(纸本)9781665463836
Road surface monitoring in winter conditions is of great importance to ensure the safety of road users. Estimation of snow coverage on roads can be included in intelligent transportation systems to alert drivers or improve snow removal processes. Several models have been proposed for estimating snow coverage using surveillance cameras, but these models have focused on predicting few snow levels, which limits their usefulness in practice. In this paper, we present a model that allows a more granular estimation of the percentage of road surface covered by snow by predicting snow coverage from 0% (no snow) to 100% (fully snow-covered) using increments of 10%. We propose an ensemble learning model combining a deep convolutional neural network (CNN) and a support-vector machine (SVM). The accuracy of our model is similar to the state-of-the-art accuracy despite the higher task complexity associated with the increased granularity of predictions.
Wireless Sensor Systems (WSN) is a broad, exciting area with new perspectives and growing growth over the past decades, where more research is being done. WSNs contain many (hundreds of thousands) of micro-sized, chea...
Wireless Sensor Systems (WSN) is a broad, exciting area with new perspectives and growing growth over the past decades, where more research is being done. WSNs contain many (hundreds of thousands) of micro-sized, cheap chips, powered by low-cost wireless interconnected batteries. These chips are called nodes, which could be of several types, including acoustic, radar, low-frequency magnetic, thermal, and visual sampling frequencies. Many applications are based nowadays on WSN, such as environmental control, smart cities, wildlife monitoring, vehicles and infrastructure, natural disasters, home security, underwater investigations, military, airplane surveillance and drone body sensors. One of the aims of the paper is to develop a smart architecture based on intelligent microprocessor sensors, WSN802GCA/GPA and the software needed to operate with a TINI control unit. TINI allows this system to be connected to Ethernet via TCP/I$\mathbf{P}$
Skin cancer diagnosis, a critical task in the medical domain, can be revolutionized through the application of advanced deep-learning techniques. This work investigates the efficacy of Convolutional Neural Networks (C...
详细信息
Skin cancer diagnosis, a critical task in the medical domain, can be revolutionized through the application of advanced deep-learning techniques. This work investigates the efficacy of Convolutional Neural Networks (CNNs) in the automated classification of skin cancer. The process begins with a comprehensive explanation of key CNN layers: Conv2D, MaxPool2D, Dropout, and Dense. The Conv2D layers employ learnable filters that transform localized image segments, while MaxPool2D contributes to downsampling, effectively reducing computational cost and overfitting risk. Integrating these layers enables the network to capture local and global characteristics, which is crucial for accurate classification. Adding Dropout layers enhances generalization and mitigates overfitting by introducing randomness during training. ReLU activation functions infuse non-linearity, and the Flatten layer facilitates the transition to fully connected layers. The proposed CNN architecture is meticulously designed considering filter counts, kernel sizes, and pooling dimensions. The trained model demonstrates promising performance by utilizing the HAM10000 dataset, encompassing diverse skin lesion images across seven classes. The CNN model’s parameters and architecture are systematically presented, offering insights into its design rationale. The model undergoes optimization with the Adam optimizer and annealing techniques to facilitate convergence. The model’s effectiveness is evaluated on validation and test datasets, demonstrating an accuracy of 78.55% and 76.49%, respectively, for skin cancer classification. Data augmentation strategies are introduced to enhance model generalization further. The results underscore CNN’s potential as a robust tool for automating skin cancer diagnosis, aligning with the broader trend of leveraging deep learning for medical image analysis
Affine correspondences have traditionally been used to improve feature matching over wide baselines. While recent work has successfully used affine correspondences to solve various relative camera pose estimation prob...
Affine correspondences have traditionally been used to improve feature matching over wide baselines. While recent work has successfully used affine correspondences to solve various relative camera pose estimation problems, less attention has been given to their use in absolute pose estimation. We introduce the first general solution to the problem of estimating the pose of a calibrated camera given a single observation of an oriented point and an affine correspondence. The advantage of our approach (P1AC) is that it requires only a single correspondence, in comparison to the traditional point-based approach (P3P), significantly reducing the combinatorics in robust estimation. P1AC provides a general solution that removes restrictive assumptions made in prior work and is applicable to large-scale image-based localization. We propose a minimal solution to the P1AC problem and evaluate our novel solver on synthetic data, showing its numerical stability and performance under various types of noise. On standard image-based localization benchmarks we show that P1AC achieves more accurate results than the widely used P3P algorithm. Code for our method is available at https://***/jonathanventura/P1AC/.
In incremental learning, replaying stored samples from previous tasks together with current task samples is one of the most efficient approaches to address catastrophic forgetting. However, unlike incremental classifi...
In incremental learning, replaying stored samples from previous tasks together with current task samples is one of the most efficient approaches to address catastrophic forgetting. However, unlike incremental classification, image replay has not been successfully applied to incremental object detection (IOD). In this paper, we identify the overlooked problem of foreground shift as the main reason for this. Foreground shift only occurs when replaying images of previous tasks and refers to the fact that their background might contain foreground objects of the current task. To overcome this problem, a novel and efficient Augmented Box Replay (ABR) method is developed that only stores and replays foreground objects and thereby circumvents the foreground shift problem. In addition, we propose an innovative Attentive RoI Distillation loss that uses spatial attention from region-of-interest (RoI) features to constrain current model to focus on the most important information from old model. ABR significantly reduces forgetting of previous classes while maintaining high plasticity in current classes. Moreover, it considerably reduces the storage requirements when compared to standard image replay. Comprehensive experiments on Pascal-VOC and COCO datasets support the state-of-the-art performance of our model 1 .
Traditional fully annotated closed set 3D object detection methods improve model performance but are impractical in real-world settings due to the emergence of new categories and the complexity of 3D annotations. Open...
详细信息
ISBN:
(数字)9798350368604
ISBN:
(纸本)9798350368611
Traditional fully annotated closed set 3D object detection methods improve model performance but are impractical in real-world settings due to the emergence of new categories and the complexity of 3D annotations. Open-World Object Detection (OWOD) addresses these issues but relies heavily on manual labeling, which is costly. This paper focuses on open world active learning and proposes an entropy-guided reinforced open world active 3D object detection (EROA). EROA regards active learning as a reinforcement learning problem tailored for open driving scenarios. We use entropy as a reward metric for efficient reinforcement learning. We also leverage knowledge from the 2D domain using object-level large-scale vision-language models to enhance sample selection. Extensive experiments evidence that the proposed EROA meets the dynamic and cost-sensitive requirements of autonomous driving, enabling real-time detection of both known and unknown objects.
Exemplar-free class-incremental learning (CIL) poses several challenges since it prohibits the rehearsal of data from previous tasks and thus suffers from catastrophic forgetting. Recent approaches to incrementally le...
详细信息
This paper proposes the use of an event camera as a component of a vision system that enables counting of fast-moving objects – in this case, falling corn grains. These type of cameras transmit information about the ...
详细信息
3D object detection from LiDAR sensor data is an important topic in the context of autonomous cars and drones. In this paper, we present the results of experiments on the impact of backbone selection of a deep convolu...
详细信息
Transformers have achieved remarkable success in several domains, ranging from natural language processing to computervision. Nevertheless, it has been recently shown that stacking self-attention layers — the distin...
ISBN:
(纸本)9781713871088
Transformers have achieved remarkable success in several domains, ranging from natural language processing to computervision. Nevertheless, it has been recently shown that stacking self-attention layers — the distinctive architectural component of Transformers — can result in rank collapse of the tokens' representations at initialization. The question of if and how rank collapse affects training is still largely unanswered, and its investigation is necessary for a more comprehensive understanding of this architecture. In this work, we shed new light on the causes and the effects of this phenomenon. First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization. Furthermore, we provide a thorough description of the origin of rank collapse and discuss how to prevent it via an appropriate depth-dependent scaling of the residual branches. Finally, our analysis unveils that specific architectural hyperparameters affect the gradients of queries and values differently, leading to disproportionate gradient norms. This suggests an explanation for the widespread use of adaptive methods for Transformers' optimization.
暂无评论