Recent years have seen a rapid development in machine Learning, which has profoundly influenced many areas of science and engineering. Among them, computer vision takes the leading place, where important tasks are ima...
详细信息
ISBN:
(数字)9798331542726
ISBN:
(纸本)9798331542733
Recent years have seen a rapid development in machine Learning, which has profoundly influenced many areas of science and engineering. Among them, computer vision takes the leading place, where important tasks are image classifications powered by CNNs. Despite the great performance of CNNs in complicated scenarios, they remain sensitive to so-called adversarial attacks, and deliberate perturbations leading them to incorrect predictions. Besides more innocuous consequences, this has serious security implications for critical applications, in-cluding medical diagnostics, where misclassifications might result in disastrous outcomes. This research work discusses adversarial attacks on CNNs and other DNNs in computer vision, studying a full range of the generation and detection methods with details while discussing intrinsic vulnerability and robustness. It also proposes a learning framework that will enhance the robustness and security of DNNs and CNNs against such adversarial perils. The ultimate goal is directed to an improvement in the reliability of such models in absolutely critical scenarios for safe deployment into applications where accuracy is crucial.
Optical analog computing based on flat optical structures offers significant advantages in system miniaturization, loss reduction, and computational speed compared to traditional systems requiring complex optical conf...
详细信息
As a brain-inspired optical computing architectures, diffractive optical neural networks (DONN) harness light’s wave nature for high-speed, energy efficient and parallel information processing, enabling applications ...
详细信息
As a brain-inspired optical computing architectures, diffractive optical neural networks (DONN) harness light’s wave nature for high-speed, energy efficient and parallel information processing, enabling applications such as image classification and wavefront shaping. However, conventional spatially encoded DONNs struggle with robustness in complex and unpredictable environments, where occlusions and distortions degrade processing accuracy. To address these challenges, we propose a robust all-optical feature extraction framework based on orbital angular momentum (OAM). This approach converts optical information into target OAM modes using a diffractive processing framework trained via deep learning, enabling stable and efficient information representation in the OAM domain. Unlike conventional DONNs, our method maintains high performance across diverse and irregular occlusions without requiring network retraining. This self-adaptive occlusion immune operates with zero additional training samples, effectively enhancing optical computing tasks under dynamic and uncertain conditions. By fully utilizing the helical wavefront and orthogonality of OAM, our approach improves the robustness and scalability of DONNs, demonstrating superior performance in challenging optical environments. Our work paves the way for next-generation optical computing systems that can operate reliably in unpredictable and occlusion-rich environment, unlocking what we believe to be new possibilities for robust, real-time processing in a variety of applications.
Deep learning models based on graph neural networks have emerged as a popular approach for solving computer vision problems. They encode the image into a graph structure and can be beneficial for efficiently capturing...
详细信息
Object detection in aerial imagery presents significant challenges in computer vision due to the varied orientations and complex backgrounds of objects such as buildings and vehicles. Current annotation tools often fa...
详细信息
ISBN:
(纸本)9783031777370;9783031777387
Object detection in aerial imagery presents significant challenges in computer vision due to the varied orientations and complex backgrounds of objects such as buildings and vehicles. Current annotation tools often fail to accurately delineate these objects, relying on manual bounding box methods that are both time-consuming and inconsistent. Our novel methodology automates the conversion of axis-aligned annotations into polygonal and rotated annotations, prioritising systematic and scalable enhancements to data quality rather than modifying the model itself. Precise annotations, crucial for determining object locations and boundaries, are fundamental to this approach. We evaluated this methodology through a case study involving electrical transmission towers in aerial images, using advanced object detectors based on variations of the YOLOv8 algorithm. Preliminary results indicate that our automated method not only improves annotation accuracy but also significantly reduces the manual effort required, thereby lowering overall costs and time for data preparation in object detection training. The success of this methodology underscores its potential for broader applications and further advancements in automated annotation technologies.
Longitudinal medical imageprocessing is a significant task to understand the dynamic changes of disease by taking and comparing image series over time, providing insights into how conditions evolve and enabling more ...
详细信息
ISBN:
(数字)9798331510831
ISBN:
(纸本)9798331510848
Longitudinal medical imageprocessing is a significant task to understand the dynamic changes of disease by taking and comparing image series over time, providing insights into how conditions evolve and enabling more accurate di-agnosis and treatment planning. While recent advance-ments in biomedical vision-Language Pre-training (VLP) have enabled label-efficient representation learning with paired medical images and reports, existing methods pri-marily pair a single image with the corresponding textual report, limiting their ability to capture temporal relation-ships. To address this limitation, it is essential to learn temporal-aware cross-modal representations from sequen-tial medical images and text reports that highlight the tem-poral changes occurring between examinations. Specifi-cally, we introduce TempA- Vlp, a temporal-aware vision language pre-training framework with a cross-exam en-coder to integrate the information from both prior and cur-rent examinations. This approach enables the model to capture dynamic representations that reflect disease pro-gression over time, which allows us to (i) achieve state-of-the-art performance in disease progression classification, (ii) localize dynamic progression regions across consecutive examinations, as demonstrated in our new task, dynamic phrase grounding on the Chest-imagenome Gold dataset, and (iii) highlight progression localized regions, often rele-vant to lesion areas, which in turn improves disease classi-fication tasks on a single image.
Summarization approaches are currently proposed solutions that focus on meaningfully reducing different types of data such as text, audio, and video. Many techniques such as machine learning, signal processing, image ...
详细信息
ISBN:
(数字)9798331517649
ISBN:
(纸本)9798331517656
Summarization approaches are currently proposed solutions that focus on meaningfully reducing different types of data such as text, audio, and video. Many techniques such as machine learning, signal processing, imageprocessing, computer vision, and deep learning can be used to develop summarization approaches. In this study, we performed object detection on videos that can be used in smart city applications using a pretrained YOLOv8 model. As a result of the object detection, we created a feature vector for each image frame by using the location information covered by the classes used in the object detection process. Then, we used several different approaches to determine the reference feature vector for the video. Finally, we calculated the cosine similarities of the feature vector for each frame to this reference feature vector using different methods. With the method we developed, we presented a similarity-focused summary created by selecting the video frames expressed with maximum similarity. We also developed an evaluation approach to evaluate the summaries we presented, comparing the overall heat maps of the video with the heat maps of the summary videos. Experimental results demonstrate the efficiency of our summarization approaches.
The absence of standardized evaluation methodologies for single-layer dimensional accuracy significantly hinders the broader implementation of direct ink writing (DIW) technology. Addressing the critical need for prec...
详细信息
image loading represents a critical bottleneck in modern machine learning pipelines, particularly in computer vision tasks where JPEG remains the dominant format. This study presents a systematic performance analysis ...
详细信息
Recent Compositional Zero-Shot Learning (CZSL) methods increasingly adopt the pre-trained vision-language models to capture the contextual relations between image and text spaces. However, the single-class-token desig...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Recent Compositional Zero-Shot Learning (CZSL) methods increasingly adopt the pre-trained vision-language models to capture the contextual relations between image and text spaces. However, the single-class-token design from Transformer-based encoder inevitably captures contextual information from unrelated objects and background, thus hindering the modeling of fine-grained class-specific visual features. Suffering from cross-modal gap, prior methods also struggle to improve compositional recognition performance. To address these issues, we propose a fine-grained cross-modal concepts refinement framework, termed as Refiner, which comprises two pivotal components: (i) the fine-grained concepts refinement of image embeddings to capture state-object context within visual scenes, and (ii) the cross-modal information fusion to mitigate the modality gap. By leveraging learnable query vectors to capture region-specific semantic information pertinent to composition labels, our approach refines visual representations with fine-grained state-object context information. As for cross-modal information fusion, we construct a robust image-to-text mapping by aligning visual embeddings with states, objects, and compositions, respectively. Extensive experiments demonstrate that our Refiner achieves new state-of-the-art performance across all popular benchmarks in both closed- and open-world settings.
暂无评论