We present a novel multiscale approach that combines segmentation with classification to detect abnormal brain structures in medical imagery, and demonstrate its utility in detecting multiple sclerosis lesions in 3D M...
详细信息
We present a novel multiscale approach that combines segmentation with classification to detect abnormal brain structures in medical imagery, and demonstrate its utility in detecting multiple sclerosis lesions in 3D MRI data. Our method uses segmentation to obtain a hierarchical decomposition of a multi-channel, anisotropic MRI scan. It then produces a rich set of features describing the segments in terms of intensity, shape, location, and neighborhood relations. These features are then fed into a decision tree-based classifier, trained with data labeled by experts, enabling the detection of lesions in all scales. Unlike common approaches that use voxel-by-voxel analysis, our system can utilize regional properties that are often important for characterizing abnormal brain structures. We provide experiments showing successful detections of lesions in both simulated and real MR images.
In recent years, we have witnessed the collection of larger and larger multi-modal, image-caption datasets: from hundreds of thousands such pairs to hundreds of millions. Such datasets allow researchers to build power...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
In recent years, we have witnessed the collection of larger and larger multi-modal, image-caption datasets: from hundreds of thousands such pairs to hundreds of millions. Such datasets allow researchers to build powerful deep learning models, at the cost of requiring intensive computational resources. In this work, we ask: can we use such datasets efficiently without sacrificing performance? We tackle this problem by extracting difficulty scores from each image-caption sample, and by using such scores to make training more effective and efficient. We compare two ways to use difficulty scores to influence training: filtering a representative subset of each dataset and ordering samples through curriculum learning. We analyze and compare difficulty scores extracted from a single modality—captions (i.e., caption length and number of object mentions) or images (i.e., region proposals’ size and number)—or based on alignment of image-caption pairs (i.e., CLIP and concreteness). We focus on Weakly-Supervised Object Detection where image-level labels are extracted from captions. We discover that (1) combining filtering and curriculum learning can achieve large gains in performance, but not all methods are stable across experimental settings, (2) singlemodality scores often outperform alignment-based ones, (3) alignment scores show the largest gains when training time is limited.
A regression model in the tensorPCA subspace is proposed in this paper for face super-resolution reconstruction. An approximate conditional probability model is used for the tensor subspace coefficients and maximum-li...
详细信息
ISBN:
(纸本)0769525210
A regression model in the tensorPCA subspace is proposed in this paper for face super-resolution reconstruction. An approximate conditional probability model is used for the tensor subspace coefficients and maximum-likelihood estimator gives a linear regression model. The approximation is corrected by adding non-linear component from a RBF-type regressor. Experiments on face images from FERET database validate the algorithm. Although each projection coefficient is estimated by a local estimator, tensorPCA subspace analysis is still a global descriptor, which makes the algorithm have certain ability to deal with partially occluded images
For any dance form, either classical or folk, visual expressions - facial expressions and hand gestures play a key role in conveying the storyline of the accompanied music to the audience. Bharatanatyam - a classical ...
详细信息
ISBN:
(数字)9781728125060
ISBN:
(纸本)9781728125077
For any dance form, either classical or folk, visual expressions - facial expressions and hand gestures play a key role in conveying the storyline of the accompanied music to the audience. Bharatanatyam - a classical dance form which has origins from the southern states of India, is on the verge of being completely automated partly due to an acute dearth of qualified and dedicated teachers/gurus. In an honest effort to speed up this automation process and at the same time preserve the cultural heritage, we have chosen to identify and classify the single hand gestures/mudras/hastas against their true labels by using two variations of the convolutional neural networks (CNNs) that demonstrates the exceeding effectiveness of transfer learning irrespective of the domain difference between the pre-training and the training dataset. This work is primarily aimed at 1) building a novel dataset of 2D single hand gestures belonging to 27 classes that were collected from Google search engine (Google images), YouTube videos (dynamic and with background considered) and professional artists under staged environment constraints (plain backgrounds), 2) exploring the effectiveness of Convolutional Neural Networks in identifying and classifying the single hand gestures by optimizing the hyperparameters, and 3) evaluating the impacts of transfer learning and double transfer learning, which is a novel concept explored in this paper for achieving higher classification accuracy.
The detection of deepfakes is crucial for mitigating the societal impact of falsified video content. Despite the development of various algorithms for this purpose, challenges arise for detectors in real-world scenari...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
The detection of deepfakes is crucial for mitigating the societal impact of falsified video content. Despite the development of various algorithms for this purpose, challenges arise for detectors in real-world scenarios, especially when users capture deepfake content from screens and upload it online or when detectors operate on external devices like smartphones, requiring the capture of potential deepfakes through the camera for evaluation. A significant challenge in these scenarios is the presence of Moiré patterns, which degrade image quality and complicate conventional classification methods, notably deep neural networks (DNNs). However, the impact of Moiré patterns on the effectiveness of deepfake detection systems has not been adequately explored. This study aims to investigate how capturing deepfake videos via digital screen cameras affects the accuracy of detection mechanisms. We introduced the Moiré patterns by capturing the display of a monitor using a smartphone camera and conducted empirical evaluations using four widely recognized datasets: CelebDF, DFD, DFDC, and FF++. We compare the performance of twelve SOTA detectors on deepfake videos captured under the influence of Moiré patterns. Our findings reveal a performance decrease of up to 33.1 and 31.3 percentage points for image- and video-based detectors. Therefore, highlighting the challenges posed by Moiré patterns and other naturally induced artifacts is critical for improving the effectiveness of real-world deepfake detection efforts. To facilitate further research, we will release the Moiré pattern impact version of CelebDF, DFD, DFDC, and FF++ datasets with this paper. Our code is available here: https://***/Razaib-Tariq/deepmoire
This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) ...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
This paper reviews the NTIRE 2024 RAW Image Super-Resolution Challenge, highlighting the proposed solutions and results. New methods for RAW Super-Resolution could be essential in modern Image Signal Processing (ISP) pipelines, however, this problem is not as explored as in the RGB domain. Th goal of this challenge is to upscale RAW Bayer images by 2x, considering unknown degradations such as noise and blur. In the challenge, a total of 230 participants registered, and 45 submitted results during thee challenge period. The performance of the top-5 sub-missions is reviewed and provided here as a gauge for the current state-of-the-art in RAW Image Super-Resolution.
We present a novel method to obtain a 3D Euclidean reconstruction of both the background and moving objects in a video sequence. We assume that, multiple objects are moving rigidly on a ground plane observed by a movi...
详细信息
We present a novel method to obtain a 3D Euclidean reconstruction of both the background and moving objects in a video sequence. We assume that, multiple objects are moving rigidly on a ground plane observed by a moving camera. The video sequence is first segmented into static background and motion blobs by a homography-based motion segmentation method. Then classical "Structure from Motion" (SfM) techniques are applied to obtain a Euclidean reconstruction of the static background. The motion blob corresponding to each moving object is treated as if there were a static object observed by a hypothetical moving camera, called a "virtual camera". This virtual camera shares the same intrinsic parameters with the real camera but moves differently due to object motion. The same SfM techniques are applied to estimate the 3D shape of each moving object and the pose of the virtual camera. We show that the unknown scale of moving objects can be approximately determined by the ground plane, which is a key contribution of this paper. Another key contribution is that we prove that the 3D motion of moving objects can be solved from the virtual camera motion with a linear constraint imposed on the object translation. In our approach, a planartranslation constraint is formulated: "the 3D instantaneous translation of moving objects must be parallel to the ground plane". Results on real-world video sequences demonstrate the effectiveness and robustness of our approach.
In multi-target tracking, the maintaining of the correct identity of targets is challenging. In the presented tracking method, accurate target identification is achieved by incorporating the appearance information of ...
详细信息
In multi-target tracking, the maintaining of the correct identity of targets is challenging. In the presented tracking method, accurate target identification is achieved by incorporating the appearance information of the spatial and temporal context of each target. The spatial context of a target involves local background and nearby targets. The first contribution of the paper is to provide a new discriminative model for multi-target tracking with the embedded classification of each target against its context. As a result, the tracker not only searches for the image region similar to the target but also avoids latching on nearby targets or on a background region. The temporal context of a target includes its appearances seen during tracking in the past. The past appearances are used to train a probabilistic PCA that is used as the measurement model of the target at the present. As the second contribution, we develop a new incremental scheme for probabilistic PCA. It can update accurately the full set of parameters including a noise parameter still ignored in related literature. The experiments show robust tracking performance under the condition of severe clutter, occlusions and pose changes.
Car plate detection is a key component in automatic license plate recognition system. This paper adopts an enhanced cascaded tree style learner framework for car plate detection using the hybrid object features includ...
详细信息
Car plate detection is a key component in automatic license plate recognition system. This paper adopts an enhanced cascaded tree style learner framework for car plate detection using the hybrid object features including the simple statistical features and Harr-like features. The statistical features are useful for simplifying the process on cascade classifier. The cascaded tree-style detector design will further reduce the false alarm and the false dismissal while retaining a high detection ratio. The experimental results obtained by the proposed algorithm exhibit the encouraging performance.
Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deployment in real-world applications. While feature normalization has garnered considerable attention wit...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
Neural networks are notorious for being overconfident predictors, posing a significant challenge to their safe deployment in real-world applications. While feature normalization has garnered considerable attention within the deep learning literature, current train-time regularization methods for Out-of-Distribution(OOD) detection are yet to fully exploit this potential. Indeed, the naive incorporation of feature normalization within neural networks does not guarantee substantial improvement in OOD detection performance. In this work, we introduce T2FNorm, a novel approach to transforming features to hyperspherical space during training, while employing non-transformed space for OOD-scoring purposes. This method yields a surprising enhancement in OOD detection capabilities without compromising model accuracy in in-distribution(ID). Our investigation demonstrates that the proposed technique substantially diminishes the norm of the features of all samples, more so in the case of out-of-distribution samples, thereby addressing the prevalent concern of overconfidence in neural networks. The proposed method also significantly improves various post-hoc OOD detection methods.
暂无评论