Manual analysis of pedestrians for surveillance of large crowds in real time applications is not practical. Tracking-Learning-Detection suggested by Kalal , Mikolajczyk and Matas [1] is one of the most prominent autom...
详细信息
ISBN:
(纸本)9781450347532
Manual analysis of pedestrians for surveillance of large crowds in real time applications is not practical. Tracking-Learning-Detection suggested by Kalal , Mikolajczyk and Matas [1] is one of the most prominent automatic object tracking system. TLD can track single object and can handle occlusion and appearance change but it suffers from limitations .In this paper, tracking of multiple objects and estimation of their trajectory is suggested using improved TLD. Feature tracking is suggested in place of grid based tracking to solve the limitation of tracking during out of plane rotation .this also leads to optimization of algorithm. Proposed algorithm also achieves auto-initialization with detection of pedestrians in the first frame which makes it suitable for real time pedestrian tracking.
this paper presents a new approach to achieve the performance improvement for the traditional palmprint authentication approaches. the cohort information is used in the matching stage but only when the matching scores...
详细信息
ISBN:
(纸本)9781424442195
this paper presents a new approach to achieve the performance improvement for the traditional palmprint authentication approaches. the cohort information is used in the matching stage but only when the matching scores are inadequate to generate reliable decisions. the cohort information can also be utilized to achieve the significant performance improvement for the combination of modalities and this is demonstrated from the experimental results in this paper. the rigorous palmprint authentication results presented in this paper are the best in the literature and confirm the utility of significant information that can be extracted from the imposter scores. the statistical estimation of confidence level for the palmprint matching requires an excellent match between the theoretical distribution and the real score distribution. the performance analysis presented in this paper, from over 29.96 million imposter matching scores, suggests that Beta-Binomial function can more accurately model the distribution of real palmprint matching scores.
the paper presents a hybrid thresholding approach for binarization and enhancement of degraded documents. Historical documents contain information of great cultural and scientific value. But such documents are frequen...
详细信息
ISBN:
(纸本)9781424442195
the paper presents a hybrid thresholding approach for binarization and enhancement of degraded documents. Historical documents contain information of great cultural and scientific value. But such documents are frequently degraded over time. Digitized degraded documents require specialized processing to remove different kinds of noise and to improve readability. the approach for enhancing degraded documents uses a combination of two thresholding algorithms. First, iterative global thresholding is applied to the smoothed degraded image until the stopping criteria is reached then a threshold selection method from gray level histogram is used to binarize the image. the next step is detecting areas where noise still remains and applying iterative thresholding locally. A method to improve the quality of textual information in the document is also done as a post processing stage, thus making the approach efficient and better suited for character recognition applications.
Saliency computation is widely studied in computervision but not in medical imaging. Existing computational saliency models have been developed for general (natural) images and hence may not be suitable for medical i...
详细信息
ISBN:
(纸本)9781450347532
Saliency computation is widely studied in computervision but not in medical imaging. Existing computational saliency models have been developed for general (natural) images and hence may not be suitable for medical images. this is due to the variety of imaging modalities and the requirement of the models to capture not only normal but also deviations from normal anatomy. We present a biologically inspired model for colour fundus images and illustrate it for the case of diabetic retinopathy. the proposed model uses spatially varying morphological operations to enhance lesions locally and combines an ensemble of results, of such operations, to generate the saliency map. the model is validated against an average Human Gaze map of 15 experts and found to have 10% higher recall (at 100% precision) than four leading saliency models proposed for natural images. the F-score for match with manual lesion markings by 5 experts was 0.4 (as opposed to 0.532 for gaze map) for our model and very poor for existing models. the model's utility is shown via a novel enhancement method which employs saliency to selectively enhance the abnormal regions and this was found to boost their contrast to noise ratio by similar to 30%.
In the past decade, additive manufacturing technology has gained an immense attention in numerous research areas and has already been adopted in a wide range of industries relevant to transportation, healthcare, elect...
详细信息
ISBN:
(纸本)9789897584022
In the past decade, additive manufacturing technology has gained an immense attention in numerous research areas and has already been adopted in a wide range of industries relevant to transportation, healthcare, electronics and energy. However, the presence of defects and dimensional deviations that occur during the process hinder the broad exploitation of 3D printing. In order to enhance the capabilities of this emerging technology, online quality control methodologies and verifications of the manufacturing process are necessary to be developed. In the present article, a low cost in-situ vision-based monitoring technique applied in Fused Deposition Modeling (FDM) 3D printing technology is introduced. An optical scanning system was integrated in a commercial 3D Printer in order to scan and validate the performance of the procedure. the proposed methodology monitors the FDM process and correlates the theoretical 3D model withthe manufactured one. this technique can be utilized in various additive manufacturing technologies providing integrity and reliability of the process, high quality standards and reduced production costs.
Contextual information plays a critical role in object recognition models within computervision, where changes in context can significantly affect accuracy, underscoring models' dependence on contextual cues. thi...
详细信息
ISBN:
(纸本)9798400710759
Contextual information plays a critical role in object recognition models within computervision, where changes in context can significantly affect accuracy, underscoring models' dependence on contextual cues. this study investigates how context manipulation influences both model accuracy and feature attribution, providing insights into the reliance of object recognition models on contextual information as understood through the lens of feature attribution methods. We employ a range of feature attribution techniques to decipher the reliance of deep neural networks on context in object recognition tasks. Using the imageNet-9 and our curated imageNet-CS datasets, we conduct experiments to evaluate the impact of contextual variations, analyzed through feature attribution methods. Our findings reveal several key insights: (a) Correctly classified images predominantly emphasize object volume attribution over context volume attribution. (b) the dependence on context remains relatively stable across different context modifications, irrespective of classification accuracy. (c) Context change exerts a more pronounced effect on model performance than Context perturbations. (d) Surprisingly, context attribution in 'no-information' scenarios is non-trivial. Our research moves beyond traditional methods by assessing the implications of broad-level modifications on object recognition, either in the object or its context. Code available at https://***/nineRishav/Lost-In-Context
End-to-end automatic speech recognition (ASR) systems achieve promising performance for large-scale speech datasets. However, these systems experience performance degradation when a domain mismatch exists between trai...
详细信息
ISBN:
(纸本)9798400710759
End-to-end automatic speech recognition (ASR) systems achieve promising performance for large-scale speech datasets. However, these systems experience performance degradation when a domain mismatch exists between training and test datasets. this paper addresses the domain adaptation problem by employing adversarial learning in an unsupervised manner, along withthe ASR training. We propose frame level and character level domain adversarial training, which reduces the domain shift between source and target data. Frame-level adversarial training selects all source and target speech frames and tries to classify them into two domains. On the contrary, character-level training generates pseudo-labels for source and target batches and finds the feature distribution for each pseudo-character label. A random feature is selected for each character from the source and target domains. this feature set of all characters is used in the domain classification. Experiments on the Libriadapt and Librispeech clean dataset show that our approaches achieve similar word error rate (WER) reduction as for the state-of-the-art approaches with lower time complexities. the proposed approaches expect promising results for other speech adaptation applications, which will be analyzed in the future.
Despite significant advancements in large-scale text-to-image generation and text-conditioned image editing, appearance transfer remains relatively unexplored. Transferring appearance aims to transfer an object's ...
详细信息
ISBN:
(纸本)9798400710759
Despite significant advancements in large-scale text-to-image generation and text-conditioned image editing, appearance transfer remains relatively unexplored. Transferring appearance aims to transfer an object's appearance in an appearance image to an object in the structure image so that background details are preserved and accurately reflect the transferred object's characteristics. Appearance transfer has practical applications in areas like virtual try-on and e-commerce product placement. Existing methods often require fine-tuning text-to-image diffusion models or are not applicable to virtual try-on and e-commerce scenarios, which is not ideal. In this paper, we introduce a Mask-Guided attention mechanism that replaces the existing self-attention in U-net architecture of Stable diffusion [29]. this approach can be easily integrated into the Masactrl [6] framework, enabling appearance transfer without model fine-tuning and suitable for a wide range of applications. Our method uses masks of objects in images to guide the appearance transfer process, withthese masks obtained from the Segment Anything Model (SAM) [17]. this integration of SAM-generated masks allows for precise object localization and more accurate appearance transfer. We have conducted comprehensive experiments on transferring various clothing items (shirts, jeans, t-shirts) onto people, as well as transferring sofas into living spaces.
Face frontalization is the process of synthesizing a frontal view of a face, given its non-frontal view. Frontalization is used in intelligent photo editing tools and also aids in improving the accuracy of face recogn...
详细信息
ISBN:
(纸本)9781467385640
Face frontalization is the process of synthesizing a frontal view of a face, given its non-frontal view. Frontalization is used in intelligent photo editing tools and also aids in improving the accuracy of face recognition systems. For example, in the case of photo editing, faces of persons in a group photo can be corrected to look into the camera, if they are looking elsewhere. Similarly, even though recent methods in face recognition claim accuracy which surpasses that of humans in some cases, performance of recognition systems degrade when profile view of faces are given as input. One way to address this issue is to synthesize frontal views of faces before recognition. We propose a simple and efficient method to address the face frontalization problem. Our method leverages the fact that faces in general have a definite structure and can be represented in a low dimensional subspace. We employ an exemplar based approach to find the transformation that relates the profile view to the frontal view, and use it to generate realistic frontalizations. Our method does not involve estimating 3D model of the face, which is a common approach in previous work in this area. this leads to an efficient solution, since we avoid the complexity of adding one more dimension to the problem. Our method also retains the structural information of the individual as compared to that of a recent method [4], which assumes a generic 3D model for synthesis. We show impressive qualitative and quantitative results in comparison to the state-of-the-art in this field.
this paper describes a method that generates in-between frames of two videos of a musical instrument being played. While image generation achieves a successful outcome in recent years, there is ample scope for improve...
详细信息
ISBN:
(纸本)9789897584022
this paper describes a method that generates in-between frames of two videos of a musical instrument being played. While image generation achieves a successful outcome in recent years, there is ample scope for improvement in video generation. the keys to improving the quality of video generation are the high resolution and temporal coherence of videos. We solved these requirements by using not only visual information but also aural information. the critical point of our method is using two-dimensional pose features to generate high-resolution in-between frames from the input audio. We constructed a deep neural network with a recurrent structure for inferring pose features from the input audio and an encoder-decoder network for padding and generating video frames using pose features. Our method, moreover, adopted a fusion approach of generating, padding, and retrieving video frames to improve the output video. Pose features played an essential role in both end-to-end training with a differentiable property and combining a generating, padding, and retrieving approach. We conducted a user study and confirmed that the proposed method is effective in generating interpolated videos.
暂无评论