Automatic recognition of handwritten texts in video lectures has important applications. In video lectures, the presenter usually writes on white / colored board. The video camera often captures the writing board alon...
详细信息
Automatic recognition of handwritten texts in video lectures has important applications. In video lectures, the presenter usually writes on white / colored board. The video camera often captures the writing board along with certain other objects possibly including the presenter itself. recognition of handwritten texts from such a video frame requires prior detection of the region of texts in the frame. In this article, we present our recent study of text localization in such video lecture frames. Here, we use Scale Invariant Feature Transform (SIFT) descriptors densely over the entire region of the frame. The descriptors are located on a regular grid of 5 pixels following the usual practice and considered a uniform patch size of 60 × 60 pixels as its support on the basis of an empirical study. This SIFT descriptor at each location (grid point) is fed as a 128-dimensional input feature vector to a Multilayer Perceptron (MLP) network which gives response for each grid point as either text or non-text. Depending on certain aggregate response at each pixel we localize text regions in the input video frame. Next, we employ K-means clustering to detect the text components present in the localized region of the video frame. Finally, two simple rules are applied to decide certain possible detected text components as noise. We obtained encouraging simulation results of this approach on a variety of video lecture frames.
Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal disease...
详细信息
Many crowd abnormal motion detection methods in video surveillance have been proposed in resent ***,most of them are based on low semantic features,such gray value,velocity and ***,low semantic features contain weak d...
详细信息
Many crowd abnormal motion detection methods in video surveillance have been proposed in resent ***,most of them are based on low semantic features,such gray value,velocity and ***,low semantic features contain weak discriminative information of the *** addition,these methods often ignore important information in time and space *** this work,a high semantic representation is *** feature analysis(SFA) is adopted to provide high semantic ***,a random walk model,which takes into account the spatio-temporal information,is used to detect the abnormal motions in *** conduct extensive experiments on two datasets to demonstrate the effectiveness of proposed *** results suggest that our method outperforms the state-of-the-art methods.
This paper addresses the issue of tracking tubular objects, particularly blood vessels from MR images. A model-based approach is adopted. The generalized stochastic tube (GST) model is developed which is an extension ...
详细信息
In this paper, we propose the use of a new modality characterized by a richer information content, namely acoustic images, for the sake of audio-visual scene understanding. Each pixel in such images is characterized b...
详细信息
A model-based approach is used for recognizing arterial blood vessels from MRA volumetric data. The modeling includes (1) a generalized stochastic tube model characterizing the structural properties of the vessels, an...
详细信息
Semantic segmentation has achieved huge progress via adopting deep Fully Convolutional Networks (FCN). However, the performance of FCN based models severely rely on the amounts of pixel-level annotations which are exp...
详细信息
ISBN:
(纸本)9781728132945
Semantic segmentation has achieved huge progress via adopting deep Fully Convolutional Networks (FCN). However, the performance of FCN based models severely rely on the amounts of pixel-level annotations which are expensive and time-consuming. To address this problem, it is a good choice to learn to segment with weak supervision from bounding boxes. How to make full use of the class-level and region-level supervisions from bounding boxes is the critical challenge for the weakly supervised learning task. In this paper, we first introduce a box-driven class-wise masking model (BCM) to remove irrelevant regions of each class. Moreover, based on the pixel-level segment proposal generated from the bounding box supervision, we could calculate the mean filling rates of each class to serve as an important prior cue, then we propose a filling rate guided adaptive loss (FR-Loss) to help the model ignore the wrongly labeled pixels in proposals. Unlike previous methods directly training models with the fixed individual segment proposals, our method can adjust the model learning with global statistical information. Thus it can help reduce the negative impacts from wrongly labeled proposals. We evaluate the proposed method on the challenging PASCAL VOC 2012 benchmark and compare with other methods. Extensive experimental results show that the proposed method is effective and achieves the state-of-the-art results.
There are many video images where hand written text may appear. Therefore handwritten scene text detection in video is essential and useful for many applications for efficient indexing, retrieval etc. Also there are m...
详细信息
There are many video images where hand written text may appear. Therefore handwritten scene text detection in video is essential and useful for many applications for efficient indexing, retrieval etc. Also there are many video frames where text line may be multi-oriented in nature. To the best of our knowledge there is no work on handwritten text detection in video, which is multi-oriented in nature. In this paper, we present a new method based on maximum color difference and boundary growing method for detection of multi-oriented handwritten scene text in video. The method computes maximum color difference for the average of R, G and B channels of the original frame to enhance the text information. The output of maximum color difference is fed to a K-means algorithm with K = 2 to separate text and non-text clusters. Text candidates are obtained by intersecting the text cluster with the Sobel output of the original frame. To tackle the fundamental problem of different orientations and skews of handwritten text, boundary growing method based on a nearest neighbor concept is employed. We evaluate the proposed method by testing on our own handwritten text database and publicly available video data (Hua's data). Experimental results obtained from the proposed method are promising.
Embedding data into vector spaces is a very popular strategy of patternrecognition methods. When distances between embeddings are quantized, performance metrics become ambiguous. In this paper, we present an analysis...
详细信息
暂无评论