Nestor is a real-time recognition and camera pose estimation system for planar shapes. The system allows shapes that carry contextual meanings for humans to be used as Augmented Reality (AR) tracking targets. The user...
详细信息
Nestor is a real-time recognition and camera pose estimation system for planar shapes. The system allows shapes that carry contextual meanings for humans to be used as Augmented Reality (AR) tracking targets. The user can teach the system new shapes in real time. New shapes can be shown to the system frontally, or they can be automatically rectified according to previously learned shapes. Shapes can be automatically assigned virtual content by classification according to a shape class library. Nestor performs shape recognition by analyzing contour structures and generating projective-invariant signatures from their concavities. The concavities are further used to extract features for pose estimation and tracking. Pose refinement is carried out by minimizing the reprojection error between sample points on each image contour and its library counterpart. Sample points are matched by evolving an active contour in real time. Our experiments show that the system provides stable and accurate registration, and runs at interactive frame rates on a Nokia N95 mobile phone.
There is now a growing need to identify various kinds of activities that occur in videos. In this paper, we first present a logical language called Probabilistic Activity Description Language (PADL) in which users can...
详细信息
There is now a growing need to identify various kinds of activities that occur in videos. In this paper, we first present a logical language called Probabilistic Activity Description Language (PADL) in which users can specify activities of interest. We then develop a probabilistic framework which assigns to any subvideo of a given video sequence a probability that the subvideo contains the given activity, and we finally develop two fast algorithms to detect activities within this framework. OffPad finds all minimal segments of a video that contain a given activity with a probability exceeding a given threshold. In contrast, the OnPad algorithm examines a video during playout (rather than afterwards as OffPad does) and computes the probability that a given activity is occurring (even if the activity is only partially complete). Our prototype Probabilistic Activity Detection System (PADS) implements the framework and the two algorithms, building on top of existing imageprocessing algorithms. We have conducted detailed experiments and compared our approach to four different approaches presented in the literature. We show that-for complex activity definitions-our approach outperforms all the other approaches.
Spatial augmented reality is especially interesting for the design process of a car, because a lot of virtual content and corresponding real objects are used. One important issue in such a process is that the designer...
详细信息
Spatial augmented reality is especially interesting for the design process of a car, because a lot of virtual content and corresponding real objects are used. One important issue in such a process is that the designer can trust the visualized colors on the real object, because design decisions are made on basis of the projection. In this paper, we present an interactive visualization technique which is able to exactly compute the RGB values for the projected image, so that the resulting colors on the real object are equally perceived as the real desired colors. Our approach computes the influences of the ambient light, the material, the pose and the color model of the projector to the resulting colors of the projected RGB values by using a physically based computation. This information allows us to compute the adjustment for the RGB values for varying projector positions at interactive rates. Since the amount of projectable colors does not only depend on the material and the ambient light, but also on the pose of the projector, our method can be used to interactively adjust the range of projectable colors by moving the projector to arbitrary positions around the real object. We further extend the mentioned method so that it is applicable to multiple projectors. All methods are evaluated in a number of experiments.
IntroductionThe application of artificial intelligence to facial aesthetics has been limited by the inability to discern facial zones of interest, as defined by complex facial musculature and underlying structures. Al...
详细信息
IntroductionThe application of artificial intelligence to facial aesthetics has been limited by the inability to discern facial zones of interest, as defined by complex facial musculature and underlying structures. Although semantic segmentation models (SSMs) could potentially overcome this limitation, existing facial SSMs distinguish only three to nine facial zones of *** developed a new supervised SSM, trained on 669 high-resolution clinical-grade facial images;a subset of these images was used in an iterative process between facial aesthetics experts and manual annotators that defined and labeled 33 facial zones of *** some zones overlap, some pixels are included in multiple zones, violating the one-to-one relationship between a given pixel and a specific class (zone) required for SSMs. The full facial zone model was therefore used to create three sub-models, each with completely non-overlapping zones, generating three outputs for each input image that can be treated as standalone models. For each facial zone, the output demonstrating the best Intersection Over Union (IOU) value was selected as the winning *** new SSM demonstrates mean IOU values superior to manual annotation and landmark analyses, and it is more robust than landmark methods in handling variances in facial shape and structure.
Satellite imagery is changing the way we understand and predict economic activity in the world. Advancements in satellite hardware and low-cost rocket launches have enabled near-real-time, high-resolution images cover...
详细信息
Satellite imagery is changing the way we understand and predict economic activity in the world. Advancements in satellite hardware and low-cost rocket launches have enabled near-real-time, high-resolution images covering the entire Earth. It is too labour-intensive, time-consuming and expensive for human annotators to analyse petabytes of satellite imagery manually. Current computervision research exploring this problem still lack accuracy and prediction speed, both significantly important metrics for latency-sensitive automatized industrial applications. Here we address both of these challenges by proposing a set of improvements to the object recognition model design, training and complexity regularisation, applicable to a range of neural networks. Furthermore, we propose a fully convolutional neural network (FCN) architecture optimised for accurate and accelerated object recognition in multispectral satellite imagery. We show that our FCN exceeds human-level performance with state-of-the-art 97.67% accuracy over multiple sensors, it is able to generalize across dispersed scenery and outperforms other proposed methods to date. Its computationally light architecture delivers a fivefold improvement in training time and a rapid prediction, essential to real-time applications. To illustrate practical model effectiveness, we analyse it in algorithmic trading environment. Additionally, we publish a proprietary annotated satellite imagery dataset for further development in this research field. Our findings can be readily implemented for other real-time applications too.
Chart data extraction is a crucial research field in recovering information from chart images. With the recent rise in image processing and computer vision algorithms, researchers presented various approaches to tackl...
详细信息
Chart data extraction is a crucial research field in recovering information from chart images. With the recent rise in image processing and computer vision algorithms, researchers presented various approaches to tackle this problem. Nevertheless, most of them use different datasets, often not publicly available to the research community. Therefore, the main focus of this research was to create a chart data extraction algorithm for circular-shaped and grid-like chart types, which will accelerate research in this field and allow uniform result comparison. A large-scale dataset is provided containing 120,000 chart images organized into 20 categories, with corresponding ground truth for each image. Through the undertaken extensive research and to the best of our knowledge, no other author reports the chart data extraction of the sunburst diagrams, heatmaps, and waffle charts. In this research, a new, fully automatic low-level algorithm is also presented that uses a raster image as input and generates an object-oriented structure of the chart of that image. The main novelty of the proposed approach is in chart processing on binary images instead of commonly used pixel counting techniques. The experiments were performed with a synthetic dataset and with real-world chart images. The obtained results demonstrate two things: First, a low-level bottom-up approach can be shared among different chart types. Second, the proposed algorithm achieves superior results on a synthetic dataset. The achieved average data extraction accuracy on the synthetic dataset can be considered state-of-the-art within multiple error rate groups.
The task of document binarization of degraded complex documents is tremendously challenging due to the various forms of noise often present in these documents. While the current state-of-the-art deep learning approach...
详细信息
The task of document binarization of degraded complex documents is tremendously challenging due to the various forms of noise often present in these documents. While the current state-of-the-art deep learning approaches are capable for the removal of various noise types in documents with high accuracy, they employ a supervised learning scheme which requires matching clean and noisy document image pairs which are difficult and costly to obtain for complex documents such as engineering drawings. In this paper, we propose our method for document binarization of engineering drawings using 'Multi Noise CycleGAN'. The method utilizing unsupervised learning using adversarial and cycle-consistency loss is trained on unpaired noisy document images of various noise and image conditions. Experimental results for the removal of various noise types demonstrated that the method is able to reliably produce a clean image for any given noisy image and in certain noisy images achieve significant improvements over existing methods.
In recovering information from the chart image, the first step should be chart type classification. Throughout history, many approaches have been used, and some of them achieve results better than others. The latest a...
详细信息
In recovering information from the chart image, the first step should be chart type classification. Throughout history, many approaches have been used, and some of them achieve results better than others. The latest articles are using a Support Vector Machine (SVM) in combination with a Convolutional Neural Network (CNN), which achieve almost perfect results with the datasets of few thousand images per class. The datasets containing chart images are primarily synthetic and lack real-world examples. To overcome the problem of small datasets, to our knowledge, this is the first report of using Siamese CNN architecture for chart type classification. Multiple network architectures are tested, and the results of different dataset sizes are compared. The network verification is conducted using Few-shot learning (FSL). Many of described advantages of Siamese CNNs are shown in examples. In the end, we show that the Siamese CNN can work with one image per class, and a 100% average classification accuracy is achieved with 50 images per class, where the CNN achieves only average classification accuracy of 43% for the same dataset.
Manga or Japanese comics are a popular medium and their images comprise line drawings and screentones. This study investigates the screentone synthesis task that involves translation from line drawings to manga images...
详细信息
ISBN:
(数字)9781728156064
ISBN:
(纸本)9781728156071
Manga or Japanese comics are a popular medium and their images comprise line drawings and screentones. This study investigates the screentone synthesis task that involves translation from line drawings to manga images. Screentones have regular patterns that are difficult to synthesize. To address this problem, we propose a method to translate line drawings into manga images by generating pixel-wise screentone class labels instead of generating manga images directly. To train a screentone label generator, we create paired data of line drawings and pixel-wise screentone class labels that we obtain by applying to manga images a screentone removal and a screentone classifier, respectively. We train the screentone classifier using paired data of simulated manga images and pixel-wise screentone class labels. In tests, we conduct post-processing to reduce noise in the generated pixel-wise screentone labels. Experiments show that our proposed method produces reasonable screentone patterns. In comparison with results obtained using a baseline method of image-to-image translations, our results are comparable or more visually appealing.
暂无评论