There is now a growing need to identify various kinds of activities that occur in videos. In this paper, we first present a logical language called Probabilistic Activity Description Language (PADL) in which users can...
详细信息
There is now a growing need to identify various kinds of activities that occur in videos. In this paper, we first present a logical language called Probabilistic Activity Description Language (PADL) in which users can specify activities of interest. We then develop a probabilistic framework which assigns to any subvideo of a given video sequence a probability that the subvideo contains the given activity, and we finally develop two fast algorithms to detect activities within this framework. OffPad finds all minimal segments of a video that contain a given activity with a probability exceeding a given threshold. In contrast, the OnPad algorithm examines a video during playout (rather than afterwards as OffPad does) and computes the probability that a given activity is occurring (even if the activity is only partially complete). Our prototype Probabilistic Activity Detection System (PADS) implements the framework and the two algorithms, building on top of existing imageprocessing algorithms. We have conducted detailed experiments and compared our approach to four different approaches presented in the literature. We show that-for complex activity definitions-our approach outperforms all the other approaches.
IntroductionThe application of artificial intelligence to facial aesthetics has been limited by the inability to discern facial zones of interest, as defined by complex facial musculature and underlying structures. Al...
详细信息
IntroductionThe application of artificial intelligence to facial aesthetics has been limited by the inability to discern facial zones of interest, as defined by complex facial musculature and underlying structures. Although semantic segmentation models (SSMs) could potentially overcome this limitation, existing facial SSMs distinguish only three to nine facial zones of *** developed a new supervised SSM, trained on 669 high-resolution clinical-grade facial images;a subset of these images was used in an iterative process between facial aesthetics experts and manual annotators that defined and labeled 33 facial zones of *** some zones overlap, some pixels are included in multiple zones, violating the one-to-one relationship between a given pixel and a specific class (zone) required for SSMs. The full facial zone model was therefore used to create three sub-models, each with completely non-overlapping zones, generating three outputs for each input image that can be treated as standalone models. For each facial zone, the output demonstrating the best Intersection Over Union (IOU) value was selected as the winning *** new SSM demonstrates mean IOU values superior to manual annotation and landmark analyses, and it is more robust than landmark methods in handling variances in facial shape and structure.
Satellite imagery is changing the way we understand and predict economic activity in the world. Advancements in satellite hardware and low-cost rocket launches have enabled near-real-time, high-resolution images cover...
详细信息
Satellite imagery is changing the way we understand and predict economic activity in the world. Advancements in satellite hardware and low-cost rocket launches have enabled near-real-time, high-resolution images covering the entire Earth. It is too labour-intensive, time-consuming and expensive for human annotators to analyse petabytes of satellite imagery manually. Current computervision research exploring this problem still lack accuracy and prediction speed, both significantly important metrics for latency-sensitive automatized industrial applications. Here we address both of these challenges by proposing a set of improvements to the object recognition model design, training and complexity regularisation, applicable to a range of neural networks. Furthermore, we propose a fully convolutional neural network (FCN) architecture optimised for accurate and accelerated object recognition in multispectral satellite imagery. We show that our FCN exceeds human-level performance with state-of-the-art 97.67% accuracy over multiple sensors, it is able to generalize across dispersed scenery and outperforms other proposed methods to date. Its computationally light architecture delivers a fivefold improvement in training time and a rapid prediction, essential to real-time applications. To illustrate practical model effectiveness, we analyse it in algorithmic trading environment. Additionally, we publish a proprietary annotated satellite imagery dataset for further development in this research field. Our findings can be readily implemented for other real-time applications too.
Chart data extraction is a crucial research field in recovering information from chart images. With the recent rise in image processing and computer vision algorithms, researchers presented various approaches to tackl...
详细信息
Chart data extraction is a crucial research field in recovering information from chart images. With the recent rise in image processing and computer vision algorithms, researchers presented various approaches to tackle this problem. Nevertheless, most of them use different datasets, often not publicly available to the research community. Therefore, the main focus of this research was to create a chart data extraction algorithm for circular-shaped and grid-like chart types, which will accelerate research in this field and allow uniform result comparison. A large-scale dataset is provided containing 120,000 chart images organized into 20 categories, with corresponding ground truth for each image. Through the undertaken extensive research and to the best of our knowledge, no other author reports the chart data extraction of the sunburst diagrams, heatmaps, and waffle charts. In this research, a new, fully automatic low-level algorithm is also presented that uses a raster image as input and generates an object-oriented structure of the chart of that image. The main novelty of the proposed approach is in chart processing on binary images instead of commonly used pixel counting techniques. The experiments were performed with a synthetic dataset and with real-world chart images. The obtained results demonstrate two things: First, a low-level bottom-up approach can be shared among different chart types. Second, the proposed algorithm achieves superior results on a synthetic dataset. The achieved average data extraction accuracy on the synthetic dataset can be considered state-of-the-art within multiple error rate groups.
The task of document binarization of degraded complex documents is tremendously challenging due to the various forms of noise often present in these documents. While the current state-of-the-art deep learning approach...
详细信息
The task of document binarization of degraded complex documents is tremendously challenging due to the various forms of noise often present in these documents. While the current state-of-the-art deep learning approaches are capable for the removal of various noise types in documents with high accuracy, they employ a supervised learning scheme which requires matching clean and noisy document image pairs which are difficult and costly to obtain for complex documents such as engineering drawings. In this paper, we propose our method for document binarization of engineering drawings using 'Multi Noise CycleGAN'. The method utilizing unsupervised learning using adversarial and cycle-consistency loss is trained on unpaired noisy document images of various noise and image conditions. Experimental results for the removal of various noise types demonstrated that the method is able to reliably produce a clean image for any given noisy image and in certain noisy images achieve significant improvements over existing methods.
Gain concepts central to digital video using the affordable Corel Video Studio Ultimate X9 software package as well as open source digital video editing package EditShare Lightworks 12. This compact visual guide build...
详细信息
ISBN:
(数字)9781484218662
ISBN:
(纸本)9781484218655
Gain concepts central to digital video using the affordable Corel Video Studio Ultimate X9 software package as well as open source digital video editing package EditShare Lightworks 12. This compact visual guide builds on the essential concepts of digital imaging, audio, illustration, and painting, and gets more advanced as chapters progress, covering what digital video new media formats are best for use with Android Studio, Java and JavaFX, iOS, and HTML5.;Furthermore,;covers the key factors of the data footprint optimization work process, streaming versus captive assets, and why these are important.;Website developers, musicians, digital signage, e-learning content creators, Android developers, and iOS developers.
In the early 1990s, the establishment of the Internet brought forth a revolutionary viewpoint of information storage, distribution, and processing: the World Wide Web is becoming an enormous and expanding distributed ...
详细信息
ISBN:
(数字)9781402080357
ISBN:
(纸本)9781402080340;9781475779301
In the early 1990s, the establishment of the Internet brought forth a revolutionary viewpoint of information storage, distribution, and processing: the World Wide Web is becoming an enormous and expanding distributed digital library. Along with the development of the Web, image indexing and retrieval have grown into research areas sharing a vision of intelligent agents. Far beyond Web searching, image indexing and retrieval can potentially be applied to many other areas, including biomedicine, space science, biometric identification, digital libraries, the military, education, commerce, culture and entertainment.;describes several approaches of integrating machine learning and statistical modeling into an image retrieval and indexing system that demonstrates promising results. The topics of this book reflect authors' experiences of machine learning and statistical modeling based image indexing and retrieval. This book contains detailed references for further reading and research in this field as well.
In recovering information from the chart image, the first step should be chart type classification. Throughout history, many approaches have been used, and some of them achieve results better than others. The latest a...
详细信息
In recovering information from the chart image, the first step should be chart type classification. Throughout history, many approaches have been used, and some of them achieve results better than others. The latest articles are using a Support Vector Machine (SVM) in combination with a Convolutional Neural Network (CNN), which achieve almost perfect results with the datasets of few thousand images per class. The datasets containing chart images are primarily synthetic and lack real-world examples. To overcome the problem of small datasets, to our knowledge, this is the first report of using Siamese CNN architecture for chart type classification. Multiple network architectures are tested, and the results of different dataset sizes are compared. The network verification is conducted using Few-shot learning (FSL). Many of described advantages of Siamese CNNs are shown in examples. In the end, we show that the Siamese CNN can work with one image per class, and a 100% average classification accuracy is achieved with 50 images per class, where the CNN achieves only average classification accuracy of 43% for the same dataset.
暂无评论