Camera phones are ubiquitous, and consumers have been adopting them faster than any other technology in modern history. When connected to a network, though, they are capable of more than just picture taking: Suddenly,...
详细信息
ISBN:
(纸本)9780819481726
Camera phones are ubiquitous, and consumers have been adopting them faster than any other technology in modern history. When connected to a network, though, they are capable of more than just picture taking: Suddenly, they gain access to the power of the cloud. We exploit this capability by providing a series of image-based personal advisory services. These are designed to work with any handset over any cellular carrier using commonly available Multimedia Messaging Service (MMS) and Short Message Service (SMS) features. Targeted at the unsophisticated consumer, these applications must be quick and easy to use, not requiring download capabilities or preplanning. Thus, all application processing occurs in the back-end system (i.e., as a cloud service) and not on the handset itself. Presenting an image to an advisory service in the cloud, a user receives information that can be acted upon immediately. Two of our examples involve color assessment - selecting cosmetics and home decor paint palettes;the third provides the ability to extract text from a scene. In the case of the color imaging applications, we have shown that our service rivals the advice quality of experts. The result of this capability is a new paradigm for mobile interactions - image-based information services exploiting the ubiquity of camera phones.
This paper investigates the application of deep convolutional neural networks with prohibitively small datasets to the problem of macular edema segmentation. In particular, we investigate several different heavily reg...
详细信息
ISBN:
(纸本)9781728195742
This paper investigates the application of deep convolutional neural networks with prohibitively small datasets to the problem of macular edema segmentation. In particular, we investigate several different heavily regularized architectures. We find that, contrary to popular belief, neural architectures within this application setting are able to achieve close to human-level performance on unseen test images without requiring large numbers of training examples. Annotating these 3D datasets is difficult, with multiple criteria required. It takes an experienced clinician two days to annotate a single 3D image, whereas our trained model achieves similar performance in less than a second. We found that an approach which uses targeted dataset augmentation, alongside architectural simplification with an emphasis on residual design, has acceptable generalization performance - despite relying on fewer than 15 training examples.
Automatically synthesizing the facial sketches of a facial image is highly challenging since facial images typically exhibit a wide range of poses, expressions and scales, and have differing degrees of illumination an...
详细信息
ISBN:
(纸本)9781424474936
Automatically synthesizing the facial sketches of a facial image is highly challenging since facial images typically exhibit a wide range of poses, expressions and scales, and have differing degrees of illumination and/or occlusion. When the facial sketches are to be synthesized in the unique sketching style of a particular artist, the problem becomes even more complex. This study develops an automatic facial sketch synthesis system based on a novel direct combined model (DCM) algorithm carrying three major advantages: First, DCM approach takes account of both the local details of each facial feature and the global geometric structure of the face, and thus the synthesized sketches more accurately mimic the caricatures drawn by the artist. Second, although the training database contains only full-frontal facial images with a neutral expression, sketches with a wide variety of facial poses, gaze directions and facial expressions can be successfully synthesized. Third, previous synthesizing proposals are heavily reliant on the quality of the texture reconstruction results, which in turn are highly sensitive to occlusion and lighting effects in the input image. DCM approach accurately produces lifelike synthesized facial sketches without the need to restore the texture information lost as a result of such unfavorable conditions.
This book reviews the state of the art in deep learning approaches to high-performance robust disease detection, robust and accurate organ segmentation in medical image computing (radiological and pathological imaging...
详细信息
ISBN:
(数字)9783030139698
ISBN:
(纸本)9783030139681;9783030139711
This book reviews the state of the art in deep learning approaches to high-performance robust disease detection, robust and accurate organ segmentation in medical image computing (radiological and pathological imaging modalities), and the construction and mining of large-scale radiology databases. It particularly focuses on the application of convolutional neural networks, and on recurrent neural networks like LSTM, using numerous practical examples to complement the theory.;The book’s chief features are as follows: It highlights how deep neural networks can be used to address new questions and protocols, and to tackle current challenges in medical image computing; presents a comprehensive review of the latest research and literature; and describes a range of different methods that employ deep learning for object or landmark detection tasks in 2D and 3D medical imaging. In addition, the book examines a broad selection of techniques for semantic segmentation using deep learning principles in medical imaging; introduces a novel approach to text and image deep embedding for a large-scale chest x-ray image database; and discusses how deep learning relational graphs can be used to organize a sizable collection of radiology findings from real clinical practice, allowing semantic similarity-based retrieval.
Manga or Japanese comics are a popular medium and their images comprise line drawings and screentones. This study investigates the screentone synthesis task that involves translation from line drawings to manga images...
详细信息
ISBN:
(纸本)9781728156064
Manga or Japanese comics are a popular medium and their images comprise line drawings and screentones. This study investigates the screentone synthesis task that involves translation from line drawings to manga images. Screentones have regular patterns that are difficult to synthesize. To address this problem, we propose a method to translate line drawings into manga images by generating pixel-wise screentone class labels instead of generating manga images directly. To train a screentone label generator, we create paired data of line drawings and pixel-wise screentone class labels that we obtain by applying to manga images a screentone removal and a screentone classifier, respectively. We train the screentone classifier using paired data of simulated manga images and pixel-wise screentone class labels. In tests, we conduct post-processing to reduce noise in the generated pixel-wise screentone labels. Experiments show that our proposed method produces reasonable screentone patterns. In comparison with results obtained using a baseline method of image-to-image translations, our results are comparable or more visually appealing.
Dilated convolution is used to achieve wide receptive fields in computervision algorithms such as image segmentation and denoising. Unlike the strided convolution, dilated convolution maintains the resolution of the ...
详细信息
ISBN:
(纸本)9781538662496
Dilated convolution is used to achieve wide receptive fields in computervision algorithms such as image segmentation and denoising. Unlike the strided convolution, dilated convolution maintains the resolution of the output feature map same as the input feature map. Thus, the computational complexity can be increased to configure the convolutional neural network (CNN) architecture with the dilated convolutional layer. However, the complexity accordingly introduces additional computation delay and it is strongly required to have a proper way to lessen the computation delay of the dilated convolution. In this paper, we propose the dilated-Winograd convolution to reduce the computational complexity of the dilated convolution. By using the Winograd transform with a dilation rate, the number of pixels in the tile is effectively reduced. The proposed acceleration methods result in an average speedup of 2.043 and 1.456 with dilation rate of 2 and 4 compared to the state-of-the-art implementation.
Nowadays an increasingly wide variety of multimedia devices can be networked together in ever-growing smart environments. Although these networks, thanks to mobile technology and Wi-Fi, are almost ubiquitous by now, t...
详细信息
ISBN:
(纸本)9783319208046;9783319208039
Nowadays an increasingly wide variety of multimedia devices can be networked together in ever-growing smart environments. Although these networks, thanks to mobile technology and Wi-Fi, are almost ubiquitous by now, the players therein are still working largely distinct from one another. To simply play a file on the playback device A, which is originally housed on device B, is therefore a complicated task, despite the theoretical possibility provided by existing networking. Especially playing and viewing files on multimedia devices under various circumstances and limited reproduction capabilities is a non-trivial problem. Current solutions from industry still put little interoperable approaches in proprietary systems. Individual multimedia devices of the same manufacturer can be combined intelligently, but with respect to the usability the system scales poorly, the (also physical) distribution increases the difficulty of access to the functions and control is largely independent of the user's context. In this work, a solution is developed, which focuses in particular on the context-based playback of files: sending video, music, image and text files to output devices with different display options, as well as the distribution of these multimedia files between devices. Activities are centered on a mobile device for visualizing the spatial distribution of all devices, including the user's position and the intuitive movement of files of various types between them.
暂无评论