Convolutional Neural Networks (CNNs) have shown a great potential in different application domains including object detection, image classification, natural language processing, and speech recognition. Since the depth...
详细信息
ISBN:
(纸本)9781538679104
Convolutional Neural Networks (CNNs) have shown a great potential in different application domains including object detection, image classification, natural language processing, and speech recognition. Since the depth of the neural network architectures keep growing and the requirement of the large-scale dataset, to design a high-performance computing hardware for training CNNs is very necessary. In this paper, we measure the performance of different configuration on GPU platform and learning the patterns through training two CNNs architectures, LeNet and MiniNet, both perform the image classification. Observe the results of measurements, we indicate the correlation between LAD cache and the performance of GPUs during the training process. Also, we demonstrate that L2D cache slightly influences the performance. The network traffic intensity with both CNN models shows that each layer has distinct patterns of traffic intensity.
Shape, number, and position of teeth are the main targets of a dentist when screening for patient's problems on X-rays. Rather than solely relying on the trained eyes of the dentists, computational tools have been...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
Shape, number, and position of teeth are the main targets of a dentist when screening for patient's problems on X-rays. Rather than solely relying on the trained eyes of the dentists, computational tools have been proposed to aid specialists as decision supporter for better diagnoses. When applied to X-rays, these tools are specially grounded on object segmentation and detection. In fact, the very first goal of segmenting and detecting the teeth in the images is to facilitate other automatic methods in further processing steps. Although researches over tooth segmentation and detection are not recent, the application of deep learning techniques in the field is new and has not reached maturity yet. To fill some gaps in the area of dental image analysis, we bring a thorough study on tooth segmentation and numbering on panoramic X-ray images by means of end-to-end deep neural networks. For that, we analyze the performance of four network architectures, namely, Mask R-CNN, PANet, HTC, and ResNeSt, over a challenging data set. The choice of these networks was made upon their high performance over other data sets for instance segmentation and detection. To the best of our knowledge, this is the first study on instance segmentation, detection, and numbering of teeth on panoramic dental X-rays. We found that (i) it is completely feasible to detect, to segment, and to number teeth by through any of the analyzed architectures, (ii) performance can be significantly boosted with the proper choice of neural network architecture, and (iii) the PANet had the best results on our evaluations with an mAP of 71.3% on segmentation and 74.0% on numbering, raising 4.9 and 3.5 percentage points the results obtained with Mask R-CNN.
Pose estimation is a challenging task in computer vision that has many applications, as for example: in motion capture, in medical analysis, in human posture monitoring, and in robotics. In other words, it is a main t...
详细信息
ISBN:
(数字)9781728192741
ISBN:
(纸本)9781728192758
Pose estimation is a challenging task in computer vision that has many applications, as for example: in motion capture, in medical analysis, in human posture monitoring, and in robotics. In other words, it is a main tool to enable machines do understand human patterns in videos or images. Performing this task in real-time while maintaining accuracy and precision is critical for many of these applications. Several papers propose real time approaches considering deep neural networks for pose estimation. However, in most cases they fail when considering run-time performance or do not achieve the precision needed. In this work, we propose a new model for real-time pose estimation considering attention modules for convolutional neural networks (CNNs). We introduce a two-dimensional relative attention mechanism for feature extraction in pose machines leading to improvements in accuracy. We create a single shot architecture where both operations to infer key points and part affinity fields share the information. Also, for each stage, we use tensor decompositions to not only reduce dimensionality, but also to improve performance. This allows us to factorize each convolution and drastically reduce the number of parameters in our network. Our experiments show that, with this factorized approach, it is possible to achieve state-of-art performance in terms of run-time while we have a small reduction in accuracy.
The following topics are dealt with: image segmentation; learning (artificial intelligence); convolutional neural nets; medical imageprocessing; data visualisation; image classification; feature extraction; object de...
详细信息
The following topics are dealt with: image segmentation; learning (artificial intelligence); convolutional neural nets; medical imageprocessing; data visualisation; image classification; feature extraction; object detection; image representation; image recognition.
The large variety of medical image modalities (e.g. Computed Tomography, Magnetic Resonance Imaging, and Positron Emission Tomography) acquired from the same body region of a patient together with recent advances in c...
详细信息
ISBN:
(纸本)9781538692646
The large variety of medical image modalities (e.g. Computed Tomography, Magnetic Resonance Imaging, and Positron Emission Tomography) acquired from the same body region of a patient together with recent advances in computer architectures with faster and larger CPUs and GPUs allows a new, exciting, and unexplored world for image registration area. A precise and accurate registration of images makes possible understanding the etiology of diseases, improving surgery planning and execution, detecting otherwise unnoticed health problem signals, and mapping functionalities of the brain. The goal of this paper is to present a review of the state-of-the-art in medical image registration starting from the preprocessing steps, covering the most popular methodologies of the literature and finish with the more recent advances and perspectives from the application of Deep Learning architectures.
This paper proposes a convolutional neural network model for detection and classification of vehicles present in digital images into six categories, namely Bus, Microbus, Minivan, Sedan, SUV, and Truck. Experimental r...
详细信息
ISBN:
(纸本)9781728132273
This paper proposes a convolutional neural network model for detection and classification of vehicles present in digital images into six categories, namely Bus, Microbus, Minivan, Sedan, SUV, and Truck. Experimental results with the BIT-Vehicle Dataset report a mean average precision (mAP) of 92.40% and the average intersection over a union (IoU) of 81.26% with the average inference latency under 70 milliseconds in an environment equipped with a graphicsprocessing unit. We conclude that the model is discriminative and capable of generalizing the patterns of the vehicle type classification task while not requiring expensive computational resources. These features suggest that the model can be useful in the development of embedded intelligent traffic systems improving accuracy and decision latency.
Machine vision performs an important role in many applications, including robotics. Combined with classical instrumentation, welding robots can use a camera to perceive the scene and take a decision. A camera attached...
详细信息
ISBN:
(纸本)9781538692646
Machine vision performs an important role in many applications, including robotics. Combined with classical instrumentation, welding robots can use a camera to perceive the scene and take a decision. A camera attached to the robot body and machine vision system work as the eyes of the robot during the welding process. However, image-based systems are susceptible to the interference of fumes, sparks, dust, and artifacts generated as a side effect of the welding process. Fume can adhere to the lenses and degrades the image, introducing a negative impact on the processing pipeline. This paper proposes a novel image fusion based algorithm that minimizes the effects caused by the fume adhered to the camera lens. Results show the proposed method is able to enhance the overall image quality, outperforming classical alternatives for similar problems.
With rapid advancements in technology, in recent times we have witnessed tremendous strides taken in camera and imaging sciences. Digital processing has also added to exceptional details in photography. The advancemen...
详细信息
ISBN:
(数字)9781728180588
ISBN:
(纸本)9781728180595
With rapid advancements in technology, in recent times we have witnessed tremendous strides taken in camera and imaging sciences. Digital processing has also added to exceptional details in photography. The advancements in imaging have created a gap between capturing a moment centered around aesthetics and random captures without meaning. In this work, we aim to provide to users an insightful feedback on the aesthetics of the images captured. The feedback concentrates on the rule of thirds, high dynamic range, color, content and various quality aspects for each image captured. To achieve this, we propose a hierarchical real time image aesthetic score prediction technique. We use a modified YOLO-CNN and Mobilenet for object detection and style prediction, followed by a multi-threaded aesthetic predictor to analyze and score the images on portable devices.
We present a novel strategy for image-level interaction that is applicable to a single image without any prior structural knowledge, such as object status or reconstructed 3D models. By training sets of input image an...
详细信息
ISBN:
(纸本)9781728147659
We present a novel strategy for image-level interaction that is applicable to a single image without any prior structural knowledge, such as object status or reconstructed 3D models. By training sets of input image and interaction pairs using a target image, our model can generate result images by applying the desired interaction to new unseen images. The proposed method is differentiated from previous approaches for changing poses, which requires absolute statuses for training images or estimated 3D model with reconstruction errors. Based on the conceptual analysis of encoder-decoder networks, we propose a novel generator network architecture containing a feature converter network, which is suitable for applying interactions to images. We also implement a discriminator network for training, which is a well-known technique for generative adversarial networks. Experimental results demonstrate that the proposed method successfully generates result images with applied interactions without any prior knowledge. We expect that our method will provide insights into novel interaction schemes for augmented reality by reflecting interactions onto real scenes and providing more realistic user experiences.
Poor visibility is a common problem when capturing images in participating mediums such as mist or water. The problem of generating a haze-free image based on a hazy one can be described as image dehazing. Previous ap...
详细信息
ISBN:
(纸本)9781538692646
Poor visibility is a common problem when capturing images in participating mediums such as mist or water. The problem of generating a haze-free image based on a hazy one can be described as image dehazing. Previous approaches dealt with this problem using physical models based on priors and simplifications. In this paper, we demonstrate that an end-to-end convolutional neural network is able to learn the dehazing process with no parameters or priors required, resulting in a more generic method. Even though our model is trained entirely with hazy indoor images, we are able to fully restore outdoor images with real haze. Also, we propose an architecture containing the novel Guided Layers, introduced in order to reduce the loss of spatial information while restoring the images. Our method outperforms other machine learning based models, yielding superior results both qualitatively and quantitatively.
暂无评论