Despite the success of Deep-Learning-based (DL) methods on Salient Object Detection (SOD), the need for abundantly labeled data and the high complexity of the network architectures limit their applications. Feature Le...
详细信息
ISBN:
(数字)9798350376036
ISBN:
(纸本)9798350376043
Despite the success of Deep-Learning-based (DL) methods on Salient Object Detection (SOD), the need for abundantly labeled data and the high complexity of the network architectures limit their applications. Feature Learning from image Markers (FLIM) is a recent methodology to build convolutional encoders with minimal human effort in data annotation. More recently, a FLIM encoder has been combined with an adaptive decoder to build flyweight FLIM networks for SOD, requiring only user-drawn markers in discriminative regions of a few (e.g., 4) images to train the entire model with no backpropagation. Furthermore, due to the data scarcity in some applications, using Cellular Automata (CA) may help compute better saliency maps. However, the initialization of the CA could be a problem since it is based on user input, priors, or randomness. Here, we propose a new strategy for CA initialization via a FLIM-based SOD network. In summary, CA interprets pixels of an initial saliency map as cells and cleverly designs transition rules to generate an improved saliency map through the evolution and interaction of each cell and its neighbors using the original pixel properties. CA requires initializing the cell's states, where methods diverge. By exploring the saliency map of a FLIM network, we circumvent the CA initialization problem and improve FLIM saliencies. Experiments in two challenging medical datasets demonstrate improvements in FLIM-based SOD, with results comparable to two state-of-the-art DL methods fine-tuned under data scarcity.
We present a novel approach for image deformation and video time warping. Our technique involves the inversion of the nonlinear regularized Kelvinlet equations, leading to higher-quality results and time/space efficie...
详细信息
ISBN:
(纸本)9781728192741
We present a novel approach for image deformation and video time warping. Our technique involves the inversion of the nonlinear regularized Kelvinlet equations, leading to higher-quality results and time/space efficiency compared to naive solutions. Inversion is performed by a per-pixel optimization process, being inherently parallel and achieving real-time performance in Full HD videos (over 300 fps). We demonstrate our method on a variety of images and videos, in addition to discussing some important technical and theoretical details.
Open-set face recognition is a scenario in which biometric systems have incomplete knowledge of all existing subjects. This arduous requirement must dismiss irrelevant faces and focus on subjects of interest only. For...
Open-set face recognition is a scenario in which biometric systems have incomplete knowledge of all existing subjects. This arduous requirement must dismiss irrelevant faces and focus on subjects of interest only. For this reason, this work introduces a novel method that associates an ensemble of compact neural networks with data augmentation at the feature level and an entropy-based cost function. Deep neural networks pre-trained on large face datasets serve as the preliminary feature extraction module. The neural adapter ensemble consists of binary models trained on original feature representations along with negative synthetic mix-up embeddings, which are adequately handled by the designed open-set loss since they do not belong to any known identity. We carry out experiments on well-known LFW and IJB-C datasets where results show that the approach is capable of boosting closed and open-set identification accuracy.
This article introduces a multi-level automatic image segmentation method based on graphs and Label Propagation (LP), originally proposed for the detection of communities in complex networks, namely MGLP. To reduce th...
详细信息
ISBN:
(纸本)9781728192741
This article introduces a multi-level automatic image segmentation method based on graphs and Label Propagation (LP), originally proposed for the detection of communities in complex networks, namely MGLP. To reduce the number of graph nodes, a super-pixel strategy is employed, followed by the computation of color descriptors. Segmentation is achieved by a deterministic propagation of vertex labels at each level. Several experiments with real color images of the BSDS500 dataset were performed to evaluate the method. Our method outperforms related strategies in terms of segmentation quality and processing time. Considering the Covering metric for image segmentation quality, for example, MGLP outperforms LPCI-SP, its most similar counterpart, in 38.99%. In term of processing times, MGLP is 1.07 faster than LPCI-SP.
We present a novel way of using one's eye to capture an image of what it "sees" through the use of steady-state visually-evoked potentials (SSVEP). Existing methods leveraging response patterns for SSVEP...
详细信息
ISBN:
(纸本)9781728192741
We present a novel way of using one's eye to capture an image of what it "sees" through the use of steady-state visually-evoked potentials (SSVEP). Existing methods leveraging response patterns for SSVEP visual image reconstruction show lossy reconstruction and have a lengthy scanning process. With our signal acquisition procedure, data collection requirements are significantly decreased while still improving the signal clarity. The data for image reconstruction were collected from the Oz positioned electrode using a low-cost, wearable electroencephalography (EEG) device. For image reconstruction, software-defined lock-in amplifier (LIA) and discrete Fourier transform (DFT) signal processing methods are analyzed and compared.
Numerous investigations have delved into image segmentation methodologies, encompassing techniques such as thresholding, convolution, deep neural networks, and most recently, vision transformers. However, many of thes...
详细信息
ISBN:
(数字)9798350354119
ISBN:
(纸本)9798350354126
Numerous investigations have delved into image segmentation methodologies, encompassing techniques such as thresholding, convolution, deep neural networks, and most recently, vision transformers. However, many of these studies overlook the imperative of optimizing image segmentation for various processing cores or specialized hardware like ASICs or FPGAs. This paper presents our implementation of both a neural network model and a vision transformer model based on UNET (UNETR) for image segmentation tasks, utilizing the Lapa dataset. We meticulously compare their training and prediction accuracies. Additionally, we introduce mixed-precision architectures for both models, aiming to enhance real-time image segmentation performance on both CPU and GPU platforms. Furthermore, we propose a novel mixed-precision FPGA architecture developed in Vivado software. This architecture is specifically tailored to optimize inference and streaming capabilities on FPGA devices. The hallmark contribution of our research lies in the development of an FPGA accelerator designed for the UNETR model. Our findings reveal substantial enhancements in the training speed of our model on GPU platforms with the application of mixed precision, as well as notable improvements in latency during real-time inference on FPGA. Notably, we achieved an Fl-score and training accuracy of approximately 90% over 60 training epochs.
The oceans occupy a considerable part of our planet and are underexplored compared to their own significance. The bottom of the sea was reached, but there are several difficulties involved in trying to extract informa...
详细信息
ISBN:
(纸本)9781665407618
The oceans occupy a considerable part of our planet and are underexplored compared to their own significance. The bottom of the sea was reached, but there are several difficulties involved in trying to extract information from that location. In the spectrum of robotics and computer vision, underwater images are particularly challenges due to the vast range of existing aquatic environments, presenting diverse characteristics. A robot working underwater needs to understand the environment to which it is exposed. Thus, the proposed work aims to help the capabilities of its vision system addressing the problem of classifying underwater images according to their water type. The proposed approach takes into account the color channels and uses a classification tool widely accepted in the scientific world as a basis. The method is developed through the recognition of patterns observed in several underwater images of varying depths and types. We achieve good results when compared to state-of-the-art methods opening several opportunities for underwater imageprocessing.
Amazon rainforest deforestation severely impacts the environment in many ways, including biodiversity reduction, climate change, and so on. A key indicator of deforestation is the sudden appearance of rural/unofficial...
Amazon rainforest deforestation severely impacts the environment in many ways, including biodiversity reduction, climate change, and so on. A key indicator of deforestation is the sudden appearance of rural/unofficial roads, usually exploited to transport raw materials extracted from the forest. To early detect such roads and prevent deforestation, remote sensing images have been widely employed. Precisely, some researchers have focused on tackling this task by using low-resolution imagery, mainly due to their public availability and long time series. However, performing road extracting using low-resolution images poses several challenges, most of which are not addressed by existing works, including high inter-class similarity, complex structure, etc. Motivated by this, in this paper, we propose a novel approach to perform road extraction on low-resolution satellite images based on contextual and pixel-level decision fusion. We conducted a systematic evaluation of the proposed method using a new dataset proposed in this work. The experiments show that the proposed method outperforms state-of-the-art algorithms in terms of intersection over union and F1-score.
With the rapid advancement of technology, the de-sign of virtual humans has led to a very realistic user experience, such as in movies, video games, and simulations. As a result, virtual humans are becoming increasing...
With the rapid advancement of technology, the de-sign of virtual humans has led to a very realistic user experience, such as in movies, video games, and simulations. As a result, virtual humans are becoming increasingly similar to real humans. However, following the Uncanny Valley (UV) theory, users tend to feel discomfort when watching entities with anthropomorphic traits that differ from real humans. This phenomenon is related to social identity theory, where the observer looks for something familiar. In computergraphics (CG), techniques used to create virtual humans with dark skin tones often rely on approaches initially developed for rendering characters with white skin tones. Furthermore, most CG characters portrayed in various media, including movies and games, predominantly exhibit white skin tones. Consequently, it is pertinent to explore people's perceptions regarding different groups of virtual humans. Thus, this paper aims to examine and evaluate the human perception of CG characters from different media, comparing two types of skin colors. The findings indicate that individuals felt more comfortable and perceived less realism when watching characters with dark colored skin than those with white colored skin. Our central hypothesis is that dark colored characters, rendered with classical developed algorithms, are considered more cartoon than realistic and placed on the left of the Valley in the UV chart.
This article uses Graphic Neural Network (GNN) models on histology images to classify tissue to find phenotypes. The majority of tissue phenotyping approaches are confined to tumor and stroma classification and necess...
详细信息
ISBN:
(纸本)9781665497961
This article uses Graphic Neural Network (GNN) models on histology images to classify tissue to find phenotypes. The majority of tissue phenotyping approaches are confined to tumor and stroma classification and necessitate a significant number of histology images. In this study, graphics Convolutional Network (GCN) is used on the CRC Tissue Phenotyping dataset, which consists of seven tissue phenotypes, namely Benign, Complex Stroma, Debris, Inflammatory, Muscle, Stroma, and Tumor. First, the input images are converted into superpixels using the SLIC algorithm and the region neighborhood graphs (RAGs), where each superpixel is a node, and the edges connect neighboring superpixels to each other are converted. Finally, graphic classification is performed on the graphic data set using GCN.
暂无评论