Convolutional neural networks (CNNs) are widely used in machine learning (ML) applications such as imageprocessing. CNN requires heavy computations to provide significant accuracy for many ML tasks. Therefore, the ef...
详细信息
Convolutional neural networks (CNNs) are widely used in machine learning (ML) applications such as imageprocessing. CNN requires heavy computations to provide significant accuracy for many ML tasks. Therefore, the efficient implementations of CNNs to improve performance using limited resources without accuracy reduction is a challenge for ML systems. One of the architectures for the efficient execution of CNNs is the array-based accelerator, that consists of an array of similar processing elements (PEs). The array accelerators are popular as high-performance architecture using the features of parallel computing and data reuse. These accelerators are optimized for a set of CNN layers, not for individual layers. Using the same accelerator dimension size to compute all CNN layers with varying shapes and sizes leads to the resource underutilization problem. We propose a flexible and scalable architecture for array-based accelerator that increases resource utilization by resizing PEs to better match the different shapes of CNN layers. The low-cost partial reconfiguration improves resource utilization and performance, resulting in a 23.2% reduction in computational times of GoogLeNet compared to the state-of-the-art accelerators. The proposed architecture decreases the on-chip memory access rate by 26.5% with no accuracy loss.
image categorization is a fundamental task in computer vision, with applications in domains such as object recognition, medical imaging, and autonomous systems. Traditional approaches frequently fail to balance accura...
详细信息
Assessing the quality of pansharpened images is a critical issue in order to obtain a quantitative score to represent the quality and compare the performance of different fusion methods. Most of the introduced metrics...
详细信息
image-to-image translation is the process of transforming an image from one domain to another, where the goal is to learn the mapping between an input image and an output image. This task has been generally performed ...
详细信息
In the robot application system incorporating dexterous hand, a vision-based robot grasping system is proposed to address the lack of robustness of dexterous hand in grasping fixed attitude objects. First, a 6DOF robo...
详细信息
In this digital era, social media is one of the key platforms for collecting customer feedback and reflecting their views on various aspects, including products, services, brands, events, and other topics of interest....
详细信息
In this digital era, social media is one of the key platforms for collecting customer feedback and reflecting their views on various aspects, including products, services, brands, events, and other topics of interest. However, there is a rise of sarcastic memes on social media, which often convey contrary meaning to the implied sentiments and challenge traditional machine learning identification techniques. The memes, blending text and visuals on social media, are difficult to discern solely from the captions or images, as their humor often relies on subtle contextual cues requiring a nuanced understanding for accurate interpretation. Our study introduces Offensive images and Sarcastic Memes Detection to address this problem. Our model employs various techniques to identify sarcastic memes and offensive images. The model uses Optical Character Recognition (OCR) and bidirectional long-short term memory (Bi-LSTM) for sarcastic meme detection. For offensive image detection, the model employs Autoencoder LSTM, deep learning models such as Densenet and mobilenet, and computer vision techniques like Feature Fusion Process (FFP) based on Transfer Learning (TL) with image Augmentation. The study showcases the effectiveness of the proposed methods in achieving high accuracy in detecting offensive content across different modalities, such as text, memes, and images. Based on tests conducted on real-world datasets, our model has demonstrated an accuracy rate of 92% on the Hateful Memes Challenge dataset. The proposed methodology has also achieved a Testing Accuracy (TA) of 95.7% for Densenet with transfer learning on the NPDI dataset and 95.12% on the Pornography dataset. Moreover, implementing Transfer Learning with a Feature Fusion Process (FFP) has resulted in a TA of 99.45% for the NPDI dataset and 98.5% for the Pornography dataset.
The problem of producing a natural language description of an image for describing the visual content has gained more attention in natural language processing(NLP)and computer vision(CV).It can be driven by applicatio...
详细信息
The problem of producing a natural language description of an image for describing the visual content has gained more attention in natural language processing(NLP)and computer vision(CV).It can be driven by applications like image retrieval or indexing,virtual assistants,image understanding,and support of visually impaired people(VIP).Though the VIP uses other senses,touch and hearing,for recognizing objects and events,the quality of life of those persons is lower than the standard *** image captioning generates captions that will be read loudly to the VIP,thereby realizing matters happening around *** article introduces a Red Deer Optimization with Artificial Intelligence Enabled image Captioning System(RDOAI-ICS)for Visually Impaired *** presented RDOAI-ICS technique aids in generating image captions for *** presented RDOAIICS technique utilizes a neural architectural search network(NASNet)model to produce image ***,the RDOAI-ICS technique uses the radial basis function neural network(RBFNN)method to generate a textual *** enhance the performance of the RDOAI-ICS method,the parameter optimization process takes place using the RDO algorithm for NasNet and the butterfly optimization algorithm(BOA)for the RBFNN model,showing the novelty of the *** experimental evaluation of the RDOAI-ICS method can be tested using a benchmark *** outcomes show the enhancements of the RDOAI-ICS method over other recent image captioning approaches.
This review article about Few-Shot Learning techniques is focused on Computer visionapplications based on Deep Convolutional Neural Networks. A general discussion about Few-Shot Learning is given, featuring a context...
详细信息
ISBN:
(纸本)9783031133244;9783031133237
This review article about Few-Shot Learning techniques is focused on Computer visionapplications based on Deep Convolutional Neural Networks. A general discussion about Few-Shot Learning is given, featuring a context-constrained description, a short list of applications, a description of a couple of commonly used techniques and a discussion of the most used benchmarks for FSL computer visionapplications. In addition, the paper features a few examples of recent publications in which FSL techniques are used for training models in the context of Human Behaviour Analysis and Smart City Environment Safety. These examples give some insight about the performance of state-of-the-art FSL algorithms, what metrics do they achieve, and how many samples are needed for accomplishing that.
As everyone knows that in today's time Artificial Intelligence, machine Learning and Deep Learning are being used extensively and generally researchers are thinking of using them everywhere. At the same time, we a...
详细信息
As everyone knows that in today's time Artificial Intelligence, machine Learning and Deep Learning are being used extensively and generally researchers are thinking of using them everywhere. At the same time, we are also seeing that the second wave of corona has wreaked havoc in India. More than 4 lakh cases are coming in 24 h. In the meantime, news came that a new deadly fungus has come, which doctors have named Mucormycosis (Black fungus). This fungus also spread rapidly in many states, due to which states have declared this disease as an epidemic. It has become very important to find a cure for this life-threatening fungus by taking the help of our today's devices and technology such as artificial intelligence, data learning. It was found that the CT-Scan has much more adequate information and delivers greater evaluation validity than the chest X-Ray. After that the steps of imageprocessing such as pre-processing, segmentation, all these were surveyed in which it was found that accuracy score for the deep features retrieved from the ResNet50 model and SVM classifier using the Linear kernel function was 94.7%, which was the highest of all the findings. Also studied about Deep Belief Network (DBN) that how easy it can be to diagnose a life-threatening infection like fungus. Then a survey explained how computer vision helped in the corona era, in the same way it would help in epidemics like Mucormycosis.
Advances in multimodal machine learning help artificial intelligence to resemble human intellect more closely, which perceives the world from multiple modalities. We surveyed state-of-the-art research on the modalitie...
详细信息
Advances in multimodal machine learning help artificial intelligence to resemble human intellect more closely, which perceives the world from multiple modalities. We surveyed state-of-the-art research on the modalities of bidirectional machine learning translation of image and natural language processing (NLP), which address a considerable proportion of human life. Recently, with the advances in deep learning model architectures and learning methods in the fields of image and NLP, considerable progress has been made in multimodal machine learning translations that can be built by integrating image and NLP. Our goal is to explore and summarize state-of-the-art research on multimodal machine learning translation and present a taxonomy for the multimodal bidirectional machine learning translation of image and NLP. Furthermore, we reviewed the evaluation metrics and compared state-of-the-art approaches that influences this field. We believe that this survey will become a cornerstone of future research by discussing the challenges in multimodal machine learning translation and direction of future research based on understanding state-of-the-art research in the field.
暂无评论