This study examines the necessity of employing BERT2GPT for single-document summarization in the current age of escalating digital data. The primary focus of this work is on the abstractive technique, which tries to g...
详细信息
The ability to differentiate various products in the retail store plays an essential role to provide effectiveness to customers and reduce or even eliminate long queues. However, traditional machine learning algorithm...
详细信息
The purpose of this study is to find out and analyze what has been done by previous studies in knowing the problems faced by SMEs today and how the influence of Industry 4.0 technology in dealing with problems in Smal...
详细信息
In real life, many activities are performed sequentially. These activities must be carried out sequentially, such as the assembly process in the manufacturing production process. This series of activities cannot be re...
详细信息
This study investigates the performance of Vision Transformer (ViT) variants—the Shifted Window Transformers (SWIN), Distillation with No Labels (DINO), and Data-efficient Image Transformers (DeIT)—in image captioni...
详细信息
ISBN:
(数字)9798331506490
ISBN:
(纸本)9798331506506
This study investigates the performance of Vision Transformer (ViT) variants—the Shifted Window Transformers (SWIN), Distillation with No Labels (DINO), and Data-efficient Image Transformers (DeIT)—in image captioning tasks using the Flickr8K dataset. While ViT architectures have shown promise in image classification, their effectiveness for image captioning, particularly with smaller datasets, remains unclear. The models' performance was evaluated using BLEU metrics, while training efficiency was analyzed through Pareto front analysis of computational time and accuracy. Among the tested variants, SWIN Transformers demonstrated superior performance (BLEU-1: 64.4, BLEU-2: 33.9, BLEU-3: 17.1, BLEU-4: 8.4), followed by DINO (BLEU-1: 63.1, BLEU-2: 32.7, BLEU-3: 16.4, BLEU-4: 7.5), while DeIT showed the weakest performance (BLEU-1: 61.6, BLEU-2: 31.1, BLEU-3: 14.7, BLEU-4: 6.5). SWIN Transformers achieved the shortest training time at 3 minutes 31 seconds per epoch, making it the most efficient model among ViT variants based on Pareto front analysis. While ViT variants achieved competitive BLEU-1 scores comparable to previous top models, they struggled with generating coherent, longer sentences, as evidenced by suboptimal BLEU-4 scores. These findings provide empirical evidence of how the lack of inductive bias in transformer architectures affects their ability to capture complex scene relationships, despite their strong feature detection capabilities, contributing to the understanding of transformer models' limitations in vision-language tasks, especially with limited data.
Students 'attendance in class is one important success parameter in face-to-face learning processes. Conventional attendance systems, such as paper-based attendance sheets or identity card systems, require a long ...
详细信息
In hybrid cloud-edge systems, data processing is distributed between cloud servers and edge servers, this makes data serialization critical for efficient data transfer between servers. This study evaluates the efficie...
详细信息
Several studies suggest that sleep quality is associated with physical activities. Moreover, deep sleep time can be used to determine the sleep quality of an individual. In this work, we aim to find the association be...
详细信息
Fatigue in workplace is a common thing shared by all employees. Continuous exposure of fatigue could lead to negative productivity for companies. Current research on fatigue detection mostly focused to detect fatigues...
详细信息
Social media has evolved into a vast and multifaceted data repository, presenting valuable opportunities to investigate gender-based variations in writing styles. This study aims to enhance the precision of gender cla...
详细信息
暂无评论