Memes have become a fundamental part of online communication and humour, reflecting and shaping the culture of today’s digital age. The amplified Meme culture is inadvertently endorsing and propagating casual Misogyn...
详细信息
Memes have become a fundamental part of online communication and humour, reflecting and shaping the culture of today’s digital age. The amplified Meme culture is inadvertently endorsing and propagating casual Misogyny. This study proposes V-LTCS (vision- languagetransformer Combination Search), a framework that encompasses all possible combinations of the most fitting Text ( i.e. BERT, ALBERT, and XLM-R) and vision ( i.e. Swin, ConvNeXt, and ViT) transformermodels to determine the backbone architecture for identifying Memes that contains misogynistic contents. All feasible vision-language transformer model combinations obtained from the recognized optimal Text and visiontransformermodels are evaluated on two (smaller and larger) datasets using varied standard metrics ( viz. Accuracy, Precision, Recall, and F1-Score). The BERT-ViT combinational transformermodel demonstrated its efficiency on both datasets, validating its ability to serve as a backbone architecture for all subsequent efforts to recognize Multimodal Misogynous Memes.
暂无评论