Existing works have shown that fine-tuned textual transformer models achieve state-of-the-art prediction performances but are also vulnerable to adversarial text perturbations. Traditional adversarial evaluation is of...
详细信息
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model(MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introdu...
详细信息
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model(MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements.(1) Strong vision encoder: we explored a continuous learning strategy for the large-scale vision foundation model — InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs.(2) Dynamic high-resolution: we divide images into tiles ranging from 1 to 40 of 448×448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input.(3) High-quality bilingual dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images,and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in optical character recognition(OCR) and Chinese-related tasks. We evaluate InternVL 1.5 through a series of benchmarks and comparative studies. Compared to both open-source and proprietary commercial models, InternVL 1.5 shows competitive performance, achieving state-of-the-art results in 8 of 18 multimodal benchmarks. Code and models are available at https://***/OpenGVLab/InternVL.
Facial Expression Recognition (FER) aims to detect the emotional state of facial images. It is playing an increasingly important role in several application areas, including human–computer interaction (HCI), video tr...
详细信息
The automated classification of immune cells plays a vital role in advancing immunological research, diagnostics, and therapeutic monitoring. This paper leverages machine learning and image processing techniques to ac...
详细信息
Early detection of Alzheimer's disease (AD) is crucial for timely intervention and slowing its progression. This research leverages neuroimaging-based machine learning to classify cognitive impairment levels using...
详细信息
One of the scariest illnesses that causes irreversible blindness is Diabetic Retinopathy (DR). As a result, early exposure to Diabetic Retinopathy can help to preserve vision. The study proposes a hybrid model to clas...
详细信息
This study explores the transformative potential of image classification algorithms like VGG16, ResNet, and DenseNet, for the early detection of pancreatic tumors using medical imaging. One of the main causes of cance...
详细信息
This paper considers the security of non-minimum phase systems, a typical kind of cyber-physical systems. Non-minimum phase systems are characterized by unstable zeros in their transfer functions, making them particul...
详细信息
Accidents on the road continue to pose a significant threat to life and safety, necessitating innovative solutions to improve emergency response and minimize injuries. The proposed approach introduces an IoT-based Acc...
详细信息
Disaster-resilient dams require accurate crack detection,but machine learning methods cannot capture dam structural reaction temporal patterns and *** research uses deep learning,convolutional neural networks,and tran...
详细信息
Disaster-resilient dams require accurate crack detection,but machine learning methods cannot capture dam structural reaction temporal patterns and *** research uses deep learning,convolutional neural networks,and transfer learning to improve dam crack *** deep-learning models are trained on 192 crack *** research aims to provide up-to-date detecting techniques to solve dam crack *** finding shows that the EfficientNetB0 model performed better than others in classifying borehole concrete crack surface tiles and normal(undamaged)surface tiles with 91%*** study’s pre-trained designs help to identify and to determine the specific locations of cracks.
暂无评论