People who have trouble communicating verbally are often dependent on sign language,which can be difficult for most people to understand,making interaction with them a difficult *** Sign Language Recognition(SLR)syste...
详细信息
People who have trouble communicating verbally are often dependent on sign language,which can be difficult for most people to understand,making interaction with them a difficult *** Sign Language Recognition(SLR)system takes an input expression from a hearing or speaking-impaired person and outputs it in the form of text or voice to a normal *** existing study related to the Sign Language Recognition system has some drawbacks,such as a lack of large datasets and datasets with a range of backgrounds,skin tones,and *** research efficiently focuses on Sign Language Recognition to overcome previous *** importantly,we use our proposed Convolutional Neural Network(CNN)model,“ConvNeural”,in order to train our ***,we develop our own datasets,“BdSL_OPSA22_STATIC1”and“BdSL_OPSA22_STATIC2”,both of which have ambiguous backgrounds.“BdSL_OPSA22_STATIC1”and“BdSL_OPSA22_STATIC2”both include images of Bangla characters and numerals,a total of 24,615 and 8437 images,***“ConvNeural”model outperforms the pre-trained models with accuracy of 98.38%for“BdSL_OPSA22_STATIC1”and 92.78%for“BdSL_OPSA22_STATIC2”.For“BdSL_OPSA22_STATIC1”dataset,we get precision,recall,F1-score,sensitivity and specificity of 96%,95%,95%,99.31%,and 95.78%***,in case of“BdSL_OPSA22_STATIC2”dataset,we achieve precision,recall,F1-score,sensitivity and specificity of 90%,88%,88%,100%,and 100%respectively.
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model(MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introdu...
详细信息
In this paper, we introduce InternVL 1.5, an open-source multimodal large language model(MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements.(1) Strong vision encoder: we explored a continuous learning strategy for the large-scale vision foundation model — InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs.(2) Dynamic high-resolution: we divide images into tiles ranging from 1 to 40 of 448×448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input.(3) High-quality bilingual dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images,and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in optical character recognition(OCR) and Chinese-related tasks. We evaluate InternVL 1.5 through a series of benchmarks and comparative studies. Compared to both open-source and proprietary commercial models, InternVL 1.5 shows competitive performance, achieving state-of-the-art results in 8 of 18 multimodal benchmarks. Code and models are available at https://***/OpenGVLab/InternVL.
Aspect-based sentiment analysis (ABSA) is a natural language processing (NLP) technique to determine the various sentiments of a customer in a single comment regarding different aspects. The increasing online data con...
详细信息
Named in-network computing service (NICS) is a potential computing paradigm emerged recently. Benefitted from the characteristics of named addressing and routing, NICS can be flexibly deployed on NDN router side and p...
详细信息
This systematic review gave special attention to diabetes and the advancements in food and nutrition needed to prevent or manage diabetes in all its forms. There are two main forms of diabetes mellitus: Type 1 (T1D) a...
详细信息
This article introduces a novel approach to bolster the robustness of Deep Neural Network (DNN) models against adversarial attacks named "Targeted Adversarial Resilience Learning (TARL)". The initial ev...
详细信息
Brain tumors are ranked highly among the leading causes of cancer-related fatalities. Precise segmentation and quantitative assessment of brain tumors are crucial for effective diagnosis and treatment planning. Howeve...
详细信息
The safeguarding of critical data stored on devices such as phones, computers, and tablets against unauthorized access has emerged as a central concern in modern society. Along with the increasing reliance on these de...
详细信息
In this paper,a robust and consistent COVID-19 emergency decision-making approach is proposed based on q-rung linear diophantine fuzzy set(q-RLDFS),differential evolutionary(DE)optimization principles,and evidential r...
详细信息
In this paper,a robust and consistent COVID-19 emergency decision-making approach is proposed based on q-rung linear diophantine fuzzy set(q-RLDFS),differential evolutionary(DE)optimization principles,and evidential reasoning(ER)*** proposed approach uses q-RLDFS in order to represent the evaluating values of the alternatives corresponding to the *** optimization is used to obtain the optimal weights of the attributes,and ER methodology is used to compute the aggregated q-rung linear diophantine fuzzy values(q-RLDFVs)of each *** the score values of alternatives are computed based on the aggregated *** alternative with the maximum score value is selected as a better *** applicability of the proposed approach has been illustrated in COVID-19 emergency decision-making system and sustainable energy planning ***,we have validated the proposed approach with a numerical ***,a comparative study is provided with the existing models,where the proposed approach is found to be robust to perform better and consistent in uncertain environments.
We present a novel framework for the multidomain synthesis of artworks from semantic *** of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art *** address thi...
详细信息
We present a novel framework for the multidomain synthesis of artworks from semantic *** of the main limitations of this challenging task is the lack of publicly available segmentation datasets for art *** address this problem,we propose a dataset called ArtSem that contains 40,000 images of artwork from four different domains,with their corresponding semantic label *** first extracted semantic maps from landscape photography and used a conditional generative adversarial network(GAN)-based approach for generating high-quality artwork from semantic maps without requiring paired training ***,we propose an artwork-synthesis model using domain-dependent variational encoders for high-quality multi-domain ***,the model was improved and complemented with a simple but effective normalization method based on jointly normalizing semantics and style,which we call spatially style-adaptive normalization(SSTAN).Compared to the previous methods,which only take semantic layout as the input,our model jointly learns style and semantic information representation,improving the generation quality of artistic *** results indicate that our model learned to separate the domains in the latent ***,we can perform fine-grained control of the synthesized artwork by identifying hyperplanes that separate the different ***,by combining the proposed dataset and approach,we generated user-controllable artworks of higher quality than that of existing approaches,as corroborated by quantitative metrics and a user study.
暂无评论