Predicting crop disease on the image obtained from the affected crop has been a potential research topic. In this research, the Localise Search Optimisation Algorithm (LSOA) enabled deep Convolutional Neural Network (...
详细信息
Even though every individual is entitled to freedom of speech, some limitations exist when this freedom is used to target and harm another individual or a group of people, as it translates to hate speech. In this stud...
详细信息
Video forgery detection has been necessary with recent spurt in fake videos like Deepfakes and doctored videos from multiple video capturing devices. In this paper, we provide a novel technique of detecting fake video...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In thi...
详细信息
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive evaluation of large multimodal models, such as GPT4V and Gemini, in various text-related visual tasks including text recognition, scene text-centric visual question answering(VQA), document-oriented VQA, key information extraction(KIE), and handwritten mathematical expression recognition(HMER). To facilitate the assessment of optical character recognition(OCR) capabilities in large multimodal models, we propose OCRBench, a comprehensive evaluation benchmark. OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available. Furthermore, our study reveals both the strengths and weaknesses of these models, particularly in handling multilingual text, handwritten text, non-semantic text, and mathematical expression *** importantly, the baseline results presented in this study could provide a foundational framework for the conception and assessment of innovative strategies targeted at enhancing zero-shot multimodal *** evaluation pipeline and benchmark are available at https://***/Yuliang-Liu/Multimodal OCR.
Ethereum has received increasing attention as the first blockchain platform to support smart *** mining has become an important tool for analyzing Ethereum ***,existing methods have the disadvantage of covering partia...
详细信息
Ethereum has received increasing attention as the first blockchain platform to support smart *** mining has become an important tool for analyzing Ethereum ***,existing methods have the disadvantage of covering partial transactions and being vulnerable to privacy-enhancing *** this paper,we propose a scheme for transaction correlation with the node as an entity,which can cover all transactions while being resistant to privacy-enhancing *** timestamps relayed from N fixed nodes to describe the network properties of transactions,we cluster transactions that enter the network from the same source *** results show that our method can determine with 97%precision whether two transactions enter the network from the same source node.
ASR is an effectual approach, which converts human speech into computer actions or text format. It involves extracting and determining the noise feature, the audio model, and the language model. The extraction and det...
详细信息
Free speech is essential, but it can conflict with protecting marginalized groups from harm caused by hate speech. Social media platforms have become breeding grounds for this harmful content. While studies exist to d...
详细信息
Free speech is essential, but it can conflict with protecting marginalized groups from harm caused by hate speech. Social media platforms have become breeding grounds for this harmful content. While studies exist to detect hate speech, there are significant research gaps. First, most studies used text data instead of other modalities such as videos or audio. Second, most studies explored traditional machine learning algorithms. However, due to the increase in complexities of computational tasks, there is need to employ complex techniques and methodologies. Third, majority of the research studies have either been evaluated using very few evaluation metrics or not statistically evaluated at all. Lastly, due to the opaque, black-box nature of the complex classifiers, there is need to use explainability techniques. This research aims to address these gaps by detecting hate speech in English and Kiswahili languages using videos manually collected from YouTube. The videos were converted to text and used to train various classifiers. The performance of these classifiers was evaluated using various evaluation and statistical measurements. The experimental results suggest that the random forest classifier achieved the highest results for both languages across all evaluation measurements compared to all classifiers used. The results for English language were: accuracy 98%, AUC 96%, precision 99%, recall 97%, F1 98%, specificity 98% and MCC 96% while the results for Kiswahili language were: accuracy 90%, AUC 94%, precision 93%, recall 92%, F1 94%, specificity 87% and MCC 75%. These results suggest that the random forest classifier is robust, effective and efficient in detecting hate speech in any language. This also implies that the classifier is reliable in detecting hate speech and other related problems in social media. However, to understand the classifiers’ decision-making process, we used the Local Interpretable Model-agnostic Explanations (LIME) technique to explain the
Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential...
详细信息
Inverse Reinforcement Learning (IRL) and Reinforcement Learning from Human Feedback (RLHF) are pivotal methodologies in reward learning, which involve inferring and shaping the underlying reward function of sequential decision-making problems based on observed human demonstrations and feedback. Most prior work in reward learning has relied on prior knowledge or assumptions about decision or preference models, potentially leading to robustness issues. In response, this paper introduces a novel linear programming (LP) framework tailored for offline reward learning. Utilizing pre-collected trajectories without online exploration, this framework estimates a feasible reward set from the primal-dual optimality conditions of a suitably designed LP, and offers an optimality guarantee with provable sample efficiency. Our LP framework also enables aligning the reward functions with human feedback, such as pairwise trajectory comparison data, while maintaining computational tractability and sample efficiency. We demonstrate that our framework potentially achieves better performance compared to the conventional maximum likelihood estimation (MLE) approach through analytical examples and numerical experiments. Copyright 2024 by the author(s)
Second language learners often experience language anxiety when speaking with others in their target language. As the generative capabilities of Large Language Models (LLMs) continue to improve, we investigate the pos...
详细信息
The Quantum Internet of Things (QIoT) in the healthcare industry holds the promise of transforming patient care, diagnostics, and medical research. Quantum-enhanced sensors, communication, and computation offer unprec...
详细信息
The Quantum Internet of Things (QIoT) in the healthcare industry holds the promise of transforming patient care, diagnostics, and medical research. Quantum-enhanced sensors, communication, and computation offer unprecedented capabilities that can revolutionize how healthcare services are delivered and experienced. This paper explores the potential of QIoT in the context of smart healthcare, where interconnected quantum-enabled devices and systems create an ecosystem that enhances data security, enables real-time monitoring, and advances medical knowledge. We delve into the applications of quantum sensors in precise health monitoring, the role of quantum communication in secure telemedicine, and the computational power of quantum computing in drug discovery and personalized medicine. We discuss challenges such as technical feasibility, scalability, and regulatory considerations, along with the emerging trends and opportunities in this transformative field. By examining the intersection of quantum technologies and smart healthcare, this paper aims to shed light on the novel approaches and breakthroughs that could redefine the future of healthcare delivery and patient outcomes. IEEE
暂无评论