作者:
Patel, PrachiBhushanwar, KushPatel, HemlataPiet
Parul University Computer Engineering Department Gujarat Vadodara India Piet
Parul University Computer Science & Engineering Department Gujarat Vadodara India
Social media websites have provided rich contextual data but its misuse for criminal activities creates lot of problems. This paper seeks to solve the problem of criminal behavioral analysis on social media platforms ...
详细信息
Large language models (LLMs) like ChatGPT and Gemini have significantly advanced naturallanguageprocessing, enabling various applications such as chatbots and automated content generation. However, these models can ...
详细信息
ISBN:
(数字)9798400712487
ISBN:
(纸本)9798400712487
Large language models (LLMs) like ChatGPT and Gemini have significantly advanced naturallanguageprocessing, enabling various applications such as chatbots and automated content generation. However, these models can be exploited by malicious individuals who craft toxic prompts to elicit harmful or unethical responses. These individuals often employ jailbreaking techniques to bypass safety mechanisms, highlighting the need for robust toxic prompt detection methods. Existing detection techniques, both blackbox and whitebox, face challenges related to the diversity of toxic prompts, scalability, and computational efficiency. In response, we propose TOXICDETECTOR, a lightweight greybox method designed to efficiently detect toxic prompts in LLMs. TOXICDETECTOR leverages LLMs to create toxic concept prompts, uses embedding vectors to form feature vectors, and employs a Multi-Layer Perceptron (MLP) classifier for prompt classification. Our evaluation on various versions of the LLama models, Gemma-2, and multiple datasets demonstrates that TOXICDETECTOR achieves a high accuracy of 96.39% and a low false positive rate of 2.00%, outperforming state-of-the-art methods. Additionally, ToxicDetector's processing time of 0.0780 seconds per prompt makes it highly suitable for real-time applications. TOXICDETECTOR achieves high accuracy, efficiency, and scalability, making it a practical method for toxic prompt detection in LLMs.
This research paper presents an Exploratory Data Analysis (EDA) technique, Artificial Intelligence (AI), and naturallanguageprocessing (NLP) based approach for the analysis of textual content, and opinion analysis. ...
详细信息
The Artificial Intelligence (AI) and naturallanguageprocessing (NLP) software or application helps to interact with people in a human-like fashion by delivering information, answering questions, completing tasks, an...
详细信息
With the rapid development of information technology, intelligent question answering system (QAS) has become an important tool for users to obtain information. This paper aims to design an intelligent QAS which integr...
详细信息
This special issue features the selected works of authors who have presented papers at the 2022 iteration of the Joint conference on Digital Libraries (JCDL) in Cologne, Germany. The motto of the conference was "...
详细信息
This special issue features the selected works of authors who have presented papers at the 2022 iteration of the Joint conference on Digital Libraries (JCDL) in Cologne, Germany. The motto of the conference was "Bridging Worlds" and was run as a fully hybrid event. Ten papers covering all aspects of Digital Libraries, namely naturallanguageprocessing, Information Retrieval, User Behavior, Scholarly Communication, Classification, Information Extraction are included in this issue.
Pre-trained language models (PLMs) have established the new paradigm in the field of NLP. For more powerful PLMs, one of the most popular and successful ways is to continuously scale up sizes of the models and the pre...
详细信息
ISBN:
(纸本)9798400704369
Pre-trained language models (PLMs) have established the new paradigm in the field of NLP. For more powerful PLMs, one of the most popular and successful ways is to continuously scale up sizes of the models and the pre-training corpora. These large corpora, typically obtained by converging smaller ones from multiple sources, are thus growing increasingly diverse. However, colossal converged corpora don't always enhance PLMs' performance. In this paper, we identify the disadvantage of heterogeneous corpora from multiple sources for pre-training PLMs. Towards coordinated pre-training on diverse corpora, we further propose Source Prompt (SP), which explicitly prompt the model with the source of data at the pre-training and fine-tuning stages. Extensive experimental results show that pre-training PLMs with SP on diverse corpora significantly improves performance in various downstream tasks.
This paper presents two significant contributions: First, it introduces a novel dataset of 19th-century Latin American newspaper texts, addressing a critical gap in specialized corpora for historical and linguistic an...
详细信息
This research presents a novel method of automatic video summarising and note-generating using naturallanguageprocessing (NLP) and audio recognition techniques. The exponential rise in internet video footage has inc...
详细信息
Communication is a critical aspect of every individual's interaction, and individuals typically exchange information in a variety of languages. However, individuals with hearing and speech impairments may encounte...
详细信息
暂无评论