With the vigorous development of the Internet advertising industry, the increasing number of illegal advertisements has brought misleading information and risks to consumers, while also posing challenges to market ord...
详细信息
Sentiment analysis, the process of gauging user attitudes and emotions through their textual data, including social media posts and other forms of communication, is a valuable tool for informed decision-making. In oth...
详细信息
Achieving carbon neutrality by 2050 requires unprecedented technological, economic, and sociological changes. With time as a scarce resource, it is crucial to base decisions on relevant facts and information to avoid ...
详细信息
Achieving carbon neutrality by 2050 requires unprecedented technological, economic, and sociological changes. With time as a scarce resource, it is crucial to base decisions on relevant facts and information to avoid misdirection. This study aims to help decision makers quickly find relevant information related to companies and organizations in the renewable energy sector. Over the course of this PhD program, we will propose several text-mining methods applied to the renewable energy sector in order to detect technological breakthroughs and new, innovative companies. These techniques include specialized Named Entity Recognition (NER) models, news summarization, and trend analysis of scientific articles. Further steps in this project will contain a TRIZ-based analysis of scientific articles in order to attribute a multi-factor score on the innovative potential of novel technologies.
Open Information Extraction (OIE) aims at extracting the relational triplets from open-domain texts. Existing methods, unfortunately, mostly fall prey to the complex OIE setting, due to the failure to extract unseen w...
详细信息
With the rapid development of China’s economy and the advancement of urbanization, China’s per capita automobile ownership rate is also increasing year by year. But with the increasing number of cars, the problem of...
详细信息
With the increasing maturity of deep learning technology, large language models have shown excellent performance in the field of naturallanguageprocessing, but the performance of information extraction is yet to be ...
详细信息
ISBN:
(纸本)9789819600540;9789819600557
With the increasing maturity of deep learning technology, large language models have shown excellent performance in the field of naturallanguageprocessing, but the performance of information extraction is yet to be further improved. In this paper, we use 7B Llama-2 as a base model training to obtain a large language model capable of using naturallanguage to guide information extraction tasks, which can solve the challenges faced by traditional information extraction methods, and also innovatively use multi-task learning optimization to improve the performance of the large model. We performed large-scale pretraining and instruction tuning on the big model LLama-2. Based on the highquality and richly typed training data automatically constructed by ChatGPT, remote supervision, and other algorithms, which contains a total of 1 million entities, relations, and events, we designed corresponding English templates for instruction tuning. Second, we performed supervised fine-tuning of the model using manually labeled high-quality training sets. To further improve the model performance, we innovatively adopt a multi-task learning optimization strategyGradNorm, which can dynamically adjust the weights of different tasks, thus balancing the losses among tasks during the training process and reducing the overall training loss. After information extraction experiments, our model is compared with other models to test the performance of uniform information extraction for large models, and the experimental results show that our model performs well.
In recent years, large language models (LLMs) have shown exceptional capabilities across various naturallanguageprocessing (NLP) tasks. However, such impressive performance often comes at the cost of an increased pa...
详细信息
Emotion classification in social media texts has several challenges, such as the characteristics of social media texts that tend to use informal language, unbalanced data distribution, overlapping vocabulary between e...
详细信息
Code-mixing is a common phenomenon in multilingual communities, where speakers use more than one language within a single conversation. A transliteration framework is necessary to accommodate individuals who feel more...
详细信息
The Disaster Response Headquarters receives various requests and reports from affiliated organizations, such as hospitals. Staffs at the headquarters need to prioritize incoming information to effectively allocate lim...
详细信息
暂无评论