咨询与建议

看过本文的还看了

相关文献

该作者的其他文献

文献详情 >Structure and design of multim... 收藏

Structure and design of multimodal dataset for automatic regex synthesis methods in Roman Urdu

作     者:Tariq, Sadia Rana, Toqir A. 

作者机构:Univ Lahore Dept CS & IT Lahore Pakistan 

出 版 物:《INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS》 (Int. J. Data Sci. Anal.)

年 卷 期:2024年

页      面:1-15页

核心收录:

学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)] 

主  题:Multimodal dataset Auto regex synthesis Dataset for Roman Urdu auto regex synthesis Roman Urdu dataset Query to regex Strings to regex 

摘      要:Automatic regex synthesis involves generation of regular expressions from user-written natural language descriptions, example strings or both. Daily, countless regex generation queries are posted on online Q&A platforms such as StackOverflow (https://***) and Quora (https://***). Existing automatic regex synthesis methods demand concretely designed, multimodal datasets for optimal performance. Unfortunately, publicly available datasets even for resource-rich languages like English are often model-specific and incomplete, potentially hindering the efficiency and accurateness of regex synthesis methods. This issue is worsened for resource-poor languages such as Standard Urdu and Roman Urdu. In this paper, we present a novel, benchmark Roman Urdu dataset with 900 words and a novel Roman Urdu lexicon of 4225 words, annotated and labeled, to address the unmet needs of regex synthesis methods. Equipping these methods with a proficient dataset can lead to more fruitful regex generation.

读者评论 与其他读者分享你的观点

用户名:未登录
我的评分