This paper introduces DREIFLUSS, an innovative, minimalist approach designed to tackle the Column Type Annotation (CTA) and Column Property Annotation (CPA) tasks in the SemTab challenge. DREIFLUSS efficiently employs...
详细信息
Domain-specific complex nature of patent text, unique drafting styles of patent applicants, and mammoth volume of patent data makes classification a challenging task. To become a helping hand, in the recent time, Goog...
详细信息
ISBN:
(纸本)9781450399067
Domain-specific complex nature of patent text, unique drafting styles of patent applicants, and mammoth volume of patent data makes classification a challenging task. To become a helping hand, in the recent time, Google has released pre-trained BERT model trained over 100 million patent documents. However, to the best of our knowledge, there has not been any testament about prediction capabilities and performance of the BERT-for-Patents model over any patent tasks on standard benchmarks. Our work addresses this problem, investigates BERT-for-patents in multi-label patent classification at both CPC and IPC sub-class level. Evidence from experiments enables us to claim that, this work outperformed SOTA by an absolute 2% on micro-F1 with a newly proposed USPTO 2.8M dataset. In order to introduce robustness to the classification process, our collaborative Machine Learning models including NB and SVM uplifted the micro-F1 measures to 70%. This work stands as a corroboration to promote development of patent-specific language models and also claims, robustness in patent analysis tasks can be achieved by not forgetting plain old Machine Learning models. The contributions of this work including code, models, and a novel dataset of the size 2.8M with patent claims are released to the public1, in order to nurture the patent community in developing AI solutions.
This paper focuses on the analysis of socio-spatial data, i. e., user-performance relations at a distributed event. We consider the data as a bimodal network (i. e., model it as a bipartite graph), and investigate its...
详细信息
ISBN:
(纸本)9781509028474
This paper focuses on the analysis of socio-spatial data, i. e., user-performance relations at a distributed event. We consider the data as a bimodal network (i. e., model it as a bipartite graph), and investigate its structural characteristics towards a social network. We focus on plans of the participants (expressed by preferences) and their fulfilment, and propose measures for matching preference and reality. We specifically analyse behavioural patterns w.r.t. distinct user and performance groups. We utilise real-world data collected at the Lange Nacht der Musik (Long Night of Music) 2013 in Munich.
The talk discusses briefly current challenges in artificial intelligence (AI), including: efficient learning of data (interactive, adaptive, life-long;transfer);interpretability and explainability;personalised predict...
详细信息
Small to medium-scale data science experiments often rely on research software developed ad-hoc by individual scientists or small teams. Often there is no time to make the research software fast, reusable, and open ac...
详细信息
We revisit the notion of probably approximately correct implication bases from the literature and present a first formulation in the language of formal concept analysis, with the goal to investigate whether such bases...
详细信息
In the emerging information economy, data evolves as an essential asset and personal data in particular is used for data-driven business models. However, companies frequently leverage personal data without considering...
详细信息
The talk discusses briefly current challenges in artificial intelligence (AI), including: efficient learning of data (interactive, adaptive, life-long; transfer); interpretability and explainability; personalised pred...
The talk discusses briefly current challenges in artificial intelligence (AI), including: efficient learning of data (interactive, adaptive, life-long; transfer); interpretability and explainability; personalised predictive modelling and profiling; multiple modality of data (e.g. genetic, clinical, behaviour, cognitive, static, temporal, longitudinal); computational complexity; energy consumption; human-machine interaction.
暂无评论