Information Extraction (IE) processes are often business-critical, but very hard to automate due to a heterogeneous data basis. Specific document characteristics, also called features, influence the optimal way of pro...
详细信息
ISBN:
(纸本)9789897585883
Information Extraction (IE) processes are often business-critical, but very hard to automate due to a heterogeneous data basis. Specific document characteristics, also called features, influence the optimal way of processing. Architecture for Automated Generation of Distributed Information Extraction Pipelines (ARTIFACT) supports businesses in successively automating their IE processes by finding optimal IE pipelines. However, ARTIFACT treats each document the same way, and does not enable document-specific processing. Single solution strategies can perform extraordinarily well for documents with particular traits. While manual approvals are superfluous for these documents, ARTIFACT does not provide the opportunity for Fully Automatic Processing (FAP). Therefore, we introduce an enhanced pattern that integrates an extensible and domain-independent concept of feature detection based on microservices. Due to this, we create two fundamental benefits. First, the document-specific processing increases the quality of automated generated IE pipelines. Second, the system enables FAP to eliminate superfluous approval efforts.
暂无评论