Processing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous depl...
详细信息
Processing large amounts of natural language text using machine learning-based models is becoming important in many disciplines. This demand is being met by a variety of approaches, resulting in the heterogeneous deployment of separate, partly incompatible, not natively scalable applications. To overcome the technological bottleneck involved, we have developed DOCKER UNIFIED UIMA INTERFACE, a system for the standardized, parallel, platform-independent, distributed and microservices-based solution for processing large and extensive text corpora with any nlp method. We present DUUI as a framework that enables automated orchestration of GPU-based nlp processes beyond the existing Docker Swarm cluster variant, and in addition to the adaptation to new runtime environments such as Kubernetes. Therefore, anew driver for DUUI is introduced, which enables the lightweight orchestration of DUUI processes within a Kubernetes environment in a scalable setup. In this way, the paper opens up novel text-technological perspectives for existing practices indisciplines that deal with the scientific analysis of large amounts of data based on nlp.
Computational storage drives (CSD) are solid-state drives (SSD) empowered by general-purpose processors that can perform in-storage processing. They have the potential to improve both performance and energy significan...
详细信息
ISBN:
(纸本)9781665494663
Computational storage drives (CSD) are solid-state drives (SSD) empowered by general-purpose processors that can perform in-storage processing. They have the potential to improve both performance and energy significantly for big-data analytics by bringing compute to data, thereby eliminating costly data transfer while offering better privacy. In this work, we introduce Solana, the first-ever high-capacity(12-TB) CSD in El.S form factor, and present an actual prototype for evaluation. To demonstrate the benefits of in-storage processing on CSD, we deploy several natural language processing (nlp) applications on datacenter-grade storage servers comprised of clusters of the Solana. Experimental results show up to 3.1x speedup in processing while reducing the energy consumption and data transfer by 67% and 68%, respectively, compared to regular enterprise SSDs.
暂无评论