检索结果-内蒙古大学图书馆

IEEE 6th International Congress on Big data (Bigdata Congress)

作者： Orhean, Alexandru Iulian Ijagbone, Itua Raicu, Ioan Chard, Kyle Zhao, Dongfang IIT Dept Comp Sci Chicago IL 60616 USA Univ Chicago Computat Inst Chicago IL 60637 USA Univ Nevada CSE Dept Reno NV 89557 USA

ISBN: (纸本)9781538619964

The ubiquity of Big data has greatly influenced the direction and the development of storage technologies. To meet the needs of storing and analyzing Big data, researchers and administrators have turned to parallel and distributed storage and compute architectures. While the problems of securely and consistently storing and accessing data in large parallel and distributed file systems have been largely addressed in both the research and production systems, efficiently searching across large unstructured data and metadata has largely been overlooked. According to the International data Corporation, more than 90% of data found in the digital universe is unstructured, emphasizing the importance of developing efficient solutions for querying distributed data. This paper proposes a novel indexing solution, called FusionDex, that provides an efficient model for querying across distributed file systems. FusionDex leverages state-of-the-art, open-source indexing modules as its building blocks to deliver an integrated system for enabling efficient user-specified queries over distributed and unstructured data. FusionDex has been evaluated on a cluster of 64 nodes, and results show that it outperforms existing tools (in some cases by several orders of magnitude), such as Hadoop Grep and Cloudera search.

关键词： indexing methods distributed file systems unstructured data search

来源：评论

学校读者我要写书评

暂无评论

An Industrial Approach to Using Artificial Intelligence and Natural Language Processing for Accelerated Document Preparation in Drug Development

引用

JOURNAL OF PHARMACEUTICAL INNOVATION 2021年第2期16卷 302-316页

作者： Viswanath, Shekhar Fennell, Jared W. Balar, Kalpesh Krishna, Praful Eli Lilly & Co Lilly Corp Ctr Indianapolis IN 46285 USA Arbot Solut Inc Dba Coseer 301 Mission St Suite 9F San Francisco CA 94105 USA

Purpose Due to the exceptionally high standards for accuracy and data integrity in scientific regulatory reporting, it is vital that any tool that aims to streamline this process is as efficient or more in gathering data as a team of scientists, without higher cost in terms of time or resources. For this reason, an artificial intelligence-based tool with parallel search, document creation, and data integrity review capabilities is being investigated as a potential solution. This paper describes a proof of concept project to develop an AI-based tool to rapidly assemble an end-of-phase 2 (EOP2) briefing document for a potential medicine. We have called the tool an Intelligent Machine for Document Preparation or IMDP. Methods A training corpus of approximately 65,000 pdf documents derived from electronic lab notebooks and technical reports related to five molecules (including Merestinib) was ingested, and prior EOP2 documents from the remaining four molecules was used to generate training questions and answers. Then, an annotation-light natural language processing algorithm analyzed a set of structured and unstructured data regarding Merestinib. A simple user interface was created allowing scientists to query the system in natural language, and a table builder, image/plot finder, and free-text addition features were added to allow for advanced search without dependence on keywords. Results Three significant innovations were designed-in to improve overall performance as compared to our benchmark solution without sacrificing usability. First, the AI-based IMDP was built to improve accuracy and accelerate document creation with remarkably low amount of training. Second, image search capability was added to enrich the knowledge base, and third, the IMDP was integrated with the existing process rather than adding a step in the workflow. Finally, accuracy and total document creation time were compared with the existing tool (benchmark tool). Our experiments show that the AI-

关键词： Artificial intelligence Natural language processing Pharmaceutical development unstructured data search Image search Documentation preparation Image analysis

来源：评论

学校读者我要写书评

暂无评论

ICE: An Intelligent Cognition Engine with 3D NAND-based In-Memory Computing for Vector Similarity search Acceleration 55

ICE: An Intelligent Cognition Engine with 3D NAND-based In-M...

引用

55th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)

作者： Hu, Han-Wen Wang, Wei-Chen Chang, Yuan-Hao Lee, Yung-Chun Lin, Bo-Rong Wang, Huai -Mu Lin, Yen-Po Huang, Yu -Ming Lee, Chong-Ying Su, Tzu-Hsiang Hsieh, Chih-Chang Hu, Chia -Ming Lai, Yi-Ting Chen, Chung-Kuang Chen, Han -Sung Li, Hsiang -Pang Kuo, Tei-Wei Chang, Meng -Fan Wang, Keh-Chung Hung, Chun-Hsiung Lu, Chih-Yuan Macronix Int Co Ltd Hsinchu Taiwan Natl Tsing Hua Univ Dept Elect Engn Hsinchu Taiwan MIT Dept Elect Engn & Comp Sci Cambridge MA 02139 USA Natl Taiwan Univ Dept Comp Sci & Informat Engn New Taipei Taiwan Acad Sinica IInstitute Informat Sci New Taipei Taiwan Natl Taiwan Univ Grad Inst Elect Engn New Taipei Taiwan Natl Taiwan Univ Grad Inst Networking & Multimedia New Taipei Taiwan Natl Taiwan Univ High Performance & Sci Comp Ctr New Taipei Taiwan

ISBN: (数字)9781665462723

ISBN: (纸本)9781665462723

Vector similarity search (VSS) for unstructured vectors generated via machine learning methods is a promising solution for many applications, such as face search. With increasing awareness and concern about data security requirements, there is a compelling need to store data and process VSS applications locally on edge devices rather than send data to servers for computation. However, the explosive amount of data movement from NAND storage to DRAM across memory hierarchy and data processing of the entire dataset consume enormous energy and require long latency for VSS applications. Specifically, edge devices with insufficient DRAM capacity will trigger data swap and deteriorate the execution performance. To overcome this crucial hurdle, we propose an intelligent cognition engine (ICE) with cognitive 3D NAND, featuring non-volatile in-memory computing (nvIMC) to accelerate the processing, suppress the data movement, and reduce data swap between the processor and storage. This cognitive 3D NAND features digital nvIMC techniques (i.e., ADC/DAC-free approach), high-density 3D NAND, and compatibility with standard 3D NAND products with minor modifications. To facilitate parallel INT8/INT4 vector-vector multiplication (VVM) and mitigate the reliability issue of 3D NAND, we develop a bit-error-tolerance data encoding and a two's complement-based digital accumulator. VVM can support similarity computations (e.g., cosine similarity and Euclidean distance), which are required to search "the most similar data" right where they are stored. In addition, the proposed solution can be realized on edge storage products, e.g., embedded MultiMedia Card (eMMC). The measured and simulated results on real 3D NAND chips show that ICE enhances the system execution time by 17 x to 95 x and energy efficiency by 11 x to 140 x, compared to traditional von Neumann approaches using state-of-the-art edge systems with MobileFaceNet on CASIA-WebFace dataset. To the best of our knowledge, this work

关键词： 3D NAND In-Memory Computing Vector Similarity search unstructured data search

来源：评论

学校读者我要写书评

暂无评论

ICE: An Intelligent Cognition Engine with 3D NAND-Based In-Memory Computing for Vector Similarity search Acceleration 22

ICE: An Intelligent Cognition Engine with 3D NAND-Based In-M...

引用

Proceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture

作者： Han-Wen Hu Wei-Chen Wang Yuan-Hao Chang Yung-Chun Lee Bo-Rong Lin Huai-Mu Wang Yen-Po Lin Yu-Ming Huang Chong-Ying Lee Tzu-Hsiang Su Chih-Chang Hsieh Chia-Ming Hu Yi-Ting Lai Chung-Kuang Chen Han-Sung Chen Hsiang-Pang Li Tei-Wei Kuo Meng-Fan Chang Keh-Chung Wang Chun-Hsiung Hung Chih-Yuan Lu Macronix International Co. Ltd. and Department of Electrical Engineering National Tsing Hua University Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology and Macronix International Co. Ltd. Institute of Information Science Academia Sinica Macronix International Co. Ltd Macronix International Co. Ltd. and Graduate Institute of Electronics Engineering National Taiwan University Department of Computer Science and Information Engineering National Taiwan University and Graduate Institute of Electronics Engineering National Taiwan University Department of Electrical Engineering National Tsing Hua University

ISBN: (纸本)9781665462723

Vector similarity search (VSS) for unstructured vectors generated via machine learning methods is a promising solution for many applications, such as face search. With increasing awareness and concern about data security requirements, there is a compelling need to store data and process VSS applications locally on edge devices rather than send data to servers for computation. However, the explosive amount of data movement from NAND storage to DRAM across memory hierarchy and data processing of the entire dataset consume enormous energy and require long latency for VSS applications. Specifically, edge devices with insufficient DRAM capacity will trigger data swap and deteriorate the execution performance. To overcome this crucial hurdle, we propose an intelligent cognition engine (ICE) with cognitive 3D NAND, featuring non-volatile in-memory computing (nvIMC) to accelerate the processing, suppress the data movement, and reduce data swap between the processor and storage. This cognitive 3D NAND features digital nvIMC techniques (i.e., ADC/DAC-free approach), high-density 3D NAND, and compatibility with standard 3D NAND products with minor modifications. To facilitate parallel INT8/INT4 vector-vector multiplication (VVM) and mitigate the reliability issue of 3D NAND, we develop a bit-error-tolerance data encoding and a two's complement-based digital accumulator. VVM can support similarity computations (e.g., cosine similarity and Euclidean distance), which are required to search "the most similar data" right where they are stored. In addition, the proposed solution can be realized on edge storage products, e.g., embedded MultiMedia Card (eMMC). The measured and simulated results on real 3D NAND chips show that ICE enhances the system execution time by 17× to 95× and energy efficiency by 11× to 140×, compared to traditional von Neumann approaches using state-of-the-art edge systems with MobileFaceNet on CASIA-WebFace dataset. To the best of our knowledge, this work demo

关键词： 3D NAND in-memory computing vector similarity search unstructured data search

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：