One of the applications of machine learning with the most potential is speeding up expensive computational processes, i.e. learning to think. To do so, one first generates a large-scale dataset by a compute-intensive ...
详细信息
ISBN:
(纸本)9781665423694
One of the applications of machine learning with the most potential is speeding up expensive computational processes, i.e. learning to think. To do so, one first generates a large-scale dataset by a compute-intensive process and then trains a model to approximate the distribution. High performance computing (HPC) is a perfect fit for these processes, as one may efficiently deploy large amounts of computation to generate a dataset in a reasonable amount of time, to then learn a computationally-efficient solution. Here, we focus on generating a program synthesis dataset. Finding the program that fits a given input-output specification is very expensive, but generating the input-output pairs for a given program is a well-defined process. In this work, we show how we efficiently ran hundreds of thousands of C++ codes line-by-line and used intermediate variable states to generate a large-scale program synthesis dataset.
This paper describes a multi-institution effort to develop a "data science as a service" platform. This platform integrates advanced federated data management for small to large datasets, access to high perf...
详细信息
ISBN:
(纸本)9781509035250
This paper describes a multi-institution effort to develop a "data science as a service" platform. This platform integrates advanced federated data management for small to large datasets, access to high performance computing, distributedcomputing and advanced networking. The goal is to develop a platform that is flexible and extensible while still supporting domain research and avoiding the walled garden problem. Some preliminary lessons learned and next steps will also be outlined.
暂无评论