The functions of data exchange provide a series of methods, transfer data from a storage system to another storage system. HDFS is a Distributed File System realized by Hadoop, which has the character of high fault-to...
详细信息
The functions of data exchange provide a series of methods, transfer data from a storage system to another storage system. HDFS is a Distributed File System realized by Hadoop, which has the character of high fault-tolerance, at the same time it provides a high transfer rate to access the data of applications, and is suitable for those applications with large data set. Traditionally, the large data is stored in FTP servers or SQL databases. We use Hadoop distributed framework for large-scale data calculation, which will certainly need to transfer data from FTP servers or SQL databases to HDFS. This paper mainly discussed the problem of parallel data exchange between SQL database and HDFS, introduced the performance of Hadoop data exchange functions: dbinputformat/ dboutputformat, and put forward some strategies to improve the performance.
暂无评论