Research on Small File Processing Technology Based on HDFS
- 10.2991/ecae-17.2018.61How to use a DOI?
- HDFS; cloud storage; small files; file merge; insert
With the rapid development of the Internet and the rapid growth of Internet users, the Internet data is also a sharp expansion. The emergence of cloud computing is a good solution to the large data computing and storage problems, massive data storage and analysis has become a very popular research field. HDFS uses a single NameNode to manage the metadata of the entire system, and stores metadata in memory in order to improve access efficiency, but when the system stores a large number of small files, it generates a lot of metadata, occupies larger NameNode memory. In addition, a large number of small file access need to frequently send a request to the NameNode, resulting in the NameNode overload. In view of this problem, this paper analyzes some of the previous research and improvement programs, and on this basis to do a corresponding improvement. On the basis of the original distributed file system, an independent small file processing module was added. The small file processing module merged the small files, created the index of the file, and passed the file cache to HDFS for data processing.
- © 2018, the Authors. Published by Atlantis Press.
- Open Access
- This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).
Cite this article
TY - CONF AU - Rui Gu PY - 2017/12 DA - 2017/12 TI - Research on Small File Processing Technology Based on HDFS BT - Proceedings of the 2017 2nd International Conference on Electrical, Control and Automation Engineering (ECAE 2017) PB - Atlantis Press SP - 286 EP - 289 SN - 2352-5401 UR - https://doi.org/10.2991/ecae-17.2018.61 DO - 10.2991/ecae-17.2018.61 ID - Gu2017/12 ER -