Proceedings of the 2017 2nd International Conference on Electrical, Control and Automation Engineering (ECAE 2017)

Research on Small File Processing Technology Based on HDFS

Authors
Rui Gu
Corresponding Author
Rui Gu
Available Online December 2017.
DOI
10.2991/ecae-17.2018.61How to use a DOI?
Keywords
HDFS; cloud storage; small files; file merge; insert
Abstract

With the rapid development of the Internet and the rapid growth of Internet users, the Internet data is also a sharp expansion. The emergence of cloud computing is a good solution to the large data computing and storage problems, massive data storage and analysis has become a very popular research field. HDFS uses a single NameNode to manage the metadata of the entire system, and stores metadata in memory in order to improve access efficiency, but when the system stores a large number of small files, it generates a lot of metadata, occupies larger NameNode memory. In addition, a large number of small file access need to frequently send a request to the NameNode, resulting in the NameNode overload. In view of this problem, this paper analyzes some of the previous research and improvement programs, and on this basis to do a corresponding improvement. On the basis of the original distributed file system, an independent small file processing module was added. The small file processing module merged the small files, created the index of the file, and passed the file cache to HDFS for data processing.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2017 2nd International Conference on Electrical, Control and Automation Engineering (ECAE 2017)
Series
Advances in Engineering Research
Publication Date
December 2017
ISBN
10.2991/ecae-17.2018.61
ISSN
2352-5401
DOI
10.2991/ecae-17.2018.61How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Rui Gu
PY  - 2017/12
DA  - 2017/12
TI  - Research on Small File Processing Technology Based on HDFS
BT  - Proceedings of the 2017 2nd International Conference on Electrical, Control and Automation Engineering (ECAE 2017)
PB  - Atlantis Press
SP  - 286
EP  - 289
SN  - 2352-5401
UR  - https://doi.org/10.2991/ecae-17.2018.61
DO  - 10.2991/ecae-17.2018.61
ID  - Gu2017/12
ER  -