Research on Small File Processing Technology Based on HDFS

Rui Gu

doi:10.2991/ecae-17.2018.61

<Previous Article In Volume

Next Article In Volume>

Research on Small File Processing Technology Based on HDFS

Authors

Rui Gu

Corresponding Author

Rui Gu

Available Online December 2017.

DOI: 10.2991/ecae-17.2018.61 How to use a DOI?
Keywords: HDFS; cloud storage; small files; file merge; insert
Abstract: With the rapid development of the Internet and the rapid growth of Internet users, the Internet data is also a sharp expansion. The emergence of cloud computing is a good solution to the large data computing and storage problems, massive data storage and analysis has become a very popular research field. HDFS uses a single NameNode to manage the metadata of the entire system, and stores metadata in memory in order to improve access efficiency, but when the system stores a large number of small files, it generates a lot of metadata, occupies larger NameNode memory. In addition, a large number of small file access need to frequently send a request to the NameNode, resulting in the NameNode overload. In view of this problem, this paper analyzes some of the previous research and improvement programs, and on this basis to do a corresponding improvement. On the basis of the original distributed file system, an independent small file processing module was added. The small file processing module merged the small files, created the index of the file, and passed the file cache to HDFS for data processing.
Copyright: © 2018, the Authors. Published by Atlantis Press.
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

<Previous Article In Volume

Next Article In Volume>

Volume Title: Proceedings of the 2017 2nd International Conference on Electrical, Control and Automation Engineering (ECAE 2017)
Series: Advances in Engineering Research
Publication Date: December 2017
ISBN: 978-94-6252-458-3
ISSN: 2352-5401
DOI: 10.2991/ecae-17.2018.61 How to use a DOI?
Open Access: This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

ris enw bib

TY  - CONF
AU  - Rui Gu
PY  - 2017/12
DA  - 2017/12
TI  - Research on Small File Processing Technology Based on HDFS
BT  - Proceedings of the 2017 2nd International Conference on Electrical, Control and Automation Engineering (ECAE 2017)
PB  - Atlantis Press
SP  - 286
EP  - 289
SN  - 2352-5401
UR  - https://doi.org/10.2991/ecae-17.2018.61
DO  - 10.2991/ecae-17.2018.61
ID  - Gu2017/12
ER  -

download .riscopy to clipboard