Proceedings of the 2018 Joint International Advanced Engineering and Technology Research Conference (JIAET 2018)

An optimization strategy of massive small files storage based on HDFS

Authors
Xun Cai, Cai Chen, Yi Liang
Corresponding Author
Xun Cai
Available Online March 2018.
DOI
10.2991/jiaet-18.2018.40How to use a DOI?
Keywords
Storage of Small Files, Distribution of Small Files, Merge, Relationship between files.
Abstract

Nowadays, Hadoop distributed file system as a distributed storage system, has a good effect on the storage of large files. However, there is a natural flaw in the storage of small files: storing a large number of small files will produce excessive metadata, resulting in namenode memory bottlenecks; frequent RPC communications will cause time consumption due to over-provisioning. To solve these problems, this paper presents a merging algorithm based on two factors: the distribution of files and the correlation of files. The algorithm can not only reduce the HDFS blocks, but also make relevant files close. Experimental results show that the algorithm effectively improves the storage efficiency of HDFS on small files and help to optimize the access of small files.

Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2018 Joint International Advanced Engineering and Technology Research Conference (JIAET 2018)
Series
Advances in Engineering Research
Publication Date
March 2018
ISBN
10.2991/jiaet-18.2018.40
ISSN
2352-5401
DOI
10.2991/jiaet-18.2018.40How to use a DOI?
Copyright
© 2018, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Xun Cai
AU  - Cai Chen
AU  - Yi Liang
PY  - 2018/03
DA  - 2018/03
TI  - An optimization strategy of massive small files storage based on HDFS
BT  - Proceedings of the 2018 Joint International Advanced Engineering and Technology Research Conference (JIAET 2018)
PB  - Atlantis Press
SP  - 225
EP  - 230
SN  - 2352-5401
UR  - https://doi.org/10.2991/jiaet-18.2018.40
DO  - 10.2991/jiaet-18.2018.40
ID  - Cai2018/03
ER  -