Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)

A Novel Processing Model For Scds In ETL

Authors
Li Sun, Jiaoyan Zhang, Jiyun Li
Corresponding Author
Li Sun
Available Online October 2017.
DOI
10.2991/jimec-17.2017.29How to use a DOI?
Keywords
ETL, MapReduce, map-only
Abstract

ETL(Extract-Transform-Load) which populates data from various data source systems to data warehouses (DWs) is an important part of building data warehouse. Nowadays, as the data growing rapidly, it is a big challenge for ETL to process such huge data quickly. MapReduce is a programming model for large-scale data-intensive processing. It is composed of two functions, map and reduce, this promotes the implementation of many tasks in parallel. However, this model has its disadvantages. For example, it is not so efficiency when the mappers produce lots of data, which will take a lot of network cost to move the Intermediate data to reducers. In this paper, we present a new method called map-only. With this method, we do the reduce in the local and do not need to transfer the data to the reducers through the network. The result shows that the method we present performs very well, which improves the speed of processing data for both Type-1 and Type-2 SCDs. For example, when the size of increasing data is 5GB, with the map-only method, it takes only 20 minutes to process the Type-2 SCDs while it costs 28 minutes to process the same data.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)
Series
Advances in Computer Science Research
Publication Date
October 2017
ISBN
10.2991/jimec-17.2017.29
ISSN
2352-538X
DOI
10.2991/jimec-17.2017.29How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Li Sun
AU  - Jiaoyan Zhang
AU  - Jiyun Li
PY  - 2017/10
DA  - 2017/10
TI  - A Novel Processing Model For Scds In ETL
BT  - Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)
PB  - Atlantis Press
SP  - 133
EP  - 136
SN  - 2352-538X
UR  - https://doi.org/10.2991/jimec-17.2017.29
DO  - 10.2991/jimec-17.2017.29
ID  - Sun2017/10
ER  -