Proceedings of the International Conference on Computer Networks and Communication Technology (CNCT 2016)

Authoring of Personalized Web Page from Heterogeneous Web Pages by Content Extraction and Integration

Authors
Wei-gang LI, Ke SUN, Shuo-chen WANG
Corresponding Author
Wei-gang LI
Available Online December 2016.
DOI
10.2991/cnct-16.2017.102How to use a DOI?
Keywords
Authoring of Web pages, Content extraction, Element similarity, CS-DOM tree.
Abstract

Authoring of personalized Web page by integrating heterogeneous Web page elements from different sites is a challenging task in Web 2.0 applications. An approach to extract various of partitions or elements, which can be the basic HTML elements, CSS definitions, JavaScript source code, etc, from different Web sites, thus implementing authoring of new page from heterogeneous Web pages is proposed in this paper. A novel DOM tree model, CS-DOM tree, is introduced to retrieve the CSS definitions. In order to assure that the new Web pages keep updating synchronized with the source pages, a method based on the structure of DOM and the context of elements to relocate the elements that have been retrieved before is then presented. The similarity calculation algorithm used to judge whether the relocated elements and the elements retrieved before are from the same position is developed at last. The method proposed in this paper has been applied to develop a personalized portal.

Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of the International Conference on Computer Networks and Communication Technology (CNCT 2016)
Series
Advances in Computer Science Research
Publication Date
December 2016
ISBN
10.2991/cnct-16.2017.102
ISSN
2352-538X
DOI
10.2991/cnct-16.2017.102How to use a DOI?
Copyright
© 2017, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Wei-gang LI
AU  - Ke SUN
AU  - Shuo-chen WANG
PY  - 2016/12
DA  - 2016/12
TI  - Authoring of Personalized Web Page from Heterogeneous Web Pages by Content Extraction and Integration
BT  - Proceedings of the International Conference on Computer Networks and Communication Technology (CNCT 2016)
PB  - Atlantis Press
SP  - 734
EP  - 740
SN  - 2352-538X
UR  - https://doi.org/10.2991/cnct-16.2017.102
DO  - 10.2991/cnct-16.2017.102
ID  - LI2016/12
ER  -