Proceedings of 3rd International Conference on Multimedia Technology(ICMT-13)

A Novel Framework for Web Pages Classification

Authors
Hu Ruiguang, Hu Weiming
Corresponding Author
Hu Ruiguang
Available Online November 2013.
DOI
10.2991/icmt-13.2013.130How to use a DOI?
Keywords
Score fusion • Multi-instance learning • Bag-of-features
Abstract

In this paper, we propose a novel framework for classifying web pages containing images and text. Valid images are first chosen by the FOrward CompArison of Relative Sizes Sorting(FOCARSS) algorithm, and each valid image is represented by the mid-level feature vector generated by the Bag-Of-Features model. Taking these feature vectors of valid images in a web page as instances of a bag, Multi-Instance Learning is utilized to conduct the image-based web pages classification. Regarding the text information, Bag-Of-Words model is used to conduct the text-based web pages classification. Subsequently, score-level fusion schemes are used to fuse these two kinds of heterogeneous information. Experimental results on a representative dataset demonstrate that our framework can definitely take full advantage of image and text information and improve final classification performances.

Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Download article (PDF)

Volume Title
Proceedings of 3rd International Conference on Multimedia Technology(ICMT-13)
Series
Advances in Intelligent Systems Research
Publication Date
November 2013
ISBN
10.2991/icmt-13.2013.130
ISSN
1951-6851
DOI
10.2991/icmt-13.2013.130How to use a DOI?
Copyright
© 2013, the Authors. Published by Atlantis Press.
Open Access
This is an open access article distributed under the CC BY-NC license (http://creativecommons.org/licenses/by-nc/4.0/).

Cite this article

TY  - CONF
AU  - Hu Ruiguang
AU  - Hu Weiming
PY  - 2013/11
DA  - 2013/11
TI  - A Novel Framework for Web Pages Classification
BT  - Proceedings of 3rd International Conference on Multimedia Technology(ICMT-13)
PB  - Atlantis Press
SP  - 1054
EP  - 1061
SN  - 1951-6851
UR  - https://doi.org/10.2991/icmt-13.2013.130
DO  - 10.2991/icmt-13.2013.130
ID  - Ruiguang2013/11
ER  -