Proceedings of the 5th International Conference on Advanced Design and Manufacturing Engineering

Research on Bayes-based Text Automatic Classification

Xuan Zhang
Corresponding author
Xuan Zhang
text automatic classification; Bayes; classification algorithms; feature extraction
Enormous amount of information on the Internet, there are several of information and it is so complicated. Information retrieval is of blind and too much redundant information is in search results. In order for a user to much more effective at getting the information they needed, This paper researches the method of page text automatic classification based on the classification algorithm of Naive Bayes. Responding to the structure of pages, the paper analyses the structure components which are useful to the classification in the page tags in detail. And we apply Naive Bayes algorithm to classify with these effective features of HTML identifiers. It easy for users to more precise locate information on Internet through reduced the difficulty of Internet information retrieval.
