Hi everybody
I have a file(newfolder.html). I want to do preprocessing on its content. Some operations like tokenization, deleting stop words, counting the number of words. I know how to do these operations if I have a text file(.txt) .but now I have to do it with a html file.
How can I do it?
Thanks
aseeman 0 Newbie Poster
Recommended Answers
Jump to PostI know how to do these operations if I have a text file(.txt)
An HTML file is text as well, so haven't you tried using the same approach as if it was a text file. HTML is plain text, just saved with a .html exention so the computer …
Jump to PostThere is clearly some HTML code that you're stripping the HTML tags off that is gettings left behind. The last part of the random looking string is
"textdecorationnone" which equals "text-decoration:none" I'm presuming which is, of course, CSS.
So have a look at the actual source of your HTML and …
All 6 Replies
blackmiau 0 Junior Poster
aseeman 0 Newbie Poster
JorgeM 958 Problem Solver Team Colleague Featured Poster
aseeman 0 Newbie Poster
aseeman 0 Newbie Poster
hericles 289 Master Poster Featured Poster
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.