mwparserfromhtml

A python library for parsing and mining metadata from the Enterprise HTML Dumps.

Python library for parsing and mining metadata from the Enterprise HTML Dumps. Useful for building lossless plaintext dataset from HTML dataset with little resource overhead. Read more about the project here