Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project. WebSPHINX ( Website-Specific Processors for HTML INformation eXtraction) is a Java class library and interactive development environment for Web crawlers that browse and process Web pages automatically.