HTML parser written in Java that can be used as a tool, library or Ant task
http://htmlcleaner.sourceforge.net/