A simple HTML scanner and tag balancer using standard XML interfaces
http://nekohtml.sourceforge.net/