Next up in my review of HTMLDiff projects is a Python variant, which can be found on GitHub as cygri/htmldiff.
boom
Advantages:
- Much faster than the PHP library I tried, Daisy Diff. Not as quick as the C HTMLDiff library I used, but quick enough to use
- Doesn't index non-essential data, such as attribute values in the
<head>
- Easy API to use: simply using the
shell_exec
from PHP gets me pretty far
Disadavantages:
- Speed isn't yet like the C library I used
- It strips htm- - l comments: this is especially problematic, because it strips out cases such as:
<style type="text/css">
<!--
span {
color: red;
}
-->
</style>
As you may notice, this is valid styling that ought to stay in the document, but is instead stripped out.
- Does a really good job at detecting changes
- Has the ability to detect, and highlight, added/removed tags
Endgame
I'm currently using this on my project. The stripping of the comments from the document means there's a good chance I won't use it in the end, but if that's not an issue for you, or you can control your markup to not have html comments, this library has my blessing.