My own blog posts about development, tech, finance and other (interesting) stuff.

❮❮ back

2021-06-21, Dev, PHP, HTML5, Parser

How to parse HTML5 with PHP

HTML5 parsing in PHP sounds like an easy task. but it's not that simple. HTML in general and especially HTML5 allows a lot of fuzziness in how you could write things. That makes a hard to parse, and I guess that's the reason why there are not a lot of great libraries out there for that task.

The best combination I have found so far, is the built-in PHP DOM library and the Masterminds HTML5 parser. The parser creates a DOMDocument out of HTML5 and then you could work with the built-in PHP methods.

require "vendor/autoload.php";

use Masterminds\HTML5;

$html5 = new HTML5();
$dom = $html5->loadHTML($html);

If you want to use xpath to query elements then disable the automatic html5 namespace with disable_html_ns option.

$html5 = new HTML5(['disable_html_ns' => true]);

$xpath = new \DOMXPath($dom);
$title = $xpath->query("//title");

❮❮ back