My own blog posts about development, tech, finance and other (interesting) stuff.
2021-06-21, Dev, PHP, HTML5, Parser
How to parse HTML5 with PHP
HTML5 parsing in PHP sounds like an easy task. but it's not that simple. HTML in general and especially HTML5 allows a lot of fuzziness in how you could write things. That makes a hard to parse, and I guess that's the reason why there are not a lot of great libraries out there for that task.
The best combination I have found so far, is the built-in PHP DOM library and the Masterminds HTML5 parser. The parser creates a DOMDocument out of HTML5 and then you could work with the built-in PHP methods.
require "vendor/autoload.php"; use Masterminds\HTML5; $html5 = new HTML5(); $dom = $html5->loadHTML($html);
If you want to use xpath to query elements then disable the automatic html5 namespace with disable_html_ns option.
$html5 = new HTML5(['disable_html_ns' => true]); $xpath = new \DOMXPath($dom); $title = $xpath->query("//title"); var_dump($title->textContent);