BEST-WEB-TOOLS Blog
My own blog posts about development, tech, finance and other (interesting) stuff.
2021-06-21, Dev, PHP, HTML5, Parser
How to parse HTML5 with PHP
HTML5 parsing in PHP sounds like an easy task. but it's not that simple. HTML in general and especially HTML5 allows a lot of fuzziness in how you could write things. That makes a hard to parse, and I guess that's the reason why there are not a lot of great libraries out there for that task.
The best combination I have found so far, is the built-in PHP DOM library and the Masterminds HTML5 parser. The parser creates a DOMDocument out of HTML5 and then you could work with the built-in PHP methods.
require "vendor/autoload.php";
use Masterminds\HTML5;
$html5 = new HTML5();
$dom = $html5->loadHTML($html);
If you want to use xpath to query elements then disable the automatic html5 namespace with disable_html_ns option.
$html5 = new HTML5(['disable_html_ns' => true]);
$xpath = new \DOMXPath($dom);
$title = $xpath->query("//title");
var_dump($title[0]->textContent);
❮❮ back