BEST-WEB-TOOLS Blog
My own blog posts about development, tech, finance and other (interesting) stuff.
2021-05-23, Dev, Puppeteer, Puphpeteer, PHP, Chromium
Create a full page screenshot with PHP
Screenshoting a website is easy. But screenshoting the full website could get complicated. There are some hidden browser features or shady extensions that could handle that job for you, ... or you just use best-web-tools.com/fetch. You want to build such a feature yourself? Or just want to know how it works. Read on!
What do you need
To create a screenshot programmatically you need a browser and something that can control your browser. With Chromium and Puppeteer you get this things from Google. Chromium is the open source version of Google Chrome, you can run it on a server in headless mode. That means it runs on command line without creating a window. Puppeteer is a javascript library that let you control Chromium in headless-mode.
npm i puppeteer
# or "yarn add puppeteer"
The easy way
The easy way to create a screenshot in PHP is to execute a javascript-file on your command line. Just put the screenshot-code in a js-file and call it with exec('node ...') from PHP. If the script don't return any error code you could expect the screenshot at the destination you specified.
PHP
$websiteUrl = 'https://best-web-tools.com';
$screenshotPath = './screenshot.png';
$command = sprintf('node screenshot.js %s %s',
escapeshellarg($websiteUrl), escapeshellarg($screenshotPath));
$result = exec($command);
Javascript (screenshot.js)
const puppeteer = require('./node_modules/puppeteer');
const args = process.argv;
const url = args[2];
const filename = args[3];
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
await page.screenshot({ path: filename, fullPage: true });
await browser.close();
})();
That's the simple way, but maybe not the best solution if you want to do additional things, like getting infos back from the node-script. Here is an alternative ...
Another approach with PuPHPeteer
It's nothing wrong about calling a Node.js script from PHP. But there is a better way to do that. PuPHPeteer is creating a bridge between PHP and Node.js and you can call Puppeteer methods like it would be a PHP library. ... And yes. PuPHPeteer is a horrible name and i always write it wrong.
composer require nesk/puphpeteer
npm install @nesk/puphpeteer
Here is the example from above just in PHP with PuPHPeteer
$websiteUrl = 'https://best-web-tools.com';
$screenshotPath = './screenshot.png';
$puppeteer = new Puppeteer;
$browser = $puppeteer->launch();
$page = $browser->newPage();
$page->goto($websiteUrl,
['waitUntil' => 'networkidle0', 'timeout' => 0]);
$page->screenshot(['path' => $screenshotPath, 'fullPage' => true]);
$browser->close();
There are now await-calls because everything in PuPHPeteer is synchron by default. The rest of the code looks quite similar to the js-version. Are you happy now? No. Here are some improvements for your screenshot script ...
If fullPage is not a full page
Even if you add the fullPage-parameter to the screenshot-method you don't get a real full page screenshot with Puppeteer? I never found out why that is happening. The best results i got, by fetching the document.body.scrollHeight and set it as viewport-height in puppeteer (in addition to the fullPage-switch).
$this->bodyHeight = $page->evaluate(JsFunction::createWithBody("
return document.body.scrollHeight;
"));
$page->setViewport(['width' => 1920, 'height' => $this->bodyHeight]);
The images in my screenshot are blank
Many websites today use lazy loading techniques to reduce initial website loading times. Lazy loading usually loads images when they are visible to the user and that could be simulated in a headless browser if you just scroll down until the end of the website. Also add a timeout before taking a screenshot could improve the results. Even a headless browser have to load the assets.
$page->evaluate(JsFunction::createWithBody("
window.scrollBy(0, document.body.scrollHeight);
"));
$page->waitForTimeout(1000);
Clicking stuff
If a cookie banner is nagging you. Just click it away first. You can click on x/y-coordinates or finding html-elements by css-selectors and click on them.
$page->mouse->click(1, 1, ['button' => 'left']);
$page->click('button[name=submit]');
Getting html elements
You can even fetch the html-elements with puphpeteer and check for properties. For example the natural sizes of all the images on the website.
$images = [];
foreach($page->querySelectorAll('img') as $img) {
$src = $img->getProperty('src')->jsonValue();
$images[$src]['width'] = $img->getProperty('naturalWidth')->jsonValue();
$images[$src]['height'] = $img->getProperty('naturalHeight')->jsonValue();
}
Let me google that for you
Here a full example of googling something. First clicking away the cookie-modal: Google makes it hard to use css-selectors because they name all classes and ids different every request. So you have to iterate over the buttons and check for the label. Then type something into the search box and hit the Enter key.
$puppeteer = new Puppeteer;
$browser = $puppeteer->launch();
$page = $browser->newPage();
$page->goto('https://www.google.com',
['waitUntil' => 'networkidle0', 'timeout' => 0]);
$buttons = $page->querySelectorAll('button');
foreach($buttons as $button) {
$text = $button->getProperty('innerText')->jsonValue();
if($text === 'I agree' || $text === 'Ich stimme zu') {
$id = $button->getProperty('id')->jsonValue();
$page->click('#'.$id);
}
}
$page->type('input[name=q]', 'best-web-tools.com');
$page->keyboard->press('Enter');
$page->waitForTimeout(1000);
$page->screenshot(['path' => './google.png', 'fullPage' => true]);
$browser->close();
Action now
Want to see all above in action then try the fetch util here in best-web-tools.com.
❮❮ back