ariya.io About Talks Articles

Capturing Web Page Without Stylesheets

2 min read

It is amazing to live in an environment where the Internet connection is ubiquitous and fast. But in case the tube is having a problem and the bits from the web server are broken into random pieces, how does the web site look like? If the content degrades gracefully, the lack of style sheets may reduce the attractiveness of the page but it should not significantly hamper the experience. Fortunately, there is a way to automatically check the appearance of a web page under that circumstance.

Some time ago, I have demonstrated the use of PhantomJS, headless WebKit, to capture web pages programmatically. The example was also extended to capture just a particular portion of the page via clipping. For CSS-less capture, we just need to extend it with the new feature in PhantomJS 1.9 (as implemented by Vitaliy Slobodin): the ability to abort network requests.

There is a example loadurlwithoutcss.js which demonstrates this feature. In fact, combining this idea with the previous BBC News site capture, we can come up with the following screenshots. The left side shows the normal page (see my previous blog post on web clipping) while the right side demonstrates what happens when all the CSS files are not loaded at all.

decssify

The script which produces the above image is as follows:

var page = require('webpage').create();
page.settings.userAgent = 'WebKit/534.46 Mobile/9A405 Safari/7534.48.3';
page.settings.viewportSize = { width: 400, height: 600 };
 
page.onResourceRequested = function(requestData, request) {
    if ((/http:\/\/.+?\.css$/gi).test(requestData['url'])) {
        console.log('Skipping', requestData['url']);
        request.abort();
    }   
};
 
page.open('http://m.bbc.co.uk/news/health', function (status) {
    if (status !== 'success') {
        console.log('Unable to load BBC!');
        phantom.exit();
    } else {
        window.setTimeout(function () {
            page.clipRect = { left: , top: , width: 400, height: 600 };
            page.render('bbc_unstyled.png');
            phantom.exit();
        }, 1000);
    }   
});

It is pretty similar to its previous version. The new addition is a handler for onResourceRequested where we detect the URL for a style sheet and abort its loading. If the script is executed, it will display the message:

Skipping http://static.bbci.co.uk/frameworks/barlesque/2.45.9/mobile/3.5/style/main.css
Skipping http://static.bbci.co.uk/bbcdotcom/0.3.184/style/mobile/bbccom.css
Skipping http://static.bbci.co.uk/news/1.7.1-259/stylesheets/core.css
Skipping http://static.bbci.co.uk/news/1.7.1-259/stylesheets/compact.css

which indicates that these 4 (four) style sheets won’t be part of the rendered output.

The entire process is rather straightforward. Because PhantomJS is cloud-ready, you can even have it running on an instance of Amazon EC2. It should not be too difficult to include this type of spartan rendering of your web site as another layer in the defensive development workflow.

What do you plan to de-CSS-ify today?

Related posts:

♡ this article? Explore more articles and follow me Twitter.

Share this on Twitter Facebook