Fixing duplicate redirects in PWAs caused by Service Worker Cache

I recently noticed that after a web application of mine started recording a higher number of redirects to an affiliate program I am running. For a while I thought this was just due to improvements on the site.

Over time I did notice a pattern that some redirects were showing up twice in access logs, and thought it was suspicious. It turns out the PWA (Progressive Web App) capabilities I added at some point was the reason for this. Specifically it was the Service Worker Cache that caused the "random" behaviour.

Browsers can provide offline capabilities to web applications using a Service Worker. Currently the feature is available in Chrome, Firefox and Opera, but is in development for Edge and Safari. A service worker is essentially a proxy script and a cache running in your browser that can be used to store data.

Service Workers are written in JavaScript. There is quite a bit of boilerplate involved, so I like to start off with a ready made script, usually using PWA builder. It is an easy way to provide offline capabilities to a web app. The default script goes a long way, but since it’s "just JavaScript" the code is easy to modify.

Scopes in Web Workers

Web workers work within a defined scope of a website or a web application. This scope is simply a path in the URL of the site, so for example a service worker set to the scope of /real-estate would be limited to that path on a domain. In this case the Real Estate App would not have access to the resources of the Jobs App living under /jobs living under the same domain name.

For most cases it’s safe just to set the scope of the application to the root level, because often it is safe to assume everything living under a single domain is not malicious. The scope is enforced by the location of the worker file itself, so the scope of /real-estate/service-worker.js can not be set to /jobs/.

This is fine, as the abovementioned limitation is good for security, but there is also a HTTP header, Service-Worker-Allowed, to allow bypassing of scopes for special cases as an opt-in. It works similar to CORS headers, but as of August 2017 it seems support for Service-Worker-Allowed remains limited.

Duplicated redirects due to Service Worker's Cache

Defining scopes (and purging caches) in PWAs continues to be very limited in my opinion and you do not have a mechanism similar to robots.txt or wildcards to allow scopes to multiple locations. Adding scopes for an admin panel under /admin/ might be just fine, but I'd like more flexibility without workarounds. This is essential for systems with different capabilities under a single domain.

It took me a while to figure the reason for the duplicated redirects in the web server logs. I went from theories like “stupid users” double clicking on links, but eventually I noticed that iPhone users for example never had this behaviour according to the logs. Safari in iOS 10 does not support Web Workers, so I finally followed the Service Worker path. Learned a thing or two, more importantly figured it out.

I use a proxy script to redirect users, e.g. http://example.com/_redirect/insurance/1337. This is due to convenience since I want to be able to control the target in a central location, while keeping the link the same. The script itself is simple. It does some checking of the incoming request and then forwards the user to wherever they should be redirected to according to the configuration at that point in time.

The links were recorded twice, because the service worker was doing a cache fetch for the link first, followed by the actual user visit. This resulted in the logs (and the receiving end) recording the requests twice, and since the 302 response is not cached it happened each time a user clicked the same link.

The caching fetch happens in the background, so the user (including me) did not experience screen flickering that would indicate the page was being loaded twice subsequently. Cache debugging...

Preventing caching of requests in a Global Scope

I’m sure there are more elegant ways to handle this, but what I did was just modify the script generated by PWA Builder. I changed the fetch event listener to include some logic that would validate the clicked URL to see if it should be cached at all. This enables blocking caching of select assets from global scope.

The URL fragments that should not be cached are stored in an array and then a boolean value is used to decide whether that specific URL should be store in the cache or not. This is brute force as every download is checked. See the full source on Gist and the relevant snippet below:

self.addEventListener('fetch', function (event) {
    console.log('The service worker is serving the asset.');
    event.respondWith(checkResponse(event.request).catch(function () {
        return returnFromCache(event.request)
    }
    ));

    var storeInCache = true;
    var dontCache = [
        'sw.js',
        '/_redirect'
    ];

    storeInCache.forEach(function (element) {
        if (event.request.url.includes(element)) {
            storeInCache = false;
        }
    }, this);
    
    if (storeInCache) {
        event.waitUntil(addToCache(event.request));
    } else {
        console.log('do not cache ' + event.request.url);
    }
});

Conclusion

The above solution may not be the most elegant way to handle this, but in my experience it eliminated the effect of the redirects occurring twice. Service Workers seem to be an effective and flexible method for controlling caching on the client end, but adding any cache layers increase complexity for debugging.

It’s worth noting that the tracking of duplicate redirects would not happen if I was linking to another domain directly, since the SW Cache scoping would limit to caching by default. Also this is a case that is not a very common problem as in most cases the extra pre-fetch does not really make a difference.

In my case I wanted to solve the issue because the affiliate network tracks clicks directly, based on hits to their own links. The statistics were skewed and not in a predictable way due to varying levels of support for Service Workers in browsers. In addition I can imagine that from the networks’ point of view this could easily be interpreted as click fraud, clickjacking or other malicious activities.

-- Jani Tarvainen, 18/08/2017