Ways to model ecommerce customers without using cookies // INDEZ Blog

Definition & Background

Cookies are the main way that marketeers target customers at present. This is because cookies contain information about users that show who they are, where they have been and what they have done online. Google’s latest announcement is to phase out their use by 2023.

In response to pressure from both Governments and the World Wide Web Consortium, Google is not alone in announcing that it will soon be disallowing the use of cookies. Firefox no longer reads cookies and a swathe of browser extensions are available to block cookies with other browsers.

To maintain their ability to model and target website customers, ecommerce businesses will need to make use of alternative ways of modelling customers if they are to maintain sales conversions.‍

Conversion Modeling is an approach being taken by Google within their Privacy Sandbox project to help fill in the gaps between marketing (e.g. being shown an adword) and achieving a goal (e.g. making a sale). The Google approach makes use of FLoCS (Federated Learning of Cohorts) which will be covered in more detail in a future post. Filling in the gaps in a cookie-free world will involve a variety of different techniques which are being researched (but rarely disclosed) by all those currently involved in ecommerce marketing.

Conversion Modeling is based on two related approaches:

Intrinsic information about individual website visitors. Examples of this intrinsic information might be where they live, their gender, their wealth, or even their buying interests (i.e. the type of things that may be currently available (or deducible) from a cookie).
Conformity is behaviour based around matching/comparing a user's behaviour to what others do that share similar personas and behaviour. Conformity in group behaviour makes use of AI and ML to establish ‘most likely behaviours’. This is the main approach being taken by Google.

Here we will be looking at as many potential ways as possible for establishing Intrinsic Information about any potential ecommerce customer.

There are potential legal issues around conversion modelling that stem from GDPR regulations. Although this is yet to be tested or regulated, the guidelines state that if the modelling absolutely identifies a specific individual then it would be classed as a breach of GDPR. If however the modeling process provides a likely persona (e.g. possibly an overseas student, wealthy, living on campus, sociable and interested in fashion) then, under current GDPR regulations, that’s fine. The problem is, exactly how fine is that line between a specific individual and a slightly generalised model of that individual. A particular challenge for a company like Google is that it will almost certainly already have access to all the data necessary to uniquely identify individuals. Such privileged information will need to be (and be seen to be) firewalled from any conversion modeling process.

Possible ways to profile customers using Intrinsic factors

Customer modelling is important both for marketing as well as for website conversion. Some of these factors may already be in use, others are being developed while some are being researched.

Geolocation:

All the main ecommerce platforms have extensions that provide reasonably accurate geolocation data for UK website visitors. Visit iplocation.net to test this for yourself. One recent study suggests that UK IP geolocation has an accuracy of between 74 and 88%. One limitation is that many users have ISPs that provide dynamic addresses that limit accuracy. Also, the increased use of VPN technology masks geolocation data. However, IP anonymization will only cause a negligible 0.91% and 1.19% discrepancy at the country and region level for UK based visitors.

If visitors can be geolocated then this can be tied in with databases (e.g. from the ONS) that provide information on incomes in different areas. It can also tie-in with house price databases as provided by the UK land registry and used by estate agents. It’s really not hard to see how this type of information could be used to help sell products designed for use in large gardens such as lawn mowers, garden plants, and outside furniture. Visitors from very rural areas might be more likely to be interested in chain saws, tractor spare parts or wellingtons than city-based users.

Browsing Behaviour on the site

Once a user is already on a website, conversion modeling is more straightforward. If somebody is browsing, for example, mens watches and uses the sort function set high-to-low on price then it’s reasonable to expect the visitor to be wealthier than somebody looking at sale items. Site search can also help establish persona-type in terms of likely preferences.

Landscape or Portrait

Offcom has provided some interesting insights into age and gender differences along with how this impacts how different groups make use of the web. It’s claimed that younger users who make extensive use of messaging services are more likely to use portrait mode. Landscape mode is more common for older users.

ETAG files

When you first visit a webpage, load times can be slow. On a second visit load times can be very much faster because the data has been cached. This process is mediated via etag files. Hence, ETAG status helps identify if you have visited the site before rather than being a first time visitor. If a user has previously visited then they are likely to be at a different point in the sales funnel.

Canvas Fingerprinting & User Behaviour

In principle, it is technically possible to take measurements of how a web user is interacting with their screen. For example, mouse movements, acceleration, the use of scroll, etc. Such usability patterns will be different for different user-groups and are highly likely to change with age. The same would be true in relation to typing speeds and time taken to interact when form filling. Whether your interest is in the ecommerce site or a third party site that might display adverts, likely personas will impact the advertising you show and the products you sell. Gologin provides insights and technical details of how canvas fingerprinting is increasingly being used as a replacement technology for cookies.

*Canvas Defender in Firefox displays a notification whenever it detects sites that may use Canvas fingerprinting on its visitors.*

Local Storage

HTML5 can make use of LocalStorage. This potentially provides a mechanism of communicating user identification data directly. Although this provides a mechanism to get around a lack of cookie data, it is likely to be in breach of GDPR. Though currently untested, it’s likely to be categorised as black (or at least dark-grey) hat.

Operating Systems or Browser Types

If your website visitor was using the latest version of Linux and the browser was Firefox Developer Edition or Propane then you can immediately have a reasonably accurate idea of what type of user you have. Equally, a version of Safari 11 on a second generation Ipad would indicate a different type of user.

This type of information, though statistically significant for some categories of user, will contain relatively high mis-categorisations and would only be used as a refining adjunct to other methods.

Device Fingerprinting

Of all the approaches being researched, the approach that appears to be showing the most promise is Device Fingerprinting. Amiunique illustrates how a multivariate approach in combining multiple fingerprinting factors can provide a quick and easy way to uniquely tag a visitor. Most people visiting the site can see by checking their own browser fingerprint how their own combination of factors show themselves to be unique. This technique appears to be sufficiently powerful to question why anyone found it necessary to create cookies for individual tagging in the first place.

Points to note

While cookies are still in use, organisations that require user modelling are busy analysing cookie data and its relationship to non-cookie (intrinsic) factors. The general approach here is to use Machine Learning techniques applied to data sets for cookie users. This will create machine learned behaviors that correlate with cookie data. Applying these learned generalisations to cookie-free users provides an ability to reverse engineer cookie data.

A final point is to note that while any one measure is highly likely to provide a user model/persona that is wildly inaccurate, once you start to combine multiple independent sources of information, information accuracy quickly improves. A good rule of thumb is that combining two independent sources will improve the information by a factor of √2 provided that each carries a similar level of information. Repeating this over and over for a lot of different measures has the potential to eventually create high-accuracy models. Google is clearly well on the path to achieve this and already claims to be 70% as accurate as having a cookie. Much of this has been achieved using conformity behaviour which will be the basis of a future post.

Ways to model ecommerce customers without using cookies

Definition & Background