This is a transcript of our webinar from 28 September 2022
This blog is an edited transcript of the ‘Learn Client Hints: why is Google messing with the User-Agent header?’ webinar that we hosted on 28 September 2022. The webinar, and this transcript, serves as an extensive history into why Google are reducing the User-Agent in favor of their own creation, User-Agent Client Hints.
This webinar was hosted by James Rosewell and Kirstin.Watch the webinar
Table of contents
- A brief history
- User-Agent header
- The overall digital market
- Privacy Sandbox
- User-Agent Client Hints
- The different headers
- How the information is received
- The data model
- UA-CH pros and cons
- What can be done?
- Code demo
- The future
- What’s changing with the User-Agent?
- Final thoughts
A brief history
51Degrees is a real-time data company specializing in device data keyed for the web, applications, and mobile networks. We identify the device, operating system, browser, and all the features about the product (price at launch, screen size, chipsets, etc.). This is used in optimization and analytics.
We’re also actively involved in other projects; one is SWAN.community which is looking at the future of digital consent, as well as OneKey. Also, we’re part of the W3C, the web standards forum, and we are a founding member of Movement for an Open Web, which has been set up to provide an opportunity for businesses to come together and deal with the competition issues that are impacting digital. We will touch upon those a bit later.
The User-Agent is a string of characters that is sent in every web request. It goes back pretty much to the beginning of the web and it’s an unstructured string of data and characters that help us identify things about the device.
Mozilla/5.0 (Linux; Android 12; Pixel 6) AppleWebKit/537.36 (KHTML,like Gecko) Chrome/93.0.4577.62 Mobile Safari/537.36
You can see here on the first User-Agent, I’ve highlighted Android 12, Pixel 6, Chrome 93; this is information that can be pulled out of that string of characters to be used for optimization in real time or for analytics to help diagnose problems.
Mozilla/5.0 (iPhone; CPU iPhone OS 14_7_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.2 Mobile/15E148 Safari/604.1 (compatible; AdsBot-Google-Mobile; +http://www.google.com/mobile/adsbot.html)
The second User-Agent is used to communicate other features of the request. In this case, it’s a Google Ads bot; you may not want to put ads to this User-Agent if you were a publisher, because you’re not actually communicating with a human.
The User-Agent contains lots of useful information and it’s evolved by convention; things like Mozilla 5.0 at the beginning doesn’t really mean anything, but we can make sense of it. That’s what we do, and many businesses use this to handle fraud, detect crawlers, optimize their experience, analysis etc. A lot of big businesses use these features, including Google themselves, in order to improve performance and delivery for customers.
So, why is the User-Agent changing? What I’m going to do is step back a little bit from the technology in that string of characters and look at what’s happening in the overall digital market.
The overall digital market
There was an article in The Economist looking at the changes that are taking place in advertising. In that article, there was a graph showing the digital ad revenue globally for seven companies.
We can see since 2015, the overall amount of digital ad revenue has been increasing, and Google’s ad revenue is increasing. But if we look at this graph closely, we can see that the amount of revenue is expanding faster in the entire market than Google’s share of that market.
So, why does this relate to privacy? How does this relate to the humble User Agent? If we look at the comparable Apple; Apple implemented an ATT tracking mechanism in the App Store and not on the web.
This graph is from the Financial Times; they analyzed Apple’s share of installs, and you can see the share of total installs for Apple Search Ads increases dramatically after ATT came into effect in April 2021. We can see that something like ATT, which was a change to improve privacy, had a very beneficial effect on Apple’s revenues.
The Privacy Sandbox was set up by Google to explain the changes to the web that they’re making – there’s 23 changes in total, which includes the User-Agent Reduction and User-Agent Client Hints. All are being done in the name of privacy, but the changes are going to influence competition.
When we look at the User-Agent, and this is Google’s position on it, it’s sort of a fingerprint in the background. This is Google using neuro-linguistic programming to make us think of crime or something that’s bad. Google are positioning the User-Agent as problematic because it could be used to form a fingerprinting for a particular device, and that may therefore represent a privacy issue.
But really, this is a part of a set of changes that’s ultimately moving us towards the logged in web; this information that’s used to detect fraud and non-human traffic could be replaced with a log in screen.
We often see a reduction of the number of sites that you’re typing your email address into. It generally is the platform vendors that are included there: Google, Facebook, Apple, and in the case of the New York Times, to smooth access to the site. What we find is this is interrelated to the density access control and it’s trying to get you to be logged in more often.
I would argue that being logged into Google’s services almost all the time, ubiquitously across the web, doesn’t improve privacy. And if there isn’t an option to access the web anonymously then that isn't necessarily good for society.
If we think of a stone arch leading into the walled gardens of big tech companies, that control over identity is the keystone – it's the central stone that keeps the arch up. That’s why a lot of these changes, subtle as they are, are about controlling access to identity and access to services – access to the walled garden. Those companies can get data that the rest of us would not be able to access.
When we come back to these characters in the User-Agent, what Google is saying is the fingerprint is like a game of Guess Who. If you know that someone’s on a particular device or operating system, Google’s argument would be that you could find out who they are. The reality is this just isn't the case.
At best, when you get that information, you would be put into a cohort of people who happen to use that device. And coincidently, cohort is a name that Google uses in one of their other products as part of Privacy Sandbox; Federated Learning of Cohorts (FLOC). They might say this could be used to create a probabilistic identifier (that fingerprint we mentioned earlier), but there’s nothing wrong with fingerprints. They’re not illegal, and in fact, they have a very valuable role when it comes to fraud detection.
But even if we said that a probabilistic identifier is something we wanted to remove and not have in our technology, then I find the argument a little bit disingenuous.
If we investigate the Chromium source code, we can see there’s a HTTP header that Google sends only to their own websites. That code exists because for some activities in Chromium, Google treat their own websites differently, and an example of that is sending the X-Client-Data header. Now, X-Client-Data is sending a pseudo-anonymous identifier, exclusively to Google domains: users and website operators have no control over that.
The argument about pseudo-anonymous identifiers and probabilistic identifiers is one where Google is effectively tilting the balance in their favor. As you saw earlier with the example from Apple, these changes in the name of privacy have a material impact on revenue.
What we would like to see happen is a refocus on technologies that help law enforcement. If someone was using probabilistic identifiers, (whether that’s a massive tech company, a gateway to the web, or a small company) and they were using them inappropriately, then why are we not developing technologies to identify that and help law enforcement?
We would like to see a more balanced and rational debate. The narrative would change to focus on the harm to people and the overall needs of society. For example, plurality of the media or reducing the risk of harm when working online.
Just before we return to the User-Agent; last week, we had a splendid few days in Cologne for DMEXCO and I was just staggered at the number of B2B businesses (which by the very definition are third parties in relation to the users accessing the web browser), who were promoting this narrative around privacy and how technologies need to be restricted. It felt very much like turkeys voting for Christmas.
User-Agent Client Hints
What is changing about the string of characters in the User-Agent that we saw earlier? There’s no longer going to be a single string of characters that is kept accurate and up to date; the data that is contained within them is going to move across to something called User-Agent Client Hints (UA-CH). The key thing is that at least until 2028, no data is being removed, it's just the way that you request it and the way that it is sent is going to change.
The different headers
We're going to see new fields that are going to be sent alongside the User-Agent for mobile, model, platform (Android, Linux, Windows, etc.), the version of the platform, the architecture of the CPU, and then the full version information:
- Sec-CH-UA-Full-Version (note: Google has since deprecated this field, and are urging developers to use Sec-CH-UA-Full-Version-List instead)
This may seem quite sensible because we now have a little more structure to the data that's coming through. At 51Degrees, we've taken the complexity out of these new headers, but there's a couple of things you just need to understand about the concepts first. One is that this information is not always easy to digest.
If we take the information returned from
Sec-CH-UA using our Client Hints
tester, we can see there's some information in there like “Not A;Brand”;v=”99” – there's a whole load of distractors that
are being added in just to make the data harder to consume. We take care of this
complexity for you.
How the information is received
On the left-hand side, we have the process for the User-Agent. A request comes from a web browser to your web server and the User-Agent is sent in that request. The web server immediately knows what type of device it's dealing with and can send a response; if it's a crawler, for example, we immediately know we shouldn't be serving any advertising because it's not a human. This is the way it's been for decades.
What we now see for User-Agent Client Hints is a more complex exchange. The browser sends its request (the default UA-CH) and the server looks at the request. The User-Agent is still there, but the information that's needed is being frozen. So, the server then requests additional UA-CH headers. It's only on the next request that that information starts to come across from the browser; we have two requests that are needed to get the same information contained in the User-Agent.
That leads us to a problem around latency. I mentioned earlier Movement for an Open Web; we're a founding member and the only company that publicly acknowledges its membership, therefore we get to talk with the CMA (Competition and Markets Authority) about latency.
We put together a website to demonstrate the latency issue to the CMA. What it does is measure the latency associated with obtaining UA-CH information. Typically, there’s going to be a delay of around 100 milliseconds. Sometimes, we've seen figures over 200 milliseconds.
Let's say it's averaging out at about 100 milliseconds; performance really matters. We help our customers improve the performance of their sites through real-time optimization, and this performance is dependent on the time to first render. If you want to get access to the UA-CH information, there is going to be a delay – we find that concerning. We've done our best to work with this change that Google is putting forward, but the latency issue is sadly present.
The data model
The other change is to the data model. For the last 20 years, we've had a single string of characters – the User-Agent. It's not perfect but it's proved remarkably successful over the years. What we're now finding is with UA-CH, there are variable number of fields. If you have a data model for storing this information that has a single field, you’ve now got to add more.
When it comes to dependent standards of interoperability, like OpenRTB which is used to deal with programmatic advertising, then those data models need to change. All of this becomes a distraction tax on the rest of the industry and many of those organizations are competitors to Google. It's easy for Google to make these changes but perhaps not so easy for the rest of the world to come together and agree on how to adapt to Google’s proposals.
UA-CH pros and cons
From Google's perspective, there's a claim that UA-CH improves privacy. We think that isn’t quite as it seems: Google continues to have access to all this information and more, but Google knowing everything about people doesn't actually improve privacy. Additionally, the format of UA-CH is slightly easier for developers to consume as it deals with some of the structure problems. But of course, it creates several other issues around the data model and latency.
I should also say that no attempt has been made to standardize any of this: it's purely discussions on unofficial drafts and experimental ideas. No other browser is showing any signs of supporting UA-CH, so they’re splintering the web and adding increasing complexity for those working in the ecosystem.
One final observation is that UA-CH are only ever sent over HTTPS. Anyone who's looked at the energy use of the web would understand that sending large frameworks, for example, that are the same across multiple websites over HTTPS simply increases the amount of data that is sent and received, but that's one for another day.
What can be done?
I mentioned the Competition and Markets Authority earlier. If you have any concerns about User-Agent Client Hints, you can write to the CMA at firstname.lastname@example.org; they're listening to what's going on in the industry.
UA-CH is not the headline associated with the Privacy Sandbox (that of course is the third-party cookie), but you can also raise issues directly to Google. It’s byzantine in terms of how to get an issue in but I would say their GitHub is as good as any place to raise your concerns. Or of course, you could join Movement for an Open Web.
Code demoWatch the code demo
Let’s go to configure.51degrees.com; I've already gone through and selected some properties. We get a piece of code that we can just cut and paste into our development environment; I'm using .NET. I've commented out the User-Agent Client Hints section and left the User-Agent in there to show you what data we get back when we just have the User-Agent to work with.
We can see in this case, we've got a User-Agent returning Windows 10. If we now enable some of the User-Agent Client Hints headers (in this case the platform and the platform version number) and rerun the demo, we see Windows 11 being returned. This is because Windows 11 sends User-Agent Client Hints and has already degraded the User-Agent.
This was a cloud demo, but it works the same for our on-premise version. You've got this concept of evidence that you pass into the device detection library; if you're only passing across the User-Agent, you're going to need to add those other User Agent Client Hints field names to make the device detection as accurate as possible. There's a lot of information on the website about how to do this.
You can go to configure.51degrees.com to have a play with this; it's a three-step process and you can use your existing License Keys if you’re a customer or use the trials if you’re new to 51Degrees.
We are halfway through the rollout of User-Agent Client Hints and the User-Agent Reduction. The next two big dates are October 2022 and February 2023.
In October, the desktop User-Agents are starting to reduce the amount of information that is available. Then from February, we have the retirement of the model information on the Android mobile and tablet User-Agent. Finally, from April 2023, the User-Agent will be fully reduced.
Of course, we don't control this timeline. We'll continue to keep you up to date as to what's going on but the key things we're concerned about is the latency and particularly what happens from February, which is quite a significant change from that point onwards.
What’s changing with the User-Agent?
Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/184.108.40.206 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/220.127.116.11 Safari/537.36
From October 2022, the platform version starts to change; it becomes fixed and is the same across all platforms. You won’t be able to tell what the platform version is within the User-Agent.
Mozilla/5.0 (Linux; Android 9; SM A205U) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/18.104.22.168 Mobile Safari/537.36
Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/22.214.171.124 Mobile Safari/537.36
From February, we will start to see the information about the model becoming “K”, which doesn't really help us determine whether it's a new and modern phone or an old and slow phone, for example.
Version 4 of 51Degrees was released in June 2020 with support for User-Agent Client Hints. We’ve made amendments to the product as Google made changes to UA-CH over the last two years. So, Version 4 has been out there for some time, but now is the time to look at upgrading.Upgrade to Version 4
Can you use the User-Agent at the same time as using User-Agent Client Hints, or do you have to choose one?
The User-Agent is going to continue to be transmitted by Chrome and Chromium browsers. We haven't received any information from Google to say that the User Agent is going to be removed, but the direction of travel is to remove it from the ecosystem. Therefore, everyone should be starting to become reliant on User Agent Client Hints as for now, the two will be running alongside each other.
Can you talk more about the CMA who are monitoring Google’s changes?
The Competition and Markets Authority in July 2019 started a year-long investigation into digital markets; they were the first to do such a thorough investigation into digital markets.
They published their report in July 2020 and they concluded that the total expenditure on digital advertising is costing every household in the United Kingdom around £500 pounds a year. So, if digital marketing didn't take place, every household would save £500. Of course, digital marketing has got a huge role to play, but I’m making the point about what the cost of society is.
The CMA worked with Google to get a voluntary set of commitments that bind Google globally until February 2028, or until the CMA is satisfied that the competition concerns they have with Privacy Sandbox do not impact competition in the digital market. (I would say deliberately producing proposals that give you control over identity for example, and access to content that drives people to log in all the time, those are not good for competition.)
It was February 2022 that these commitments were entered into, and Google are reporting on a calendar quarter basis. We would expect to have the third report out by the end of October.
What impact would a V3 customer have with the introduction of User-Agent Client Hints?
The big change is you don't just send User-Agents anymore, you can add lots of User-Agent Client Hints. UA-CH is a big change technically which is why we created Version 4. We've worked to streamline the public interface across the different implementations for the on-premise users, but for many V3 customers, moving to our cloud service would be a sensible thing to do (on-premise still makes a lot of sense particularly for those high-volume AdTech environments, for example).
Is there a reason why the browser would send default User-Agent Client Hints and only request more UA-CH headers in subsequent requests?
Google are effectively saying that some Client Hints are riskier than others: a low-risk Client Hint might be the operating system version or the major version of the browser, a higher risk Client Hint might be the build number of the browser, or the model information associated with the device.
I’ve asked Google countless times: what is the justification for this? But everything comes back to theoretical harm. Yes, it might be possible to identify someone by exposing this higher risk information, but there's zero evidence that that's happening in practice. There's this arbitrary classification based on theoretical risk, but a very small number of browser engineers working in a bubble have decided this should be done.
Is there a way to force the browser to send all the User-Agent Client Hints at once to lower the latency?
In the first request, you don’t get all the User-Agent Client Hints, so you send a header back called Accept-CH. With Accept-CH, you're saying these are the Client Hints that I want, so you're not having to go back and forward for each individual Client Hint.
There's also a header called Critical-CH. It really is very difficult to understand what the difference is between the two, other than supposedly the presence of Critical-CH gets the browser to start the second requests sooner. But considering User-Agents, even on a slow device, that request exchange is going to happen very quickly.
If your question hasn't been answered, use the contact form on our website and we'll answer your question.
We've tried to put as much as we can in one place covering all the different angles of User-Agent Client Hints and the Privacy Sandbox. We ask you to keep talking about your concerns on the Privacy Sandbox, and let your voice be heard.