As stakeholders in the web and following discussions with colleagues, clients and industry professionals I've prepared a summary of questions and observations concerning your proposed changes to the established usage of the HTTP User-Agent header field. This usage has been established since the beginning of the web and the purpose of the field is noted in the documentation of HTTP from 1990s.
While anybody must be concerned about possibly compromising the privacy of the users of the web we feel that the nature and extent of those compromises needs to be more clearly established. The effect of the proposed changes is to damage quite seriously the user experience enjoyed by users on a significant number of websites that require the User-Agent to be in its present form.
Further we are most concerned about the possible damaging effects on the accessibility of those sites and the availability of those sites in developing nations as a result of the proposed changes.
Here follows a number of more detailed comments and we'd be most grateful for your feedback on these comments.
- Where is the evidence to support the primary assertion that the User-Agent is an 'abundant source of passive fingerprinting information'?
Suggestion: A very large web site or network provider using session tracking - such as first party cookies -would be able to provide the analysis needed. Group a large cohort of web sessions by IP and User-Agent and count the number of sessions. Is the relation 1 to 1 or 1 to many? If 1 to many, how many? Does it vary by different formats of User-Agent? Is the analysis sufficient to validate the assertion?
- In response to your second assertion User-Agent field values are poorly structured. It is our experience that a high degree of accuracy can be achieved from the current state of these strings.
We are not the only practitioners. Experts in the field of device identification and therefore User-Agents strings include:
51Degrees (this authors company 'we')
Device Atlas (owned by Afilias)
ScientiaMobile via the WURFL solution.
Between us, support is provided to millions of web sites including blue chip businesses as varied as Google, Facebook, Oracle, eBay, Amazon, Disney, Tencent, IBM, Naspers, Verizon and countless others many of which operate outside North America and Europe.
We provide free and open source solutions. Access to first request device information is essential to the services they support. We have not been consulted about this change.
- Has thought been given to the impact on non-technical web professionals such as marketing professionals who use analytics tools? As a result of the proposed changes their website may break. As a minimum their web site analytics will be incorrect. They will report incorrect information and form invalid conclusions. This will be frustrating to them and waste countless days head scratching.
- Have network operators been consulted about the deployment of IPv6 and networks in general? The fingerprinting argument depends at least 50% on the IP address. Depending on deployment a client IP address could persist with a device forever; effectively forming a unique identifier. Frequently changing the client IP address will lead to inefficiencies for network operators.
- Has the change been validated with web site owners large and small who operate in many different markets and geographies, many of whom make extensive use of the User-Agent to deliver their services? How do developing countries differ to north America and Europe?
Consider news services from Naspers and Tencent targeting Asia and Africa. These services use the information obtained from the User-Agent in the very first request to optimise content ensuring feature phones, powerful smartphones and budget smartphones all receive optimised content cost effectively. Many consumers outside north America and Western Europe do not enjoy 'all you can eat' data plans or high-end electronics. They need this feature to receive cost effective and optimised news.
Facebook Zero inspired many businesses to operate this 'stripped down' approach. They tend not to be covered in the English language technology press or outside their home markets.
From an engineering perspective optimising content on the web server at first request to reduce the amount of data transmitted and the computation the user agent needs to perform is very sensible. The client and web server work more harmoniously together rather than the server being a dumb file server. Making the information needed available in the first request is essential to enable this performance improvement. The User-Agent isn't ideally formatted, but it works.
- Does the process of agreeing this include a sufficiently broad cross section of engineers and web professionals? For example the User-Agent is used extensively within the HTTP protocol by CDNs and within edge networks both with and without SSL. Such engineers are key stakeholders in such a change. Should the IEFT be consulted? Is this not part of HTTP 3?
- Is Safari a good comparator for this performance degrading change? Users are skewed towards higher socio-economic groups, high end chipsets and specific geographic markets. Chromium is the ubiquitous foundation of the web browser globally. It's used with $10 MediaTek A53 chips to the most powerful Qualcomm SoCs.
- Is Firefox a good comparator? In many cases Firefox is not bundled with Android by default. It is installed by a user who wishes to benefit from its features. As such a typical user is more privacy aware. As Chromium is the default for most Android devices it should support the widest set of requirements out of the box. Perhaps Google will offer a choice of web browser when Android is first setup? Microsoft did a similar thing in the past with Windows.
- Is this a feature that should be up to the browser vendor to form a view on rather than sit within the Chromium project? Brave already take one approach, Microsoft another. Privacy conscious users can easily switch between browsers.
- Public support seems to be in the form of other web browser engineers via a series of tweets and git issues posts operating within a 'bubble' or 'echo chamber'. This tweet is cited as support (https://twitter.com/_scottlow/status/1206831008261132289). Is this sufficient support for such a wide-reaching change impacting almost every digital citizen and business?
- What is the view of 'minority' browser vendors who benefit extensively via marketing and analytics by appending their browser details to the User-Agent string? What is the view of Android device vendors who wish to be able to gain insight into device usage?
- Even if a large company has the resources to make the necessary changes within the timescales proposed what are the changes required to maintain the current fine grain of optimisation enjoyed by their users? The majority of small and medium sized businesses and other non-profit web site operators do not. They are all impacted.
They are not of the size and scale of the relatively small number of AdTech / tracking businesses who need to adapt to cookie changes. These companies were given 2 years notice.
This change impacts every website operator and owner. It is impractical to assume the vast majority can or want to make changes to their sites.
- The AdTech business, of which Google is a major player, have spent a long time producing the standards used to communicate between each other in real time. Has an assessment been performed on the impact to OpenRTB and AdCom where User-Agent is the primary method of validating and unifying device information?
- Where a change is needed there is an obvious solution? Consider the Y2K problem. Like this proposed change it impacted every business. The solution was clear; add two digits to the year and test everything.
In this case there are many use cases where a clear alternative does not exist. Consider the previously mentioned web site adapting on first request to the device with a granularity beyond just desktop or mobile. This change will remove this possibility and new designs will be required.
Or the web professionals wishing to understand why performance is better or worse by different users or web sessions. The IETF HTTP 1.1 specification states 'This is for statistical purposes, the tracing of protocol violations, and automated recognition of user agents for the sake of tailoring responses to avoid particular user agent limitations.'
21 years later this is exactly how it's being used. So widely the Chromium.org web site uses the User-Agent to adapt its content.
Have the IETF been consulted? Should this change be made by the IETF within the HTTP 3 specification?
- Is feature detection really a good solution? If it were why does Chromium.org make use of the User-Agent? It's not a very complex web site.
- If 90% of web sites are making use of the User-Agent, as cited in the post - why change anything? It clearly works.
- Given the dominance of Google services, and Google's control over Chromium, the change will benefit Google economically by removing information from the wider web. Only Google via its dominance in Search, Android and the Play Store will understand device and browser usage. Will Google be making this data available in aggregate as open data? If not, then there exists an argument concerning abuse of dominate market position that Google would be left open to. Has this been considered?
There are a lot of implications to this proposal which do not as yet appear to have been thought through. We sincerely hope the governance model for such a change considers all stakeholders, particularly those who benefit from optimised content in developing countries and do not engage in these discussions.
James Rosewell - For self and 51Degrees
Please find the response here