51Degrees IP Intelligence now includes Diversity properties. These are a measure of the diversity of the data that is returned for a given IP address. Depending on the data, its diversity can indicate any number of aspects.
For example, a high diversity of geographic locations for an IP address could indicate that the address is part of a large pool of dynamic IP addresses. A high diversity of Device types could indicate that the IP address is being used by a cellular access point, or some variety of VPN or proxy.
Diversity properties include HardwareDiversity, PlatformDiversity, and BrowserDiversity. Each indicates the relative diversity of profiles for an IP address for the Hardware, Platform, and Browser components respectively. The device profiles are determined using the 51Degrees Device Detection.
Some examples
Below are some example IPs illustrating different diversity values which are seen in the real world at the time of writing. Tap the linked IP address to see the latest values.
| IP Address | Browser Diversity | Hardware Diversity | Platform Diversity |
|---|---|---|---|
| 2 | 2 | 2 | |
| 7 | 4 | 5 | |
| 10 | 4 | 10 |
Background
Many different formulas were considered before arriving at the final implementation. A few factors are important in the choice of formula:
- The total number of web events observed for an IP range
- The number of unique profiles observed for an IP range
- The distribution of web events across the unique profiles
- Values for individual IPs vs a whole IP range
- Change in profiles over time
The implementation takes into account all of the above in a way that means a value is not influenced too heavily by any one factor. For example, a large total count, or unique profiles which are only observed a few times, will not result in a high diversity value.
Formula
The starting point is the Diversity for a single IP, which is:
where each p is a profile observed from the IP (e.g. a Platform profile for Windows 10), Cₚ is the average of events observed for the profile, and C̅ₚ is the average for the IP.
The min function is a way of ensuring minor outliers do not have too much influence on the final value. So profiles with an above average count have a contribution of 1 to the final value, while profiles with a below average count have a contribution of less than 1 which diminishes with the difference from average.
As an example of this effect:
- 3 profiles with a count of 100 each result in a Diversity of 3
- 1 profile with a count of 100 results in a Diversity of 1
- 1 profile with a count of 100 and 2 profiles with a count of 10 each results in a Diversity of around 1.2, not 3, because the two profiles with a count of 10 have a contribution of around 0.1 each.
which gives a better representation of the actual source data, and separates case 1 from case 3.
The Diversity for an IP range is then calculated as the average of Diversity values for the individual observed IPs in the range:
or, more completely:
To include a more complete set of data, the values of Cₚ are actually sample over a number of days. To avoid any change in profiles over time becoming more significant than the Distribution at a fixed point in time, single web events are weighted linearly with age. So a web event observed 10 days ago has a weight of 0.1, while a web event observed today has a weight of 1. This means that the Diversity values are more stable over time, and not unduly influenced by legitimate changes.
So Cₚ can be expressed as:
and C̅ₚ as:
where M is the number of days over which the data is sampled, n is the age in days of an event, i is a single web event, and Nₚ is the number of profiles.
With this, we can write the complete formula for Diversity for an IP range as:
Distribution
Most IP ranges have a low diversity, while the high diversity ranges are in the minority. This is what we would expect, as most individual IP ranges are used by a small number of devices, while a small number of IP ranges are used by a large number of devices.