\r\n

51Degrees Device Detection Python  4.4

Device Detection services for 51Degrees Pipeline

onpremise/offlineprocessing.py

Provides an example of processing a YAML file containing evidence for device detection. There are 20,000 examples in the supplied file of evidence representing HTTP Headers. For example:

1 header.user-agent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'
2 header.sec-ch-ua: '" Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"'
3 header.sec-ch-ua-full-version: '"98.0.4758.87"'
4 header.sec-ch-ua-mobile: '?0'
5 header.sec-ch-ua-platform: '"Android"'

We create a device detection pipeline to read the data and find out about the associated device, we write this data to a YAML formatted output stream.

As well as explaining the basic operation of off line processing using the defaults, for advanced operation this example can be used to experiment with tuning device detection for performance and predictive power using Performance Profile, Graph and Difference and Drift settings.

This example is available in full on GitHub.

This example requires a local data file. The free 'Lite' data file can be acquired by pulling the git submodules under this repository (run `git submodule update --recursive`) or from the device-detection-data GitHub repository.

The Lite data file is only used for illustration, and has limited accuracy and capabilities. Find out about the more capable data files that are available on our pricing page

Required PyPi Dependencies:

1 # *********************************************************************
2 # This Original Work is copyright of 51 Degrees Mobile Experts Limited.
3 # Copyright 2023 51 Degrees Mobile Experts Limited, Davidson House,
4 # Forbury Square, Reading, Berkshire, United Kingdom RG1 3EU.
5 #
6 # This Original Work is licensed under the European Union Public Licence
7 # (EUPL) v.1.2 and is subject to its terms as set out below.
8 #
9 # If a copy of the EUPL was not distributed with this file, You can obtain
10 # one at https://opensource.org/licenses/EUPL-1.2.
11 #
12 # The 'Compatible Licences' set out in the Appendix to the EUPL (as may be
13 # amended by the European Commission) shall be deemed incompatible for
14 # the purposes of the Work and the provisions of the compatibility
15 # clause in Article 5 of the EUPL shall not apply.
16 #
17 # If using the Work as, or as part of, a network application, by
18 # including the attribution notice(s) required under Article 5 of the EUPL
19 # in the end user terms of the application under an appropriate heading,
20 # such notice(s) shall fulfill the requirements of that article.
21 # *********************************************************************
22 
23 
52 
53 from pathlib import Path
54 import sys
55 from fiftyone_devicedetection.devicedetection_pipelinebuilder import DeviceDetectionPipelineBuilder
57 from fiftyone_pipeline_core.logger import Logger
58 from fiftyone_devicedetection_shared.example_constants import LITE_DATAFILE_NAME
59 from fiftyone_devicedetection_shared.example_constants import EVIDENCE_FILE_NAME
60 from ruamel.yaml import YAML
61 
62 class OfflineProcessing():
63  def run(self, data_file, evidence_yaml, logger, output):
64  """!
65  Process a YAML representation of evidence - and create a YAML output containing
66  the processed evidence.
67  @param data_file: The path to the device detection data file
68  @param evidence_yaml: File containing the yaml representation of the evidence to process
69  @param logger: Logger to use within the pipeline
70  @param output: Output file to write results to
71  """
72 
73  # In this example, we use the DeviceDetectionPipelineBuilder
74  # and configure it in code. For more information about
75  # pipelines in general see the documentation at
76  # http://51degrees.com/documentation/4.3/_concepts__configuration__builders__index.html
77  pipeline = DeviceDetectionPipelineBuilder(
78  data_file_path = data_file,
79  # We use the low memory profile as its performance is
80  # sufficient for this example. See the documentation for
81  # more detail on this and other configuration options:
82  # http://51degrees.com/documentation/4.3/_device_detection__features__performance_options.html
83  # http://51degrees.com/documentation/4.3/_features__automatic_datafile_updates.html
84  # http://51degrees.com/documentation/4.3/_features__usage_sharing.html
85  performance_profile = "LowMemory",
86  # inhibit sharing usage for this test, usually this
87  # should be set "true"
88  # In general, off line processing usage should NOT be shared back to 51Degrees.
89  # This is because it will not contain the full set of information that is
90  # required by our data processing back-end and will be discarded.
91  # If you specifically want to share data that is being processed off line
92  # in order to help us improve detection of new devices/browsers/etc, then
93  # this additional data will need to be collected and included as evidence
94  # to the Pipeline. See
95  # https://51degrees.com/documentation/_features__usage_sharing.html#Low_Level_Usage_Sharing
96  # for more details on this.
97  usage_sharing = False,
98  # Inhibit auto-update of the data file for this example
99  auto_update = False,
100  licence_keys = "").add_logger(logger).build()
101 
102  records = 0
103  yaml = YAML()
104  yaml_data = yaml.load_all(evidence_yaml)
105 
106  try:
107  # Keep going as long as we have more document records.
108  for evidence in yaml_data:
109  # Output progress.
110  records = records + 1
111  if (records % 100 == 0):
112  logger.log("info", f"Processed {records} records")
113 
114  # write the yaml document separator
115  print("---", file = output)
116  # Pass the record to the pipeline as evidence so that it can be analyzed
117  headers = {}
118  for key in evidence:
119  headers[f"header.{key}"] = evidence[key]
120 
121  self.analyseEvidence(headers, pipeline, output, yaml)
122  except BaseException as err:
123  # We can't read the evidence values, so cant write them to the output. Will just
124  # have to skip this entry.
125  logger.log("error", err)
126 
127  # write the yaml document end marker
128  print("...", file = output)
129 
130  ExampleUtils.check_data_file(pipeline, logger)
131 
132  def analyseEvidence(self, evidence, pipeline, output, yaml):
133  # FlowData is a data structure that is used to convey information required for
134  # detection and the results of the detection through the pipeline.
135  # Information required for detection is called "evidence" and usually consists
136  # of a number of HTTP Header field values, in this case represented by a
137  # dictionary of header name/value entries.
138  data = pipeline.create_flowdata()
139  # Add the evidence values to the flow data
140  data.evidence.add_from_dict(evidence)
141  # Process the flow data.
142  data.process()
143 
144  device = data.device
145 
146  values = {}
147  # Add the evidence values to the output
148  for key in evidence:
149  values[key] = evidence[key]
150  # Now add the values that we want to store against the record.
151  values["device.ismobile"] = device.ismobile.value() if device.ismobile.has_value() else "Unknown"
152  values["device.platformname"] = ExampleUtils.get_human_readable(device, "platformname")
153  values["device.platformversion"] = ExampleUtils.get_human_readable(device, "platformversion")
154  values["device.browsername"] = ExampleUtils.get_human_readable(device, "browsername")
155  values["device.browserversion"] = ExampleUtils.get_human_readable(device, "browserversion")
156  # DeviceId is a unique identifier for the combination of hardware, operating
157  # system, browser and crawler that has been detected.
158  # Our device detection solution uses machine learning to find the optimal
159  # way to identify devices based on the real-world evidence values that we
160  # observe each day.
161  # As this changes over time, the result of detection can potentially change
162  # as well. By storing the device id, we can use this as a lookup in future
163  # rather than performing detection with the original evidence again.
164  # Do this by passing an evidence entry with:
165  # key = query.51D_ProfileIds
166  # value = [the device id]
167  # This is much faster and avoids the potential for getting a different
168  # result.
169  values["device.deviceid"] = ExampleUtils.get_human_readable(device, "deviceid")
170  yaml.dump(values, output)
171 
172 def main(argv):
173  # In this example, by default, the 51degrees "Lite" file needs to be
174  # somewhere in the project space, or you may specify another file as
175  # a command line parameter.
176  #
177  # Note that the Lite data file is only used for illustration, and has
178  # limited accuracy and capabilities.
179  # Find out about the Enterprise data file on our pricing page:
180  # https://51degrees.com/pricing
181  data_file = argv[0] if len(argv) > 0 else ExampleUtils.find_file(LITE_DATAFILE_NAME)
182  # This file contains the 20,000 most commonly seen combinations of header values
183  # that are relevant to device detection. For example, User-Agent and UA-CH headers.
184  evidence_file = argv[1] if len(argv) > 1 else ExampleUtils.find_file(EVIDENCE_FILE_NAME)
185  # Finally, get the location for the output file. Use the same location as the
186  # evidence if a path is not supplied on the command line.
187  output_file = argv[2] if len(argv) > 2 else Path.joinpath(Path(evidence_file).absolute().parent, "offline-processing-output.yml")
188 
189  # Configure a logger to output to the console.
190  logger = Logger(min_level="info")
191 
192  if (data_file != None):
193  with open(output_file, "w") as output:
194  with open(evidence_file, "r") as input:
195  OfflineProcessing().run(data_file, input, logger, output)
196  logger.log("info",
197  f"Processing complete. See results in: '{output_file}'")
198  else:
199  logger.log("error",
200  "Failed to find a device detection data file. Make sure the " +
201  "device-detection-data submodule has been updated by running " +
202  "`git submodule update --recursive`.")
203 
204 if __name__ == "__main__":
205  main(sys.argv[1:])