\r\n

51Degrees Device Detection Python  4.4

Device Detection services for 51Degrees Pipeline

onpremise/offlineprocessing.py

Provides an example of processing a YAML file containing evidence for device detection. There are 20,000 examples in the supplied file of evidence representing HTTP Headers. For example:

1 header.user-agent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'
2 header.sec-ch-ua: '" Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"'
3 header.sec-ch-ua-full-version: '"98.0.4758.87"'
4 header.sec-ch-ua-mobile: '?0'
5 header.sec-ch-ua-platform: '"Android"'

We create a device detection pipeline to read the data and find out about the associated device, we write this data to a YAML formatted output stream.

As well as explaining the basic operation of off line processing using the defaults, for advanced operation this example can be used to experiment with tuning device detection for performance and predictive power using Performance Profile, Graph and Difference and Drift settings.

This example is available in full on GitHub.

This example requires a local data file. The free 'Lite' data file can be acquired by pulling the git submodules under this repository (run `git submodule update --recursive`) or from the device-detection-data GitHub repository.

The Lite data file is only used for illustration, and has limited accuracy and capabilities. Find out about the more capable data files that are available on our pricing page

Required PyPi Dependencies:

  • fiftyone_devicedetection
  • pyyaml
1 # *********************************************************************
2 # This Original Work is copyright of 51 Degrees Mobile Experts Limited.
3 # Copyright 2019 51 Degrees Mobile Experts Limited, 5 Charlotte Close,
4 # Caversham, Reading, Berkshire, United Kingdom RG4 7BY.
5 #
6 # This Original Work is licensed under the European Union Public Licence (EUPL)
7 # v.1.2 and is subject to its terms as set out below.
8 #
9 # If a copy of the EUPL was not distributed with this file, You can obtain
10 # one at https://opensource.org/licenses/EUPL-1.2.
11 #
12 # The 'Compatible Licences' set out in the Appendix to the EUPL (as may be
13 # amended by the European Commission) shall be deemed incompatible for
14 # the purposes of the Work and the provisions of the compatibility
15 # clause in Article 5 of the EUPL shall not apply.
16 #
17 # If using the Work as, or as part of, a network application, by
18 # including the attribution notice(s) required under Article 5 of the EUPL
19 # in the end user terms of the application under an appropriate heading,
20 # such notice(s) shall fulfill the requirements of that article.
21 # ********************************************************************
22 
23 
52 
53 from pathlib import Path
54 import sys
55 import yaml
56 from fiftyone_devicedetection.devicedetection_pipelinebuilder import DeviceDetectionPipelineBuilder
57 from fiftyone_devicedetection_examples.example_utils import ExampleUtils
58 from fiftyone_pipeline_core.logger import Logger
59 
60 # In this example, by default, the 51degrees "Lite" file needs to be
61 # somewhere in the project space, or you may specify another file as
62 # a command line parameter.
63 #
64 # Note that the Lite data file is only used for illustration, and has
65 # limited accuracy and capabilities.
66 # Find out about the Enterprise data file on our pricing page:
67 # https://51degrees.com/pricing
68 LITE_V_4_1_HASH = "51Degrees-LiteV4.1.hash"
69 
70 # This file contains the 20,000 most commonly seen combinations of header values
71 # that are relevant to device detection. For example, User-Agent and UA-CH headers.
72 EVIDENCE = "20000 Evidence Records.yml"
73 
74 class OfflineProcessing():
75  def run(self, data_file, evidence_yaml, logger, output):
76  """!
77  Process a YAML representation of evidence - and create a YAML output containing
78  the processed evidence.
79  @param data_file: The path to the device detection data file
80  @param evidence_yaml: File containing the yaml representation of the evidence to process
81  @param logger: Logger to use within the pipeline
82  @param output: Output file to write results to
83  """
84 
85  # In this example, we use the DeviceDetectionPipelineBuilder
86  # and configure it in code. For more information about
87  # pipelines in general see the documentation at
88  # http://51degrees.com/documentation/4.3/_concepts__configuration__builders__index.html
89  pipeline = DeviceDetectionPipelineBuilder(
90  data_file_path = data_file,
91  # We use the low memory profile as its performance is
92  # sufficient for this example. See the documentation for
93  # more detail on this and other configuration options:
94  # http://51degrees.com/documentation/4.3/_device_detection__features__performance_options.html
95  # http://51degrees.com/documentation/4.3/_features__automatic_datafile_updates.html
96  # http://51degrees.com/documentation/4.3/_features__usage_sharing.html
97  performance_profile = "LowMemory",
98  # inhibit sharing usage for this test, usually this
99  # should be set "true"
100  # In general, off line processing usage should NOT be shared back to 51Degrees.
101  # This is because it will not contain the full set of information that is
102  # required by our data processing back-end and will be discarded.
103  # If you specifically want to share data that is being processed off line
104  # in order to help us improve detection of new devices/browsers/etc, then
105  # this additional data will need to be collected and included as evidence
106  # to the Pipeline. See
107  # https://51degrees.com/documentation/_features__usage_sharing.html#Low_Level_Usage_Sharing
108  # for more details on this.
109  usage_sharing = False,
110  # Inhibit auto-update of the data file for this example
111  auto_update = False,
112  licence_keys = "").add_logger(logger).build()
113 
114  records = 0
115  yaml_data = yaml.safe_load_all(evidence_yaml)
116  # Keep going as long as we have more document records.
117  for evidence in yaml_data:
118  # Output progress.
119  records = records + 1
120  if (records % 100 == 0):
121  logger.log("info", f"Processed {records} records")
122 
123  # write the yaml document separator
124  print("---", file = output)
125  # Pass the record to the pipeline as evidence so that it can be analyzed
126  headers = {}
127  for key in evidence:
128  headers[f"header.{key}"] = evidence[key]
129 
130  self.analyseEvidence(headers, pipeline, output)
131  # write the yaml document end marker
132  print("...", file = output)
133 
134  ExampleUtils.check_data_file(pipeline, logger)
135 
136  def analyseEvidence(self, evidence, pipeline, output):
137  # FlowData is a data structure that is used to convey information required for
138  # detection and the results of the detection through the pipeline.
139  # Information required for detection is called "evidence" and usually consists
140  # of a number of HTTP Header field values, in this case represented by a
141  # dictionary of header name/value entries.
142  data = pipeline.create_flowdata()
143  # Add the evidence values to the flow data
144  data.evidence.add_from_dict(evidence)
145  # Process the flow data.
146  data.process()
147 
148  device = data.device
149 
150  values = {}
151  # Add the evidence values to the output
152  for key in evidence:
153  values[key] = evidence[key]
154  # Now add the values that we want to store against the record.
155  values["device.ismobile"] = device.ismobile.value() if device.ismobile.has_value() else "Unknown"
156  values["device.platformname"] = ExampleUtils.get_human_readable(device, "platformname")
157  values["device.platformversion"] = ExampleUtils.get_human_readable(device, "platformversion")
158  values["device.browsername"] = ExampleUtils.get_human_readable(device, "browsername")
159  values["device.browserversion"] = ExampleUtils.get_human_readable(device, "browserversion")
160  # DeviceId is a unique identifier for the combination of hardware, operating
161  # system, browser and crawler that has been detected.
162  # Our device detection solution uses machine learning to find the optimal
163  # way to identify devices based on the real-world evidence values that we
164  # observe each day.
165  # As this changes over time, the result of detection can potentially change
166  # as well. By storing the device id, we can use this as a lookup in future
167  # rather than performing detection with the original evidence again.
168  # Do this by passing an evidence entry with:
169  # key = query.51D_ProfileIds
170  # value = [the device id]
171  # This is much faster and avoids the potential for getting a different
172  # result.
173  values["device.deviceid"] = ExampleUtils.get_human_readable(device, "deviceid")
174  yaml.dump(values, output)
175 
176 def main(argv):
177  # Use the supplied path for the data file or find the lite
178  # file that is included in the repository.
179  data_file = argv[0] if len(argv) > 0 else ExampleUtils.find_file(LITE_V_4_1_HASH)
180  # Do the same for the yaml evidence file.
181  evidence_file = argv[1] if len(argv) > 1 else ExampleUtils.find_file(EVIDENCE)
182  # Finally, get the location for the output file. Use the same location as the
183  # evidence if a path is not supplied on the command line.
184  output_file = argv[2] if len(argv) > 2 else Path.joinpath(Path(evidence_file).absolute().parent, "offline-processing-output.yml")
185 
186  # Configure a logger to output to the console.
187  logger = Logger(min_level="info")
188 
189  if (data_file != None):
190  with open(output_file, "w") as output:
191  with open(evidence_file, "r") as input:
192  OfflineProcessing().run(data_file, input, logger, output)
193  logger.log("info",
194  f"Processing complete. See results in: '{output_file}'")
195  else:
196  logger.log("error",
197  "Failed to find a device detection data file. Make sure the " +
198  "device-detection-data submodule has been updated by running " +
199  "`git submodule update --recursive`.")
200 
201 if __name__ == "__main__":
202  main(sys.argv[1:])