Provides an example of processing a YAML file containing evidence for device detection. There are 20,000 examples in the supplied file of evidence representing HTTP Headers. For example:

1 header.user-agent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'

2 header.sec-ch-ua: '" Not A;Brand";v="99", "Chromium";v="98", "Google Chrome";v="98"'

3 header.sec-ch-ua-full-version: '"98.0.4758.87"'

4 header.sec-ch-ua-mobile: '?0'

5 header.sec-ch-ua-platform: '"Android"'

We create a device detection pipeline to read the data and find out about the associated device, we write this data to a YAML formatted output stream.

As well as explaining the basic operation of off line processing using the defaults, for advanced operation this example can be used to experiment with tuning device detection for performance and predictive power using Performance Profile, Graph and Difference and Drift settings.

This example is available in full on GitHub.

This example requires a local data file. The free 'Lite' data file can be acquired by pulling the git submodules under this repository (run `git submodule update --recursive`) or from the device-detection-data GitHub repository.

The Lite data file is only used for illustration, and has limited accuracy and capabilities. Find out about the more capable data files that are available on our pricing page

Required PyPi Dependencies:

fiftyone_devicedetection
ruamel

1 # *********************************************************************

2 # This Original Work is copyright of 51 Degrees Mobile Experts Limited.

4 # Forbury Square, Reading, Berkshire, United Kingdom RG1 3EU.

5 #

6 # This Original Work is licensed under the European Union Public Licence

7 # (EUPL) v.1.2 and is subject to its terms as set out below.

8 #

9 # If a copy of the EUPL was not distributed with this file, You can obtain

10 # one at https://opensource.org/licenses/EUPL-1.2.

11 #

12 # The 'Compatible Licences' set out in the Appendix to the EUPL (as may be

13 # amended by the European Commission) shall be deemed incompatible for

14 # the purposes of the Work and the provisions of the compatibility

15 # clause in Article 5 of the EUPL shall not apply.

16 #

17 # If using the Work as, or as part of, a network application, by

18 # including the attribution notice(s) required under Article 5 of the EUPL

19 # in the end user terms of the application under an appropriate heading,

20 # such notice(s) shall fulfill the requirements of that article.

21 # *********************************************************************

22

23

52

53 from pathlib import Path

54 import sys

55 from fiftyone_devicedetection.devicedetection_pipelinebuilder import DeviceDetectionPipelineBuilder

56 from fiftyone_devicedetection_examples.example_utils import ExampleUtils

57 from fiftyone_pipeline_core.logger import Logger

58 from fiftyone_devicedetection_shared.example_constants import LITE_DATAFILE_NAME

59 from fiftyone_devicedetection_shared.example_constants import EVIDENCE_FILE_NAME

60 from ruamel.yaml import YAML

61

62 class OfflineProcessing():

63 def run(self, data_file, evidence_yaml, logger, output):

64 """!

65 Process a YAML representation of evidence - and create a YAML output containing

66 the processed evidence.

67 @param data_file: The path to the device detection data file

68 @param evidence_yaml: File containing the yaml representation of the evidence to process

69 @param logger: Logger to use within the pipeline

70 @param output: Output file to write results to

71 """

72

73 # In this example, we use the DeviceDetectionPipelineBuilder

74 # and configure it in code. For more information about

75 # pipelines in general see the documentation at

76 # https://51degrees.com/documentation/_concepts__configuration__builders__index.html

77 pipeline = DeviceDetectionPipelineBuilder(

78 data_file_path = data_file,

79 # We use the low memory profile as its performance is

80 # sufficient for this example. See the documentation for

81 # more detail on this and other configuration options:

82 # https://51degrees.com/documentation/_device_detection__features__performance_options.html

83 # https://51degrees.com/documentation/_features__automatic_datafile_updates.html

84 # https://51degrees.com/documentation/_features__usage_sharing.html

85 performance_profile = "LowMemory",

86 # inhibit sharing usage for this test, usually this

87 # should be set "true"

88 # In general, off line processing usage should NOT be shared back to 51Degrees.

89 # This is because it will not contain the full set of information that is

90 # required by our data processing back-end and will be discarded.

91 # If you specifically want to share data that is being processed off line

92 # in order to help us improve detection of new devices/browsers/etc, then

93 # this additional data will need to be collected and included as evidence

94 # to the Pipeline. See

95 # https://51degrees.com/documentation/_features__usage_sharing.html#Low_Level_Usage_Sharing

96 # for more details on this.

97 usage_sharing = False,

98 # Inhibit auto-update of the data file for this example

99 auto_update = False,

100 licence_keys = "").add_logger(logger).build()

101

102 records = 0

103 yaml = YAML()

104 yaml_data = yaml.load_all(evidence_yaml)

105

106 try:

107 # Keep going as long as we have more document records.

108 for evidence in yaml_data:

109 # Output progress.

110 records = records + 1

111 if (records % 100 == 0):

112 logger.log("info", f"Processed {records} records")

113

114 # write the yaml document separator

115 print("---", file = output)

116 # Pass the record to the pipeline as evidence so that it can be analyzed

117 headers = {}

118 for key in evidence:

119 headers[f"header.{key}"] = evidence[key]

120

121 self.analyseEvidence(headers, pipeline, output, yaml)

122 except BaseException as err:

123 # We can't read the evidence values, so cant write them to the output. Will just

124 # have to skip this entry.

125 logger.log("error", err)

126

127 # write the yaml document end marker

128 print("...", file = output)

129

130 ExampleUtils.check_data_file(pipeline, logger)

131

132 def analyseEvidence(self, evidence, pipeline, output, yaml):

133 # FlowData is a data structure that is used to convey information required for

134 # detection and the results of the detection through the pipeline.

135 # Information required for detection is called "evidence" and usually consists

136 # of a number of HTTP Header field values, in this case represented by a

137 # dictionary of header name/value entries.

138 data = pipeline.create_flowdata()

139 # Add the evidence values to the flow data

140 data.evidence.add_from_dict(evidence)

141 # Process the flow data.

142 data.process()

143

144 device = data.device

145

146 values = {}

147 # Add the evidence values to the output

148 for key in evidence:

149 values[key] = evidence[key]

150 # Now add the values that we want to store against the record.

151 values["device.ismobile"] = device.ismobile.value() if device.ismobile.has_value() else "Unknown"

152 values["device.platformname"] = ExampleUtils.get_human_readable(device, "platformname")

153 values["device.platformversion"] = ExampleUtils.get_human_readable(device, "platformversion")

154 values["device.browsername"] = ExampleUtils.get_human_readable(device, "browsername")

155 values["device.browserversion"] = ExampleUtils.get_human_readable(device, "browserversion")

156 # DeviceId is a unique identifier for the combination of hardware, operating

157 # system, browser and crawler that has been detected.

158 # Our device detection solution uses machine learning to find the optimal

159 # way to identify devices based on the real-world evidence values that we

160 # observe each day.

161 # As this changes over time, the result of detection can potentially change

162 # as well. By storing the device id, we can use this as a lookup in future

163 # rather than performing detection with the original evidence again.

164 # Do this by passing an evidence entry with:

165 # key = query.51D_ProfileIds

166 # value = [the device id]

167 # This is much faster and avoids the potential for getting a different

168 # result.

169 values["device.deviceid"] = ExampleUtils.get_human_readable(device, "deviceid")

170 yaml.dump(values, output)

171

172 def main(argv):

173 # In this example, by default, the 51degrees "Lite" file needs to be

174 # somewhere in the project space, or you may specify another file as

175 # a command line parameter.

176 #

177 # Note that the Lite data file is only used for illustration, and has

178 # limited accuracy and capabilities.

179 # Find out about the Enterprise data file on our pricing page:

180 # https://51degrees.com/pricing

181 data_file = argv[0] if len(argv) > 0 else ExampleUtils.find_file(LITE_DATAFILE_NAME)

182 # This file contains the 20,000 most commonly seen combinations of header values

183 # that are relevant to device detection. For example, User-Agent and UA-CH headers.

184 evidence_file = argv[1] if len(argv) > 1 else ExampleUtils.find_file(EVIDENCE_FILE_NAME)

185 # Finally, get the location for the output file. Use the same location as the

186 # evidence if a path is not supplied on the command line.

187 output_file = argv[2] if len(argv) > 2 else Path.joinpath(Path(evidence_file).absolute().parent, "offline-processing-output.yml")

188

189 # Configure a logger to output to the console.

190 logger = Logger(min_level="info")

191

192 if (data_file != None):

193 with open(output_file, "w") as output:

194 with open(evidence_file, "r") as input:

195 OfflineProcessing().run(data_file, input, logger, output)

196 logger.log("info",

197 f"Processing complete. See results in: '{output_file}'")

198 else:

199 logger.log("error",

200 "Failed to find a device detection data file. Make sure the " +

201 "device-detection-data submodule has been updated by running " +

202 "`git submodule update --recursive`.")

203

204 if __name__ == "__main__":

205 main(sys.argv[1:])

51Degrees Device Detection Python 4.4

onpremise/offlineprocessing.py