51Degrees Pipeline Documentation  4.1

Automatic Data File Updates

Introduction

On-premise engines added to a Pipeline can have their data files registered for automatic updates. When there is a new version of a data file available, it will be downloaded and the on-premise engine refreshed with it. The file location where the data file was loaded from can also be monitored for changes, so when a data file is manually replaced, the on-premise engine will be refreshed with it.

Registering for Updates

All data files added to an on-premise engine have the option to enable automatic updates. By enabling this, the data file is automatically registered when the on-premise engine is added to a Pipeline.

A data file can also be manually registered for automatic updates by registering with the data update service. This works in exactly the same way as if it was registered by the pipeline builder, but in most cases is not necessary as the pipeline builder will do this.

Configuration

There are a number of configuration options available when registering a data file for automatic updates, which specify when and how the data file is updated.

Data Update URL

To download a new data file when one becomes available, the data update service must have a URL to download it from. This can be a constant URL, or a URL formatter can be used to dynamically generate the URL based on other options.

License Keys

A license key may be required when downloading certain types of data file. The data update service uses the license key, in combination with a URL formatter, to ensure the data file is only made available to licensed users.

File Watcher

The location of the data file in the file system can be monitored by enabling the file system watcher. If the data file changes, then the data update service will be called to refresh the on-premise engine using that file. This can be useful when distributing data files to a local cluster.

Polling Interval

The polling interval tells the data update service the frequency with which to check for the availability of a new data file when the expected date is not known. If the data file itself provides the date when the next update is expected, then the data update service will not check for updates at all until after that date is passed.

Randomization

In large clusters of servers, it is beneficial to stagger an update. If all servers download a new data file and refresh at the same time, a service's overall performance can be affected. To prevent this, the randomization option enables a random time interval to be added to the time at which the new data file is downloaded. For example, if there are 10 servers, and a full download and refresh takes around 10 seconds, it is sensible to set the randomization to above 10 seconds. In this case, there should only be one server updating at any one time.

URL Formatter

Where an on-premise engine needs to download a data file from a URL which is not constant, a URL formatter is used. On-premise engines generally provide the correct URL formatter automatically, but the option to override this is available.

URL formatters are necessary in many cases where multiple data files are available for an on-premise engine. For example, the required format or version of the data file may need to be specified as a parameter in the URL. This is handled by the URL formatter by looking at the current data file to see what is needed.

Temporary File

It is good practice to set a data file to be copied to a temporary location for use by an on-premise engine. This means that whatever mode the file is being used in (e.g. in memory or streamed from file) an update can occur smoothly.

By setting the on-premise engine to use a temporary file location, the original data file is free to be changed by the autoupdateservice. Once the file has been replaced, the on-premise engine will be informed and manage the removal of the temporary file and creation of a new one.

Decompression

Data files are often served as GZipped content from their download URL to minimize the amount of data which needs to be downloaded. When this is the case, the data update service will unzip the data file before carrying on with the process.

Usually an on-premise engine will set this option, along with the URL/URL formatter. But if an alternative URL has been set, then this option may need to be overridden too.

Verify MD5

A server will often provide the MD5 hash of the data file which it has served in the 'Content-MD5' response header. This can then be checked against that which was actually downloaded to ensure the integrity of the data file. By default this is usually enabled, however not all download servers support it.

Verify 'If-Modified-Since'

Unnecessary downloads can be prevented by providing the download server with an 'If-Modified-Since' HTTP header. If this option is enabled (which it is by default for most on-premise engines) the 'If-Modified-Since' header will be set to the date at which the current data file was last modified. If there is not a newer data file on the server then the service will not attempt to download a file.