data-update

Automatic Data Updates With 51Degrees

Keep your detection engine accurate, synchronized, and maintenance-free with intelligent automatic data file updates.

51Degrees Device Detection

In dynamic, real-world environments, device characteristics and detection rules constantly evolve. Browser versions change, new devices enter the market, and User-Agents shift. If your device detection system does not update its underlying data files regularly, it’ll become stale — and misclassifications or missed detections will creep in.

That’s where automatic data file updates in the 51Degrees Pipeline API come in. This feature ensures your On-premise engines stay current with the latest detection data with minimal operational overhead.

What are automatic data file updates?

When you deploy an On-premise Engine inside a Pipeline, each engine depends on one or more data files (e.g. for hardware, browser, capabilities). With automatic updates enabled:

  • The engine is registered to receive updates when a new version of the data file becomes available.
  • When an update is published, the system downloads the new file and refreshes the engine transparently.
  • Optionally, a file system watcher can monitor the data file location; if someone manually replaces the file, the engine is refreshed accordingly.

So, whether an update comes from an authoritative source or someone swaps the file manually, your engine can adapt and stay synchronized.

Key configuration options

When registering a data file for automatic updates, there are a number of configuration parameters you need to think about. Below are the principal ones:

Data update URL / URL formatter

The system must know where to fetch new data files. You can either:

  • Use a static URL, or
  • Use a URL formatter (a pattern) that dynamically produces the correct URL (for example, varying by version or format).

Engines often provide a default formatter; overriding is only needed in special scenarios.

License Key

Some data files are license-protected. In those cases, the update service will include the License Key (often via query parameters) to authenticate the request.

File system watcher

If enabled, the system monitors the local filesystem location of the data file. When the file is replaced (e.g. by an external deployment tool), the engine is refreshed automatically. This is particularly useful in cluster scenarios or when distributing data via network shares.

Polling interval

If the next update date is unknown, the system can periodically poll the update server to check for new releases. This interval is configurable.

If the data file itself states when the next update is expected, polling may be suppressed until that date arrives.

Randomization / Staggering

In a cluster of many nodes, you don’t want all servers to download and refresh simultaneously — that could spike load or contention. The randomization parameter adds a random delay to each node’s update schedule, spreading the work over time.

Temporary file copy & safe replacement

Best practice dictates using a temporary file copy. The engine expects to read from its current file while a new version gets downloaded to a temporary location. When that download completes, the engine is notified to switch. That way, you avoid tearing or partial file reads mid-update.

Data decompression

Many distributions serve data files in compressed formats (e.g. GZipped). The update service can transparently decompress the file after download before handing it off to the engine.

Integrity checks (MD5) & conditional downloads

To prevent corruption or unnecessary traffic:

  • MD5 verification: The service can compare the downloaded file’s MD5 hash with the Content-MD5 header in HTTP responses. By default, this is enabled (if the server supports it).
  • If-Modified-Since: To avoid re-downloading an unchanged file, the HTTP If-Modified-Since header can be sent. If the server responds with “not modified,” no new download occurs. This is often enabled by default.

Considerations & recommendations for large clusters

Automatic updates are powerful—but in environments with many nodes, naive usage can lead to unintended load or hitting service quotas. Here are best practices:

  • The Distributor API (which hosts official data updates) imposes rate or quota limits (e.g. 100 requests per License Key per 30 minutes).
  • If many nodes independently poll, you may exceed those limits. A better pattern is to have one designated updater node (or a small cluster) pull the file, then distribute it internally (via shared storage, push tools, etc.).
  • Use shared file locations (network mount) with file system watching enabled. Nodes then refresh as the shared file is replaced. But beware: if many nodes see the change instantaneously, they might refresh simultaneously - thus causing a spike.
  • To mitigate that, you can:
    • Use staggered deployment (update subsets or alternate locations gradually), or
    • Use multiple shared endpoints to spread out update times.

Enabling the “MaxPerformance” mode is beneficial when using shared files across nodes (so that data is fully loaded into memory rather than streaming).

Example configuration snippet

Here’s a JSON-style example showing how one might configure a device detection engine to auto-update from a self-hosted URL:

{ "BuilderName": "DeviceDetectionHashEngine", "BuildParameters": { "DataFile": "data/TAC-HashV41.hash", "TempDirPath": "data/tmp", "AutoUpdate": true, "DataUpdateOnStartup": true, "UpdatePollingInterval": 14400, "UpdateRandomisationMax": 600, "CreateTempDataCopy": true, "DataUpdateUrl": "https://myhost.net/51Ddatafile", "DataUpdateVerifyMd5": false, "DataUpdateUseUrlFormatter": false, "DataUpdateLicenseKey": "KEY" } }

Key points

  • AutoUpdate: Enables automatic updates
  • DataUpdateUrl: Provides the location to check
  • UpdatePollingInterval & UpdateRandomisationMax: govern timing
  • CreateTempDataCopy: ensures safe replacement
  • DataUpdateVerifyMd5 is disabled here (if your server cannot provide MD5)
  • DataUpdateUseUrlFormatter is false because the URL is static in this example

Also note: once the “next published date” is reached (tracked within the data file metadata), nodes will start checking - even if your URL is static - so supporting If-Modified-Since is wise.

Summary & best practices

Feature Why it matters Recommendation
Auto updates Keeps your engine up to date without manual intervention Enable by default unless your environment disallows it
URL / URL formatting Allows flexibility in where and how you fetch files Use a static URL when simple; use formatter when multiple versions are involved
MD5 & conditional headers Prevent corruption or redundant downloads Enable checks when supported
Randomization Prevents load spikes in clusters Use nonzero randomization bounds
Shared file / push model Scalability and rate-limiting control Use a single updater + internal distribution for large clusters
File watcher Supports manual or push-based file changes Enable when filesystem changes should trigger refreshes

When designed thoughtfully, automatic updates remove a heavy operational burden and ensure your detection logic remains sharp and current — without downtime or stale data.

Need help or have questions?

We understand that every implementation is unique, and setting up automatic data file updates can raise questions. Whether you’re troubleshooting configuration, optimizing for large clusters, or simply exploring best practices, our team is here to help.

Stay ahead of the curve — let 51Degrees handle updates so you can focus on what matters - Switch On Auto‑Updates.