In dynamic, real-world environments, device characteristics and detection rules constantly evolve. Browser versions change, new devices enter the market, and User-Agents shift. If your device detection system does not update its underlying data files regularly, it’ll become stale — and misclassifications or missed detections will creep in.
That’s where automatic data file updates in the 51Degrees Pipeline API come in. This feature ensures your On-premise engines stay current with the latest detection data with minimal operational overhead.
What are automatic data file updates?
When you deploy an On-premise Engine inside a Pipeline, each engine depends on one or more data files (e.g. for hardware, browser, capabilities). With automatic updates enabled:
- The engine is registered to receive updates when a new version of the data file becomes available.
- When an update is published, the system downloads the new file and refreshes the engine transparently.
- Optionally, a file system watcher can monitor the data file location; if someone manually replaces the file, the engine is refreshed accordingly.
So, whether an update comes from an authoritative source or someone swaps the file manually, your engine can adapt and stay synchronized.
Key configuration options
When registering a data file for automatic updates, there are a number of configuration parameters you need to think about. Below are the principal ones:
Data update URL / URL formatter
The system must know where to fetch new data files. You can either:
- Use a static URL, or
- Use a URL formatter (a pattern) that dynamically produces the correct URL (for example, varying by version or format).
Engines often provide a default formatter; overriding is only needed in special scenarios.
License Key
Some data files are license-protected. In those cases, the update service will include the License Key (often via query parameters) to authenticate the request.
File system watcher
If enabled, the system monitors the local filesystem location of the data file. When the file is replaced (e.g. by an external deployment tool), the engine is refreshed automatically. This is particularly useful in cluster scenarios or when distributing data via network shares.
Polling interval
If the next update date is unknown, the system can periodically poll the update server to check for new releases. This interval is configurable.
If the data file itself states when the next update is expected, polling may be suppressed until that date arrives.
Randomization / Staggering
In a cluster of many nodes, you don’t want all servers to download and refresh simultaneously — that could spike load or contention. The randomization parameter adds a random delay to each node’s update schedule, spreading the work over time.
Temporary file copy & safe replacement
Best practice dictates using a temporary file copy. The engine expects to read from its current file while a new version gets downloaded to a temporary location. When that download completes, the engine is notified to switch. That way, you avoid tearing or partial file reads mid-update.
Data decompression
Many distributions serve data files in compressed formats (e.g. GZipped). The update service can transparently decompress the file after download before handing it off to the engine.
Integrity checks (MD5) & conditional downloads
To prevent corruption or unnecessary traffic:
- MD5 verification: The service can compare the downloaded file’s MD5 hash with the Content-MD5 header in HTTP responses. By default, this is enabled (if the server supports it).
- If-Modified-Since: To avoid re-downloading an unchanged file, the HTTP If-Modified-Since header can be sent. If the server responds with “not modified,” no new download occurs. This is often enabled by default.
Considerations & recommendations for large clusters
Automatic updates are powerful—but in environments with many nodes, naive usage can lead to unintended load or hitting service quotas. Here are best practices:
- The Distributor API (which hosts official data updates) imposes rate or quota limits (e.g. 100 requests per License Key per 30 minutes).
- If many nodes independently poll, you may exceed those limits. A better pattern is to have one designated updater node (or a small cluster) pull the file, then distribute it internally (via shared storage, push tools, etc.).
- Use shared file locations (network mount) with file system watching enabled. Nodes then refresh as the shared file is replaced. But beware: if many nodes see the change instantaneously, they might refresh simultaneously - thus causing a spike.
- To mitigate that, you can:
- Use staggered deployment (update subsets or alternate locations gradually), or
- Use multiple shared endpoints to spread out update times.
Enabling the “MaxPerformance” mode is beneficial when using shared files across nodes (so that data is fully loaded into memory rather than streaming).
Example configuration snippet
Here’s a JSON-style example showing how one might configure a device detection engine to auto-update from a self-hosted URL:
{
"BuilderName": "DeviceDetectionHashEngine",
"BuildParameters": {
"DataFile": "data/TAC-HashV41.hash",
"TempDirPath": "data/tmp",
"AutoUpdate": true,
"DataUpdateOnStartup": true,
"UpdatePollingInterval": 14400,
"UpdateRandomisationMax": 600,
"CreateTempDataCopy": true,
"DataUpdateUrl": "https://myhost.net/51Ddatafile",
"DataUpdateVerifyMd5": false,
"DataUpdateUseUrlFormatter": false,
"DataUpdateLicenseKey": "KEY"
}
}
Key points
- AutoUpdate: Enables automatic updates
- DataUpdateUrl: Provides the location to check
- UpdatePollingInterval & UpdateRandomisationMax: govern timing
- CreateTempDataCopy: ensures safe replacement
- DataUpdateVerifyMd5 is disabled here (if your server cannot provide MD5)
- DataUpdateUseUrlFormatter is false because the URL is static in this example
Also note: once the “next published date” is reached (tracked within the data file metadata), nodes will start checking - even if your URL is static - so supporting If-Modified-Since is wise.
Summary & best practices
| Feature | Why it matters | Recommendation |
|---|---|---|
| Auto updates | Keeps your engine up to date without manual intervention | Enable by default unless your environment disallows it |
| URL / URL formatting | Allows flexibility in where and how you fetch files | Use a static URL when simple; use formatter when multiple versions are involved |
| MD5 & conditional headers | Prevent corruption or redundant downloads | Enable checks when supported |
| Randomization | Prevents load spikes in clusters | Use nonzero randomization bounds |
| Shared file / push model | Scalability and rate-limiting control | Use a single updater + internal distribution for large clusters |
| File watcher | Supports manual or push-based file changes | Enable when filesystem changes should trigger refreshes |
When designed thoughtfully, automatic updates remove a heavy operational burden and ensure your detection logic remains sharp and current — without downtime or stale data.
Need help or have questions?
We understand that every implementation is unique, and setting up automatic data file updates can raise questions. Whether you’re troubleshooting configuration, optimizing for large clusters, or simply exploring best practices, our team is here to help.
Stay ahead of the curve — let 51Degrees handle updates so you can focus on what matters - Switch On Auto‑Updates.