Hardware Γ Software Β· Notes From Production
Talking to the Gate: Syncing ZKTeco Biometric Hardware With a Django Backend
Cloud software lives in a clean world of JSON and retries. Biometric gate hardware lives in a world of fixed-length packets, flaky network switches, and devices that power-cycle at the worst moment. The Centralized Gate Pass System had to make those two worlds agree on a single question: is this person allowed through this gate, right now?
## Two sources of truth, one verification table
The ZKTeco devices hold their own user and fingerprint records locally so a gate keeps working if the network drops. The cloud holds the authoritative roster. The danger is drift: a revoked pass that still opens a gate because the device never got the memo. The whole design centers on keeping the `dop_verification` table and the on-device records converging, not diverging.
## Push, don't hope
Early on I leaned on the device's own polling. It was unpredictable. The reliable pattern was to treat the backend as the source of intent and push changes to devices, then confirm:
```
def sync_user(device, user):
device.set_user(user.pin, user.name, privilege=user.role)
device.set_fingerprint(user.pin, user.template)
ack = device.read_user(user.pin) # read back to confirm
Verification.objects.update_or_create(
pin=user.pin, device=device,
defaults={"synced": ack == user, "last_sync": now()},
)
```
The read-back is the part that turns "I sent it" into "it is actually there."
## Idempotency and retries, because devices disappear
Network calls to hardware fail constantly β a switch reboots, a cable is bumped. Every sync operation is idempotent so it can be retried safely, and failures go into a queue rather than crashing the request:
```
@shared_task(bind=True, max_retries=5)
def sync_user_task(self, device_id, user_id):
try:
sync_user(get_device(device_id), get_user(user_id))
except DeviceUnreachable as exc:
raise self.retry(exc=exc, countdown=2 ** self.request.retries)
```
Exponential backoff means a device that's down for thirty seconds doesn't generate a thousand failed calls.
## Keeping the verification database honest
Optimizing `dop_verification` was half indexing and half discipline: index the columns every gate-check filters on (pin, device, active), and run a reconciliation job that compares device records against the table and flags mismatches for review. The gate's job is to be fast and correct; the reconciliation job's job is to make sure "correct" stays true overnight.
## The takeaway
Integrating physical hardware taught me to design for the failure, not the demo. The happy path β device online, network up β writes itself. The system earns its keep in how it behaves when the gate's switch reboots mid-sync.