IoT Edge Inference: Lessons from Building FaceCheckIn
What I learned designing an edge-to-cloud attendance system on Raspberry Pi — covering inference latency, API contracts, and the realities of running ML on constrained hardware.
Background
FaceCheckIn was a university course project that turned into one of the most technically demanding systems I have built. The goal was straightforward: use a Raspberry Pi to automate attendance and capture anonymous emotional state data. The execution involved coordinating four engineers across hardware, backend, mobile, and AI — with all layers meeting at a shared REST API contract.
This post covers what I learned about running inference on constrained hardware and designing systems where latency variability is a first-class concern.
The Architecture
The system has four components:
- Raspberry Pi (Edge) — captures images, calls the cloud API
- Django Backend (Cloud) — receives images, runs inference, stores results in PostgreSQL
- Flutter Mobile App — users see real-time check-in status
- Admin Web Dashboard — attendance reports and analytics
The edge device does not run inference locally. Instead, it captures a 6-frame burst, sends the images to the cloud API, and waits for the result. This was a deliberate tradeoff: the Raspberry Pi's compute constraints make local model execution too slow for an interactive system. Cloud inference sacrifices some latency budget but runs on real hardware.
Inference Workflow
For each check-in:
- Raspberry Pi captures 6 images in quick succession
- A primary frame is selected and sent to the identity recognition endpoint (face matching)
- All 6 frames are sent to the emotion recognition endpoint
- The backend returns: identity match + aggregated emotion probabilities
- Results are stored in PostgreSQL for later analysis
Identity Recognition
Identity recognition uses a primary frame because a single clear image is sufficient for face matching. Running matching on 6 frames would be slower and add no meaningful accuracy improvement for this use case. The model uses PyTorch, and inference in a local prototype setup measured approximately 1 second end-to-end.
Emotion Recognition
Emotion recognition uses all 6 frames for a different reason: micro-expressions are brief and a single frame might capture a transitional expression that does not represent the subject's actual emotional state. Averaging probabilities across 6 frames produces a more stable result. This adds latency — approximately 2–4 seconds in prototype testing.
The Latency Problem
Here is the core challenge with IoT edge systems: latency is highly variable. A system that feels instant in a controlled lab environment can feel broken in a real deployment.
Our prototype ran the backend on a local laptop, which gave us:
- Identity recognition: ~1s
- Emotion recognition: ~2–4s
- Total check-in: ~3–5s
In production with a cloud-hosted backend, add:
- Network round trips
- Load variance
- Cold start time for the model (if not pre-loaded)
We mitigated this by designing the system to handle waiting gracefully. The mobile app shows a "processing" spinner rather than pretending the result is instant. The user interface acknowledges the latency rather than hiding it.
API Contract Design
The trickiest engineering decision was designing the API contract between the Raspberry Pi and the backend. The Pi is running a simple Python client — it cannot tolerate server failures gracefully if the contract is implicit.
We defined explicit API contracts upfront:
POST /api/checkin/identify/
{
"image": "<base64-encoded primary frame>",
"user_id": "<candidate user ID>"
}
→ { "match": true, "confidence": 0.94, "identity_id": "..." }
POST /api/checkin/emotion/
{
"images": ["<frame1>", ..., "<frame6>"],
"checkin_id": "<from identify response>"
}
→ { "dominant_emotion": "neutral", "probabilities": {...} }
By defining the contract before writing any code, the hardware engineer and backend engineer could work in parallel. Integration at the end required no renegotiation.
Docker for Reproducibility
The backend is fully containerized with Docker. This turned out to be critical for the project. During development, each team member ran the backend locally, and the consistent environment meant we never hit "works on my machine" failures.
For integration testing, we ran the full backend stack — Django, PostgreSQL, the PyTorch inference module — in
a single docker compose up. The hardware team could test the Pi against a local backend without depending on
a shared staging environment.
What I Would Do Differently
1. Plan for model warm-up. Cold model inference — loading weights from disk on the first request — added several seconds of latency that surprised us. Pre-loading the model at server startup eliminated this.
2. Measure latency continuously. We measured latency once during early development. In practice, it varied more than expected. Continuous latency logging with a simple middleware would have caught regressions earlier.
3. Use WebSockets for status updates. The mobile app currently polls for check-in status. WebSockets would have been cleaner and given better perceived responsiveness. We planned this as the next iteration.
4. Define error responses in the API contract. We defined the happy-path contract carefully but underspecified error responses. When the Pi sent a blurry or over-exposed image, the API returned a generic 500 that the Pi client could not interpret meaningfully.
Conclusion
Building FaceCheckIn taught me that IoT systems are as much about failure modes and latency expectations as they are about the core feature. The interesting engineering is in the seams between layers — the contract between edge and cloud, the tradeoff between local and remote inference, the user experience that hides complexity.
These are the problems that interest me most in my graduate research: distributed systems where correctness and reliability are harder than the feature itself.