Part
3
  |  
Seeing the World
  |  
Chapter
14

Face Detection on the Pi

The assumption that machine learning on a $35 computer is too slow to be useful tells you more about the assumption than about the computer.
Reading Time
12
mins
BACK TO RASPBERRY PI MASTERCLASS

The trap is reaching for the heaviest model first. You read about deep neural networks, YOLO, and transformer-based detectors, and you assume that's what "real" face detection looks like. So you download a 200 MB model, load it onto a Pi with 2 GB of RAM, wait eight seconds for inference on a single frame, and conclude that the Pi isn't powerful enough for computer vision.

It is. You just started at the wrong end of the toolbox.

Haar cascades — the face detection method OpenCV has shipped since 2001 — run at 15–25 FPS on a Pi 4 at 640x480 with no GPU, no neural network runtime, and no model file larger than 1 MB. They're not state-of-the-art. They don't handle extreme angles or heavy occlusion. But for the cases that matter most — a face looking roughly toward the camera, in reasonable lighting — they work. And they work fast enough to be useful in real time on hardware that costs less than a restaurant meal.

The best model for your Pi project is not the most accurate model. It's the most accurate model that runs within your frame budget.

What Haar Cascades Actually Are

A Haar cascade is a trained classifier that scans an image with a sliding window at multiple scales, applying a sequence of increasingly complex feature tests to each window position. If a region fails any test in the cascade, it's immediately rejected. Most of the image is rejected in the first few tests, which is why the algorithm is fast — it spends almost no time on background regions.

The "Haar" refers to Haar-like features — simple rectangular patterns that measure contrast differences. A horizontal two-rectangle feature, for instance, measures whether the top half of a region is brighter than the bottom half (useful for detecting foreheads above eye sockets). A vertical two-rectangle feature checks left-right contrast. The cascade is a sequence of these tests, organized so that cheap, coarse tests run first and expensive, precise tests only run on regions that pass the early filters.

The training process examines thousands of positive samples (images containing faces) and negative samples (images without faces) to learn which Haar features, at which positions and scales, best discriminate faces from background. The result is an XML file — a trained cascade — that OpenCV can load and apply.

You don't train Haar cascades — you use pretrained ones

OpenCV ships with pretrained cascades for faces, eyes, full bodies, upper bodies, and profile faces. Training your own requires thousands of annotated samples and hours of computation. For face detection, the pretrained haarcascade_frontalface_default.xml handles the vast majority of use cases. Train a custom cascade only if you're detecting non-standard objects (specific industrial parts, custom markers).

Framework · The Cascade Ladder · Start light, graduate when you must

Start with Haar cascades for every detection task on the Pi. They're fast, lightweight, and good enough for 80% of face detection scenarios. Graduate to DNN-based models only when you hit the accuracy ceiling — profile faces, extreme angles, very small faces in crowds. Don't start with the heaviest model because a tutorial told you Haar cascades are "old."

Your First Face Detector

The complete face detection pipeline in OpenCV is about 20 lines of code. Every line matters:

import cv2

# Step 1: Load the pretrained cascade
cascade_path = cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(cascade_path)

# Verify it loaded (empty classifier if path is wrong)
if face_cascade.empty():
    print("Error: cascade file not found")
    exit(1)

# Step 2: Load and prepare the image
img = cv2.imread('/home/pi/test_photo.jpg')
if img is None:
    print("Error: image not found")
    exit(1)

# Step 3: Convert to grayscale (cascades require single-channel input)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Step 4: Detect faces
faces = face_cascade.detectMultiScale(
    gray,
    scaleFactor=1.1,
    minNeighbors=5,
    minSize=(30, 30)
)

print(f"Found {len(faces)} face(s)")

# Step 5: Annotate results on a copy
display = img.copy()
for (x, y, w, h) in faces:
    cv2.rectangle(display, (x, y), (x + w, y + h), (0, 255, 0), 2)
    cv2.putText(display, "Face", (x, y - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

# Step 6: Save the annotated result
cv2.imwrite('/home/pi/detected_faces.jpg', display)

The cv2.data.haarcascades path points to wherever OpenCV installed its pretrained cascade files. This path works on every platform — don't hardcode /usr/share/opencv4/ or any other system-specific location.

Key takeaway

The face detection pipeline is five operations: load cascade, load image, convert to grayscale, run detectMultiScale, draw results on a copy. Master this sequence — it's the same for every cascade-based detector.

Understanding detectMultiScale Parameters

detectMultiScale is where the engineering happens. Three parameters control the speed-accuracy tradeoff, and understanding them is the difference between a detector that works and one that fires on every shadow.

faces = face_cascade.detectMultiScale(
    gray,
    scaleFactor=1.1,    # How much to shrink the image at each scale
    minNeighbors=5,     # How many neighbors each candidate needs to survive
    minSize=(30, 30)    # Minimum face size in pixels
)

scaleFactor — the image is resized by this factor at each step of the multi-scale scan. A value of 1.1 means the image shrinks by 10% at each step. Smaller values (1.05) scan more scales, catching more faces but running slower. Larger values (1.3) scan fewer scales, running faster but missing faces between scales. On a Pi, 1.1 is the sweet spot for 640x480 input.

minNeighbors — the minimum number of overlapping detections a region needs before it's reported as a face. At each scale, the sliding window generates many overlapping positive detections around a real face. minNeighbors=5 means at least 5 of these overlapping windows must agree. Lower values (1–3) catch more faces but produce more false positives. Higher values (6–8) are more conservative.

minSize — the smallest face the detector will look for, in pixels. Setting this eliminates false positives from tiny image regions and speeds up detection by skipping small scales. If you know faces in your frame will be at least 80x80 pixels, set minSize=(80, 80) and save significant processing time.

minNeighbors is your false-positive knob. Turn it up when you're getting phantom faces in textured backgrounds. Turn it down when you're missing real faces in challenging lighting.

Here's a practical tuning script that lets you see the effect of each parameter:

import cv2
import time

cascade_path = cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(cascade_path)

img = cv2.imread('/home/pi/group_photo.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Test different parameter combinations
configs = [
    {"scaleFactor": 1.05, "minNeighbors": 3, "label": "Sensitive"},
    {"scaleFactor": 1.1,  "minNeighbors": 5, "label": "Balanced"},
    {"scaleFactor": 1.2,  "minNeighbors": 7, "label": "Conservative"},
]

for cfg in configs:
    start = time.perf_counter()
    faces = face_cascade.detectMultiScale(
        gray,
        scaleFactor=cfg["scaleFactor"],
        minNeighbors=cfg["minNeighbors"],
        minSize=(30, 30)
    )
    elapsed = (time.perf_counter() - start) * 1000

    display = img.copy()
    for (x, y, w, h) in faces:
        cv2.rectangle(display, (x, y), (x + w, y + h), (0, 255, 0), 2)

    cv2.imwrite(f'/home/pi/faces_{cfg["label"].lower()}.jpg', display)
    print(f'{cfg["label"]}: {len(faces)} faces, {elapsed:.1f} ms')
Start balanced, then tune for your environment

scaleFactor=1.1, minNeighbors=5 is the starting point, not the endpoint. Every deployment environment — lighting, camera angle, typical face distance — needs its own calibration. Run the tuning script in your actual environment and adjust until the false positive rate and the miss rate are both acceptable.

Available Pretrained Cascades

OpenCV ships with cascades for more than just frontal faces:

import cv2
import os

# List all available cascades
cascade_dir = cv2.data.haarcascades
for f in sorted(os.listdir(cascade_dir)):
    if f.endswith('.xml'):
        print(f)

The ones you'll actually use:

  • haarcascade_frontalface_default.xml — general-purpose frontal face detection. Start here.
  • haarcascade_frontalface_alt2.xml — alternative training with slightly different accuracy characteristics. Try it if the default misses faces in your specific setup.
  • haarcascade_eye.xml — eye detection. Run it inside a detected face region, not on the full frame.
  • haarcascade_profileface.xml — side-profile face detection. Much less accurate than frontal, but useful for counting people walking past a camera.
  • haarcascade_fullbody.xml — full-body pedestrian detection. Works best when people are at least 200 pixels tall in the frame.

You can stack cascades. A common pattern: detect faces first, then within each face region, detect eyes to confirm it's actually a face (and not a pattern in a textured wall that happens to look face-like):

face_cascade = cv2.CascadeClassifier(
    cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
)
eye_cascade = cv2.CascadeClassifier(
    cv2.data.haarcascades + 'haarcascade_eye.xml'
)

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.1, 5)

confirmed_faces = []
for (x, y, w, h) in faces:
    face_roi = gray[y:y+h, x:x+w]
    eyes = eye_cascade.detectMultiScale(face_roi, 1.1, 3)

    if len(eyes) >= 2:
        confirmed_faces.append((x, y, w, h))

print(f"Confirmed faces (with eyes): {len(confirmed_faces)}")

      Real-Time Face Detection from Camera

      The culmination of everything in Part 3 — a live face detection system running on a Pi:

      import cv2
      import time
      from picamera2 import Picamera2
      
      # Initialize camera
      picam2 = Picamera2()
      config = picam2.create_video_configuration(
          main={"size": (640, 480), "format": "RGB888"}
      )
      picam2.configure(config)
      picam2.start()
      time.sleep(2)  # Let auto-exposure settle
      
      # Load cascade
      cascade_path = cv2.data.haarcascades + 'haarcascade_frontalface_default.xml'
      face_cascade = cv2.CascadeClassifier(cascade_path)
      
      # FPS tracking
      frame_count = 0
      fps_start = time.perf_counter()
      current_fps = 0.0
      
      try:
          while True:
              # Capture
              frame = picam2.capture_array()
              bgr = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
      
              # Process (on clean frame)
              gray = cv2.cvtColor(bgr, cv2.COLOR_BGR2GRAY)
              faces = face_cascade.detectMultiScale(gray, 1.1, 5, minSize=(60, 60))
      
              # Annotate (on copy)
              display = bgr.copy()
              for (x, y, w, h) in faces:
                  cv2.rectangle(display, (x, y), (x + w, y + h), (0, 255, 0), 2)
      
                  label = f"Face ({w}x{h})"
                  (tw, th), _ = cv2.getTextSize(
                      label, cv2.FONT_HERSHEY_SIMPLEX, 0.5, 1
                  )
                  cv2.rectangle(
                      display, (x, y - th - 8), (x + tw + 4, y), (0, 255, 0), -1
                  )
                  cv2.putText(
                      display, label, (x + 2, y - 4),
                      cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 0), 1, cv2.LINE_AA
                  )
      
              # FPS overlay
              frame_count += 1
              elapsed = time.perf_counter() - fps_start
              if elapsed >= 1.0:
                  current_fps = frame_count / elapsed
                  frame_count = 0
                  fps_start = time.perf_counter()
      
              cv2.putText(
                  display, f"FPS: {current_fps:.1f} | Faces: {len(faces)}",
                  (10, 25), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2
              )
      
              # Save latest annotated frame (for headless monitoring)
              cv2.imwrite('/home/pi/latest_detection.jpg', display)
      
      except KeyboardInterrupt:
          print(f"\nStopped. Last FPS: {current_fps:.1f}")
      
      finally:
          picam2.stop()
          picam2.close()
      

      On a Pi 4 (4 GB) at 640x480, this pipeline runs at 18–25 FPS depending on how many faces are in the frame. That's real-time performance — fast enough for a doorbell camera, a visitor counter, an access control system, or a privacy filter that blurs faces in a video stream.

      Key takeaway

      Haar cascades running on a Pi 4 at 640x480 deliver 18–25 FPS face detection — fast enough for production use cases that don't require sub-frame latency. Start here. Graduate to heavier models only when this accuracy ceiling isn't sufficient.

      Limitations and When to Graduate

      Haar cascades are fast and lightweight, but they have clear boundaries:

      • Angle sensitivity — frontal cascades work best within roughly ±30 degrees of head rotation. Beyond that, detection drops off sharply. Profile cascades extend coverage but don't handle arbitrary angles.
      • Lighting dependence — cascades rely on contrast patterns. In very low light or harsh directional lighting, the contrast features that define "face" break down. Histogram equalization (cv2.equalizeHist()) can help:
      gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
      equalized = cv2.equalizeHist(gray)  # Normalize contrast
      faces = face_cascade.detectMultiScale(equalized, 1.1, 5)
      
      • False positives on texture — complex textures (brick walls, bookshelves, patterned fabrics) can trigger false detections. Increase minNeighbors or add eye-confirmation as a second pass.
      • No identity — cascades detect faces, not people. They can't tell you whose face it is. For recognition, you need a different pipeline entirely (face embeddings, not cascades).
      The 80/20 point

      Haar cascades handle the first 80% of face detection use cases — frontal faces, reasonable lighting, known distance range. If your project falls in that 80%, you're done. If you need the other 20% — extreme angles, tiny faces in crowds, face recognition — the next step is OpenCV's DNN module with a pretrained SSD or MTCNN model. That's a different chapter and a different performance budget.

      I've seen this pattern where teams skip cascades entirely and go straight to a deep learning face detector because it scores higher on academic benchmarks. On a Pi, that deep model runs at 2–3 FPS. The cascade runs at 20 FPS. For a doorbell camera that needs to decide "is someone at the door" within a fraction of a second, the cascade wins every time — not because it's more accurate per frame, but because it processes ten times more frames. More frames means more chances to detect, which often translates to higher effective accuracy in real-world conditions.

      On constrained hardware, the model that runs ten times faster often delivers better real-world results than the model that's ten percent more accurate per frame.

      What to Do Monday Morning

      Run the basic face detector on a photo

      Save a photo with 1–3 visible faces to your Pi. Run the face detection script from this chapter. Verify it detects the correct number of faces. If it misses one, lower minNeighbors to 3. If it finds phantom faces, raise it to 7.

      Tune parameters with the benchmarking script

      Run the parameter tuning script on a group photo with at least 5 faces. Compare the "Sensitive," "Balanced," and "Conservative" configurations. Note the detection count, the false positive count, and the processing time for each. Choose the configuration that fits your use case.

      Build the eye-confirmation pipeline

      Implement the face + eye stacking pattern. Run it on the same photo and compare the confirmed face count to the face-only count. This is your false-positive filter for environments with complex backgrounds.

      Run real-time detection from the camera

      Deploy the full real-time detection script with your Pi Camera or USB webcam. Walk in and out of frame. Note the detection latency — how quickly the green box appears when you enter the frame and disappears when you leave. Measure the FPS with the counter.

      Test the limits

      Try to break the detector. Turn your head sideways. Cover half your face. Move to the far end of the room. Dim the lights. Document where it fails — that boundary is the line between "Haar cascade is enough" and "I need a heavier model." For most Pi projects, you won't cross that line.

      Add histogram equalization for low light

      Run the detector in a dimly lit room with and without cv2.equalizeHist(). Compare detection rates. Equalization is a single-line addition that can rescue your pipeline in challenging lighting without switching to a heavier model.