Inside of a fridge with various condiments and foods. Green labels showing the possible detection of kangaroos.

Kangaroos in my fridge! A Machine Learning Adventure

In 2018, as I embarked on my journey to develop machine vision models, I found myself working with a unique dataset – segmented images of kangaroos. My goal was to create a model capable of detecting and subsequently counting the number of kangaroos in a given region. The images I used were snapshots of the Australian outback, captured while I was precariously hanging out of a helicopter.

After spending several days separating the kangaroos from the background in these images, I was ready to test a variety of new Deep Neural Network (DNN) architectures. My aim was to identify the most effective model for my specific needs. I set my TensorFlow pipeline in motion to train six different base models from scratch, along with a few variations of each.

However, during the validation cycle, I made a significant error. Instead of using the set of images that had been set aside for validation, I inadvertently used a directory filled with photos of the interiors of fridges from the CNTK dataset1. This mistake turned out to be a pivotal learning moment in understanding how DNNs operate when inferring images outside of their training domain.

As a human, I can look at an image and immediately understand that it’s highly unlikely to find kangaroos in my fridge. However, the model lacked this contextual understanding and produced a result indicating a low “confidence” of kangaroo presence in the fridge. This incident sparked the development of several concepts that I later incorporated into our Machine Learning and DNN training systems. These concepts included:

  • Torture Dataset: A collection of unusual images that were either adjacent to or outside the domain.
  • Custom Metrics: Measures to evaluate the model’s performance on out-of-domain images.
  • Domain-specific Language and Jargon: Terminology that facilitated communication about what was considered “in domain” and “out of domain”.
  • Model Output “Escapes”: Essentially, the “I don’t know” option for the model.
  • Data Provenance or Lineage: Understanding the origin of data and how it related to training and evaluation loops.

This experience continues to serve as a reminder that even in the realm of machine learning, the algorithm is missing key context, and not as “smart” as it would seem.

  1. CNTK on github.com ↩︎