How it was made

16,226 Photos of 61 Different Monkeys were collected and used in this project. Each of the photos was classified by the primary monkey present in the picture.  

The deep learning and computer vision platform FastAI (https://www.fast.ai/) was selected for its ease of use and powerful machine learning capabilities. The first attempts to train a classifier using the photos "as-is" with only minor manipulations was moderately successful - reaching an accuracy rate of over 80%. Further experimentation led the team to believe that focusing the training on the faces alone, as opposed to everything in the photo, would reduce errors.  

Manually cropping each photo to restrict it to the face of the monkey was tedious.  We needed a way to automatically the location within the picture where the monkey face was. The team selected YOLO - You Only Look Once (https://pjreddie.com/darknet/yolo/) to train a real-time object-detection computer vision system. YOLO doesn't know what a monkey face looks like, so we had to teach it.

The process of creating an object detector starts by manually labeling hundreds of images. For this project, HyperLabel (https://hyperlabel.com/) was used to draw the bounding boxes around 1,000 of the monkey faces. The YOLOv3 object-detector was then trained over 4,000 iterations reaching an ultimate loss rate of 0.0411 (or 95.9% effective).

The advantage of creating a computer-vision library capable of detecting monkey faces within a picture was two-fold: First, the monkey faces from the remaining 15,000+ photos were extracted in minutes, not dozens of hours.   Second, when end users upload a photo for facing recognition, the object detector can again be used to obtain the face-only and improve accuracy.  

After much experimentation using FastAI and with resnet34 and resnet50 models as starting points, a model with 99%+ accuracy was created. This notebook shows exactly the process used.

Training these models is extremely computationally expensive and requires GPUs (and TPUs), which are much faster than CPUs at machine learning - 200x faster (https://medium.com/syncedreview/harvard-researchers-benchmark-tpu-gpu-cpu-for-deep-learning-3034a452958d). To save cost, the final object detection weights and CNN (https://en.wikipedia.org/wiki/Convolutional_neural_network) were exported and put onto a standard "web server."

Monkey Photo Count
Addison433
Banana316
Bea349
Best601
Bora571
Carson83
Donda150
Dove111
Elsa238
Figiri360
Fiona160
Flower236
Gold301
Happy297
Hope142
Ibuka273
Ice495
India355
Ire318
Jenny544
Jib298
Jinja266
Joly16
Juice146
June330
Kadi163
Kau369
Kenya223
King308
Krys156
Likizo381
Limao272
Liza279
Lucky30
Mapera562
Mercy280
Msada122
New68
Okwi230
Orange255
Pices207
Ramani212
Rocket528
Ruby47
Savannah115
Scarface222
Supu273
Tabu372
Tamu271
Tatu268
Tisa40
Twin298
Uma120
Venus131
Vibe280
Viola112
Vumi148
Wali79
Zalia326
Ziwa360
Zoo366
Zuri364
Total 16,226