How it was made

16,226 Photos of 61 Different Monkeys were collected and used in this project. Each of the photos was classified by the primary monkey present in the picture.

The deep learning and computer vision platform FastAI (https://www.fast.ai/) was selected for its ease of use and powerful machine learning capabilities. The first attempts to train a classifier using the photos "as-is" with only minor manipulations was moderately successful - reaching an accuracy rate of over 80%. Further experimentation led the team to believe that focusing the training on the faces alone, as opposed to everything in the photo, would reduce errors.

Manually cropping each photo to restrict it to the face of the monkey was tedious. We needed a way to automatically the location within the picture where the monkey face was. The team selected YOLO - You Only Look Once (https://pjreddie.com/darknet/yolo/) to train a real-time object-detection computer vision system. YOLO doesn't know what a monkey face looks like, so we had to teach it.

The process of creating an object detector starts by manually labeling hundreds of images. For this project, HyperLabel (https://hyperlabel.com/) was used to draw the bounding boxes around 1,000 of the monkey faces. The YOLOv3 object-detector was then trained over 4,000 iterations reaching an ultimate loss rate of 0.0411 (or 95.9% effective).

The advantage of creating a computer-vision library capable of detecting monkey faces within a picture was two-fold: First, the monkey faces from the remaining 15,000+ photos were extracted in minutes, not dozens of hours. Second, when end users upload a photo for facing recognition, the object detector can again be used to obtain the face-only and improve accuracy.

After much experimentation using FastAI and with resnet34 and resnet50 models as starting points, a model with 99%+ accuracy was created. This notebook shows exactly the process used.

Training these models is extremely computationally expensive and requires GPUs (and TPUs), which are much faster than CPUs at machine learning - 200x faster (https://medium.com/syncedreview/harvard-researchers-benchmark-tpu-gpu-cpu-for-deep-learning-3034a452958d). To save cost, the final object detection weights and CNN (https://en.wikipedia.org/wiki/Convolutional_neural_network) were exported and put onto a standard "web server."

Monkey	Photo Count
Addison	433
Banana	316
Bea	349
Best	601
Bora	571
Carson	83
Donda	150
Dove	111
Elsa	238
Figiri	360
Fiona	160
Flower	236
Gold	301
Happy	297
Hope	142
Ibuka	273
Ice	495
India	355
Ire	318
Jenny	544
Jib	298
Jinja	266
Joly	16
Juice	146
June	330
Kadi	163
Kau	369
Kenya	223
King	308
Krys	156
Likizo	381
Limao	272
Liza	279
Lucky	30
Mapera	562
Mercy	280
Msada	122
New	68
Okwi	230
Orange	255
Pices	207
Ramani	212
Rocket	528
Ruby	47
Savannah	115
Scarface	222
Supu	273
Tabu	372
Tamu	271
Tatu	268
Tisa	40
Twin	298
Uma	120
Venus	131
Vibe	280
Viola	112
Vumi	148
Wali	79
Zalia	326
Ziwa	360
Zoo	366
Zuri	364
Total	16,226