Image recognition on the iPhone using machine learning with CoreML

Image Recognition On The IPhone Using Machine Learning With CoreML


my name is Matsuyama, and I work at Hacarus. Today I write about how to use Vision and CoreML, which was a hot topic at the last WWDC.

This time, I created an application that displays a label and confidence level for identified objects. I used models provided by Apple and a custom model converted to ML model from Caffe using Python. 

Here is how it looks like.

What is Vision Framework?

Vision framework performs face detection, text detection, barcode recognition, and general feature tracking. It also allows the use of custom CoreML models for tasks like classification or object detection.  So we can use together Core ML and Vision.

What is CoreML

CoreML is a framework for machine learning provided by Apple. Implementing it on iOS device is rather uncomplicated. The general features are:

  • Optimized for on-device performance, don’t need a network connection.
  • Used device CPU and GPU, to show the best performance.
  • Use coremltools Python package that converts a variety of model types into the Core ML format.  A list of supported models can be found here.


A step-by-step guide to run machine learning with CoreML 

CoreML can use models provided by Apple or made by yourself. The below steps show how to use the models with CoreML.

  1. Prepare model (provided by Apple or custom-made model)
  2. Add CoreML model in your project.
  3. Run the model. 

1. Prepare the model

Using a model provided by Apple

Download the CoreML model from Apple that you want to on your project. For this exercise, I used  MobileNet.

Using a custom model made by yourself

I converted a Caffe model to a ML model, so it can be used on CoreML. Caffe models can be downloaded from the internet or made by yourself. This demonstration app uses SqueezeNet downloaded from the GitHub repository on Caffe. Then I converted it to CoreML model using the coremltools framework for Python.

You can create a folder within the CaffeModels path below the Python directory on Sample repository. Then, add a CaffeModel file and text files like .prototxt and .txt in the same folder. Finally, you designate the path for the three files through coremltools.converters.caffe.convert() of src/  (Find the Python code on Github here).

coreml_model = coremltools.converters.caffe.convert(
('./../CaffeModels/SqueezeNet/squeezenet_v1.1.caffemodel', './../CaffeModels/SqueezeNet/deploy.prototxt'),
image_input_names = 'image',
class_labels = './../CaffeModels/SqueezeNet/imagenet1000.txt')'./../MLModels/Squeeze.mlmodel') # The name converted from caffe model.

image_input_names of convert() defines the name for the input layer.  is the designated path for saving the CoreML model. Here, the save location is below MLModels directory. Run python3

2. Add CoreML model to your project

You can create a new Xcode project, then adding the model prepared in step 1 to the project.

3. Execute the model

Follow these steps to execute the model.

  1. Create a VNCoreMLRequest
  2. Create a handler for the model.
  3. Set up AVFoundation
  4. Execute through VNCoreMLRequest

1. Create a VNCoreMLRequest

VNCoreMLRequest inherits VNImageBasedRequest, which inherits VNRequest.

lazy var modelRequest: VNCoreMLRequest = {
        guard let model = try? VNCoreMLModel(for: MobileNet().model) else {
            fatalError("can't load Core ML model")
        return .init(model: model, completionHandler: self.handleModel)

The Vision framework can handle VNCoreMLModel, converted from MLModel. Pass VNCoreMLModel and completionHandler as arguments. completionHandler handles the results of requests to VNCoreMLRequest()

2. Create a handler for the model

private func handleModel(request: VNRequest, error: Error?) {
        guard let results = request.results as? [VNClassificationObservation] else { return }

        if let classification = results.first {
            // classification.identifier: label(identifier) of object.
            // classification.confidence: confidence of object.
            // do something you want.

This is the handler for the results of VNCoreMLRequest. VNClassificationObservation inherits fromVNObservationThe request.result returns label(identifier) and classification confidence through CoreML. The result of VNCoreMLRequest inherits VNObservation. This time, I want to use classification for image data, so the result of VNCoreMLRequest is VNClassificationObservation. More info about the result of VNCoreMLRequest can be found here.

3. Set up AVFoundation

private lazy var previewLayer: AVCaptureVideoPreviewLayer = {
        let previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
        previewLayer.frame = view.layer.bounds
        previewLayer.backgroundColor = UIColor.clear.cgColor
        previewLayer.videoGravity = .resizeAspectFill
        return previewLayer

        let captureSession = AVCaptureSession()
        guard let captureDevice = AVCaptureDevice.default(for: .video) else {
            fatalError("Could not set av capture device")
        guard let input = try? AVCaptureDeviceInput(device: captureDevice) else {
            fatalError("Could not set av capture device input")
        let output = AVCaptureVideoDataOutput()
        output.setSampleBufferDelegate(self, queue: .init(label: "video"))
        captureSession.sessionPreset = .photo
        self.captureSession = captureSession

The set up defines previewLayer as lazy var. And, we configure captureSession to be available to use for AVFoundation.

4. Execute through VNCoreMLRequest

Run the image analysis request as an argument of  VNImageRequestHandler.

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return } .userInteractive).async {
            do {
                try VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([self.modelRequest])
            } catch let error {

SampleBuffer is passed with captureOutput and converts to CVImageBuffer by CMSampleBufferGetImageBuffer. Then CVImageBuffer is executed by VNImageRequestHandler and VNCoreMLRequest created before.
This is all to execute the given image signal.

How to use the sample application

Clone this repository from the Hacarus Github.

You can add the MLModel you want to use below from CoreMLSample/Models. Then, you can define the models you want to use in your project as enum on MLModelListViewModel.swift. Simply add to the enum of MLModels

struct MLModelListViewModel {
    let sections: [Section] = [
                                .init(title: "Provided by Apple Inc.", items: [.mobileNet]), // Add items provided by apple.
                                .init(title: "Provided by own", items: [.squeezeNet]), // Add items created by own.

    struct Section {
        let title: String
        let items: [MLModels]

        enum MLModels {
            case mobileNet
            case squeezeNet
            // add models you want

            var title: String {
                switch self {
                case .mobileNet:
                    return "Mobile Net"
                case .squeezeNet:
                    return "Squeeze Net"
                // display name for models you added on case.

            var model: MLModel {
                switch self {
                case .mobileNet:
                    return MobileNet().model
                case .squeezeNet:
                    return SqueezeNet().model
                    // MLModel you added on case.

The CoreML models you defined will be displayed on the top page, and you can analyze with any model.

About CoreML2

  • Create ML create model on your Mac
  • improve processing speed for 30%


This machine learning exercise was easy to implement with CoreML and Vision. Since Apple makes it so easy, I think services using machine learning will increase more and more in the future. I am really looking forward to the evolution of CoreML.

Hacarus is currently hiring data scientists and FPGA engineers. If you interested to work at Hacarus, please visit the recruitment page on our website.

Takashi Someda

CTO of Hacarus. Has over 15 years experience as a software and server engineer in several global SaaS startups. Currently working hard to make something new with machine learning. Holds masters degree in Information Science at Kyoto University.

Subscribe to our newsletter

Click here to sign up