
Hi,
my name is Matsuyama, and I work at Hacarus. Today I write about how to use Vision and CoreML, which was a hot topic at the last WWDC.
This time, I created an application that displays a label and confidence level for identified objects. I used models provided by Apple and a custom model converted to ML model from Caffe using Python.
Here is how it looks like.
What is Vision Framework?
Vision framework performs face detection, text detection, barcode recognition, and general feature tracking. It also allows the use of custom CoreML models for tasks like classification or object detection. So we can use together Core ML and Vision.
What is CoreML
CoreML is a framework for machine learning provided by Apple. Implementing it on iOS device is rather uncomplicated. The general features are:
- Optimized for on-device performance, don’t need a network connection.
- Used device CPU and GPU, to show the best performance.
- Use coremltools Python package that converts a variety of model types into the Core ML format. A list of supported models can be found here.
A step-by-step guide to run machine learning with CoreML
CoreML can use models provided by Apple or made by yourself. The below steps show how to use the models with CoreML.
- Prepare model (provided by Apple or custom-made model)
- Add CoreML model in your project.
- Run the model.
1. Prepare the model
Using a model provided by Apple
Download the CoreML model from Apple that you want to on your project. For this exercise, I used MobileNet.
Using a custom model made by yourself
I converted a Caffe model to a ML model, so it can be used on CoreML. Caffe models can be downloaded from the internet or made by yourself. This demonstration app uses SqueezeNet downloaded from the GitHub repository on Caffe. Then I converted it to CoreML model using the coremltools framework for Python.
You can create a folder within the CaffeModels path below the Python directory on Sample repository. Then, add a CaffeModel file and text files like .prototxt
and .txt
in the same folder. Finally, you designate the path for the three files through coremltools.converters.caffe.convert()
of src/convert_to_ml_model.py
(Find the Python code on Github here).
coreml_model = coremltools.converters.caffe.convert(
('./../CaffeModels/SqueezeNet/squeezenet_v1.1.caffemodel', './../CaffeModels/SqueezeNet/deploy.prototxt'),
image_input_names = 'image',
class_labels = './../CaffeModels/SqueezeNet/imagenet1000.txt')
coreml_model.save('./../MLModels/Squeeze.mlmodel') # The name converted from caffe model.
image_input_names of convert()
defines the name for the input layer.
coreml_models.save()
is the designated path for saving the CoreML model. Here, the save location is below MLModels directory. Run python3 convert_to_ml_model.py
.
2. Add CoreML model to your project
You can create a new Xcode project, then adding the model prepared in step 1 to the project.
3. Execute the model
Follow these steps to execute the model.
- Create a VNCoreMLRequest
- Create a handler for the model.
- Set up AVFoundation
- Execute through VNCoreMLRequest
1. Create a VNCoreMLRequest
VNCoreMLRequest
inherits VNImageBasedRequest
, which inherits VNRequest
.
lazy var modelRequest: VNCoreMLRequest = {
guard let model = try? VNCoreMLModel(for: MobileNet().model) else {
fatalError("can't load Core ML model")
}
return .init(model: model, completionHandler: self.handleModel)
}()
The Vision framework can handle VNCoreMLModel
, converted from MLModel. Pass VNCoreMLModel
and completionHandler as arguments. completionHandler handles the results of requests to VNCoreMLRequest()
.
2. Create a handler for the model
private func handleModel(request: VNRequest, error: Error?) {
guard let results = request.results as? [VNClassificationObservation] else { return }
if let classification = results.first {
// classification.identifier: label(identifier) of object.
// classification.confidence: confidence of object.
// do something you want.
}
}
This is the handler for the results of VNCoreMLRequest. VNClassificationObservation
inherits fromVNObservation
. The request.result
returns label(identifier) and classification confidence through CoreML. The result of VNCoreMLRequest
inherits VNObservation
. This time, I want to use classification for image data, so the result of VNCoreMLRequest
is VNClassificationObservation
. More info about the result of VNCoreMLRequest
can be found here.
3. Set up AVFoundation
private lazy var previewLayer: AVCaptureVideoPreviewLayer = {
let previewLayer = AVCaptureVideoPreviewLayer(session: captureSession)
previewLayer.frame = view.layer.bounds
previewLayer.backgroundColor = UIColor.clear.cgColor
previewLayer.videoGravity = .resizeAspectFill
return previewLayer
}()
let captureSession = AVCaptureSession()
guard let captureDevice = AVCaptureDevice.default(for: .video) else {
fatalError("Could not set av capture device")
}
guard let input = try? AVCaptureDeviceInput(device: captureDevice) else {
fatalError("Could not set av capture device input")
}
let output = AVCaptureVideoDataOutput()
output.setSampleBufferDelegate(self, queue: .init(label: "video"))
captureSession.sessionPreset = .photo
captureSession.addInput(input)
captureSession.addOutput(output)
self.captureSession = captureSession
view.layer.addSublayer(previewLayer)
The set up defines previewLayer as lazy var
. And, we configure captureSession to be available to use for AVFoundation.
4. Execute through VNCoreMLRequest
Run the image analysis request as an argument of VNImageRequestHandler
.
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
DispatchQueue.global(qos: .userInteractive).async {
do {
try VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:]).perform([self.modelRequest])
} catch let error {
NSLog(error.localizedDescription)
}
}
}
SampleBuffer is passed with captureOutput and converts to CVImageBuffer
by CMSampleBufferGetImageBuffer
. Then CVImageBuffer
is executed by VNImageRequestHandler
and VNCoreMLRequest
created before.
This is all to execute the given image signal.
How to use the sample application
Clone this repository from the Hacarus Github.
You can add the MLModel you want to use below from CoreMLSample/Models
. Then, you can define the models you want to use in your project as enum on MLModelListViewModel.swift
. Simply add to the enum of MLModels
struct MLModelListViewModel {
let sections: [Section] = [
.init(title: "Provided by Apple Inc.", items: [.mobileNet]), // Add items provided by apple.
.init(title: "Provided by own", items: [.squeezeNet]), // Add items created by own.
]
struct Section {
let title: String
let items: [MLModels]
enum MLModels {
case mobileNet
case squeezeNet
// add models you want
var title: String {
switch self {
case .mobileNet:
return "Mobile Net"
case .squeezeNet:
return "Squeeze Net"
// display name for models you added on case.
}
}
var model: MLModel {
switch self {
case .mobileNet:
return MobileNet().model
case .squeezeNet:
return SqueezeNet().model
// MLModel you added on case.
}
}
}
}
}
The CoreML models you defined will be displayed on the top page, and you can analyze with any model.
About CoreML2
- Create ML create model on your Mac
- improve processing speed for 30%
Conclusion
This machine learning exercise was easy to implement with CoreML and Vision. Since Apple makes it so easy, I think services using machine learning will increase more and more in the future. I am really looking forward to the evolution of CoreML.
Hacarus is currently hiring data scientists and FPGA engineers. If you interested to work at Hacarus, please visit the recruitment page on our website.