skip to Main Content
Digital Imaging And Communications In Medicine(DICOM) Annotation Using Computer Vision Annotation Tools(CVAT)


– CVAT was used as an annotation tool for medical AI development

– CVAT was adjusted to be compatible to read DICOM

– A class to read in DICOM was added to cvat/apps/engine/

– Some features that will be useful in real projects, such as default label settings were also added

1. About CVAT

Hello, my name is Yuya Sumie, an intern at HACARUS. In order to develop machine learning algorithms, we need to know how to finalize the “correct” output. In the field of machine learning, this is called “annotation” and the annotated data – that is the goal of the development project – is often created by humans. For example, in order to develop a module for detecting lesions, a patient’s CT/MRI images showing where the lesion is, is the annotation. Various annotation tools are available to create the correct data.

One of them is CVAT (Computer Vision Annotation Tool); CVAT is a free-to-use, open-source annotation tool that runs in browsers. In this article, we’ll discuss why CVAT was chosen for this medical imaging project, and how it can be used in practice. I’ll also write about the improvements I made to CVAT to enhance its practical usability.

2. Why CVAT was chosen

There are several challenges with medical image detection AI annotation.
The first one is the peculiarity of the data format.

Typically, CT and MRI images handled in the medical field are in a DICOM (Digital Imaging and COmmunications in Medicine) format. DICOM is not just an image in the container format, but also holds information of when the image was taken, and the environment in which the image was created. Therefore, normal image -such as JPEG or PNG – loading libraries cannot process them. In fact, to process DICOM in Python, you need a dedicated library such as pydicom.

Secondly, the data needs to be kept confidential.

Medical data has been processed in such a way by the medical institutions that it is not possible to know whose data it is. But that doesn’t mean you can do things like upload data to the public cloud and process it. Even in such cases, proper data management is required. For such proper management, you need a tool to set up your own in-house server.

The third challenge is the complexity of installing an annotation tool onto a PC.

Annotating a lesion requires specialized knowledge and needs to be performed by a doctor. It is not easy to have the tool installed on a PC used by medical institutions. With that in mind, an annotation tool that works in a browser environment was required.

CVAT can solve the second and third challenges. By deploying CVAT on your own servers and then accessing that server from a PC elsewhere, you can handle the data without having to expose the data or add new tools.

Concerning the first issue, public source codes do not address the first issue completely, but since CVAT is OSS and written in Django (a Python web framework), I thought that with a little bit of tweaking, this issue could be solved. Thus, CVAT was adopted as the annotation tool.


3. How to perform annotation for DICOM

Basics of CVAT

– Docker / docker-compose

– nginx

– Django, React, TypeScript

– Operates on Ubuntu, Windows 10, Mac OS Mojave 

– Annotated output in CVAT, PASCAL VOC, YOLO, MS COCO, etc.

Additions and Modifications

In addition to allowing DICOM to be annotated, I made some modifications to increase effectiveness when using it.

– Modified to accommodate DICOM

– Modified to be able to read in a number of DICOMS at once

– Modified to separate jobs in accordance to DICOM series 

– Modified to set the default annotation label

To accommodate DICOM

Firstly requirements needed to be added as python was used:


# cvat/requirements/base.txt

+ pydicom


CVAT also allows you to convert and annotate JPEG files converted from PDF files, and we added a class to convert DICOM to JPEG based on this (Japanese shown in below example).


# cvat/apps/engine/

class DicomListExtractor(MediaExtractor):
    def __init__(self, source_path, dest_path, image_quality, step=1, start=0, stop=0):
        if not source_path:
            raise Exception('No Dicom found')

        import pydicom

        self._dimensions = []
        series = dict()
        self._jpeg_source_paths = []

        for i, source in enumerate(self._source_path):
            dcm = pydicom.read_file(source)

            series_time = dcm.get("SeriesTime", "")
            if series_time not in series:
                series[series_time] = Series(i, dcm.get("SeriesDescription", ""))
                series[series_time].stop_frame = i     

            img = _normalize_image(dcm.pixel_array)
            pilImg = Image.fromarray(img)
            jpeg_source_path = os.path.splitext(source)[0] + '.jpg'
  , 'JPEG')
        # SeriesTimeで昇順に並べかえたSeriesのリストを取得
        self._series = [v for _, v in sorted(series.items())]


def _normalize_image(img, min_percent = 0, max_percent = 99, gamma = 1.2):
    vmin = np.percentile(img, min_percent)
    vmax = np.percentile(img, max_percent)
    img = ((img - vmin) / (vmax - vmin))
    img[img < 0] = 0
    img = pow(img, gamma) * 255
    img = np.clip(img, 0, 255)
    return img.astype(np.uint8)


I read the DICOM file with pydicom.read_file() and normalized the image with _ normalized_image() .

In this way, DICOM was converted to JPEG so that it can be treated the same as other existing formats when annotated.

Enabling multiple DICOMs to be read at once

Without any modifications, DICOM must be uploaded by the folder, and as the number of cases increases, this can be a big hassle. I thought it useful to be able to upload zip files, so I made the required modifications. CVAT originally had the ability to upload  zip files, and that was the approach I took.

The class that can read zip files in cvat/apps/engine/ is called ArchiveExtractor. This inherits the same class from DirectoryExtractor; and this class inherits the class of  ImageListExtractor – for processing images such as JPEG.

I decided to review this inheritance relationship: the  source_paths read from the DirectoryExtractor was divided into img_paths (paths of images) and dicom _paths (paths of DICOM). Processing each with ImageListExtractor and DicomListExtractor provided compatibility with not only images but also zip files that contain DICOM.

Dividing the DICOM into jobs – that reflect the Series

DICOM contains tags of the series of image data taken by CT and MRI – which are called Series. CVAT also has a task unit called job as a unit of annotation. Interconnecting Series and job makes it easier to organize the data, and you can understand what kind of data is undergoing annotation. Interconnecting the Series and the Job is done automatically when the data is loaded, so that you can see when a task is created, it is already divided into jobs.


# cvat/apps/engine/

class DicomListExtractor(MediaExtractor):
    def __init__(self, source_path, dest_path, image_quality, step=1, start=0, stop=0):


                series = dict()

        for i, source in enumerate(self._source_path):
            dcm = pydicom.read_file(source)

            series_time = dcm.get("SeriesTime", "")
            if series_time not in series:
                series[series_time] = Series(i, dcm.get("SeriesDescription", ""))
                series[series_time].stop_frame = i     


        self._series = [v for k, v in sorted(series.items())]

class Series:
    def __init__(self, start_frame, description):
        self.start_frame = start_frame
        self.stop_frame = start_frame
        self.description = description


Jobs in CVAT were represented by segment_frames, which contained start_frame and stop_frame, and a new class Series was added to manage all of these.

In detail, the method get() of pydicom was used to obtain the series_time, and the start_frame and stop_frame were determined by the value of the series_time.

Setting the default annotation label

There are only a set number of annotation labels for the data handled in this project, and also, since there are so many of them, it can be a pain to set them up every time you create a zip file. So, when you create an annotation task, you should pre-insert default labels.



// cvat-ui/src/components/create-task-page/create-task-content.tsx

import { DEFAULT_LABELS } from './default-label'

const defaultState = {
    basic: {
        name: '',
    advanced: {
        zOrder: false,
        lfs: false,
    labels: DEFAULT_LABELS,
    files: {
        local: [],
        share: [],
        remote: [],

// cvat-ui/src/components/create-task-page/default-label.tsx

export const DEFAULT_LABELS = [
    "name": "cHCC",
    "attributes": []
    "name": "nHCC",
    "attributes": []




In this modification, the TypeScript has been changed in the cvat-ui folder which determines the UI of the whole CVAT. A variable named defaultState has been the default label of the form to be filled in when creating a task. Thus, I defined DEFAULT_LABELS and added an external file.


4. In conclusion

By modifying the code as described above, you can modify DICOM to be able to handle CVAT, and actually use it in real projects. This article focused on the preparation phase of the project and is not essentially about AI development, but I would be happy if you can now understand that at HACARUS, we have strong reasoning when we use tools for annotation and make a number of modifications to ensure user friendliness. 

In addition, as a developer, I would like you to know that to not only to use open tools, but also understanding and modifying OSSs can be very informative. Personally, it was a great opportunity to read and learn some different code to what I was used to. If you understand such tools, you can apply them to other projects and truly enhance user experience.

This was a post about little touches that can be applied to training data – that we don’t usually think about.