Skip to main content


At the beginning of the AI4EU program, the healthcare pilot aimed to provide brain imaging as reference dataset. The collect phase has been planned from March to June 2020. Due to the COVID-19 pandemic, we proposed to collect chest Computed Tomography (CT scan) dataset from many hospitals. As explained hereafter, the interpretation of this medical exam is very useful for confirmation of COVID-19 and the usage of AI is key to facilitate the work of radiologists.


Business Category

Today, Europe is the continent badly affected by the COVID-19 pandemic. The global containment strategy currently ongoing in most European countries aims to slow down the dissemination and reverse the epidemic growth, hoping to lower the number of cases and the speed of dissemination. However, the capability to deploy at scale the screening of the population remains critical to switch from this global containment to more specific ones. Today, the RT-PCR test provides results between a few hours and two days[1]. Moreover, it cannot be generalized everywhere due to a lack of availability. Others complementary solutions need to be deployed, particularly in the hospital environment.

A recent study done on a large dataset[2] has demonstrated that clinical signs of the COVID-19 are visible on chest Computed Tomography (CT scan), and expert radiologists can detect them. A recent Chinese-American study[3] has shown the capability to differentiate the COVID-19 infection to other viral pneumonia.

All the published articles[4],[5],[6] confirm the visibility of signs at an early stage. They confirm also the superior and excellent sensitivity of the scanner compared to RT-PCR test. For instance, a Chinese study[7], based on more than 1 000 patients, reports a better sensibility of chest CT (98%) than the RT-PCR one (71%):

“About 81% of the patients with negative RT-PCR results but positive chest CT scans were re-classified as highly likely or probable cases with COVID-19, by the comprehensive analysis of clinical symptoms, typical CT manifestations and dynamic CT follow-ups”.

To conclude, the use of chest CT offers many advantages such as

  • Scans are routinely used for pulmonary diseases,
  •  Its protocol is well-known by the medical staff of European countries 
  • The use of this kind of equipment is extremely high and well-spread in hospitals and in the medical environment as a whole

In this context, Artificial Intelligence (AI) can help to test suspected population in a shorter time than RT-PCR test can do with a higher constancy of its results. A recent publication of a China study[8] made on 3 332 patients reveals an interesting level of performances obtained by the use of Deep Learning techniques. AI can also provide other algorithms like categorization of the kind of pulmonary diseases (COVID-19, SRAS or Community Acquired Pneumonia) and definition of the disease severity degree.

Some AI algorithms are currently on the market (like the Chinese company Infervision[9]). They offer interesting results in terms of performances, but raise significant issues such as explainability and verifiability of their results.

Training of algorithms should also work with large and real data sets, included in research and medical programs, to ensure the robustness of the results and their scalability

Europe must help to set-up a robust and European-based values approach, allowing the creation of algorithms respecting the five interconnected dimensions of human-centred AI considered in AI4EU[10].

This pilot aims to provide a reference dataset in healthcare, which can be used to define algorithms which detect COVID-19 disease. This dataset contains medical images, diagnostic information as well as radiological reports.

The first phase of this pilot consists in the definition, collection and formatting of a publishable corpus. The data collected will be from the field of conventional radiology. They will be composed of performed image, the age and sex of the patient, the clinical indications, the purpose of the examination and finally the radiological report.

The corpus will consist of at least 100,000 complete records from at least 5 different producing facilities. The collected data will be anonymized.

Figure 1


The second phase focuses on the extraction and organization of concepts. We will rely on the SNOMED-CT ontology, the reference for medical terms. This extraction of the concepts and their logical organization of SNOMED-CT will lead to a comparison of the concepts connected together by the SNOMED-CT references and consolidated by the guides of the good practices formalized in the same way. This may result in a search engine for concepts that can search for similar reports. We will define KPIs to evaluate the normalization of the report based on the involved concepts.

Figure 2

Cooperation of the partners

The pilot has a running collaboration with ORA in order to be able to share this large dataset. This unusual weight (more than 4 terabytes) imply to define dedicated resources.

Another collaboration is taking place with BSC to define a common way to exploit their resources for learning from this large dataset.

The main collaboration is however with UPM, which contribute to the analyzing of the text reports and concept extractions. They realized a prototype and the next phase will consist to integrate the text reports from the collected dataset.

Exploitation of the AI4EU Platform

We distinguish three modes of exploitation that were highly relevant for the AI4Healthcare pilot:

  1. Usage of the Community services

The Q/A online sessions related to the exploitation of the covid-19 dataset gave us access to competences in AI which were useful to understand how to process the data. By opening a discussion group for the covid-19 dataset, we also noted a very large interest from the AI4EU community.

  1. Publication of resources in the AI4EU Catalogue

Two components were published in the AI4EU Resource Catalogue, namely the covid-19 dataset and a docker containing the text report analysis.

Major achievements

At this stage of the pilot, 3 major steps are carried out:

  1. Defining the legal context of the data collection. Since the dataset concerns sensible medical data, a clear anonymization procedure needs to be defined. It has been complicated due to the nature of the data: medical images contain metadata. Today, verification tools to double check the anonymization are being finalized.
  2. Regarding the collection itself, we collected one 400,000 CT slices of scans in dicom format. The quality of these images is better than PNG images proposed by the reference Chinese dataset.
  3. A prototype of concept extraction has been realized. It has been based on the Ecgen-radiology and MIMIC-CXR datasets in the system design. It proposes a case-base recommender where new recommendations are based on previous similar cases. Among others, the system is capable of:
  • Selecting similar cases based on radiology images and reports (Deep Learning Computer Vision and NLP methods are employed).
  • Rearranging the report in sections automatically (a Deep Learning NLP text classification system is used for this).
  • Suggesting missing medical concepts from the report based on similar cases retrieved (a Deep Learning NLP NER system powers this functionality).
  • Validating automatically the reports based on previously rejected cases (a Deep Learning classifiers is constantly retrained based on manually validated and rejected cases).

    The concrete assets related to the AI4Healthcare pilot are available in the AI Catalog of the platform under the following references:

[1] « Curetis Group Company Ares Genetics and BGI Group Collaborate to Offer Next-Generation Sequencing and PCR-based Coronavirus (2019-nCoV) Testing in Europe », GlobeNewswire News Room,

[2] Ai T, Yang Z, Hou H et al. Correlation of Chest CT and RT-PCR Testing in Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases. Radiology. 2020;200642

[3] Bai HX, Hsieh B, Xiong Z et al. Performance of radiologists in differentiating COVID-19 from viral pneumonia on chest CT. Radiology. 2020 Mar 10:200823

[4] Wu J, Wu X, Zeng W, Guo D, Fang Z, Chen L, Huang H, Li C. Chest CT Findings in Patients with Corona Virus Disease 2019 and its Relationship with Clinical Features. Invest Radiol. 2020 Feb 21. doi: 10.1097/RLI.0000000000000670

[5] Zhao W, Zhong Z, Xie X, Yu Q, Liu J. Relation Between Chest CT Findings and Clinical Conditions of Coronavirus Disease (COVID-19) Pneumonia: A Multicenter Study. AJR Am J Roentgenol. 2020 Mar 3:1-6. doi: 10.2214/AJR.20.22976

[6] Song F, Shi N, Shan F, Zhang Z, Shen J, Lu H, Ling Y, Jiang Y, Shi Y. Emerging Coronavirus 2019-nCoV Pneumonia. Radiology. 2020 Feb 6:200274

[7] Tao Ai, Zhenlu Yang, Hongyan Hou, Chenao Zhan, Chong Chen, Wenzhi Lv, Qian Tao, Ziyong Sun, Liming Xia. Correlation of Chest CT and RT-PCR Testing in Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases. Radiology, 2020; 200642 DOI: 10.1148/radiol.2020200642

[8] Lin Li, Lixin Qin, Zeguo Xu et al. Artificial Intelligence Distinguishes COVID-19 from Community Acquired Pneumonia on Chest CT. Radiology, published online 2020 Mar 19,

[9] Infervision, HTTPS://

[10] Alessandro Saotti, Salvo Rinzivillo, Florian Zimmermann, Alberto Sanfeliu, Michele Lombardi5, João Costeira, and Luciano Serani, AI4EU Deliverable D7.1, Outcomes from the Strategic Orientation Workshop. Milton-Keynes, United Kingdom, 2019 Sept 19-20

Video file