Approved
Imaging SOP
Purpose
Guidance for Data Acquisition Sites regarding imaging data processing, harmonization, upload, metadata minimum, and populating OMOP tables. The documnet is presented in two sections:
- The first section will discuss the technical procedures to prepare image files intended for submission as part of site’s upload dataset. This section includes instructions of data de-identification, removal of patient health identifiers and extraction of a minimal required set of metadata describing the image files and clinical study.
- The second section continues with procedures to link image files with its Observational Medical Outcome Partnership (OMOP) clinical data. The image file linkage is also discussed in the Multimodal Linkage SOP. In this section we provide an example of populating the interim image_registry table associated with each DICOM file included in the upload process.
This section will also provide for your information examples of populating the other image workgroup extension tables as presented by Park, Nagy et al [see ref]. These image workgroup extension tables will be created and populated for you centrally. This information is also intended to provide additional information for those who wish to use the imaging data and extension tables once it has been fully integrated into the CHoRUS Bridge2AI central MERGE data repository.
The aim of this SOP is to describe the end-to-end process of transforming image modality data into an OMOP standards compliant dataset for CHoRUS upload.
Section 1: Procedures
Data Preparation
-
Identify data extraction partners and methods (e.g., CTSI)
-
Identify the sources of the data (e.g., PACS, site research repository)
-
Determine which image modalities would need to be extracted, date range, and size
-
Confirm the type of data format you will receive (e.g., DICOM). DICOM is the target format for CHORUS)
-
Confirm the type of IDs used to link this data to the patient (e.g., name) or other hospital ID (e.g., MRN), procedure codes, and time shifting => refer to Standards SOP on data linkage and date shifting. Linkage ID a) Accession number = OMOP: image_occurrence_id b) Procedure number = OMOP: procedure_occurrence_id
-
Ensure accession numbers for specific image tests are part of the encounters of interest. The preferred method is using EHR search to filter tests based on encounter (e.g., CDW) and not pulling directly from PACS based on the patient identifiers. This is to simplify image test mapping to the primary encounter of interest for the multimodal data pull.
-
Map header metadata relevant to your local site with header metadata selected by CHORUS (list of metadata fields below). Refer to points 12 and 13 for look up tale specifics for the deidentification tool. Having the RSNA LOINC codes specifying the type of study would be useful to include. Imaging studies should have a corresponding entry in the PROCEDURE_OCCURENCE OMOP table. Thus, imaging studies can be linked to the rest of the data using the procedure_occurence_id field. Refer to the Multimodal Linkage SOP for specifics about linking images to the other data. For date shifting, please refer to the Date Shifting SOP.
Study Level: AccessionNumber StudyDescription StudyInstanceUID Modality StudyDate Manufacturer ManufacturersModelName StudyTime MagneticFieldStrength BodyPartExamined Radiopharmaceutical ContrastBolusAgent ContrastBolusRoute
Series Level: SeriesDescription SeriesInstanceUID SliceThickness ViewPosition ImageLaterality ImagesinAcquisition TransducerType TransducerFrequency SeriesNumber
Image Level: N/A
-
Identify metadata not covered by CHORUS that would be relevant to your site locally.
-
Pursue local data quality review (e.g., data missingness).
-
Identifiable cross-walk table creation (stays local): maps all original DICOM metadata fields to the deidentification procedure (e.g., replacement, erase, etc) and the new field in the deidentified DICOM, so this can be used as a crosswalk (crosswalk table stays local at the site and is not shared). Refer to points 12 and 13 below based on DICOM deidentification status in your site.
-
Deidentified DICOM metadata tables (for sharing): to enable imaging data querying without loading DICOM data.
-
DICOM with metadata already deidentified at your local site (skip to #13 if you do not have data already deidentified): DICOM files can be shared if the site already has an approved solution that may or may not require pixel deidentification, but you should still use the CHORUS-specific CTP version to process your data consistently.
- https://github.com/chorus-ai/CTP-deid/tree/main
- You will need to prepare lookup tables: "image_map.csv" and "personal_map.csv" — examples in CHoRUS_metadata_deid_instruction/pydicom/loopup_table/ PatientID and AccessionNumber in the DICOM metadata are replaced by person_id and image_occurrence_id from the OMOP table. Selected data tags (see repository Table 1) are shifted by a predefined number of days (specific to each PatientID). Ensure this shift is consistent across EHR and waveform data
- You will first need to run a pydicom script for wrangling these lookup tables and then run the CTP tool
- For CTP set up instructions, contact Xiang Li, PhD: XLI60@mgh.harvard.edu
- DICOM with metadata that needs deidentification: You should use the CHORUS-specific CTP tool.
- https://github.com/chorus-ai/CTP-deid/tree/main
- You will need to prepare lookup tables: "image_map.csv" and "personal_map.csv" — examples in CHoRUS_metadata_deid_instruction/pydicom/loopup_table/ PatientID and AccessionNumber in the DICOM metadata are replaced by person_id and image_occurrence_id from the OMOP table. Selected data tags (see repository Table 1) are shifted by a predefined number of days (specific to each PatientID). Ensure this shift is consistent across EHR and waveform data
- You will first need to run a pydicom script for wrangling these lookup tables and then run the CTP tool
- For CTP set up instructions, contact Xiang Li, PhD: XLI60@mgh.harvard.edu
-
Test deidentified DICOM readability: load file in Horos/Osirix to ensure that DICOM can open
-
Folder and file structure and naming: ensure folder structure and naming follow this structure:
The Images folder should contain all images for the patient, with images organized in the standard DICOM hierarchy with study/series folders.
Folder and File names should follow the format below:
- Patient Identification: Typically includes a de-identified person ID.
- Modality: Refers to the type of equipment used for the scan, such as MR (Magnetic Resonance), CT (Computed Tomography), XR (X-ray), or US (Ultrasound); DICOM tag Modality (0008,0060).
- Accession Id: The Accession number, may be a de-identified version of DICOM tag AccessionNumber (0008,0050).
- Series Id: The series Id, may be a de-identified version of DICOM tag SeriesInstanceUID (0020,000E).
- Instance Number: Represents the specific image number within a series, InstanceNumber (0020,0013).
Example of a DICOM Image folder and filenames
root
├── 10001001
├── Images
├──CT
├── 1234
├── 5678
├── 000.dcm
├── 001.dcm
├── 002.dcm
...
This example would represent a CT scan for patient 10001001 with Accession Number 1234 and Series Id 5678, with instance numbers 000, 001, and 002. Please make sure that these fields match the dicom metadata tags.
DICOM Tags Recommended to Keep
Image Identity & Structure
- Modality: 0008, 0060
- Manufacturer: 0008, 0070
- Instance Number: 0020, 0013
- Image Position (Patient): 0020, 0032
- Image Orientation (Patient): 0020, 0037
- Slice Location: 0020, 1041
Image Geometry & Spatial Resolution
- Rows: 0028, 0010
- Columns: 0028, 0011
- Pixel Spacing: 0028, 0030
- Slice Thickness: 0018, 0050
- Spacing Between Slices: 0018, 0088
- Photometric Interpretation: 0028, 0004
- Bits Allocated: 0028, 0100
- Bits Stored: 0028, 0101
- High Bit: 0028, 0102
- Pixel Representation: 0028, 0103
Pixel Data & Intensity Interpretation
- Window Center: 0028, 1050
- Window Width: 0028, 1051
- Rescale Intercept: 0028, 1052
- Rescale Slope: 0028, 1053
- Rescale Type: 0028, 1054
Modality-Specific Acquisition Parameters
X-ray
- KVP: 0018, 0060
- X-Ray Tube Current: 0018, 1151
- Exposure: 0018, 1152
- Exposure in µAs: 0018, 1153
- Exposure Time: 0018, 1150
- Focal Spot(s): 0018, 1190
- Filter Type: 0018, 7050
- Filter Thickness Minimum: 0018, 7052
- Filter Thickness Maximum: 0018, 7054
CT
- KVP: 0018, 0060
- X-Ray Tube Current: 0018, 1151
- Exposure: 0018, 1152
- Convolution Kernel: 0018, 1210
MRI
- Scanning Sequence: 0018, 0020
- Sequence Variant: 0018, 0021
- Scan Options: 0018, 0022
- Repetition Time: 0018, 0080
- Echo Time: 0018, 0081
- Magnetic Field Strength: 0018, 0087
- Flip Angle: 0018, 1314
DICOM Pixel de-identification
- under development
DICOM defacing
- under development
Data Upload to Azure
- refer to Azure data upload SOP
Section 2: OMOP Image Registry Procedures
2.0 Image Data Preparation
It is recommended that sites first implement the OMOP image registry table as described in the Multimodal Linkage SOP. This can be done by creating a database table or a CSV file to be part of the upload process with the same image_registry table field names and order.
SQL: CREATE Table
CREATE TABLE image_registry (
image_registry_id integer NOT NULL AUTO_INCREMENT,
file_id varchar(1024) NOT NULL,
proc_id integer NOT NULL,
person_id integer NOT NULL,
group_id integer NOT NULL,
visit_ id integer NOT NULL,
datetime datetime NOT NULL,
src_file varchar(1024) NOT NULL,
trg_file varchar(1024) DEFAULT NULL,
PRIMARY KEY (image_registry_id)
);
CSV: Header labels: (please implement columns in the order listed below)
image_registry_id - sequential integer value
file_id varchar(1024) – name of the file. Ex “wo2d4400fflse.wfdb”
proc_id integer – from the procedure.procedure_id
person_id integer – from the person.person_id
group_id integer – file grouping identifier
visit_ id integer – from the visit_occurrence.visit_occurrence_id
datetime datetime – SHIFTED date/time stamp of the file consistent with other OMOP tables and dates. Use format yyyy-MM-dd HH:mm:ss LOCAL time zone.
src_file varchar(1024) – name and path of the source file
trg_file varchar(1024) – this will be populated centrally, leave blank or null
Note: If you choose to create a SQL table, once the image_registry table has been populated, it will need to be exported to a CSV file to be included in your image file dataset upload.
| file_id | proc_id | person_id | group_id | visit_id | datetime | src_file | trg_file |
|---|---|---|---|---|---|---|---|
| af99220 | 1111 | 1 | 1 | 11 | 8/18/2022 | ecg124.wfdb | null |
| ff99338 | 2222 | 1 | 2 | 22 | 9/22/2023 | ecg133.wfdb | null |
| eb33728 | 3333 | 3 | 1 | 33 | 2/11/2020 | ecg220.wfdb | null |
| 1a10020 | 4444 | 4 | 1 | 44 | 4/12/2019 | ecg544.wfdb | null |
| 9c99939 | 5555 | 4 | 2 | 55 | 4/18/2019 | ecg304.wfdb | null |
Table 2.0 Example CSV image registry file
Once the imaging data, DICOM files and image_registry CSV file has been received at central, the other imaging extension tables (Park, Nagy et al 2024) will be created and populated for your site and integrated into the MERGE database. Data users should review the Image Workgroup extension to understand what metadata, features, and linkages are available for their particular clinical use case.
Related Office Hours
The following office hour sessions provide additional context and demonstrations related to the SOP:
-
[05-14-2026] End-to-end imaging process
- Video Recording | Meeting Summary| Meeting Slides
- Review of end-to-end porcess for preparing and uploading imaging data
-
[01-29-2026] Registry table refresher
- Video Recording | Meeting Summary| Meeting Slides
- Review waveform & imaging data submission requirements for 2nd upload.
Reference Materials
RSNA CTP Documentation: https://mircwiki.rsna.org/index.php?title=MIRC_CTP
Park WY, Jeon K, Schmidt TS, Kondylakis H, Alkasab T, Dewey BE, You SC, Nagy P. Development of Medical Imaging Data Standardization for Imaging-Based Observational Research: OMOP Common Data Model Extension. J Imaging Inform Med. 2024 Apr;37(2):899-908. doi: 10.1007/s10278-024-00982-6. Epub 2024 Feb 5. PMID: 38315345; PMCID: PMC11031512.