Skip to main content

Approved

Standard Operating Protocol: OMOP Mapping Principles

Background

The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) is a standardized schema for observational health data, maintained by the OHDSI (Observational Health Data Sciences and Informatics) community. A crucial step in converting source electronic health record (EHR), claims, or registry data into the OMOP CDM is mapping the source system’s codes (diagnosis codes, procedure codes, lab test codes, medications, etc.) to OMOP’s Standardized Vocabularies. OMOP’s vocabularies include standard terminologies like SNOMED CT for conditions, RxNorm for drugs, LOINC for measurements, among others. All clinical event tables in OMOP (e.g., CONDITION_OCCURRENCE, DRUG_EXPOSURE, MEASUREMENT) allow only standard concepts – meaning source codes must be translated to standard concepts before insertion. This mapping ensures that disparate source data speak a common language, enabling researchers and analysts to use consistent definitions and tools across all mapped datasets.

Applicable Roles and Responsibilities

This SOP is intended for researchers, data analysts, data engineers, clinical informaticists, and ETL developers responsible for transforming source healthcare data into the OMOP CDM. It provides guidance on mapping principles across all domains, with examples illustrating one-to-one mappings, one-to-many mappings, and mappings that span different domains.

Purpose

The purpose of this Standard Operating Protocol (SOP) is to establish a clear and consistent process for mapping source data codes to OMOP standard concepts. Adhering to these mapping principles will ensure data quality, semantic consistency, and interoperability of the transformed data. This SOP covers the end-to-end mapping workflow – from initial code review and tool usage (SQL mapping, Athena, Usagi) through handling complex mappings – and provides a template checklist to verify that all standard conventions are met. By following this SOP, mapping teams will produce an OMOP-conformant dataset where each clinical fact is represented by the appropriate standard concept ID, allowing for reliable multi-database research and analytics.

Glossary

TermDefinition
Mapping (a map)A defined link between a concept (term) in one code system or dataset and a concept in another code system (sometimes the same one) that has the same or very similar meaning. In OMOP, mapping is the process of transforming a source concept into a standard concept using Maps to and Maps to value relationships.
Concept IDAn integer key that uniquely identifies a concept in OHDSI Vocabularies and is used as a reference key in all *_concept_id fields across various OMOP tables.
Source ConceptConcept that represents the original code from the source system (often non-standard) in its original state (before mapping to a standard concept).
Standard ConceptConcept designated as standard in OMOP and used to represent clinical facts in CDM tables in standardized way (after mapping from source concepts).
Custom ConceptStandard concept created/maintained by users when no suitable external standard code exists. They are usually assigned with the concept_id from the 2bil+ range.
Classification ConceptConcept used for grouping and hierarchy (roll-up) purposes, not typically for recording patient-level data.
Concept SynonymAlternative concept name (synonym, abbreviation, translation) linked to a concept_id to support various representations, flexible search, and matching.
Invalid ReasonFlag on a concept indicating why it is no longer valid (e.g., deprecated or upgraded and replaced).
Source VocabularyCoding system in which the original data are recorded (e.g., ICD10CM, local lab codes).
Target VocabularyDestination vocabulary to which source codes are mapped (e.g., SNOMED for conditions, RxNorm for drugs).
Vocabulary IDShort identifier of the vocabulary a concept belongs to (e.g., ICD10CM, SNOMED, RxNorm).
Domain IDHigh-level category (Condition, Drug, Procedure, Measurement, Observation, Device, etc.) that defines which semantics space and set of rules a concept belongs to, and designates the CDM table that should store the concept.
Concept Class IDSubtype defined by the external vocabulary authors (e.g., Clinical Finding, Ingredient, Brand Name) and used to organize concepts semantically or structurally.
Relationship IDShort identifier of the relationship between two concepts (e.g., Is a, Subsumes, Maps to, Maps to value).
Maps to / Maps to valueSpecial relationships used for mapping of source codes to standard concepts: Maps to links to a primary standard concept; Maps to value links it to an additional value/detail concept (e.g., result, historical condition, severity); it is only used for Measurement and Observation concepts, where the second concept populates the value_as_concept_id field.
One-to-one mappingA preferred mapping scenario (also called an equivalent mapping) is when the target Standard concept expresses exactly the same clinical meaning as the source term and fully preserves its intent.
One-to-many mappingMapping scenario where a single source code maps to multiple standard concepts (e.g., a combined diagnosis that maps to the two separate SNOMED conditions, multicomponent drugs that maps to several separate RxNorm/RxNorm Extension Ingredients).
AthenaOHDSI web interface for browsing and downloading OHDSI Vocabularies and inspecting concepts, their attributes, synonyms, domains, and relationships.
UsagiOHDSI desktop tool that suggests candidate standard concepts for source terms using string similarity, which is used to build mappings and export them into tables.

Prerequisites

  • Access to the OHDSI Athena browser.
  • OHDSI Standardized Vocabularies loaded into an OMOP instance in database.
  • Familiarity with OMOP CDM, vocabularies structure, and domain-specific logic.
  • Installed and configured OHDSI Usagi with the current OHDSI Vocabularies snapshot.

Procedures

Step 1: Compile Source Codes and Metadata

Begin by gathering a comprehensive list of all source codes that require mapping. This typically involves extracting distinct codes from the source data (e.g., diagnosis codes, procedure codes, lab test identifiers, medication codes) along with their descriptions and any relevant metadata (such as code type, frequency of occurrence, aggregated top 5 most frequent measurement values, expected units etc.). Ensure you understand each code’s context (e.g., which field or domain it comes from in the source). For example, you might list all ICD-10 diagnosis codes from a hospital EHR or all local laboratory test codes from a lab system. This compiled list will serve as the input for the mapping process.

Step 2: Determine Source Vocabulary and Domain(s)

For each source code, identify its vocabulary/system and intended OMOP domain(s). Common source vocabularies include ICD-10/ICD-9 (diagnoses), CPT/HCPCS (procedures), LOINC (labs/tests), RxNorm or NDC (drugs), etc. Knowing the source vocabulary helps decide the target standard vocabulary (since OMOP often has predefined mappings for many vocabularies).

Expected DomainStandard Vocabulary Priority
Condition / ObservationSNOMED → OMOP Extension; for oncology → include ICDO3
MeasurementLOINC → SNOMED → OMOP Extension → CPT4/HCPCS → all others; for oncology → include Cancer Modifier; for genomics → include OMOP Genomic
ProcedureSNOMED → LOINC → OMOP Extension → ICD10PCS/ICD9Proc/CPT4/HCPCS → all others
DeviceSNOMED → all others
DrugRxNorm (US) / RxNorm Extension (non-US) / for vaccines → CVX
UnitUCUM
GenderGender (OMOP)
EthnicityEthnicity (OMOP)
RaceRace (OMOP)
Type ConceptType Concept (OMOP)
RouteSNOMED → OMOP Extension
RegimenHemOnc
Meas ValueLOINC → SNOMED → NAACCR (for oncology) → all others
VisitNUCC/CMS Place of Service/Medicare Specialty/HES Specialty/UB04 Type of bill → Visit (OMOP)
EpisodeEpisode (OMOP)

Also, form an initial assumption about the likely domain of the most frequent source codes in each table – i.e., what kind of information they seem to represent clinically, keeping in mind that the final domain will be determined by the target standard concepts).

  • Diagnoses (expected to map to a Condition/Observation domain concept).
  • Procedures (map to a Procedure/Measurement domain concept).
  • Medications (map to a Drug/Device domain concept).
  • Labs (map to a Measurement/Observation domain concept).
  • Devices (map to a Device domain concept)
  • Facts about a patient social history, historical health records (map to an Observation domain concept).

Example: An ICD-10-CM diagnosis code I63 "Cerebral infarction" is a Condition domain source code and should map to a SNOMED concept for Cerebral infarction. A local lab code for "HbA1c" is a Measurement and should map to a LOINC concept for Hemoglobin A1c test ("Hemoglobin A1c/Hemoglobin.total in Blood"). If you are unsure of a code’s domain (some codes might be tricky, like certain ICD-10 Z-codes that represent social circumstances or history), consult clinicians, OHDSI domain guidelines or check in OHDSI Athena before mapping.

Tip: Identifying the domain guides you to the right target vocabulary; however in OMOP, some vocabularies map to unexpected domains (e.g., certain ICD-10-CM codes map to Measurement or Observation, some CPT4/HCPCS codes map to Drug or Device). Therefore, we do not recommend restricting your mapping by domain filters. Instead, let the vocabulary determine the target domain, and then use that domain information to decide which CDM table should be populated.

Step 3: Search for Existing Standard Mappings (Automated Lookup)

  • Run SQL queries against the OHDSI Standardized Vocabularies to see if mappings already exist. The vocabularies often include many source codes as non-standard concepts with pre-defined "Maps to" relationships to standard concepts.

Note: In some cases, however, your source system may already use OMOP-preferred standard vocabularies (e.g., RxNorm for drugs, LOINC for labs), so the source code itself may already be a standard concept (standard_concept = 'S') and require no additional "Maps to" mapping - only verification of domain and correctness.

  • Using SQL (if you have the vocabulary tables in a database): You can join CONCEPT and CONCEPT_RELATIONSHIP to resolve "Maps to" and "Maps to value" relationships; optionally, include CONCEPT_SYNONYM if you need additional source text fields to support searching.
  • To map ICD-10-CM codes, find their concept_ids in the CONCEPT table (where vocabulary_id='ICD10CM'), then find records in the CONCEPT_RELATIONSHIP table where concept_id_1s are ICD-10-CM concept_ids and relationship_id IN ('Maps to','Maps to value'). Rows with relationship_id = 'Maps to' give the primary standard concept(s); rows with relationship_id = 'Maps to value' give additional value/detail concepts that typically populate value_as_concept_id (if exists). The concept_id_2 will be the standard concept(s). This automated join approach can quickly map known vocabularies.
-- Query example:
SELECT dx_id,
dx_id_dx_name,
r.relationship_id,
c.concept_id,
c.concept_code,
c.concept_name,
c.vocabulary_id,
c.domain_id,
c.concept_class_id
FROM diagnoses a
JOIN concept b
ON b.concept_code = a.dx_id
AND b.vocabulary_id = 'ICD10CM'
JOIN concept_relationship r ON r.concept_id_1 = b.concept_id
JOIN concept c ON c.concept_id = r.concept_id_2
AND r.invalid_reason IS NULL
AND r.relationship_id IN ('Maps to', 'Maps to value')
AND c.standard_concept = 'S';

Case 1: ICD-10-CM E11.9 "Type 2 diabetes mellitus, without complications" has a "Maps to" SNOMED concept for "Type 2 diabetes mellitus". If a source code has multiple "Maps to" results (meaning one-to-many mapping), you will retrieve multiple target concept_ids. According to OMOP conventions, each source record should then produce multiple records – one per mapped concept – in the CDM. (Do not arbitrarily pick one concept and discard others; if the source code’s meaning truly encompasses multiple concepts, all should be represented to avoid loss of information.) Case 2: ICD-10-CM Z86.16 "Personal history of COVID-19": Maps to → History of event (concept_id 1340204, vocabulary OMOP Extension, domain Observation) – this becomes the observation_concept_id. Maps to value → COVID-19 (concept_id 37311061, vocabulary SNOMED, domain Condition) – this becomes the value_as_concept_id. In the CDM, a single source record with Z86.16 should therefore produce one Observation record with observation_concept_id = 1340204 and value_as_concept_id = 37311061.

Step 4: Utilize Mapping Tools and/or Athena Browser to Perform Manual Mapping

For source codes that are not already mapped via step 3 (for example, local codes or less common vocabularies not in the standard tables, or cases where the automatic join found no result), use a tool like Usagi to accelerate the mapping and/or Athena browser:

  • Usagi is an OHDSI tool that suggests mappings based on term similarity. Before using it, you first need to build the Usagi index by loading the OMOP vocabularies you downloaded from Athena (CSV files) into Usagi. After the vocabularies are imported, load your list of unmapped source codes (with their descriptions). Usagi will then run a term-matching algorithm against the imported OMOP vocabularies to propose candidate standard concepts that lexically or semantically resemble each source term.
  • Review Usagi’s suggestions for each code. Set an appropriate similarity score threshold to filter out poor matches, and manually inspect high-scoring suggestions. At this stage, human judgment is critical – ensure the suggested concept truly matches the meaning of the source code (not just a partial text match). Usagi conveniently flags if a concept is standard and shows its domain and vocabulary. It will only propose standard concepts by design.
  • Pick the best fitting concept(s) for each source code. In many cases, it will be a one-to-one match (one source code to one standard concept). If the tool suggests multiple concepts to cover one source term’s meaning, carefully consider if a one-to-many mapping is needed (see Step 6 and examples below). Document the chosen mapping for each code within Usagi’s interface or export the results.
  • Using Athena: enter the unmapped source code or its description in Athena’s search. If the code/name is recognized, Athena will show a concept entry. Check the concept’s details to see if it is marked as standard or non-standard. If non-standard, look at its "Maps to" relationships ("Non-standard to Standard map") – Athena displays the target standard concept(s) it maps to. For instance, searching ICD-10-CM code I26.0 "Pulmonary embolism with acute cor pulmonale" in Athena reveals it maps to two SNOMED concepts: one for Pulmonary embolism and one for Acute cor pulmonale. This indicates a one-to-many mapping: the ICD code’s meaning is split into two SNOMED condition concepts, so the source record must produce two rows in the CONDITION_OCCURRENCE table.

Step 5: Clarify and Curate Ambiguous Codes

  • Even after automated and semi-automated steps, some source codes may remain uncertain. These often require subject matter expert's curation:
    • Search Athena manually using keywords from the source code’s description. Sometimes tweaking the search terms or using synonyms helps. For instance, a source diagnosis "Heart attack" might not pull up results, but searching "myocardial infarction" in Athena would find the SNOMED concept for myocardial infarction.
    • Consult clinicians or domain experts for codes that are clinically ambiguous or unclear. Understanding the exact meaning of a source code is crucial for finding the correct standard concept.
    • Ensure standard concept and correct domain: When manually selecting a mapping, confirm that the target concept has standard_concept = 'S' (standard) in the OMOP vocabulary. Avoid mapping to classification concepts (standard_concept = 'C'), which are higher-level groupings not used for recording individual patient data.
    • If no equivalent concept exists, follow the OHDSI convention of choosing a broader or closest standard concept rather than losing the data entirely. This is called an uphill mapping. For instance, a very specific local condition might be mapped to a more general SNOMED condition if a direct match is not available – make sure to note that a loss of granularity occurred (e.g., via SSSOM mapping metadata). The opposite approach – mapping to a more specific (more granular) standard concept than the source term (downhill mapping) – is generally discouraged, as it adds clinical detail that is not explicitly present in the source; if applied at all, it should be clearly documented as an assumption rather than a fact. If a source code is truly unique (e.g., a custom lab assay with no standard counterpart), you may create a custom concept (with a concept_id > 2,000,000,000). Custom concepts can act as stand-ins for source codes. Use custom concepts sparingly and document them clearly (including creating entries in the concept and source_to_concept_map tables as needed).

Step 6: Verify One-to-One vs One-to-Many Mappings

  • After determining the best mapping for each source code, prepare to implement these mappings in the ETL process. This typically involves creating a mapping reference table (or spreadsheet) that links each source code to its target concept(s):
    • One-to-One Mapping: Most source codes will map to a single standard concept. In these cases, each source record will generate a single record in the corresponding OMOP CDM table. Example: ICD-10 code E86.1 "Hypovolemia" maps one-to-one to the standard SNOMED concept. In the ETL, for every occurrence of E86.1 in the source, you would create one CONDITION_OCCURRENCE with the SNOMED concept_id for Hypovolemia.
    • One-to-Many Mapping: Some source codes represent a combination of clinical ideas and thus map to multiple standard concepts. In these cases, a single source record must spawn multiple OMOP records (to capture all components).

Example 1: ICD-10 code I26.0 "Pulmonary embolism with acute cor pulmonale" requires two condition concepts: one for Pulmonary embolism and one for Acute cor pulmonale, since no single SNOMED concept covers both aspects. Thus, a single diagnosis entry of I26.0 in the source would result in two records in the CONDITION_OCCURRENCE table – one with concept_id for pulmonary embolism and another with concept_id for acute cor pulmonale (both tied to the same patient and date). Example 2: Multi-ingredient drug product MORPH 2.5MG / DEXAMETH 2MG INH.SOL can be mapped to two ingredient concepts: morphine (concept_id 1110410) AND dexamethasone (concept_id 1518254). In this case, one source order/administration should result in two DRUG_EXPOSURE records – one for morphine and one for dexamethasone – again linked by the same person, datetime, and source identifier. Example 3: The compounded drug product GABA 5% / Amitript 2% / Ketamine 5% can be mapped to three drug concepts representing each Clinical Drug Component (specific OMOP concept class for drugs with known ingredient and drug strength): gabapentin 50 MG/ML (concept_id 797436) + Amitriptyline 20 MG/ML (concept_id 589159) + ketamine 50 MG/ML (concept_id 19085322). A single prescription/dispense of this compounded drug should therefore produce three DRUG_EXPOSURE records, all sharing the same dates and source identifiers but with different drug_concept_ids for each ingredient.

  • When implementing, join to or lookup in the mapping table. If using a SQL-based ETL, joining source data to a mapping table will "duplicate" source rows for one-to-many mappings (because the mapping table will have multiple entries for that one source code). This is desired, as it yields multiple output records. Do not collapse or randomly choose one concept from a one-to-many map – all mapped concepts should be recorded. Each of the multiple records can carry the same source identifier and timestamp, differing only in the standard concept_id.
  • Preserve Source Values: Always fill the OMOP _source_value fields with the original source code or term, and [cdm_table].source_concept_id with the concept_id of the source code (if it exists in the source data). This preserves lineage. In one-to-many scenarios, each of the multiple target records would have the same source_value and source_concept_id (representing the single originating entry).

Step 7: Route Mapped Concepts to the Correct CDM Tables

  • Ensure the mapped concepts are stored in the correct CDM table/field according to their domains. OMOP’s design is that the domain of the standard concept dictates where the record goes:
Standard Concept DomainCDM Event Table(s) and Field(s) to Populate
ConditionCONDITION_OCCURRENCE.condition_concept_id, CONDITION_ERA.condition_concept_id
DrugDRUG_EXPOSURE.drug_concept_id, DRUG_ERA.drug_concept_id, DOSE_ERA.drug_concept_id
ProcedurePROCEDURE_OCCURRENCE.procedure_concept_id
MeasurementMEASUREMENT.measurement_concept_id
ObservationOBSERVATION.observation_concept_id
DeviceDEVICE_EXPOSURE.device_concept_id
SpecimenSPECIMEN.specimen_concept_id
VisitVISIT_OCCURRENCE.visit_concept_id, VISIT_DETAIL.visit_detail_concept_id
EpisodeEPISODE.episode_concept_id
UnitMEASUREMENT.unit_concept_id, OBSERVATION.unit_concept_id, DEVICE_EXPOSURE.unit_concept_id
Meas ValueMEASUREMENT.value_as_concept_id, OBSERVATION.value_as_concept_id
GenderPERSON.gender_concept_id
RacePERSON.race_concept_id
EthnicityPERSON.ethnicity_concept_id
RegimenEPISODE.episode_object_concept_id
RouteDRUG_EXPOSURE.route_concept_id
Type Conceptall _type_concept_ids
  • Special case: one-to-many across different domains: Sometimes a single source code maps to multiple standard concepts from different domains. In this case, one source record must produce multiple CDM records in different event tables, each driven by the domain of the mapped concept.

Example: ICD-10-CM code C7B "Secondary neuroendocrine tumors" has two "Maps to" relationships in the OMOP vocabulary: to 36769180 "Metastasis", which belongs to the Measurement domain (Cancer Modifier) and to 1244604 "Neuroendocrine neoplasm, malignant”, which belongs to the Condition domain (SNOMED). This means that each source record with C7B should produce two separate CDM records: one in the MEASUREMENT table (with measurement_concept_id = 36769180) and one in the CONDITION_OCCURRENCE table (with condition_concept_id = 1244604). These records will typically share the same person_id, date/time, and the same source identifiers, reflecting that they originate from the same underlying encounter.

The key rule is: do not try to force all mapped concepts into a single table just because they come from the same source field or vocabulary. Always let the domain of the mapped standard concept determine which CDM event table you populate – even if that means a single source code results in multiple rows across multiple tables.

Step 8: Quality Assurance and Validation

  • After applying the mappings in your ETL logic, perform thorough QA checks on the output:
    • Complete Mapping Coverage: Verify that every source code has been handled. There should be no unmapped codes silently dropped. For any source code that could not be mapped to a standard concept, you have two main options: 1) Assign a concept_id of 0 ("No matching concept") for that record in the CDM and still carry over the source_value for transparency; and/or 2) create a custom concept (to be used as standard) as noted earlier. Whichever approach, document how unmapped codes are flagged. It is important that analysts know if some data was not standardized.
    • Standard Concepts Only: Ensure that all concept_ids used in clinical event tables are standard. This means checking that each concept_id in your Condition, Procedure, Drug, etc. tables has standard_concept = 'S' in the vocabulary. No source or classification concept_ids should appear in those fields. This check can be done by joining your result concept_ids back to the concept table. It is a hard requirement that only standard concepts populate the primary concept_id.
    • Domain Consistency: Cross-verify that the domain of each concept_id matches the table it is in. For example, no Drug domain concept_id should end up in CONDITION_OCCURRENCE, no Procedure in MEASUREMENT, etc. If mismatches are found, it indicates an error in mapping or in placing the record.
    • Clinical Sanity Checks: Do some spot checks on high-frequency or clinically important mappings. For instance, if lab results are mapped, confirm the units and values make sense and unit_concept_ids are standard (e.g., ensure "mg/dL" unit mapped to correct UCUM concept).
    • Counts and Duplicates: In one-to-many mappings, patients will have multiple records for one source event. Confirm that this inflation of record count is expected and documented. It is helpful to communicate in documentation that certain source codes split into multiple records. Also, use the source_value to ensure you can trace back if needed – multiple records with the same source reference should share the source_value for clarity.
    • Review with Stakeholders: If possible, review the mapping outcomes with clinicians or data owners, especially for contentious mappings (e.g., where a code was mapped to a broader concept, or where a composite code was split). A quick review of a sample of patient data in the new CDM format against the original records can confirm that the mapping logic didn’t introduce anomalies.
    • Iterate Fixes: If QA finds any issues (e.g., a code mapped incorrectly, or an unmapped code discovered later), update the mapping and re-run the ETL for those parts. It is common to have a few iterations to achieve a near-100% proper mapping.

Step 9: Documentation and Maintenance

  • Finally, document the mapping work for future reference and maintain it over time:
    • Mapping Specification: Create a document or spreadsheet that lists each source code, its description, and the chosen target standard concept(s) (with concept IDs and names). Also note any special handling (e.g., "this source code produces two records: concept A and concept B") or any custom concepts created. This serves as both a record for peer review and a reference for future ETL updates.
    • Versioning: Record the version of the OMOP vocabulary used for mapping (e.g., vocabulary release date). The standard vocabularies update biannually; a concept you used might be deprecated in the future or new mappings might become available. Having the version noted helps when updating or troubleshooting later. It is good practice to periodically refresh your mappings when vocabularies update.
    • Ongoing Maintenance: If new source codes appear (e.g., a new diagnosis code introduced in the source system), update the mapping table using the same process (do not forget to check for new codes regularly if the source system is dynamic). Also, if you find that analysts or data users are confused by certain mappings (for instance, the one-to-many cases or the domain shifts), consider adding explanatory notes in the documentation or even implementing flags.

Additional Considerations

  • Ambiguous Source Concepts (Combination/Alternative meanings): Be cautious with source codes that represent an "either/or" clinical situation or a combination that is not strictly additive. For example, a source diagnosis might be labeled "Disease A or Disease B". If you were to map this to two separate concepts (A and B), it would falsely imply the patient had both conditions. In such cases, a better approach might be to map to a single, less-specific concept that covers the general category. For instance, if a source code means "Coronary heart disease OR Myocardial infarction", you might map it to a general "Coronary heart disease" concept alone, and preserve the original text in source_value to indicate the uncertainty. Document these decisions clearly. There is no perfect solution within OMOP for "either/or" ambiguity.
  • Use of Custom Concepts: If you create custom concepts for unmappable source codes or novel clinical terms, follow the conventions: use concept_ids above 2,000,000,000, assign an appropriate concept_name, domain_id, vocabulary_id, and set standard_concept to "S" if you intend to use it as a standard (e.g., in ATLAS).
  • Standard Vocabulary Updates and Version Alignment: The OHDSI Standardized Vocabularies are biannually updated. After an update, previously mapped concept_ids might become invalid or new preferred mappings might be introduced. Establish a routine to review and update mappings when vocabularies are updated. This could involve running a script to detect if any concept_id in your mapping spec now has invalid_reason IS NOT NULL (indicating deprecation) and finding the replacement via concept_relationship ("Maps to" or "Concept replaced by" relationships). Keeping the mappings current will ensure your ETL remains valid and your data stays standardized over time.
  • Communication to Data Users: Ensure that researchers and analysts using the transformed data are aware of the mapping approach and any nuances. Provide them with a data dictionary or mapping summary. This is especially important for one-to-many mappings (so they understand that counts of condition occurrences might increase for certain combined codes) and domain shifts (so they know to look in the Observation table for "history of" records, for example). It might be useful to hold office hours with the analysis team after a new ETL release to walk through how the source data was mapped.

Mapping Checklist for Mappers

Before finalizing the ETL and considering the mapping complete, use the following checklist to ensure all principles have been met:

  • Mapped to Standard Concepts: All source codes are mapped to concepts where standard_concept = 'S'. No non-standard or classification concept_ids appear in the clinical event tables. (If a source code couldn’t be mapped, it is documented and handled via concept_id = 0 or a custom concept, rather than using a wrong concept.)
  • One-to-Many Mappings Implemented: For each source code that required multiple standard concepts to cover its meaning, the ETL creates multiple records (or uses multiple fields as appropriate) so that no semantic information is lost. All components of the source concept’s meaning are represented. Example verified: combined codes like "X with Y" produce two records with concept X and concept Y.
  • Domain Conformance: The domain of each mapped concept matches the CDM table/field where the data is stored. Any intentional domain shifts are noted and correct.
  • Source Values Preserved: The original source code and description are recorded in the _source_value fields for traceability. The _source_concept_id is filled for source codes that exist in the OMOP vocabulary (including custom concepts if created). This allows verification and back-mapping if needed.
  • Unmapped or Uncertain Codes Addressed: No source code was ignored. For any code that did not have a clear mapping, an explicit decision is recorded (mapped to broader concept, set to 0 with note, or added as custom concept). These instances are documented in the mapping spec for transparency.
  • Clinical Review Completed: A subject matter expert (e.g., clinician or terminology specialist) has reviewed the mappings for clinical validity, especially for ambiguous cases or custom mappings. Any feedback from this review has been incorporated (e.g., choosing a more appropriate concept, adjusting one-to-many logic, etc.).
  • Documentation and Sign-off: The mapping documentation (specification spreadsheet or ETL documentation) is updated and complete. It includes mapping rationale where non-obvious. The mapping has been approved by relevant stakeholders or governance if required. A plan for maintenance (e.g., periodic review after vocabulary updates) is noted.

The following office hour sessions provide additional context and demonstrations related to this SOP:

Resources