Exposome Geocoder – Input Preparation and Usage Guide
Note: This toolkit does not share any Protected Health Information (PHI).
This repository provides a reproducible workflow to geocode patient location data (Phase 1) and link the resulting coordinates with exposome datasets (Phase 2). This workflow ensures that sensitive address data remains local while generating standardized exposure metrics to be shared to central server without identifiers.
This SOP describes the workflow for running codes to geocode patient location data and link latitude and longitude coordinates with exposome datasets. All code will be executed locally at each site. Only the exposure tables containing exposome data will be shared with the central server; no address-level data will be transmitted or stored centrally. Sites should use the most granular address information available to them or latitude/longitude coordinates.
Demo video Watch here
📑 Table of Contents
- Overview
- Input Options
- Usage Guide
- Phase 1 Priority Workflow (Recommended)
- References & sample files
- Related Office Hours
- Appendix
Overview
This workflow uses two separate Docker containers to support end-to-end geocoding and data linkage:
- Exposome Geocoder Container (
prismaplab/exposome-geocoder:1.0.4)
Performs address or coordinate-based geocoding to generate latitude/longitude for LOCATION workflows and supports FIPS workflows when needed, using DeGAUSS backend tools. - Exposome Linkage Container (
ghcr.io/chorus-ai/chorus-postgis-exposure:main)
Integrates the geocoded outputs with relevant environmental and social determinant datasets to produce analysis-ready files.
Together, these containers enable:
- Address and latitude/longitude-based geocoding
- LOCATION and LOCATION_HISTORY preparation for linkage
- OMOP CDM geocoding extraction and processing
- GIS linkage with PostGIS-SDoH indices (ADI, SVI, AHRQ)
Input Options
Phase 1 (Geocoding) Input: To generate coordinates, you need to prepare only ONE of the following data elements per encounter (Option 1: Address, Option 2: Coordinates, or Option 3: OMOP CDM tables).
Phase 2 (Linkage) Input: Regardless of the input option chosen for Phase 1, the final output MUST be transformed into two specific CSV files to run Phase 2.
- LOCATION.csv: Contains the physical coordinates (latitude, longitude) and identifiers (location_id).
- LOCATION_HISTORY.csv: Contains the temporal mapping of a person (entity_id, which is same as person_id) to a location (location_id) over a specific time range (start_date, end_date).
See Appendix A for the Data Dictionary and population logic.
Option 1: Address
Sample input files here
- Format A: Multi-Column Address
| street | city | state | zip | year | entity_id |
|---|---|---|---|---|---|
| 1250 W 16th St | Jacksonville | FL | 32209 | 2019 | 1 |
| 2001 SW 16th St | Gainesville | FL | 32608 | 2019 | 2 |
Tip: Street and ZIP are required. Missing these fields may lead to imprecise geocoding.
- Format B: Single Column Address
| address | year | entity_id |
|---|---|---|
| 1250 W 16th St Jacksonville FL 32209 | 2019 | 1 |
| 2001 SW 16th St Gainesville FL 32608 | 2019 | 2 |
Optional Supporting Files
Including the following optional files will help streamline the end-to-end workflow between geocoding and exposome linkage:
-
Important: Do not date-shift your LOCATION/LOCATION_HISTORY files before linkage. Date shifting (if used) should occur post linkage in Step 4.
If these files are provided during geocoding, the output will automatically include the updated latitude and longitude information required for the postgis linkage container.
If they are not provided, users will need to manually update their LOCATION files with the geocoded latitude/longitude before executing the commands for linkage.
LOCATION.csv (Follows CDM format)
| location_id | address_1 | address_2 | city | state | zip | county | location_source_value | country_concept_id | country_source_value | latitude | longitude |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1248 N Blackstone Ave | FRESNO | CA | 93703 | UNITED STATES OF AMERICA | UNITED STATES OF AMERICA | 36.75891146 | -119.7902719 |
LOCATION_HISTORY.csv (Follows CDM format)
| location_id | relationship_type_concept_id | domain_id | entity_id | start_date | end_date |
|---|---|---|---|---|---|
| 1 | 32848 | 1147314 | 3763 | 1998-01-01 | 2020-01-01 |
Option 2: Coordinates
Sample input files here
| location_id | latitude | longitude | zip | entity_id | start_date |
|---|---|---|---|---|---|
| 1 | 30.353463 | -81.6749 | 32209 | 1 | 2019-01-05 |
| 2 | 29.634219 | -82.3433 | 32608 | 2 | 2019-02-14 |
As with address-based input, including LOCATION.csv and LOCATION_HISTORY.csv enables seamless downstream processing with the linkage container.
Important for auto-generated LOCATION_HISTORY.csv (if not provided as input and only location for index encounter is available):
- Coordinates can be provided, but rows should include
location_id, either address or zip,entity_id(which is same asperson_idfor a given subject), andstart_date(for whichvisit_start_datecan be used as a proxy if nostart_dateis available).
Option 3: OMOP CDM
| Table | Required Columns |
|---|---|
| person | person_id |
| visit_occurrence | visit_occurrence_id, visit_start_date, visit_end_date, person_id |
| location | location_id, address_1, address_2, city, state, zip, location_source_value, country_concept_id, country_source_value, latitude, longitude |
| location_history | location_id, relationship_type_concept_id, domain_id, entity_id, start_date, end_date |
If you have OMOP CDM with required elements already, it can be used to prepare location and location history CSV tables as required by Phase 2.
Usage Guide
Phase 1 Priority Workflow (Recommended)
For Phase 1 geocoding (latitude/longitude generation), use:
- Script:
Address_to_LOCATION.py - Container:
prismaplab/exposome-geocoder:1.0.4
This workflow is recommended because it directly prepares files needed for Phase 2 linkage:
LOCATION.csv(updated with latitude/longitude andmodifier_source_value)LOCATION_HISTORY.csv(maintained in OMOP format)
Step 1: Prepare Input Data
For the recommended workflow, place files in one input folder and run one command.
Preferred input (recommended for all sites):
LOCATION.csvin OMOP-style formatLOCATION_HISTORY.csvin OMOP-style format
Also supported by Address_to_LOCATION.py:
-
encounter-level CSV files with address columns (for example
addressoraddress_1/street, plus city/state/zip) -
encounter-level CSV files with
latitude/longitude(orlat/lon)
Minimum required columns across input files:
location_id- either address or zip information (for
LOCATION.csvgeneration)
Recommended columns for complete LOCATION_HISTORY.csv auto-generation:
entity_id(which is same asperson_idfor a given subject)start_date(for whichvisit_start_datecan be used as a proxy if nostart_dateis available)year(used asstart_date = YYYY-01-01only when start date columns are missing)
Input file discovery behavior (for flexibility across sites):
LOCATION.csvis preferred when present (filename match is case-insensitive) and is used as the primary input.- If
LOCATION.csvis not present, the script ingests all other.csvfiles in the input folder (excludingLOCATION_HISTORY.csv), concatenates them, and processes them as encounter-level input. - Arbitrary encounter file names are supported as long as required columns are recognizable (for example
address_1/street/addressandlatitude/longitudeorlat/lon).
Folder Structure
- Place CSV file(s) in a dedicated folder, for example:
- 📂
input_location/ LOCATION.csvLOCATION_HISTORY.csv
- 📂
⚠️ Only
.csvfiles are supported. Convert.xlsxor other formats before running the tool.
Guidance on Populating LOCATION_HISTORY.csv:
This table links a person to a specific location for a specific time range.
If LOCATION_HISTORY.csv is not provided, Address_to_LOCATION.py auto-generates it from the input rows used to build LOCATION.csv.
-
Default identifiers used by the toolkit during auto-generation:
relationship_type_concept_id = 32848(project default OMOP concept ID for location-person relationship type)domain_id = 1147314(project default OMOP concept ID corresponding toPERSONdomain)
-
entity_idmapping during auto-generation:- uses input
entity_idwhen present - otherwise uses input
person_id - if both are missing/blank, script logs an alert and leaves
entity_idblank
- uses input
-
Date mapping during auto-generation:
start_dateuses inputstart_datewhen present, otherwise, usesvisit_start_dateto imputestart_date.- if start date fields are missing and
yearis present,start_dateis set toYYYY-01-01 end_dateuses inputend_datewhen present, otherwise usesvisit_end_dateto imputeend_date.yearis not used to populateend_date- if start-date values are missing/blank or invalid, script logs an alert and leaves
start_dateblank
Important validation behavior:
-
The script does not synthesize
entity_idvalues. -
The script uses
yearonly forstart_datefallback (YYYY-01-01) when explicit start date fields are absent. -
If
entity_idor date values are missing/invalid, the script continues execution, reports alert messages with row numbers, and leaves corresponding output fields blank. -
If you have full residential history: Use the actual move-in (start_date) and move-out (end_date) dates.
-
If you only have location for index ICU encounter and do not have access to previous residential addresses with date stamps as required by location history table, you can use the following logic to populate LOCATION_HISTORY.csv: If linking to specific encounters, set start_date equal to the relevant encounter admission date, that is visit_start_date (in visit_occurance table) . Set end_date to NULL.
Step 2: Run Geocoding for LOCATION Outputs (Primary)
Container: prismaplab/exposome-geocoder:1.0.4
Ensure Docker Desktop is running.
This step uses the Exposome Geocoder container to:
- preserve any valid existing latitude/longitude provided by your site in
LOCATION.csv - populate latitude/longitude using addresses
- impute missing latitude/longitude using staged geocoding fallback that utilizes available zip code information
- output updated
LOCATION.csvfor Phase 2
For macOS / Linux / Ubuntu
docker run -it --rm \
-v "$(pwd)":/workspace \
-v /var/run/docker.sock:/var/run/docker.sock \
-e HOST_PWD="$(pwd)" \
-w /workspace \
prismaplab/exposome-geocoder:1.0.4 \
/app/code/Address_to_LOCATION.py -i <input_folder_path>
For Windows
- Open Command Prompt or PowerShell
- Run command
wsl - Execute the same command as above inside your WSL terminal.
Example:
If your files are inside 📂input_location/, run:
docker run -it --rm -v "$(pwd)":/workspace -v /var/run/docker.sock:/var/run/docker.sock -e HOST_PWD="$(pwd)" -w /workspace prismaplab/exposome-geocoder:1.0.4 /app/code/Address_to_LOCATION.py -i input_location
Geocoding Levels Used by Address_to_LOCATION.py
The script assigns the following levels to indicate how latitude and longitude coordinates were obtained:
- Level 1: Provided coordinates already present in input
- Level 2: Address-based geocoding
- Level 3: ZIP9-based geocoding fallback
- Level 4: ZIP5-based geocoding fallback
- Failed: No valid coordinate could be assigned
The assigned level is written to modifier_source_value in LOCATION.csv.
Success Rate Reporting
The script writes a geocoding summary report in output/:
geocoding_summary_<timestamp>.csv
Legacy / Optional FIPS Workflows
If site needs FIPS-specific outputs in Phase 1, they can still run Address_to_FIPS.py or OMOP_to_FIPS.py using the same container tag (1.0.4).
For OMOP Input (Option 3)
To extract and geocode directly from an OMOP database:
docker run -it --rm \
-v "$(pwd)":/workspace \
-v /var/run/docker.sock:/var/run/docker.sock \
-e HOST_PWD="$(pwd)" \
-w /workspace \
prismaplab/exposome-geocoder:1.0.4 \
/app/code/OMOP_to_FIPS.py \
--user <your_username> \
--password <your_password> \
--server <server_address> \
--port <port_number> \
--database <database_name>
Staged geocoding that imputes latitude and longitude for cases when address is missing, but zip code is available has not been incorporated to this version.
Note on Dependencies (Firewall Warning):
The geocoding scripts attempt to pull Docker images automatically. If you have a strict firewall, you may need to pull these images manually before running the script:
docker pull ghcr.io/degauss-org/geocoder:3.3.0
docker pull ghcr.io/degauss-org/census_block_group:0.6.0
Step 3: Output Structure
After running the geocoder container, the tool generates output files in the output/ folder.
Primary Output (Address_to_LOCATION.py)
Files Generated
LOCATION.csv(updated lat/lon +modifier_source_value)LOCATION_HISTORY.csv(OMOP schema preserved)geocoding_summary_<timestamp>.csv(success metrics)geocode_failures_<timestamp>.csv(only when failed records exist)log/address_to_location_<timestamp>.log
Legacy FIPS/Zip Outputs (when using FIPS scripts)
Sample outputs demo/address_files/output
Each input file can produce:
<filename>_with_coordinates.csv— input + latitude/longitude<filename>_with_fips.csv— input + FIPS codes
This code has not been updated with latitude and longitude imputations utilizing zip codes and is not recommended to be used, however it may be useful if you need to generate FIPS codes for other purposes.
IMPORTANT TO NOTE
Phase 2 input preparation note: The recommended workflow is to run Address_to_LOCATION.py and use generated LOCATION.csv and LOCATION_HISTORY.csv directly in Phase 2. Ensure your location_id values are consistent between LOCATION.csv and LOCATION_HISTORY.csv before running Phase 2.
Reason Column Values (for failed records):
Possible values include:
- Street missing
- City missing
- State missing
- Zip missing
- ZIP9 not found in zip9-fips12 crosswalk
- ZIP5 not found in HUD crosswalk
OMOP Input (Option 3)
Sample outputs: demo/OMOP/output
Folder Structure
OMOP_data/
├── valid_address/ # Records with address, no lat/lon
├── invalid_lat_lon_address/ # Records missing both address and lat/lon
├── valid_lat_long/ # Records with lat/lon
OMOP_FIPS_result/
├── address/
│ ├── address_with_coordinates.zip # CSVs with lat/lon from address
│ └── address_with_fips.zip # CSVs with FIPS codes
├── latlong/
│ └── latlong_with_fips.zip # CSVs with FIPS from coordinates
├── invalid/ # Usually empty; no usable location data
LOCATION.csv
LOCATION_HISTORY.csv
Step 4: GIS Linkage with PostGIS-Exposure Tool
Purpose:
Spatially joins the lat/lon (and FIPS) from geocoding with geospatial indices (ADI, SVI, AHRQ) and produces EXTERNAL_EXPOSURE.csv.
Prerequisites for GIS Linkage
- Docker installed.
- Clone postgis-exposure repository
- Update
LOCATION,LOCATION_HISTORYfiles to include the geocoded lat/lon from Step 2. Not needed if you included these during the geocoding step - Ensure
DATA_SRC_SIMPLE.csvandVRBL_SRC_SIMPLE.csvfiles are available (centrally managed; no edits required). - Important: Do not date-shift your
LOCATION/LOCATION_HISTORYfiles before linkage. Date shifting (if used) should occur following this step.
Sample DATA_SRC_SIMPLE.csv and VRBL_SRC_SIMPLE.csv: here
Expected Outputs
EXTERNAL_EXPOSURE.csvcontaining linked indices (ADI, SVI, AHRQ metrics).
GIS Linkage Workflow
-
Start Postgres/PostGIS container following the instructions in the postgis-exposure repository. Container sequence: start/load database → ingest location tables → run the produce script. First Docker command (prepares the database):
docker run --rm --name postgis-chorus \
--env POSTGRES_PASSWORD=dummy \
--env VARIABLES=134,135,136 \
--env DATA_SOURCES=1234,5150,9999 \
-v $(pwd)/test/source:/source \
-d ghcr.io/chorus-ai/chorus-postgis-exposure:main- Replace
VARIABLESwith the comma-separated list of variable IDs you need fromVRBL_SRC_SIMPLE.csv. - Replace
DATA_SOURCESwith the relevant data source IDs (fromDATA_SRC_SIMPLE.csv).
- Replace
-
** Generate the external exposure file:**
docker exec postgis-chorus /app/produce_external_exposure.sh -
Output:
EXTERNAL_EXPOSURE.csvwill appear in your mounted directory (e.g.,./test/source).
Notes & Tips
- Run these commands in Terminal (Mac) or WSL/PowerShell/Command Prompt on Windows; WSL is more robust for Docker on Windows.
- If your site needs more variables, expand
VARIABLESaccordingly. - Important: The container may only run successfully once. To rerun, you may need to delete the container and image, then pull the image again.
Step 5: Validate & Inspect Outputs
- Open
EXTERNAL_EXPOSURE.csv. Confirm:- Patient ID, lat, lon, FIPS
- ADI, SVI, AHRQ, and VRBL-coded fields
- Spot-check a few records for accuracy.
- If errors:
- Ensure
LOCATIONhas valid lat/lon/FIPS - Confirm
VARIABLESandDATA_SOURCESare correct - Check mount paths
- Ensure
Step 6: Optional - Site-level Date Shifting
Purpose: Anonymize temporal data while preserving relative timelines.
Guidelines:
- Apply date shifts locally before upload — do not date-shift prior to GIS linkage.
- Input:
EXTERNAL_EXPOSURE.csv(from Step 4) - Output:
EXTERNAL_EXPOSURE_date_shifted.csv
See Date Shifting SOP for More Details.
Step 7: Upload & Centralized De-identification
- Upload the (optionally date-shifted)
EXTERNAL_EXPOSURE.csvto the central repository. - The central team will apply further de-identification.
References & sample files
Geocoding
- Sample files: Geocoding Demo Files
GIS Linkage
- Sample files: PostGIS Exposure CSVs
- Site-specific:
LOCATION,LOCATION_HISTORY - Centrally managed:
DATA_SRC_SIMPLE,VRBL_SRC_SIMPLE
- Site-specific:
Related Office Hours
The following office hour sessions provide additional context and demonstrations related to this SOP:
-
[08-07-25] Integration of GIS and SDoH data with OMOP
- Video Recording | Transcript
- Comprehensive session on integrating GIS and social determinants of health data
-
[09-18-25] Processing OMOP location_history table into external_exposure table
- Video Recording | Transcript
- Technical implementation of location data processing for external exposures
-
[09-25-25] End-to-end demo for capturing GIS data with OMOP
- Video Recording | Transcript
- Complete workflow demonstration for GIS data capture and processing
-
[10-16-2025] End-to-end demo for capturing GIS data with OMOP or address/latlong
- Video Recording | Transcript
- Complete workflow demonstration for GIS data capture and processing based on updated documentation
Appendix
Appendix A: Data Dictionary and Logic
To successfully run Phase 2, your data must match the OMOP CDM definitions below.
These field requirements align with OMOP CDM v6.0 LOCATION and LOCATION_HISTORY expectations.
1. LOCATION Table
Represents physical location or address information.
| Field | Description |
|---|---|
| location_id | The unique key assigned to a Location. Each instance of a Location in the source data should use this key. [REQUIRED] |
| address_1 | First line of the address. [RECOMMENDED, OPTIONAL] |
| address_2 | Second line of the address. |
| city | City name. [RECOMMENDED, OPTIONAL] |
| state | State name. [RECOMMENDED, OPTIONAL] |
| zip | ZIP code as string. ZIP+4 preferred; ZIP5 accepted. [RECOMMENDED, OPTIONAL] |
| county | County name. [RECOMMENDED, OPTIONAL] |
| location_source_value | Source text/value for location. [OPTIONAL] |
| latitude | Geocoded latitude (Float). [OPTIONAL] |
| longitude | Geocoded longitude (Float). [OPTIONAL] |
Notes for this toolkit:
- The generated
LOCATION.csvmay also include OMOP-compatible country columns (country_concept_id,country_source_value) and geocoding provenance (modifier_source_value).
2. LOCATION_HISTORY Table
Stores relationships between persons and geographic locations over time.
| Field | Description |
|---|---|
| location_id | References the location_id in the LOCATION table. [REQUIRED] |
| relationship_type_concept_id | OMOP concept ID for location-person relationship type. Defaults to 32848. [REQUIRED] |
| domain_id | Domain of the entity. For this toolkit output, this is emitted as OMOP concept id 1147314 (PERSON domain concept). [REQUIRED] |
| entity_id | Unique identifier for the entity; should be person_id. [REQUIRED] |
| start_date | Date the relationship started. [REQUIRED] |
| end_date | Date the relationship ended. [RECOMMENDED, OPTIONAL] |
Notes for this toolkit:
entity_idshould align with OMOPPERSON.person_idwhendomain_idis PERSON.- If
LOCATION_HISTORY.csvis not supplied, the script auto-generates it from available input fields. - Missing values (
entity_id/person_id,start_date/visit_start_date) trigger an alert, the script continues, and corresponding fields remain blank. - If only
yearis provided, the script setsstart_datetoYYYY-01-01and does not inferend_datefromyear.
3. EXTERNAL_EXPOSURE Table
After Phase 2 execution, the pipeline generates the external_exposure table with the columns below.
| Variable | Description |
|---|---|
| external_exposure_id | Unique row identifier for the exposure record. |
| location_id | Foreign key linking to the input LOCATION.csv file. |
| person_id | Foreign key linking to entity_id in the input LOCATION_HISTORY.csv file. |
| exposure_start_date | Start date of the exposure event (calculated overlap). |
| exposure_end_date | End date of the exposure event. |
| exposure_source_value | Name of the environmental variable linked. |
| value_as_number | Numerical value of the environmental variable. |
| unit_concept_id | OMOP Concept ID representing the unit of measure. |
| exposure_concept_id | OMOP Concept ID representing the environmental variable. |
| exposure_type_concept_id | OMOP Concept ID for the type of exposure. |
| value_as_concept_id | OMOP Concept ID for categorical results. |
| modifier_source_value | This field will be used to provide level that explains information used for coordinate generation. |
Note: This table reflects the exposure data generated as output of Phase 2.
4. GEOCODING_SUMMARY Report (Phase 1 output)
The geocoding_summary_<timestamp>.csv report provides record-level completion metrics for Phase 1 geocoding.
| Column | Description |
|---|---|
| total_records | Total number of rows processed in LOCATION.csv. |
| records_with_coordinates | Number of rows with valid latitude and longitude after geocoding. |
| records_without_coordinates | Number of rows that still do not have valid coordinates. |
| success_rate_percent | Percent of rows with valid coordinates (records_with_coordinates / total_records * 100). |
| level1_provided | Count of rows where input coordinates were already valid. |
| level2_address | Count of rows geocoded from address-level input. |
| level3_zip9 | Count of rows geocoded using ZIP9 fallback. |
| level4_zip5 | Count of rows geocoded using ZIP5 fallback. |
| failed | Count of rows with unresolved geocoding failures. |
5. GEOCODE_FAILURES Report (Phase 1 output)
The geocode_failures_<timestamp>.csv report includes only rows that remain unresolved after staged geocoding.
| Column | Description |
|---|---|
| location_id | Location identifier associated with the failed geocoding row. |
| address_1 | Normalized primary street/address line available at failure time. |
| address_2 | Normalized secondary address/unit line available at failure time. |
| city | Normalized city available at failure time. |
| state | Normalized state available at failure time. |
| zip | Normalized ZIP5 available at failure time. |
| geocode_level | Final geocoding status (failed). |
| geocode_reason | Aggregated reason(s) indicating why geocoding could not be completed. |
Appendix B: Geocoding Workflow
This guide outlines the scripts, workflows, and Docker based DeGAUSS toolkit used to generate latitude and longitude coordinates from patient data. The process follows a two step geocoding workflow powered by DeGAUSS and executed locally via Docker containers.
Method: DeGAUSS Toolkit (Docker-based)
DeGAUSS consists of two Docker containers:
- Geocoder (3.3.0) — Converts address to latitude/longitude
- Census Block Group (0.6.0) — Converts latitude/longitude to Census Tract FIPS codes
| Step | Purpose | Docker Image |
|---|---|---|
| 1 | Address → Coordinates | ghcr.io/degauss-org/geocoder:3.3.0 |
| 2 | Coordinates → FIPS | ghcr.io/degauss-org/census_block_group:0.6.0 |
DeGAUSS Docker Commands (Executed Internally)
# Step 1: Get Coordinates from Address
docker run --rm -v "ABS_OUTPUT_FOLDER:/tmp" \
ghcr.io/degauss-org/geocoder:3.3.0 \
/tmp/<your_preprocessed_input.csv> <threshold>
# Step 2: Get FIPS from Coordinates
docker run --rm -v "ABS_OUTPUT_FOLDER:/tmp" \
ghcr.io/degauss-org/census_block_group:0.6.0 \
/tmp/<your_coordinate_output.csv> <year>
Replace values:
ABS_OUTPUT_FOLDER→ absolute path to your output directory<threshold>→ numeric value (e.g.,0.7)<year>→ either2010or2020
Script Highlights
While our toolkit supports both LOCATION generation and FIPS workflows, Phase 2 requires latitude and longitude coordinates in LOCATION-based files.
Address_to_LOCATION.py Logic
This script is the recommended Phase 1 workflow:
- Reads
LOCATION.csv(or compatible encounter-level CSV inputs) - Preserves valid existing latitude/longitude values
- Uses staged fallback to fill missing coordinates:
- address geocoding
- ZIP9 fallback
- ZIP5 fallback
- Writes:
LOCATION.csvwith updated coordinates andmodifier_source_valueLOCATION_HISTORY.csvin OMOP format- geocoding summary and failure reports
Address_to_FIPS.py Logic
This script handles CSV-based input:
- Reads CSV files
- Normalizes address or uses lat/lon
- Runs DeGAUSS Docker container to generate:
- Latitude/Longitude (via
ghcr.io/degauss-org/geocoder) - FIPS codes(via
ghcr.io/degauss-org/census_block_group)
- Latitude/Longitude (via
- Packages outputs into ZIP
OMOP_to_FIPS.py Logic
This script integrates directly with OMOP CDM:
- Extracts OMOP CDM data
- Categorizes into valid/invalid address or coordinates
- Executes FIPS generation (same as CSV workflow)
- Packages outputs into ZIP