Exposome Geocoder – Input Preparation and Usage Guide
Note: This toolkit does not require or share any Protected Health Information (PHI).
This repository provides a reproducible workflow to geocode patient location data and link the resulting Census Tract (FIPS 11-digit) identifiers with Exposome datasets for environmental exposure analysis.
Demo video Watch here
📑 Table of Contents
- Geocoding Patient Data for Exposome Linkage
- 📑 Table of Contents
- Overview
- Input Options
- Usage Guide
- Appendix
Overview
This workflow uses two separate Docker containers to support end-to-end geocoding and data linkage:
-
Exposome Geocoder Container (
prismaplab/exposome-geocoder:1.0.3)
Performs address or coordinate-based geocoding to generate Census Tract (FIPS 11-digit) codes using DeGAUSS backend tools. -
Exposome Linkage Container (
ghcr.io/chorus-ai/chorus-postgis-sdoh:main)
Integrates the geocoded outputs with relevant environmental and social determinant datasets to produce analysis-ready files.
Together, these containers enable:
- Address and latitude/longitude-based geocoding
- OMOP CDM geocoding extraction and processing
- GIS linkage with PostGIS-SDoH indices (ADI, SVI, AHRQ)
Input Options
You need to prepare only ONE of the following data elements per encounter.
Option 1: Address
Sample input files here
- Format A: Multi-Column Address
| street | city | state | zip | year | entity_id |
|---|---|---|---|---|---|
| 1250 W 16th St | Jacksonville | FL | 32209 | 2019 | 1 |
| 2001 SW 16th St | Gainesville | FL | 32608 | 2019 | 2 |
Tip: Street and ZIP are required. Missing these fields may lead to imprecise geocoding.
- Format B: Single Column Address
| address | year | entity_id |
|---|---|---|
| 1250 W 16th St Jacksonville FL 32209 | 2019 | 1 |
| 2001 SW 16th St Gainesville FL 32608 | 2019 | 2 |
Optional Supporting Files
Including the following optional files will help streamline the end-to-end workflow between geocoding and exposome linkage:
-
Important: Do not date-shift your LOCATION/LOCATION_HISTORY files before linkage. Date shifting (if used) should occur post linkage in Step 4.
If these files are provided during geocoding, the output will automatically include the updated latitude and longitude information required for the postgis linkage container.
If they are not provided, users will need to manually update their LOCATION files with the geocoded latitude/longitude before executing the commands for linkage.
LOCATION.csv (Follows CDM format)
| location_id | address_1 | address_2 | city | state | zip | county | location_source_value | country_concept_id | country_source_value | latitude | longitude |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1248 N Blackstone Ave | FRESNO | CA | 93703 | UNITED STATES OF AMERICA | UNITED STATES OF AMERICA | 36.75891146 | -119.7902719 |
LOCATION_HISTORY.csv (Follows CDM format)
| location_id | relationship_type_concept_id | domain_id | entity_id | start_date | end_date |
|---|---|---|---|---|---|
| 1 | 32848 | 1147314 | 3763 | 1998-01-01 | 2020-01-01 |
Option 2: Coordinates
Sample input files here
| latitude | longitude | entity_id |
|---|---|---|
| 30.353463 | -81.6749 | 1 |
| 29.634219 | -82.3433 | 2 |
As with address-based input, including LOCATION.csv and LOCATION_HISTORY.csv enables seamless downstream processing with the linkage container.
Option 3: OMOP CDM
| Table | Required Columns |
|---|---|
| person | person_id |
| visit_occurrence | visit_occurrence_id, visit_start_date, visit_end_date, person_id |
| location | location_id, address_1, address_2, city, state, zip, location_source_value, country_concept_id, country_source_value, latitude, longitude |
| location_history | location_id, relationship_type_concept_id, domain_id, entity_id, start_date, end_date |
Usage Guide
Step 1: Prepare Input Data
Prepare only ONE of the data elements as indicated under the Input Options per encounter.
For Option 1 (Address) or Option 2 (Coordinates), your data must be in a CSV file format.
Folder Structure
- Place the CSV file(s) in a dedicated folder
- 📂
input_address/(for address-based data) - 📂
input_coordinates/(for coordinate-based data)
- 📂
- Optionally, include:
LOCATION.csvLOCATION_HISTORY.csv
⚠️ Only
.csvfiles are supported. Convert.xlsxor other formats before running the tool.
Step 2: Generate FIPS Codes
Container: prismaplab/exposome-geocoder:1.0.3
Ensure Docker Desktop is running.
This step uses the Exposome Geocoder container to:
- Convert addresses or coordinates into latitude/longitude
- Assign 11-digit Census Tract (FIPS) codes
For CSV Input (Option 1 & 2)
For macOS / Linux / Ubuntu
docker run -it --rm \
-v "$(pwd)":/workspace \
-v /var/run/docker.sock:/var/run/docker.sock \
-e HOST_PWD="$(pwd)" \
-w /workspace \
prismaplab/exposome-geocoder:1.0.3 \
/app/code/Address_to_FIPS.py -i <input_folder_path>
For Windows
- Open Command Prompt or powershell
- Run command
wsl - Execute the same command as above inside your WSL terminal.
Example:
If your file is named patients_address.csv inside 📂input_address/, run:
docker run -it --rm -v "$(pwd)":/workspace -v /var/run/docker.sock:/var/run/docker.sock -e HOST_PWD="$(pwd)" -w /workspace prismaplab/exposome-geocoder:1.0.3 /app/code/Address_to_FIPS.py -i input_address
For OMOP Input (Option 3)
To extract and geocode directly from an OMOP database:
docker run -it --rm \
-v /var/run/docker.sock:/var/run/docker.sock \
-v "$(pwd)":/workspace \
-e HOST_PWD="$(pwd)" \
-w /workspace \
prismaplab/exposome-geocoder:1.0.3 \
/app/code/OMOP_to_FIPS.py \
--user <your_username> \
--password <your_password> \
--server <server_address> \
--port <port_number> \
--database <database_name>
Step 3: Output Structure
After running the geocoder container (for Option 1, 2, or 3), the tool generates output files in the output/ folder.
CSV Input (Option 1 & 2)
Sample outputs demo/address_files/output
Files Generated Each input file produces:
<filename>_with_coordinates.csv— input + latitude/longitude<filename>_with_fips.csv— input + FIPS codes
Output Folder Example
output/
├── coordinates_from_address_<timestamp>.zip
├── geocoded_fips_codes_<timestamp>.zip
<timestamp>indicates when the script was executed (e.g., 20250624_150230).
If LOCATION.csv and LOCATION_HISTORY.csv were included, they are copied to output/ but not zipped.
Zipped Output Columns Description
| Column | Description |
|---|---|
Latitude | Latitude returned from the geocoder |
Longitude | Longitude returned from the geocoder |
geocode_result | Outcome of geocoding — geocoded for successful matches, Imprecise Geocode if not precise |
reason | Failure reason if applicable (see Reason Column Values) |
Reason Column Values
Used when geocoding fails or is imprecise. Possible values include:
- Hospital address given – Detected from known hardcoded hospital addresses.
- Street missing – No street info provided.
- Blank/Incomplete address – Address is empty or has missing components.
- Zip missing – ZIP code not provided.
💡 Tip: You can expand hospital detection by adding known addresses to
HOSPITAL_ADDRESSESinAddress_to_FIPS.py.
Formatting Note for HOSPITAL_ADDRESSES:
- Single-line string
- Lowercase letters and numbers only
- No commas or special characters
- Fields separated by single spaces
OMOP Input (Option 3)
Sample outputs: demo/OMOP/output
Folder Structure
OMOP_data/
├── valid_address/ # Records with address, no lat/lon
├── invalid_lat_lon_address/ # Records missing both address and lat/lon
├── valid_lat_long/ # Records with lat/lon
OMOP_FIPS_result/
├── address/
│ ├── address_with_coordinates.zip # CSVs with lat/lon from address
│ └── address_with_fips.zip # CSVs with FIPS codes
├── latlong/
│ └── latlong_with_fips.zip # CSVs with FIPS from coordinates
├── invalid/ # Usually empty; no usable location data
LOCATION.csv
LOCATION_HISTORY.csv
Step 4: GIS Linkage with PostGIS-Exposure Tool
Purpose:
Spatially joins the lat/lon (and FIPS) from geocoding with geospatial indices (ADI, SVI, AHRQ) and produces EXTERNAL_EXPOSURE.csv.
Prerequisites for GIS Linkage
- Docker installed.
- Clone postgis-exposure repository
- Update
LOCATION,LOCATION_HISTORYfiles to include the geocoded lat/lon from Step 2. Not needed if you included these during the geocoding step - Ensure
DATA_SRC_SIMPLE.csvandVRBL_SRC_SIMPLE.csvfiles are available (centrally managed; no edits required). - Important: Do not date-shift your
LOCATION/LOCATION_HISTORYfiles before linkage. Date shifting (if used) should occur following this step.
Sample DATA_SRC_SIMPLE.csv and VRBL_SRC_SIMPLE.csv: here
Expected Outputs
EXTERNAL_EXPOSURE.csvcontaining linked indices (ADI, SVI, AHRQ metrics).
GIS Linkage Workflow
-
Start Postgres/PostGIS container following the instructions in the postgis-exposure repository. Container sequence: start/load database → ingest location tables → run the produce script. First Docker command (prepares the database):
docker run --rm --name postgis-chorus \
--env POSTGRES_PASSWORD=dummy \
--env VARIABLES=134,135,136 \
--env DATA_SOURCES=1234,5150,9999 \
-v $(pwd)/test/source:/source \
-d ghcr.io/chorus-ai/chorus-postgis-sdoh:main- Replace
VARIABLESwith the comma-separated list of variable IDs you need fromVRBL_SRC_SIMPLE.csv. - Replace
DATA_SOURCESwith the relevant data source IDs (fromDATA_SRC_SIMPLE.csv).
- Replace
-
** Generate the external exposure file:**
docker exec postgis-chorus /app/produce_external_exposure.sh -
Output:
EXTERNAL_EXPOSURE.csvwill appear in your mounted directory (e.g.,./test/source).
Notes & Tips
- Run these commands in Terminal (Mac) or WSL/PowerShell/Command Prompt on Windows; WSL is more robust for Docker on Windows.
- If your site needs more variables, expand
VARIABLESaccordingly. - Important: The container may only run successfully once. To rerun, you may need to delete the container and image, then pull the image again.
Step 5: Validate & Inspect Outputs
- Open
EXTERNAL_EXPOSURE.csv. Confirm:- Patient ID, lat, lon, FIPS
- ADI, SVI, AHRQ, and VRBL-coded fields
- Spot-check a few records for accuracy.
- If errors:
- Ensure
LOCATIONhas valid lat/lon/FIPS - Confirm
VARIABLESandDATA_SOURCESare correct - Check mount paths
- Ensure
Step 6: Optional - Site-level Date Shifting
Purpose: Anonymize temporal data while preserving relative timelines.
Guidelines:
- Apply date shifts locally before upload — do not date-shift prior to GIS linkage.
- Input:
EXTERNAL_EXPOSURE.csv(from Step 4) - Output:
EXTERNAL_EXPOSURE_date_shifted.csv
See Date Shifting SOP for More Details.
Step 7: Upload & Centralized De-identification
- Upload the (optionally date-shifted)
EXTERNAL_EXPOSURE.csvto the central repository. - The central team will apply further de-identification.
References & sample files
Geocoding
- Sample files: Geocoding Demo Files
GIS Linkage
- Sample files: PostGIS Exposure CSVs
- Site-specific:
LOCATION,LOCATION_HISTORY - Centrally managed:
DATA_SRC_SIMPLE,VRBL_SRC_SIMPLE
- Site-specific:
Related Office Hours
The following office hour sessions provide additional context and demonstrations related to this SOP:
-
[08-07-25] Integration of GIS and SDoH data with OMOP
- Video Recording | Transcript
- Comprehensive session on integrating GIS and social determinants of health data
-
[09-18-25] Processing OMOP location_history table into external_exposure table
- Video Recording | Transcript
- Technical implementation of location data processing for external exposures
-
[09-25-25] End-to-end demo for capturing GIS data with OMOP
- Video Recording | Transcript
- Complete workflow demonstration for GIS data capture and processing
-
[10-16-2025] End-to-end demo for capturing GIS data with OMOP or address/latlong
- Video Recording | Transcript
- Complete workflow demonstration for GIS data capture and processing based on updated documentation
Appendix
Geocoding Workflow
This guide outlines the scripts, workflows, and Docker-based DeGAUSS toolkit used for generating Census Tract (FIPS) information from patient data. To convert patient location data into Census Tract identifiers (FIPS11), we use a two-step geocoding process powered by DeGAUSS, executed locally via Docker containers.
Method: DeGAUSS Toolkit (Docker-based)
DeGAUSS consists of two Docker containers:
- Geocoder (3.3.0) — Converts address to latitude/longitude
- Census Block Group (0.6.0) — Converts latitude/longitude to Census Tract FIPS codes
| Step | Purpose | Docker Image |
|---|---|---|
| 1 | Address → Coordinates | ghcr.io/degauss-org/geocoder:3.3.0 |
| 2 | Coordinates → FIPS | ghcr.io/degauss-org/census_block_group:0.6.0 |
DeGAUSS Docker Commands (Executed Internally)
# Step 1: Get Coordinates from Address
docker run --rm -v "ABS_OUTPUT_FOLDER:/tmp" \
ghcr.io/degauss-org/geocoder:3.3.0 \
/tmp/<your_preprocessed_input.csv> <threshold>
# Step 2: Get FIPS from Coordinates
docker run --rm -v "ABS_OUTPUT_FOLDER:/tmp" \
ghcr.io/degauss-org/census_block_group:0.6.0 \
/tmp/<your_coordinate_output.csv> <year>
Replace values:
ABS_OUTPUT_FOLDER→ absolute path to your output directory<threshold>→ numeric value (e.g.,0.7)<year>→ either2010or2020
Script Highlights
Address_to_FIPS.py Logic
This script handles CSV-based input:
- Reads CSV files
- Normalizes address or uses lat/lon
- Runs DeGAUSS Docker container to generate:
- Latitude/Longitude (via
ghcr.io/degauss-org/geocoder) - FIPS codes(via
ghcr.io/degauss-org/census_block_group)
- Latitude/Longitude (via
- Packages outputs into ZIP
OMOP_to_FIPS.py Logic
This script integrates directly with OMOP CDM:
- Extracts OMOP CDM data
- Categorizes into valid/invalid address or coordinates
- Executes FIPS generation (same as CSV workflow)
- Packages outputs into ZIP