AWAITING APPROVAL
SOP for Privacy Scan Tool Operations
Version History
Purpose
This Standard Operating Protocol (SOP) outlines the procedures for using the privacy scan tool to identify and manage potential privacy risks in datasets used within the CHORUS project. It is intended for project team members who are responsible for data p rivacy assessments, risk mitigation, and compliance with privacy standards.
Procedures
STEP 1: INSTALL PRIVACY SCAN TOOL
- Ensure the necessary dependencies are installed:
- Python 3.6 or higher.
- Required Python libraries from requirements.txt in the repository.
- Git for repository cloning.
- Clone the Privacy Scan Tool repository from GitHub:
- git clone https://github.com/chorus-ai/privacy_scan_tool.git
- cd privacy_scan_tooL
- install the dependencies by running:
- pip install -r requirements.txt
STEP 2: PREPARE DATASETS FOR SCANNING
- Ensure datasets are in a suitable format (CSV, JSON, or database connection).
- Anonymize or pseudonymize sensitive fields if necessary before scanning.
- Load the dataset into the Privacy Scan Tool’s input directory or configure a connection string for direct database access.
STEP 3: CONFIGURE THE SCAN TOOL
- Adjust the tool’s configuration to match the dataset and privacy rules:
- Modify the config.yaml file to set parameters such as:
- Dataset path or database connection details.
- Privacy thresholds (e.g., k-anonymity, l-diversity levels).
- Fields to scan (specify sensitive fields to be evaluated).
- Modify the config.yaml file to set parameters such as:
- Define custom privacy rules if needed by extending the rule set in rules.py.
STEP 4: EXECUTE THE PRIVACY SCAN
- Run the tool with the following command
- python privacy_scan.py --config config.yaml
- Monitor the output for real-time feedback on privacy vulnerabilities. The tool generates a detailed report highlighting any potential risks, categorized based on severity (low, medium, high)
STEP 5: REVIEW AND INTERPRET RESULTS
- Examine the generated privacy report, which includes:
- Field Name: Identifies the dataset field evaluated.
- Risk Level: Severity of the privacy risk (low, medium, high).
- Privacy Violation Type: Indicates the type of privacy violation (e.g., re-identification risk, insufficient anonymization).
- Recommendation: Suggested actions to mitigate the risk
Field Name | Risk Level | Violation Type | Recommendation | |
---|---|---|---|---|
patient_id | High | Re-identification Risk | Apply pseudonymization or remove the field | |
zipcode | Medium | Granularity of Location Data | Aggregate data to 3-digit zip code | |
birthdate | High | Direct indentifier | Use age range instead of exact birthdate | |
STEP 6: MITIGATE PRIVACY RISKS
- Apply the recommended mitigations to reduce the identified privacy risks:
- Aggregate, pseudonymize, or anonymize sensitive fields.
- Re-run the Privacy Scan Tool after applying mitigations to ensure risks have been addressed.
STEP 7: DOCUMENT THE PROCESS AND RESULTS
- Document each scan and the mitigations applied for future reference. Include:
- Date of the scan.
- Dataset description.
- Privacy violations detected.
- Actions taken to resolve the violations.
- Store the report and documentation securely in a version-controlled repository:
- GitHub: Upload the report to the privacy-scan-reports folder, using a branch named scan-report-[dataset-name] for version control.
- Naming convention for the report should be PrivacyScanReport_DatasetName_MMDDYY
STEP 8: SHARE THE PRIVACY REPORT WITH THE TEAM
- Once the scan is complete and privacy risks have been mitigated, distribute the final privacy report to the designated team members for review:
- Email: Share the report with the Data Privacy Lead (e.g., Jane Doe at jane.doe@organization.org)
- GitHub: Commit and push the report to the repository for broader team access.
STEP 9: CONTINUOUS MONITORING AND RE-ASSESSMENT
- Set a regular schedule for privacy scans based on data updates (e.g., monthly or quarterly).
- Periodically review the scan tool configuration and update the privacy rules to align with evolving privacy standards and regulatory requirements (e.g., GDPR, HIPAA).