Skip to content

Conversation

@chopkinsmade
Copy link
Contributor

@chopkinsmade chopkinsmade commented Dec 23, 2025

What

  • Update the run security scan to output paths that are skipped or excluded to the output
  • Switch the async for loop to an asyncio.TaskGroup to run the presidio scan in parallel instead of sequentially

Why

We already display a summary of the files that contained personal data, and any that were scanned but had no personal data present. This PR enhances the summary to include the excluded and skipped results. This is an example of the new output:

--------PERSONAL DATA SCAN SUMMARY--------

FILES EXCLUDED
+-------------------------------------+
|                 Path                |
+-------------------------------------+
| .github/workflows/org.common-ci.yml |
|      .github/workflows/test.yml     |
+-------------------------------------+

FILES SKIPPED
+--------------------------------------------------------------------+
|                                Path                                |
+--------------------------------------------------------------------+
|                           .dockerignore                            |
|                     .github/actions/README.md                      |
|                 .github/actions/doc/diagram.drawio                 |
|                  .github/actions/doc/diagram.png                   |
|                  .github/actions/notify/Readme.md                  |
|        .github/actions/vulnerability-scan/python/Readme.md         |
|     .github/actions/vulnerability-scan/upload-to-s3/Readme.md      |
|                             .gitignore                             |
|                          .python-version                           |
|                       .vscode/settings.json                        |
|                             CODEOWNERS                             |
|                             Dockerfile                             |
|                              Makefile                              |
|                             README.md                              |
|                           pyproject.toml                           |
|                          src/__init__.py                           |
|                      src/actions/__init__.py                       |
|                  src/actions/compare_versions.py                   |
|                       src/hooks/__init__.py                        |
|                          src/hooks/cli.py                          |
|                        src/hooks/config.py                         |
|                      src/hooks/hooks_base.py                       |
|                   src/hooks/presidio/__init__.py                   |
|                 src/hooks/presidio/path_filter.py                  |
|                   src/hooks/presidio/scanner.py                    |
|       src/hooks/presidio/spacy_post_processing_recognizer.py       |
|                src/hooks/run_personal_data_scan.py                 |
|                   src/hooks/run_security_scan.py                   |
|                  src/hooks/trufflehog/__init__.py                  |
|                  src/hooks/trufflehog/scanner.py                   |
|                  src/hooks/trufflehog/vendors.py                   |
|                src/hooks/validate_security_scan.py                 |
|                       src/proxy/__init__.py                        |
|                        src/proxy/plugins.py                        |
|                         tests/__init__.py                          |
|                         tests/conftest.py                          |
|          tests/integration/hooks/presidio/test_scanner.py          |
|                tests/integration/hooks/test_cli.py                 |
|                   tests/unit/actions/__init__.py                   |
|            tests/unit/actions/test_compare_versions.py             |
|                    tests/unit/hooks/__init__.py                    |
|               tests/unit/hooks/presidio/__init__.py                |
|           tests/unit/hooks/presidio/test_path_filter.py            |
|             tests/unit/hooks/presidio/test_scanner.py              |
| tests/unit/hooks/presidio/test_spacy_post_processing_recognizer.py |
|                    tests/unit/hooks/test_cli.py                    |
|                tests/unit/hooks/test_hooks_base.py                 |
|          tests/unit/hooks/test_run_personal_data_scan.py           |
|             tests/unit/hooks/test_run_security_scan.py             |
|          tests/unit/hooks/test_validate_security_scan.py           |
|              tests/unit/hooks/trufflehog/__init__.py               |
|            tests/unit/hooks/trufflehog/test_scanner.py             |
|            tests/unit/hooks/trufflehog/test_vendors.py             |
|                  tests/unit/proxy/test_plugins.py                  |
|                              uv.lock                               |
+--------------------------------------------------------------------+

FILES WITHOUT PERSONAL DATA
+------------------------------------------------------------+
|                            Path                            |
+------------------------------------------------------------+
|         .github/actions/notify/message/action.yml          |
|      .github/actions/notify/vulnerability/action.yml       |
|    .github/actions/vulnerability-scan/python/action.yml    |
| .github/actions/vulnerability-scan/upload-to-s3/action.yml |
|                   .github/dependabot.yml                   |
|          .github/workflows/automated-release.yml           |
|        .github/workflows/build-and-push-to-ecr.yml         |
|        .github/workflows/build-and-push-to-ghcr.yml        |
|                  .github/workflows/cd.yml                  |
|          .github/workflows/org.common-ci.test.yml          |
|            .github/workflows/org.docker-ci.yml             |
|            .github/workflows/org.python-ci.yml             |
|           .github/workflows/org.terraform-ci.yml           |
|                  .pre-commit-config.yaml                   |
|                   .pre-commit-hooks.yaml                   |
|               example.pre-commit-config.yaml               |
|                personal-data-exclusions.txt                |
|                  security-exclusions.txt                   |
|           src/hooks/presidio/engine_config.yaml            |
|             src/hooks/presidio/nlp_config.yaml             |
|         src/hooks/presidio/recognizer_config.yaml          |
|               tests/test_data/COMMIT_MSG.txt               |
+------------------------------------------------------------+

FILES CONTAINING PERSONAL DATA

tests/test_data/personal_data.csv
+---------------+---------------------+-------+
|      Type     |        Value        | Score |
+---------------+---------------------+-------+
|    LOCATION   |        london       |  0.85 |
| EMAIL_ADDRESS | [email protected] |  1.0  |
|    LOCATION   |        London       |  0.85 |
|  PHONE_NUMBER |     07111111111     |  0.4  |
+---------------+---------------------+-------+

tests/test_data/personal_data.txt
+---------------+----------------------+-------+
|      Type     |        Value         | Score |
+---------------+----------------------+-------+
| EMAIL_ADDRESS | [email protected] |  1.0  |
| EMAIL_ADDRESS | [email protected] |  1.0  |
|  UK_POSTCODE  |       SW1A 1AA       |  0.5  |
|  PHONE_NUMBER |     07111111111      |  0.4  |
|  PHONE_NUMBER |     02920000000      |  0.4  |
|  PHONE_NUMBER |    +442920000002     |  0.4  |
+---------------+----------------------+-------+

tests/test_data/personal_data.yaml
+---------------+---------------------+-------+
|      Type     |        Value        | Score |
+---------------+---------------------+-------+
| EMAIL_ADDRESS | [email protected] |  1.0  |
|  UK_POSTCODE  |       SW1A 1AA      |  0.85 |
|  PHONE_NUMBER |     07111111111     |  0.75 |
+---------------+---------------------+-------+

tests/test_data/personal_data.yml
+---------------+---------------------+-------+
|      Type     |        Value        | Score |
+---------------+---------------------+-------+
| EMAIL_ADDRESS | [email protected] |  1.0  |
|  UK_POSTCODE  |       SW1A 1AA      |  0.85 |
|  PHONE_NUMBER |     07111111111     |  0.75 |

How this has been tested

  • I have tested locally
  • Testing not required

Reviewer Checklist

  • I have reviewed the PR and ensured no secret values are present

@chopkinsmade chopkinsmade force-pushed the feature/include-more-scan-results branch 2 times, most recently from 06447a6 to 7e405ba Compare December 24, 2025 09:23
@chopkinsmade chopkinsmade marked this pull request as ready for review December 24, 2025 09:33
@chopkinsmade chopkinsmade requested a review from a team as a code owner December 24, 2025 09:33
Signed-off-by: DBT pre-commit check
@chopkinsmade chopkinsmade force-pushed the feature/include-more-scan-results branch from 7e405ba to f17a6ef Compare December 24, 2025 14:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant