Skip to content

Latest commit

 

History

History
203 lines (151 loc) · 9.21 KB

File metadata and controls

203 lines (151 loc) · 9.21 KB

Geocoder

Geocoding service consisting of a Photon backend search engine and a Proxy frontend.

Deployment

All deployment runs from the main branch.

Proxy

Automatic — Push to main → builds and deploys to dev → tst → prd, with acceptance tests after each. Tags prod-approved after successful prod deploy.

Manualproxy.yml also supports manual dispatch (target: dev only | dev → tst → prd | tst → prd).

Photon

Scheduled — Daily at 07:27 UTC: full data import + build + deploy to tst → prd. Checks out the prod-approved tag (updated by Proxy CI after successful prod deploy) to avoid using untested commits.

Manual:

  • photon.yml — Import data, build Photon image, deploy (target: dev only | dev → tst → prd | tst → prd)
  • photon-rebuild.yml — Rebuild Photon image from existing Nominatim data, deploy
  • photon-deploy.yml — Deploy an existing Photon image tag

Sweden (dev only)

Photon data artifacts (GCS)

Built artifacts live in the public bucket gs://ent-geocoder-prd/:

Prefix Contents
nominatim-data/ nominatim.ndjson.gz per build (+ .sha256)
nominatim-data-se/ Sweden variant
photon-data/ photon_data.tar.gz per build (+ .sha256)
photon-data-se/ Sweden variant
data-sources/ Daily-refreshed third-party source files

Each build writes to <prefix>/<tag>/<filename>. The <tag> is generated once and shared between the docker image and the GCS upload, so geocoder-photon:<tag> always pairs with gs://.../photon-data/<tag>/photon_data.tar.gz. Pointer files at the prefix root track recent builds:

  • latest.txt — most recent build from any branch
  • latest-prod.txt — most recent build deployed to prod (written by photon-scheduled.yml)

The photon container fetches photon_data.tar.gz from $PHOTON_DATA_URL on startup, verifies its .sha256 sidecar, and writes a photon_data/.ready sentinel after extraction so in-place container restarts skip the download. CI derives the URL from the image tag in _deploy-and-test.yml and injects it into helm values; templates/photon-data-validation.yaml fails the helm render if it's missing.

Rolling back to a previous build:

# See available pointers and recent tags
curl -s https://storage.googleapis.com/ent-geocoder-prd/photon-data/latest-prod.txt

# Re-deploy a known-good image (the data is paired automatically)
gh workflow run photon-deploy.yml -f target='tst → prd' -f image_tag=<previous-tag>

90-day lifecycle rule (apply once per bucket):

{
  "lifecycle": {
    "rule": [
      {
        "action": {"type": "Delete"},
        "condition": {
          "age": 90,
          "matchesPrefix": ["nominatim-data", "photon-data"],
          "matchesSuffix": [".gz", ".sha256"]
        }
      }
    ]
  }
}

The matchesSuffix filter spares the latest*.txt pointer files.

Usage

Running locally

# Build geocoder
./gradlew build

# Download a photon jar
cd photon
./import/download-photon-jar.sh

# EITHER download source data, convert to nominatim.ndjson (downloads nominatim-converter binary automatically)
./import/create-nominatim-data.sh import/config/sources-prod.conf -z

# OR download the latest nominatim.ndjson build by Github Actions
./download-latest-nominatim-data.sh

# Create the photon index
./import/create-photon-data.sh nominatim.ndjson.gz

# OR just download the latest Photon search index built by Github Actions
rm -rf photon_data
./download-latest-photon-data.sh

# Run Photon
./photon-start.sh

# Switch to a different terminal and start the proxy (or just run `no.entur.geocoder.proxy.AppKt` from your IDE)
cd ../proxy
java -jar build/libs/proxy-all.jar

Now try some example requests:

curl -s 'http://localhost:8080/v2/autocomplete?text=sk%C3%B8yen%20stasjon&size=20'
curl -s 'http://localhost:8080/v2/reverse?point.lat=59.92&point.lon=10.67&boundary.circle.radius=1&size=10&layers=address%2Clocality'

Adding &debug=true will also reveal native Photon results with importance (input weight) and score (calculated weight).

You can also access Photon directly:

curl -s 'http://localhost:2322/api?q=Berglyveien&include=layer.stopplace'

Or use the opensearch endpoint:

curl -s 'http://localhost:9201/photon/_mapping' | jq .       # Available fields
curl -s 'http://localhost:9201/photon/_doc/719158973' | jq . # Get document by ID

Debugging data in k8s / GKE

Accessing the opensearch queries in k8s:

kubectl --context dev port-forward geocoder-photon-85994c94dd-6lqhv -n geocoder 9201
curl -s 'https://geocoder-photon.dev.entur.io/api?q=ullerud' |jq  '.features[].properties.osm_id' |head -1
200127208213
curl -s 'http://localhost:9201/photon/_doc/200127208213' |jq -c "[._source.importance, ._source.name.default]"
[0.23010299956639815,"Ullerud terrasse"]

Verifying score and importance

We set the importance field in the Nominatim data, while score is calculated by Photon.

$ curl -s 'http://localhost:8080/v2/autocomplete?text=Oslo&debug=true&size=1' \
  | jq -c '.geocoding.debug.raw_data[] | [.localeTags.name.default, .infos.importance, .score]'
["Oslo",1.0,51.235104]
["Oslo lufthavn",0.347712,26.492702]
["Oslo S",0.330103,25.840235]
["Oslo bussterminal",0.330103,24.307642]

(Debug shows three more results than we ask for, see PhotonAutocompleteRequest.RESULT_PRUNING_HEADROOM)

Using a patched Photon version

Build and release patched Photon

  • Fetch Photon from source (https://github.com/komoot/photon) and make your changes
  • Build with ./gradlew build
  • Create a tag and push that (git push --tags entur) to EnTur's fork (https://github.com/entur/photon)
  • Draft a new release at https://github.com/entur/photon/releases/new
  • Click "Select tag" --> and select the tag name
  • Fill in release title and description
  • Add photon-<tag>.jar from Photon's target/ folder as a binary asset
  • Check "Set as a pre-release"
  • Publish the release
  • On the release page, right-click the photon-<tag>.jar asset and copy the link address

Update geocoder to use the patched Photon

Links

Grafana dashboards

Internal references

External references