Skip to content

Configuration Options

Matt Miller edited this page Dec 17, 2025 · 3 revisions

Configuration Options

This document describes the main configuration options available in the BookReconciler service.

Main Config Options

image

These settings control the core behavior of the reconciliation service and can be configured via the web interface on the Home tab.

Title Matching Behavior

Controls how the service matches your data against bibliographic databases.

Single Match Mode

  • Config Key: POST45_RECONCILIATION_MODE = 'single'
  • Description: Matches to the best single Manifestation it can find
  • Best for: When you want a single specific edition/manifestation
  • Input Requirements:
    • Title (required) - The book title to search
    • Author (recommended) - Author name for better matching
    • Publication Year (optional) - If provided, will try to match the desired edition

Cluster Match Mode

  • Config Key: POST45_RECONCILIATION_MODE = 'cluster'
  • Description: Builds a cluster of matching Manifestations which represent the work as a whole
  • Best for:
    • Works that have many editions
    • Gathering as many identifiers as possible (ISBNs, OCLC numbers, etc.)
  • Input Requirements:
    • Title (required) - The book title to search
    • Author (recommended) - Author name for better matching

Extend Data Behavior

Controls how multiple values are returned when extending data (adding ISBNs, LCCNs, OCLC numbers, etc.).

Join Mode

  • Config Key: POST45_DATA_EXTEND_MODE = 'join'
  • Description: Multiple values are returned in the same field, separated by a pipe | character
  • Output Example:
    123456789 | 987654321 | 192837465 | 564738291
    
  • Best for: Keeping data in a single cell/field

Row Mode

  • Config Key: POST45_DATA_EXTEND_MODE = 'row'
  • Description: Each value is returned in a new row
  • Output Example:
    123456789
    987654321
    192837465
    564738291
    
  • Best for: When you need each identifier on its own row for further processing

Remove Subtitle from Titles

Controls whether subtitles are stripped from title strings before matching.

Keep Subtitles

  • Config Key: POST45_REMOVE_SUBTITLE = false
  • Description: Leave title strings as they are in the data
  • Example: "The Great Gatsby: A Novel" stays as-is

Remove Subtitles

  • Config Key: POST45_REMOVE_SUBTITLE = true
  • Description: Attempt to remove subtitle from any title strings (typically text after a colon or dash)
  • Example: "The Great Gatsby: A Novel" becomes "The Great Gatsby"
  • Best for: When subtitles are inconsistent between your data and the target databases

Cluster Quality Score

image

These settings control how strict the fuzzy matching threshold needs to be for a result to be included in a cluster. Each data source (Library of Congress, Google Books, and OCLC/WorldCat) has its own independent quality score setting.

The service uses token sort ratio from the fuzzywuzzy library to compare titles and authors. Higher quality scores require closer matches, resulting in fewer but more accurate results. Lower quality scores allow more variation, which can catch more editions but may include false positives.

Score Thresholds

Quality Level Fuzzy Score Threshold Description
Very High >= 95% Only nearly exact matches
High >= 90% Very close matches only
Medium >= 80% Balanced precision and recall
Low >= 60% Allows moderate variation
Very Low >= 30% Catches distant matches (may include false positives)

Library of Congress (id.loc.gov)

  • Config Key: POST45_ID_CLUSTER_QUALITY_SCORE
  • Options: very low, low, medium, high, very high
  • Default: medium
  • Description: Controls the minimum fuzzy match score required for Library of Congress results to be included in a cluster

Google Books

  • Config Key: POST45_GOOGLE_CLUSTER_QUALITY_SCORE
  • Options: very low, low, medium, high, very high
  • Default: medium
  • Description: Controls the minimum fuzzy match score required for Google Books results to be included in a cluster

OCLC/WorldCat

  • Config Key: POST45_OCLC_CLUSTER_QUALITY_SCORE
  • Options: very low, low, medium, high, very high
  • Default: high
  • Description: Controls the minimum fuzzy match score required for WorldCat results to be included in a cluster

When to Adjust Quality Scores

  • Increase quality score when:

    • You're getting false positives (wrong books in clusters)
    • Your input data has clean, accurate titles and authors
    • You need high confidence in matches
  • Decrease quality score when:

    • You're missing known editions that should match
    • Your input data has variations in spelling or formatting
    • You want to cast a wider net and manually review results

HathiTrust Database

Before you use HathiTrust as a service you need to build the Database. Under the HathiTrust Config you will see:

image

Simply click the Build Database button. It will go through the process of building the 5GB local database:

image

You cannot use the HathiTrust as a service until you do this step.

Clone this wiki locally