Configuration Options

This document describes the main configuration options available in the BookReconciler service.

Main Config Options

These settings control the core behavior of the reconciliation service and can be configured via the web interface on the Home tab.

Title Matching Behavior

Controls how the service matches your data against bibliographic databases.

Single Match Mode

Config Key: POST45_RECONCILIATION_MODE = 'single'
Description: Matches to the best single Manifestation it can find
Best for: When you want a single specific edition/manifestation
Input Requirements:
- Title (required) - The book title to search
- Author (recommended) - Author name for better matching
- Publication Year (optional) - If provided, will try to match the desired edition

Cluster Match Mode

Config Key: POST45_RECONCILIATION_MODE = 'cluster'
Description: Builds a cluster of matching Manifestations which represent the work as a whole
Best for:
- Works that have many editions
- Gathering as many identifiers as possible (ISBNs, OCLC numbers, etc.)
Input Requirements:
- Title (required) - The book title to search
- Author (recommended) - Author name for better matching

Extend Data Behavior

Controls how multiple values are returned when extending data (adding ISBNs, LCCNs, OCLC numbers, etc.).

Join Mode

Config Key: POST45_DATA_EXTEND_MODE = 'join'
Description: Multiple values are returned in the same field, separated by a pipe | character

Output Example:

123456789 | 987654321 | 192837465 | 564738291

Best for: Keeping data in a single cell/field

Row Mode

Config Key: POST45_DATA_EXTEND_MODE = 'row'
Description: Each value is returned in a new row

Output Example:

Best for: When you need each identifier on its own row for further processing

Remove Subtitle from Titles

Controls whether subtitles are stripped from title strings before matching.

Keep Subtitles

Config Key: POST45_REMOVE_SUBTITLE = false
Description: Leave title strings as they are in the data
Example: "The Great Gatsby: A Novel" stays as-is

Remove Subtitles

Config Key: POST45_REMOVE_SUBTITLE = true
Description: Attempt to remove subtitle from any title strings (typically text after a colon or dash)
Example: "The Great Gatsby: A Novel" becomes "The Great Gatsby"
Best for: When subtitles are inconsistent between your data and the target databases

Cluster Quality Score

These settings control how strict the fuzzy matching threshold needs to be for a result to be included in a cluster. Each data source (Library of Congress, Google Books, and OCLC/WorldCat) has its own independent quality score setting.

The service uses token sort ratio from the fuzzywuzzy library to compare titles and authors. Higher quality scores require closer matches, resulting in fewer but more accurate results. Lower quality scores allow more variation, which can catch more editions but may include false positives.

Score Thresholds

Quality Level	Fuzzy Score Threshold	Description
Very High	>= 95%	Only nearly exact matches
High	>= 90%	Very close matches only
Medium	>= 80%	Balanced precision and recall
Low	>= 60%	Allows moderate variation
Very Low	>= 30%	Catches distant matches (may include false positives)

Library of Congress (id.loc.gov)

Config Key: POST45_ID_CLUSTER_QUALITY_SCORE
Options: very low, low, medium, high, very high
Default: medium
Description: Controls the minimum fuzzy match score required for Library of Congress results to be included in a cluster

Google Books

Config Key: POST45_GOOGLE_CLUSTER_QUALITY_SCORE
Options: very low, low, medium, high, very high
Default: medium
Description: Controls the minimum fuzzy match score required for Google Books results to be included in a cluster

OCLC/WorldCat

Config Key: POST45_OCLC_CLUSTER_QUALITY_SCORE
Options: very low, low, medium, high, very high
Default: high
Description: Controls the minimum fuzzy match score required for WorldCat results to be included in a cluster

When to Adjust Quality Scores

Increase quality score when:
- You're getting false positives (wrong books in clusters)
- Your input data has clean, accurate titles and authors
- You need high confidence in matches
Decrease quality score when:
- You're missing known editions that should match
- Your input data has variations in spelling or formatting
- You want to cast a wider net and manually review results

HathiTrust Database

Before you use HathiTrust as a service you need to build the Database. Under the HathiTrust Config you will see:

Simply click the Build Database button. It will go through the process of building the 5GB local database:

You cannot use the HathiTrust as a service until you do this step.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Configuration Options

Configuration Options

Main Config Options

Title Matching Behavior

Single Match Mode

Cluster Match Mode

Extend Data Behavior

Join Mode

Row Mode

Remove Subtitle from Titles

Keep Subtitles

Remove Subtitles

Cluster Quality Score

Score Thresholds

Library of Congress (id.loc.gov)

Google Books

OCLC/WorldCat

When to Adjust Quality Scores

HathiTrust Database

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally