-
Notifications
You must be signed in to change notification settings - Fork 3
Configuration Options
This document describes the main configuration options available in the BookReconciler service.
These settings control the core behavior of the reconciliation service and can be configured via the web interface on the Home tab.
Controls how the service matches your data against bibliographic databases.
-
Config Key:
POST45_RECONCILIATION_MODE = 'single' - Description: Matches to the best single Manifestation it can find
- Best for: When you want a single specific edition/manifestation
-
Input Requirements:
- Title (required) - The book title to search
- Author (recommended) - Author name for better matching
- Publication Year (optional) - If provided, will try to match the desired edition
-
Config Key:
POST45_RECONCILIATION_MODE = 'cluster' - Description: Builds a cluster of matching Manifestations which represent the work as a whole
-
Best for:
- Works that have many editions
- Gathering as many identifiers as possible (ISBNs, OCLC numbers, etc.)
-
Input Requirements:
- Title (required) - The book title to search
- Author (recommended) - Author name for better matching
Controls how multiple values are returned when extending data (adding ISBNs, LCCNs, OCLC numbers, etc.).
-
Config Key:
POST45_DATA_EXTEND_MODE = 'join' -
Description: Multiple values are returned in the same field, separated by a pipe
|character -
Output Example:
123456789 | 987654321 | 192837465 | 564738291 - Best for: Keeping data in a single cell/field
-
Config Key:
POST45_DATA_EXTEND_MODE = 'row' - Description: Each value is returned in a new row
-
Output Example:
123456789 987654321 192837465 564738291 - Best for: When you need each identifier on its own row for further processing
Controls whether subtitles are stripped from title strings before matching.
-
Config Key:
POST45_REMOVE_SUBTITLE = false - Description: Leave title strings as they are in the data
- Example: "The Great Gatsby: A Novel" stays as-is
-
Config Key:
POST45_REMOVE_SUBTITLE = true - Description: Attempt to remove subtitle from any title strings (typically text after a colon or dash)
- Example: "The Great Gatsby: A Novel" becomes "The Great Gatsby"
- Best for: When subtitles are inconsistent between your data and the target databases
These settings control how strict the fuzzy matching threshold needs to be for a result to be included in a cluster. Each data source (Library of Congress, Google Books, and OCLC/WorldCat) has its own independent quality score setting.
The service uses token sort ratio from the fuzzywuzzy library to compare titles and authors. Higher quality scores require closer matches, resulting in fewer but more accurate results. Lower quality scores allow more variation, which can catch more editions but may include false positives.
| Quality Level | Fuzzy Score Threshold | Description |
|---|---|---|
| Very High | >= 95% | Only nearly exact matches |
| High | >= 90% | Very close matches only |
| Medium | >= 80% | Balanced precision and recall |
| Low | >= 60% | Allows moderate variation |
| Very Low | >= 30% | Catches distant matches (may include false positives) |
-
Config Key:
POST45_ID_CLUSTER_QUALITY_SCORE -
Options:
very low,low,medium,high,very high -
Default:
medium - Description: Controls the minimum fuzzy match score required for Library of Congress results to be included in a cluster
-
Config Key:
POST45_GOOGLE_CLUSTER_QUALITY_SCORE -
Options:
very low,low,medium,high,very high -
Default:
medium - Description: Controls the minimum fuzzy match score required for Google Books results to be included in a cluster
-
Config Key:
POST45_OCLC_CLUSTER_QUALITY_SCORE -
Options:
very low,low,medium,high,very high -
Default:
high - Description: Controls the minimum fuzzy match score required for WorldCat results to be included in a cluster
-
Increase quality score when:
- You're getting false positives (wrong books in clusters)
- Your input data has clean, accurate titles and authors
- You need high confidence in matches
-
Decrease quality score when:
- You're missing known editions that should match
- Your input data has variations in spelling or formatting
- You want to cast a wider net and manually review results
Before you use HathiTrust as a service you need to build the Database. Under the HathiTrust Config you will see:
Simply click the Build Database button. It will go through the process of building the 5GB local database:
You cannot use the HathiTrust as a service until you do this step.