Skip to content

Commit 7dc4781

Browse files
Niels Ekkelenkampclaude
andcommitted
Add untranslated entry detection, auto version from tags, MIT license
- Untranslated check flags missing translations (skips source language) - Source language auto-detected: exactly 1 locale with all entries untranslated - hatch-vcs derives version from git tags (no manual bumping) - Switch license from Apache 2.0 to MIT - --check-untranslated / --no-check-untranslated CLI flag Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent e62bbc1 commit 7dc4781

9 files changed

Lines changed: 131 additions & 208 deletions

File tree

.github/workflows/ci.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@ jobs:
1515
python-version: ["3.10", "3.11", "3.12"]
1616
steps:
1717
- uses: actions/checkout@v4
18+
with:
19+
fetch-depth: 0
1820
- uses: astral-sh/setup-uv@v4
1921
with:
2022
version: "latest"
@@ -37,6 +39,8 @@ jobs:
3739
id-token: write
3840
steps:
3941
- uses: actions/checkout@v4
42+
with:
43+
fetch-depth: 0
4044
- uses: astral-sh/setup-uv@v4
4145
with:
4246
version: "latest"

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,4 @@ build/
88
.pytest_cache/
99
*.bin
1010
*.ftz
11+
src/po_lint/_version.py

LICENSE

Lines changed: 21 additions & 201 deletions
Original file line numberDiff line numberDiff line change
@@ -1,201 +1,21 @@
1-
Apache License
2-
Version 2.0, January 2004
3-
http://www.apache.org/licenses/
4-
5-
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6-
7-
1. Definitions.
8-
9-
"License" shall mean the terms and conditions for use, reproduction,
10-
and distribution as defined by Sections 1 through 9 of this document.
11-
12-
"Licensor" shall mean the copyright owner or entity authorized by
13-
the copyright owner that is granting the License.
14-
15-
"Legal Entity" shall mean the union of the acting entity and all
16-
other entities that control, are controlled by, or are under common
17-
control with that entity. For the purposes of this definition,
18-
"control" means (i) the power, direct or indirect, to cause the
19-
direction or management of such entity, whether by contract or
20-
otherwise, or (ii) ownership of fifty percent (50%) or more of the
21-
outstanding shares, or (iii) beneficial ownership of such entity.
22-
23-
"You" (or "Your") shall mean an individual or Legal Entity
24-
exercising permissions granted by this License.
25-
26-
"Source" form shall mean the preferred form for making modifications,
27-
including but not limited to software source code, documentation
28-
source, and configuration files.
29-
30-
"Object" form shall mean any form resulting from mechanical
31-
transformation or translation of a Source form, including but
32-
not limited to compiled object code, generated documentation,
33-
and conversions to other media types.
34-
35-
"Work" shall mean the work of authorship, whether in Source or
36-
Object form, made available under the License, as indicated by a
37-
copyright notice that is included in or attached to the work
38-
(an example is provided in the Appendix below).
39-
40-
"Derivative Works" shall mean any work, whether in Source or Object
41-
form, that is based on (or derived from) the Work and for which the
42-
editorial revisions, annotations, elaborations, or other modifications
43-
represent, as a whole, an original work of authorship. For the purposes
44-
of this License, Derivative Works shall not include works that remain
45-
separable from, or merely link (or bind by name) to the interfaces of,
46-
the Work and Derivative Works thereof.
47-
48-
"Contribution" shall mean any work of authorship, including
49-
the original version of the Work and any modifications or additions
50-
to that Work or Derivative Works thereof, that is intentionally
51-
submitted to Licensor for inclusion in the Work by the copyright owner
52-
or by an individual or Legal Entity authorized to submit on behalf of
53-
the copyright owner. For the purposes of this definition, "submitted"
54-
means any form of electronic, verbal, or written communication sent
55-
to the Licensor or its representatives, including but not limited to
56-
communication on electronic mailing lists, source code control systems,
57-
and issue tracking systems that are managed by, or on behalf of, the
58-
Licensor for the purpose of discussing and improving the Work, but
59-
excluding communication that is conspicuously marked or otherwise
60-
designated in writing by the copyright owner as "Not a Contribution."
61-
62-
"Contributor" shall mean Licensor and any individual or Legal Entity
63-
on behalf of whom a Contribution has been received by Licensor and
64-
subsequently incorporated within the Work.
65-
66-
2. Grant of Copyright License. Subject to the terms and conditions of
67-
this License, each Contributor hereby grants to You a perpetual,
68-
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69-
copyright license to reproduce, prepare Derivative Works of,
70-
publicly display, publicly perform, sublicense, and distribute the
71-
Work and such Derivative Works in Source or Object form.
72-
73-
3. Grant of Patent License. Subject to the terms and conditions of
74-
this License, each Contributor hereby grants to You a perpetual,
75-
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76-
(except as stated in this section) patent license to make, have made,
77-
use, offer to sell, sell, import, and otherwise transfer the Work,
78-
where such license applies only to those patent claims licensable
79-
by such Contributor that are necessarily infringed by their
80-
Contribution(s) alone or by combination of their Contribution(s)
81-
with the Work to which such Contribution(s) was submitted. If You
82-
institute patent litigation against any entity (including a
83-
cross-claim or counterclaim in a lawsuit) alleging that the Work
84-
or a Contribution incorporated within the Work constitutes direct
85-
or contributory patent infringement, then any patent licenses
86-
granted to You under this License for that Work shall terminate
87-
as of the date such litigation is filed.
88-
89-
4. Redistribution. You may reproduce and distribute copies of the
90-
Work or Derivative Works thereof in any medium, with or without
91-
modifications, and in Source or Object form, provided that You
92-
meet the following conditions:
93-
94-
(a) You must give any other recipients of the Work or
95-
Derivative Works a copy of this License; and
96-
97-
(b) You must cause any modified files to carry prominent notices
98-
stating that You changed the files; and
99-
100-
(c) You must retain, in the Source form of any Derivative Works
101-
that You distribute, all copyright, patent, trademark, and
102-
attribution notices from the Source form of the Work,
103-
excluding those notices that do not pertain to any part of
104-
the Derivative Works; and
105-
106-
(d) If the Work includes a "NOTICE" text file as part of its
107-
distribution, then any Derivative Works that You distribute must
108-
include a readable copy of the attribution notices contained
109-
within such NOTICE file, excluding those notices that do not
110-
pertain to any part of the Derivative Works, in at least one
111-
of the following places: within a NOTICE text file distributed
112-
as part of the Derivative Works; within the Source form or
113-
documentation, if provided along with the Derivative Works; or,
114-
within a display generated by the Derivative Works, if and
115-
wherever such third-party notices normally appear. The contents
116-
of the NOTICE file are for informational purposes only and
117-
do not modify the License. You may add Your own attribution
118-
notices within Derivative Works that You distribute, alongside
119-
or as an addendum to the NOTICE text from the Work, provided
120-
that such additional attribution notices cannot be construed
121-
as modifying the License.
122-
123-
You may add Your own copyright statement to Your modifications and
124-
may provide additional or different license terms and conditions
125-
for use, reproduction, or distribution of Your modifications, or
126-
for any such Derivative Works as a whole, provided Your use,
127-
reproduction, and distribution of the Work otherwise complies with
128-
the conditions stated in this License.
129-
130-
5. Submission of Contributions. Unless You explicitly state otherwise,
131-
any Contribution intentionally submitted for inclusion in the Work
132-
by You to the Licensor shall be under the terms and conditions of
133-
this License, without any additional terms or conditions.
134-
Notwithstanding the above, nothing herein shall supersede or modify
135-
the terms of any separate license agreement you may have executed
136-
with Licensor regarding such Contributions.
137-
138-
6. Trademarks. This License does not grant permission to use the trade
139-
names, trademarks, service marks, or product names of the Licensor,
140-
except as required for reasonable and customary use in describing the
141-
origin of the Work and reproducing the content of the NOTICE file.
142-
143-
7. Disclaimer of Warranty. Unless required by applicable law or
144-
agreed to in writing, Licensor provides the Work (and each
145-
Contributor provides its Contributions) on an "AS IS" BASIS,
146-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147-
implied, including, without limitation, any warranties or conditions
148-
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149-
PARTICULAR PURPOSE. You are solely responsible for determining the
150-
appropriateness of using or redistributing the Work and assume any
151-
risks associated with Your exercise of permissions under this License.
152-
153-
8. Limitation of Liability. In no event and under no legal theory,
154-
whether in tort (including negligence), contract, or otherwise,
155-
unless required by applicable law (such as deliberate and grossly
156-
negligent acts) or agreed to in writing, shall any Contributor be
157-
liable to You for damages, including any direct, indirect, special,
158-
incidental, or consequential damages of any character arising as a
159-
result of this License or out of the use or inability to use the
160-
Work (including but not limited to damages for loss of goodwill,
161-
work stoppage, computer failure or malfunction, or any and all
162-
other commercial damages or losses), even if such Contributor
163-
has been advised of the possibility of such damages.
164-
165-
9. Accepting Warranty or Additional Liability. While redistributing
166-
the Work or Derivative Works thereof, You may choose to offer,
167-
and charge a fee for, acceptance of support, warranty, indemnity,
168-
or other liability obligations and/or rights consistent with this
169-
License. However, in accepting such obligations, You may act only
170-
on Your own behalf and on Your sole responsibility, not on behalf
171-
of any other Contributor, and only if You agree to indemnify,
172-
defend, and hold each Contributor harmless for any liability
173-
incurred by, or claims asserted against, such Contributor by reason
174-
of your accepting any such warranty or additional liability.
175-
176-
END OF TERMS AND CONDITIONS
177-
178-
APPENDIX: How to apply the Apache License to your work.
179-
180-
To apply the Apache License to your work, attach the following
181-
boilerplate notice, with the fields enclosed by brackets "[]"
182-
replaced with your own identifying information. (Don't include
183-
the brackets!) The text should be enclosed in the appropriate
184-
comment syntax for the file format. We also recommend that a
185-
file or class name and description of purpose be included on the
186-
same "printed page" as the copyright notice for easier
187-
identification within third-party archives.
188-
189-
Copyright [yyyy] [name of copyright owner]
190-
191-
Licensed under the Apache License, Version 2.0 (the "License");
192-
you may not use this file except in compliance with the License.
193-
You may obtain a copy of the License at
194-
195-
http://www.apache.org/licenses/LICENSE-2.0
196-
197-
Unless required by applicable law or agreed to in writing, software
198-
distributed under the License is distributed on an "AS IS" BASIS,
199-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200-
See the License for the specific language governing permissions and
201-
limitations under the License.
1+
MIT License
2+
3+
Copyright (c) 2026 Pescheck
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# python-po-lint
22

3-
Lint `.po` translation files for contamination, wrong languages, shifts, and garbled text.
3+
Lint `.po` translation files for contamination, wrong languages, missing translations, shifts, and garbled text.
44

55
Uses [fastText](https://fasttext.cc/) language identification with carrier phrase confirmation and confused language score merging for high accuracy with zero false positives.
66

@@ -9,6 +9,7 @@ Uses [fastText](https://fasttext.cc/) language identification with carrier phras
99
- **Wrong language detection** — fastText-based with top-5 scoring, confused language merging, and carrier phrase confirmation
1010
- **Wrong script detection** — catches Cyrillic in a Dutch file, Arabic in French, Latin in Chinese, etc.
1111
- **Distinctive character detection** — catches Russian-specific chars in Ukrainian and vice versa
12+
- **Untranslated entry detection** — flags missing translations, auto-detects source language
1213
- **Shifted entry detection** — finds translations that got shifted to the wrong msgid
1314
- **Garbled text detection** — catches corrupted/broken unicode
1415
- **Ignore rules**`.po-lint-ignore` file with language scoping and msgctxt support
@@ -53,6 +54,9 @@ po-lint locale/ --min-detection-length 25
5354

5455
# Specify source language (default: en)
5556
po-lint locale/ --source-language en
57+
58+
# Disable untranslated entry check
59+
po-lint locale/ --no-check-untranslated
5660
```
5761

5862
## Configuration
@@ -85,6 +89,10 @@ min_text_length = 3
8589
# Use compact fastText model instead of full
8690
compact_model = false
8791

92+
# Check for untranslated entries (default: true)
93+
# Source language is auto-detected or set via source_language
94+
check_untranslated = true
95+
8896
# Regex patterns to ignore (matched against msgid and msgstr)
8997
ignore_patterns = []
9098
```
@@ -112,8 +120,9 @@ screening status::Some msgid
112120
1. **Wrong script check** — fast, no model needed. Checks if the translation uses the expected writing system.
113121
2. **Distinctive character check** — detects cross-contamination between languages sharing a script (e.g. Russian/Ukrainian).
114122
3. **Garbled text check** — flags corrupted unicode.
115-
4. **Shifted entry check** — flags suspiciously short translations for long source strings.
116-
5. **Wrong language check** — uses fastText with three layers of false positive prevention:
123+
4. **Untranslated entry check** — flags entries with empty `msgstr`. The source language is auto-detected (the locale where all entries are untranslated) or can be set explicitly. Skipped for the source language.
124+
5. **Shifted entry check** — flags suspiciously short translations for long source strings.
125+
6. **Wrong language check** — uses fastText with three layers of false positive prevention:
117126
- **Confused language score merging** — redistributes scores from commonly confused languages (e.g. Danish/Norwegian, Portuguese/Spanish)
118127
- **Source language allowance** — borrowed words from the source language are common and allowed
119128
- **Carrier phrase confirmation** — re-tests with a language-specific phrase prepended to distinguish false positives from real contamination

pyproject.toml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "python-po-lint"
3-
version = "0.1.1"
3+
dynamic = ["version"]
44
description = "Lint .po translation files for contamination, wrong languages, shifts, and garbled text"
55
readme = "README.md"
66
license = "MIT"
@@ -18,9 +18,15 @@ dependencies = [
1818
po-lint = "po_lint.cli:main"
1919

2020
[build-system]
21-
requires = ["hatchling"]
21+
requires = ["hatchling", "hatch-vcs"]
2222
build-backend = "hatchling.build"
2323

24+
[tool.hatch.version]
25+
source = "vcs"
26+
27+
[tool.hatch.build.hooks.vcs]
28+
version-file = "src/po_lint/_version.py"
29+
2430
[tool.hatch.build.targets.wheel]
2531
packages = ["src/po_lint"]
2632

src/po_lint/cli.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,12 @@ def main(argv: list[str] | None = None) -> int:
5151
default=None,
5252
help="Minimum cleaned text length for language detection (default: 30).",
5353
)
54+
parser.add_argument(
55+
"--check-untranslated",
56+
action=argparse.BooleanOptionalAction,
57+
default=None,
58+
help="Check for untranslated entries (default: true). Use --no-check-untranslated to disable.",
59+
)
5460
parser.add_argument(
5561
"--compact-model",
5662
action="store_true",
@@ -86,6 +92,10 @@ def main(argv: list[str] | None = None) -> int:
8692
args.min_detection_length if args.min_detection_length is not None
8793
else config.min_detection_length
8894
)
95+
check_untranslated = (
96+
args.check_untranslated if args.check_untranslated is not None
97+
else config.check_untranslated
98+
)
8999

90100
# Resolve locale directories
91101
if args.paths:
@@ -117,6 +127,7 @@ def main(argv: list[str] | None = None) -> int:
117127
min_text_length=config.min_text_length,
118128
min_detection_length=min_detection_length,
119129
ignore_patterns=config.ignore_patterns,
130+
check_untranslated=check_untranslated,
120131
)
121132
all_issues.extend(issues)
122133

src/po_lint/config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ class Config:
2727
min_detection_length: int = 30
2828
ignore_patterns: list[str] = field(default_factory=list)
2929
compact_model: bool = False
30+
check_untranslated: bool = True
3031

3132
def resolve_locale_dirs(self, base_dir: Path) -> list[Path]:
3233
"""Resolve all locale directories from paths and packages.
@@ -91,4 +92,5 @@ def load_config(project_dir: Path | None = None) -> Config:
9192
min_detection_length=tool_config.get("min_detection_length", 30),
9293
ignore_patterns=tool_config.get("ignore_patterns", []),
9394
compact_model=tool_config.get("compact_model", False),
95+
check_untranslated=tool_config.get("check_untranslated", True),
9496
)

0 commit comments

Comments
 (0)