Skip to content

html-cls m4#422

Merged
darkrush merged 6 commits into
ccprocessor:devfrom
feifei2023:html_cls_m4/dev
May 22, 2025
Merged

html-cls m4#422
darkrush merged 6 commits into
ccprocessor:devfrom
feifei2023:html_cls_m4/dev

Conversation

@darkrush

@darkrush darkrush commented May 22, 2025

Copy link
Copy Markdown
Collaborator

Update html classifier to 25m4 version
dev doc is at https://aicarrier.feishu.cn/wiki/VMxMwFmSXionpikvRZZcHcLxnKg

  1. Now classifier has 4 html type:
  • 0: 'Article'
  • 1: 'Forum_or_Article_with_commentsection'
  • 2: 'Content Listing'
  • 3: 'Other'
  1. now use html-alg-lib to do simplify

…gs.py、unwrap_tags.py,简化HTML处理逻辑,并更新依赖库html-alg-lib至2.0.2。
…est_remove_tags.py、test_simplify.py、test_unwrap_tags.py,简化测试结构,移除不再使用的测试用例。
@codecov

codecov Bot commented May 22, 2025

Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Impacted file tree graph

@@            Coverage Diff             @@
##              dev     #422      +/-   ##
==========================================
- Coverage   89.96%   89.63%   -0.33%     
==========================================
  Files          82      106      +24     
  Lines        5581     8310    +2729     
==========================================
+ Hits         5021     7449    +2428     
- Misses        560      861     +301     
Files with missing lines Coverage Δ
llm_web_kit/model/html_classify/model.py 100.00% <100.00%> (ø)
llm_web_kit/model/html_layout_cls.py 90.90% <100.00%> (ø)
llm_web_kit/model/html_lib/simplify.py 100.00% <100.00%> (ø)

... and 25 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@darkrush darkrush merged commit 6e1a0b1 into ccprocessor:dev May 22, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant