Commit Graph

1 Commits

Author SHA1 Message Date
viettranx 37158af231 fix(web_fetch): replace regex HTML parsing with DOM-based extraction
Regex-based htmlToMarkdown/htmlToText leaked CSS, JS, and non-content
elements. Replaced with golang.org/x/net/html DOM parser that extracts
<body> only and skips 16 non-content element types (script, style,
noscript, svg, template, iframe, form, nav, footer, etc.).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 17:10:55 +07:00