goclaw

tiennm99/goclaw

Fork 0

mirror of https://github.com/tiennm99/goclaw.git synced 2026-06-11 00:13:12 +00:00

Commit Graph

Author	SHA1	Message	Date
viettranx	37158af231	fix(web_fetch): replace regex HTML parsing with DOM-based extraction Regex-based htmlToMarkdown/htmlToText leaked CSS, JS, and non-content elements. Replaced with golang.org/x/net/html DOM parser that extracts <body> only and skips 16 non-content element types (script, style, noscript, svg, template, iframe, form, nav, footer, etc.). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-03 17:10:55 +07:00

Author

SHA1

Message

Date

viettranx

37158af231

fix(web_fetch): replace regex HTML parsing with DOM-based extraction

Regex-based htmlToMarkdown/htmlToText leaked CSS, JS, and non-content
elements. Replaced with golang.org/x/net/html DOM parser that extracts
<body> only and skips 16 non-content element types (script, style,
noscript, svg, template, iframe, form, nav, footer, etc.).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-03 17:10:55 +07:00

1 Commits