Strip HTML preservation option — implementation report
Overview
Added a
stripHtmlparser-setting checkbox in the Parser Settings dialog and persisted it into diagram metadata (defaults to stripping markup).Propagated the flag through the Pyodide runtime configuration so the Python pipeline can toggle literal sanitization.
Implemented parser overrides that preserve literal HTML content while keeping identifiers sanitized, regenerating
legacy/draw_io_parser.pyfrom overrides.Expanded Bun integration tests and pytest coverage to validate both default stripping and HTML preservation flows.
Code changes
src/rdfexport.tsupdates the UI, metadata serialization, and pipeline invocation.src/pyodideRuntime.tsextends theDrawioPyodideConfiginterface withstripHtmland forwards it when booting Pyodide.pyodide_pipeline/drawio_pipeline.pyrespects the new flag and threads it through to parser overrides.legacy/overrides/strip_html.pydefines a customNodeHTMLParserthat captures raw HTML segments.legacy/overrides/rml_export.pyrestores preserved HTML on literal objects while leaving IRIs sanitized.legacy/draw_io_parser.pyregenerated to embed override behavior.legacy/tests/test_patched_parser.pyadds assertions for both sanitized and preserved literal paths.tests/rdfexport.test.tsadds an end-to-end pipeline test verifying HTML markup appears in Turtle output when stripping is disabled.tests/fixtures/AA37 Department of Health-with-metadata-preserve-html.drawiofixture augments metadata withstripHtml="false".pyproject.tomlexcludes chat transcripts from Ruff linting.
Testing
bun run checkbun run testbun run test:log:linuxpytest legacy/tests/test_patched_parser.py
All commands completed successfully (Pytest executed within the project virtual environment).