# Strip HTML preservation option — implementation report ## Overview - Added a `stripHtml` parser-setting checkbox in the Parser Settings dialog and persisted it into diagram metadata (defaults to stripping markup). - Propagated the flag through the Pyodide runtime configuration so the Python pipeline can toggle literal sanitization. - Implemented parser overrides that preserve literal HTML content while keeping identifiers sanitized, regenerating `legacy/draw_io_parser.py` from overrides. - Expanded Bun integration tests and pytest coverage to validate both default stripping and HTML preservation flows. ## Code changes - `src/rdfexport.ts` updates the UI, metadata serialization, and pipeline invocation. - `src/pyodideRuntime.ts` extends the `DrawioPyodideConfig` interface with `stripHtml` and forwards it when booting Pyodide. - `pyodide_pipeline/drawio_pipeline.py` respects the new flag and threads it through to parser overrides. - `legacy/overrides/strip_html.py` defines a custom `NodeHTMLParser` that captures raw HTML segments. - `legacy/overrides/rml_export.py` restores preserved HTML on literal objects while leaving IRIs sanitized. - `legacy/draw_io_parser.py` regenerated to embed override behavior. - `legacy/tests/test_patched_parser.py` adds assertions for both sanitized and preserved literal paths. - `tests/rdfexport.test.ts` adds an end-to-end pipeline test verifying HTML markup appears in Turtle output when stripping is disabled. - `tests/fixtures/AA37 Department of Health-with-metadata-preserve-html.drawio` fixture augments metadata with `stripHtml="false"`. - `pyproject.toml` excludes chat transcripts from Ruff linting. ## Testing - `bun run check` - `bun run test` - `bun run test:log:linux` - `pytest legacy/tests/test_patched_parser.py` All commands completed successfully (Pytest executed within the project virtual environment).