Strip HTML preservation option — implementation report

Overview

  • Added a stripHtml parser-setting checkbox in the Parser Settings dialog and persisted it into diagram metadata (defaults to stripping markup).

  • Propagated the flag through the Pyodide runtime configuration so the Python pipeline can toggle literal sanitization.

  • Implemented parser overrides that preserve literal HTML content while keeping identifiers sanitized, regenerating legacy/draw_io_parser.py from overrides.

  • Expanded Bun integration tests and pytest coverage to validate both default stripping and HTML preservation flows.

Code changes

  • src/rdfexport.ts updates the UI, metadata serialization, and pipeline invocation.

  • src/pyodideRuntime.ts extends the DrawioPyodideConfig interface with stripHtml and forwards it when booting Pyodide.

  • pyodide_pipeline/drawio_pipeline.py respects the new flag and threads it through to parser overrides.

  • legacy/overrides/strip_html.py defines a custom NodeHTMLParser that captures raw HTML segments.

  • legacy/overrides/rml_export.py restores preserved HTML on literal objects while leaving IRIs sanitized.

  • legacy/draw_io_parser.py regenerated to embed override behavior.

  • legacy/tests/test_patched_parser.py adds assertions for both sanitized and preserved literal paths.

  • tests/rdfexport.test.ts adds an end-to-end pipeline test verifying HTML markup appears in Turtle output when stripping is disabled.

  • tests/fixtures/AA37 Department of Health-with-metadata-preserve-html.drawio fixture augments metadata with stripHtml="false".

  • pyproject.toml excludes chat transcripts from Ruff linting.

Testing

  • bun run check

  • bun run test

  • bun run test:log:linux

  • pytest legacy/tests/test_patched_parser.py

All commands completed successfully (Pytest executed within the project virtual environment).