Malformed CURIE Detection Enhancements – Worklog (2025-10-20)
Overview
Tightened
DrawIOXMLTree._extract_individual_and_arrow_and_literal_cellsoverride so literal detection requires the absence of a containing mxCell.Ensured malformed rdf:type values (missing prefix separator, empty prefix, or empty reference) now reliably raise
NotInKnownExceptioninstead of being treated as literals.Verified regenerated parser surfaces the override and retains absolute-IRI handling in downstream serialization.
Confirmed extended AA37 mock fixture exercises colon-only, dangling prefix, and missing prefix scenarios.
Key Changes
legacy/overrides/curie_validator.pyTrack the immediate parent id for each mxCell and classify literal candidates only when they lack a parent box (
parentof1).Preserve the new malformed rdf:type checks (missing separator, empty prefix, empty reference, unknown prefix) with detailed error reporting.
legacy/draw_io_parser.pyRegenerated from overrides to capture the literal classification tweaks.
Tests/Fixtures
Reran pytest suite targeting patched parser behaviours (
legacy/tests/test_patched_parser.py).Exercised Bun-integrated regression pipeline via
bun run testandbun run test:log:linux(log captured undertests/demo_logs/test.log).
Testing Summary
bun run checkpytest legacy/tests/test_patched_parser.py::test_parse_drawio_rejects_malformed_type_variants -vvpytest legacy/tests/test_curie_validator.py -qFull
bun run testbun run test:log:linux(log archived)
All checks succeeded; malformed rdf:type variants now raise as intended.