Malformed CURIE Detection Enhancements – Worklog (2025-10-20)

Overview

  • Tightened DrawIOXMLTree._extract_individual_and_arrow_and_literal_cells override so literal detection requires the absence of a containing mxCell.

  • Ensured malformed rdf:type values (missing prefix separator, empty prefix, or empty reference) now reliably raise NotInKnownException instead of being treated as literals.

  • Verified regenerated parser surfaces the override and retains absolute-IRI handling in downstream serialization.

  • Confirmed extended AA37 mock fixture exercises colon-only, dangling prefix, and missing prefix scenarios.

Key Changes

  1. legacy/overrides/curie_validator.py

    • Track the immediate parent id for each mxCell and classify literal candidates only when they lack a parent box (parent of 1).

    • Preserve the new malformed rdf:type checks (missing separator, empty prefix, empty reference, unknown prefix) with detailed error reporting.

  2. legacy/draw_io_parser.py

    • Regenerated from overrides to capture the literal classification tweaks.

  3. Tests/Fixtures

    • Reran pytest suite targeting patched parser behaviours (legacy/tests/test_patched_parser.py).

    • Exercised Bun-integrated regression pipeline via bun run test and bun run test:log:linux (log captured under tests/demo_logs/test.log).

Testing Summary

  • bun run check

  • pytest legacy/tests/test_patched_parser.py::test_parse_drawio_rejects_malformed_type_variants -vv

  • pytest legacy/tests/test_curie_validator.py -q

  • Full bun run test

  • bun run test:log:linux (log archived)

All checks succeeded; malformed rdf:type variants now raise as intended.