Task 2a – Remove Hardcoded CURIE Validation from DrawIO Parser
Summary
Replaced static RiC-O class, object property, and datatype property whitelists with dynamic CURIE validation driven by the parser’s prefix dictionary.
Extended the
Arrowrepresentation and downstream processing to classify datatype/object properties without relying on global lists.Added regression baselines for legacy fixtures and comprehensive pytest coverage, including CURIE acceptance, rejection, and fixture isomorphism checks after normalisation.
Implementation Notes
Introduced
_split_curieand_ensure_known_curiehelpers to centralise CURIE validation using prefix dictionaries.Updated
DrawIOXMLTree._arrowto track whether an edge targets a literal or an individual by inspecting literal cells and previously parsed individuals.Refactored
individual_blocksto return both the aggregated blocks and dynamically discovered property classifications; downstream serialisation now consumes these sets.Normalised graph comparison in tests to exclude ontology preamble and property type declarations that vary with runtime or broad property inventories.
Generated
.ntbaselines for every pristine.drawiofixture (skipping legacy failures) using the unmodified parser prior to refactor for regression purposes.
Testing
pytest src/main/webapp/plugins/rdfexport/legacy/tests/test_patched_parser.py
Outstanding Questions / Follow-ups
Consider extending prefix acquisition beyond the static
get_prefixes()helper once metadata-driven prefix injection (Task 2b) is implemented.Evaluate whether property definition triples should be generated on-demand for non-RiC namespaces or exposed through configuration knobs.
Follow-up Automation (2025-10-09)
Added
regenerate_baselines.pyto replay the legacy parser from historical commits, backfill missing property classifications, and materialise.ntgraphs for pristine fixtures when baselines are absent.Defaulted the helper to skip overwriting existing baselines while still executing
pytestso the regression suite runs against the freshly generated fixtures.Confirmed the helper against HEAD^, observed the static-property failure, and documented the automatic fallback behaviour plus the ability to force regeneration via
--force-overwritewhen required.
Additional Testing
python src/main/webapp/plugins/rdfexport/legacy/tests/regenerate_baselines.py --max-commits 50
Baseline Regeneration Script Invocation (2025-10-09)
Added a convenience shell wrapper
run_regeneration.shundersrc/main/webapp/plugins/rdfexport/legacy/tests/that invokes the reproducible baseline helper with the requested commit, window, and overwrite flags.Attempted to execute the wrapper to confirm behaviour; execution failed because the sandbox image does not include the
rdflibdependency required byregenerate_baselines.py.Left the failure details in the execution log so downstream runs can install dependencies or supply a virtual environment before rerunning the helper.
Additional Testing
src/main/webapp/plugins/rdfexport/legacy/tests/run_regeneration.sh
Baseline Regeneration Wrapper Verification (2025-10-10)
Installed the missing
rdflibdependency in the sandbox environment and re-ranrun_regeneration.shwithout modifying repository files.Confirmed that the helper replays commit
cf8f84bb84ff83843b6726ac96aff3a2055f4275, regenerates all pristine fixture baselines with forced overwrites, and executespytestto verify the regenerated outputs.
Additional Testing
src/main/webapp/plugins/rdfexport/legacy/tests/run_regeneration.sh