# Task 2a – Remove Hardcoded CURIE Validation from DrawIO Parser ## Summary - Replaced static RiC-O class, object property, and datatype property whitelists with dynamic CURIE validation driven by the parser's prefix dictionary. - Extended the `Arrow` representation and downstream processing to classify datatype/object properties without relying on global lists. - Added regression baselines for legacy fixtures and comprehensive pytest coverage, including CURIE acceptance, rejection, and fixture isomorphism checks after normalisation. ## Implementation Notes - Introduced `_split_curie` and `_ensure_known_curie` helpers to centralise CURIE validation using prefix dictionaries. - Updated `DrawIOXMLTree._arrow` to track whether an edge targets a literal or an individual by inspecting literal cells and previously parsed individuals. - Refactored `individual_blocks` to return both the aggregated blocks and dynamically discovered property classifications; downstream serialisation now consumes these sets. - Normalised graph comparison in tests to exclude ontology preamble and property type declarations that vary with runtime or broad property inventories. - Generated `.nt` baselines for every pristine `.drawio` fixture (skipping legacy failures) using the unmodified parser prior to refactor for regression purposes. ## Testing - `pytest src/main/webapp/plugins/rdfexport/legacy/tests/test_patched_parser.py` ## Outstanding Questions / Follow-ups - Consider extending prefix acquisition beyond the static `get_prefixes()` helper once metadata-driven prefix injection (Task 2b) is implemented. - Evaluate whether property definition triples should be generated on-demand for non-RiC namespaces or exposed through configuration knobs. ## Follow-up Automation (2025-10-09) - Added `regenerate_baselines.py` to replay the legacy parser from historical commits, backfill missing property classifications, and materialise `.nt` graphs for pristine fixtures when baselines are absent. - Defaulted the helper to skip overwriting existing baselines while still executing `pytest` so the regression suite runs against the freshly generated fixtures. - Confirmed the helper against HEAD^, observed the static-property failure, and documented the automatic fallback behaviour plus the ability to force regeneration via `--force-overwrite` when required. ## Additional Testing - `python src/main/webapp/plugins/rdfexport/legacy/tests/regenerate_baselines.py --max-commits 50` ## Baseline Regeneration Script Invocation (2025-10-09) - Added a convenience shell wrapper `run_regeneration.sh` under `src/main/webapp/plugins/rdfexport/legacy/tests/` that invokes the reproducible baseline helper with the requested commit, window, and overwrite flags. - Attempted to execute the wrapper to confirm behaviour; execution failed because the sandbox image does not include the `rdflib` dependency required by `regenerate_baselines.py`. - Left the failure details in the execution log so downstream runs can install dependencies or supply a virtual environment before rerunning the helper. ## Additional Testing - `src/main/webapp/plugins/rdfexport/legacy/tests/run_regeneration.sh` ## Baseline Regeneration Wrapper Verification (2025-10-10) - Installed the missing `rdflib` dependency in the sandbox environment and re-ran `run_regeneration.sh` without modifying repository files. - Confirmed that the helper replays commit `cf8f84bb84ff83843b6726ac96aff3a2055f4275`, regenerates all pristine fixture baselines with forced overwrites, and executes `pytest` to verify the regenerated outputs. ## Additional Testing - `src/main/webapp/plugins/rdfexport/legacy/tests/run_regeneration.sh`