On my discussion list, I’ll be spending several weeks playing with AI to generate real-world software projects–not a toy but a full-scale project with complexity, testing, and lots of decision points.
For this project, I’m choosing to implement invisible XML, or ixml.
(Wait, don’t leave! I understand that XML is a sore point for many engineers. I, too, have shuddered after having to edit a pom.xml file. Despite it’s widespread mis-application into all kinds of places where it doesn’t belong, though, XML does have a very nice tool ecosystem, including declarative processing with XSLT. Even if you’re not convinced, stick around, because this makes the ugly parts of XML ‘invisible’. And gives us a project complex enough to really put AI collaboration to the test.)
So ixml is all about grammars. For example, this (highly simplified–don’t @ me!) grammar for an email address. (Grammar geeks take note: there’s no lexer. Everything is defined down to the character level.)
email-address: local-part, "@", domain.
local-part: (letter | digit), (letter | digit | "." | "-" | "_")+.
domain: subdomain, (".", subdomain)+.
subdomain: (letter | digit), (letter | digit | "-"), (letter | digit)+.
-letter: ["A"-"Z"; "a"-"z"].
-digit: ["0"-"9"].
Not terribly different from what you might see in an IETF or W3C specification. But the interesting thing is that you can use an ixml grammar to define the ixml grammar. It’s its own fixed point.
The easiest way to play with this is using John Lumley’s online jωxml workbench. Paste the entire code block above into the grammar section, then your favorite email addresses in the input section and punch the green GO! button.
This approach gets you not only validation of the email address, but also a parse tree indicating various pieces of the input like ‘local-part’ or ‘subdomain’.
The project goal then, is to create an implementation of the iXML spec in idiomatic Rust. We want minimal dependencies, and the option to cross-compile to WebAssembly. There exists an extensive test suite available, so part of this work is wiring up the entire set of tests, and driving the conformance to 100%.
I actually started such a project, fully hand coded with no AI involvement, but life and job change got in the way. This was my first-ever Rust project, and it ended up getting abandoned just as it was getting to the it-kinda-works stage.
For those following along at home, here’s a concrete test case. If working software could consume this input
http://yahoo.com/
And process it with these rules (don’t @ me…this is just a smoke test example…)
url: scheme, ":", authority, path.
scheme: letter+.
authority: "//", host.
host: sub++".".
sub: letter+.
path: ("/", seg)+.
seg: fletter*.
-letter: ["a"-"z"]; ["A"-"Z"]; ["0"-"9"].
-fletter: letter; ".".
to produce an accurate parse tree like this (whitespace for clarity)
<url>
<scheme>http</scheme>
:
<authority>//<host><sub>yahoo</sub>.<sub>com</sub></host></authority>
<path>/<seg/></path>
</url>
I’d consider that a success. Of course, with a goal of 100% conformance it’s just the beginning, but still a good start.
Anyway, I learned a lot from the experience of building this by hand. With AI as a collaborator, can we start…and finish…a comparable project?
The only way to find out is to join my distribution list, the Problem Solvers Digest. See you there!