My assumption is that this would convert to PXML as:
[tag (A=1 B=2 C=3)[D 4][E 4]]
Which would be fine and not lossy; however, in your paper you said that the PXML parser would surface both attributes and child nodes as child nodes. So would the node object then have a type (attribute or child)? Otherwise, it would seem we'd lose the knowledge of which was which.
My assumption is that this would convert to PXML as:
[tag (A=1 B=2 C=3)[D 4][E 4]]
Yes, correct!
When a pXML document is parsed into memory, each node has a flag to indicate if it has been defined with the attributes or tag syntax. Hence the information is not lost, and it is considered when pXML is converted to XML, or vice versa.
In my next article about the parser I will show examples of this.
I will also add a note in this article to clarify this.
Sorry, I was in a hurry that morning and the formating is terrible.
I think it's easer to read the code, if it matches the old XML syntax as close as possible. So, please keep the <> braces. [] and {} are hard to type on a German QWERTZ keyboard, where <> is very easy to type.
Next, keep the attributes the old style. In order to do that, we need a separator for the start of the contents of an element. I'd pick the ':' colon character for that, but I'm open for better ideas. Using the '\' for escaping is excellent.
So the XML/XHTML
<a href="link" target="blank" >Click here > </a>
[] and {} are hard to type on a German QWERTZ keyboard
As stated already in other comments, there is unfortunately no brackets pair that is easy to type on all keyboards. Therefore I chose the pair that is easiest to type for most people in the world. Maybe a parameter could be added to the parser, so that users can chose their preferred bracket pair ([], <>, {}, or ()). However, this solution has disadvantages too, as explained in another comment[^]
Gernot Frisch wrote:
I'd pick the ':' colon character for that, but I'm open for better ideas.
':' is problematic, because it's used already as namespace separator. <foo:bar> is parsed as an empty element with name 'bar' in namespace 'foo'. So I'll continue my comment with '/' as separator (arbitrary choice).
Gernot Frisch wrote:
keep the attributes the old style. In order to do that, we need a separator for the start of the contents of an element.
Your syntax reduces verbosity for elements that have only attributes, or attributes and text:
But it makes elements with text only less user-friendly:
pXML: [div text]
your syntax: [div / text]
or: [div/text]
I compared many other examples. An important point is to consider typical markup code (e.g. XHTML):
pXML: [p This is [i italic] and [b bold]]
Your syntax: [p/This is [i/italic] and [b/bold]]
In this case the pXML code is easier to read and write.
As with brackets, there is no absolute-best-one-size-fits-all syntax. The challenge is to choose (among the infinite set of possible syntaxes) a syntax that is well suited in most cases.
In the context of pXML, it is important to have a user-friendly syntax for markup code and config data.
Moreover, it is always possible to provide optional lenient parsing for specific domains. However, lenient parsing requires look-ahead parsing (and maybe also regexes), which makes parsing more complex and less efficient. Lenient parsing also requires more rules, and sometimes there must be specific rules for specific tags. Therefore lenient parsing should not be part of basic pXML.
However it can make the syntax much more user-friendly in specific use cases. As explained in the article, I use lenient parsing in PML[^] (makes PML code succinct). Here is an example:
strict pXML: [image (source=ball.png title="Red ball")]
lenient PML 1: [image source=ball.png title="Red ball"] // parentheses not required
lenient PML 2: [image source=ball.png title=Red ball] // quotes not required,// even if value contains spaces
final PML: [image ball.png title=Red ball] // name not required for// default attribute
Your bracket choice of [ ] may be true for a QWERTY keyboard but we live here in France with an AZERTY one. It requires CTRL and ALT. I do not like to trigger this pair.. a DEL ( SUPPR on french keyboard ) is not far from reach !
How many people in the world are using an AZERTY (french) keyboard? And how many use a QWERTY (english) keyboard?
If a designer has to favor one or the other he/she will select the keyboard that most people use.
Unfortunately there is no bracket pair that is easy to type on all keyboard layouts (as can easily be verified here).
To minimize the need for typing inconvenient key combinations, it might be useful to use an editor/tool that allows you to reconfigure your keyboard, use hotkeys, predefined code snippets, auto-completion, etc.
Perhaps a better solution would be to allow an optional pml document header that allows the writer to specify what the bracket characters are, and the tokenizer adjusts appropriately? Make the default [] and let the rest of the world pick ones that work better for their keyboard layouts?
(doc title:"Test"
(ch title:"An Unusual Surprise"
(p Look at the following picture:)
(image source:"images/strawberries.jpg")
(p Text of paragraph 2)
(p Text of paragraph 3)))
Seems like you've got a crippled re-implementation of Lisp syntax.
Every year modern programming languages get a little bit closer to the feature set of mid-eighties Lisp, so I'm not surprised that the syntax itself is being independently rediscovered.
TLDR - All languages converge on Lisp features, and all syntax converges on S-expressions.
Seems like you've got a crippled re-implementation of Lisp syntax.
So you took a pXML example, replaced [] by (), replaced the indentation at the end with ))) (to make it look like Lisp), and then tell people that pXML is just "a crippled re-implementation of Lisp syntax". Unbelievable! Terribly unfair and simply wrong!
Member 13301679 wrote:
All languages converge on Lisp features
Many languages use brackets to define boundaries (e.g. C- and Java-like languages use {}, XML uses <>, pXML uses []). But that doesn't make these languages "Lisp-like" if you just replace the brackets with (). Wrong and useless statement.
Member 13301679 wrote:
all syntax converges on S-expressions
Wrong too (unless you mean that all syntax is just a list of tokens). pXML does not use s-expressions, although it might seem like that for people who don't understand the basic concepts. Consider this:
s-expression: (a b c d)
pXML: [a b c d]
The first example (s-expression) denotes a list with four elements: a, b, c, and d.
The second example (pXML), which is conceptually the same as writing <a>b c d</a> in XML, denotes a tree node with name a and with string "b c d" as content.
That's semantically very different!
Moreover, the syntax for attributes in XML and pXML (e.g. a = "b c d") is totally unrelated to s-expressions or Lisp.
So you took a pXML example, replaced [] by (), replaced the indentation at the end with ))) (to make it look like Lisp), and then tell people that pXML is just "a crippled re-implementation of Lisp syntax". Unbelievable! Terribly unfair and simply wrong!
Actually, Lisp came first, so yeah, pXML simply replaces all the '(' with '[' and all the ')' with ']'. This is exactly what I meant - pXML takes Lisp syntax and superficially changes some characters.
Quote:
Member 13301679 wrote:
All languages converge on Lisp features
Many languages use brackets to define boundaries (e.g. C- and Java-like languages use {}, XML uses <>, pXML uses []). But that doesn't make these languages "Lisp-like" if you just replace the brackets with (). Wrong and useless statement.
That's not "features", that's "syntax". When I say that all programming languages converge on Lisp feature-wise, I mean that they eventually get features that were in Lisp 30 years ago.
Syntax is separate.
Quote:
Wrong too (unless you mean that all syntax is just a list of tokens)
It literally is. That's literally what the AST in all programming languages is!
Quote:
The first example (s-expression) denotes a list with four elements: a, b, c, and d.
The second example (pXML), which is conceptually the same as writing b c d in XML, denotes a tree node with name a and with string "b c d" as content.
That's semantically very different!
Not in Lisp. This valid lisp code:
(defmacro a (&body rest)
(progn
(format t "<a>")
(dolist (r rest)
(format t "~a " r))
(format t "</a>")))
Turns any occurrence of "(a b c d)" into a tree "a b c d". Running this in my terminal gives me this:
$ cat t.lisp
(defmacro a (&body rest)
(progn
(format t "<a>")
(dolist (r rest)
(format t "~a " r))
(format t "</a>")))
(a b c d)
$ clisp -q < t.lisp
A
<a>B C D </a>
So, yeah, in Lisp every first element of an s-expression can be turned into the root of a tree that holds every other element, recursively.
I know this, because in 2001 I wrote a system for a client to generate the HTML and DOM from s-expressions. Generating html was as simple as this:
And, IIRC, I wasn't the only one doing stuff like this. XML is a subset of Lisp s-expressions, html is a subset of XML. Your proposal is html transformed, hence it's a subset of Lisp s-expressions.
It also reminded me of Lisp when I first saw it. The critique may be harsh but I think mostly correct.
The Lisp-HTML approach shown seems a little bit more readable to me, and if I'm not wrong there are some systems that still use it today for some webpages.
I think some comparisons to HTML templating engines(?) (e. g. Emmet, pug, ...) might be good as they also try to reduce complexity or make is more humanly readable but still support the full html feature set as they are transpiled into html.
I'm also not sure whether pXML is really more readable for humans. The nesting shown seems not necessarily easier to read for deep and complex XML documents. And with good syntax highlighting XML is good enough.
I dislike your proposal that is not simple for french keyboard where [ and ] characters are not easy to type !!! What you propose is interessant but it is only a bad dream. You never speak about positional attributes or named attributes