|
Very useful comment! Thank you.
rhyous wrote: needs to write a serializer and a deserializer in the top 10 most popular languages
I am currently working on:
- an open-source serializer/deserializer in Java
- an article to introduce the serializer/deserializer (to be published here on codeproject)
- a dedicated pXML website, where users can submit implementations in other languages
All is planned to be published before end of next month (May 2021).
rhyous wrote: Even though you don't need a shift, ...
Good explanation showing why different people have different preferences when it comes to the choice of brackets.
rhyous wrote: a syntax you could put on the first line that would change the bracket type
That's an interesting idea worth pondering about.
However, adding such a parameter would increase complexity. The rules for escaping become more complicated, readers/writers risk to become less efficient, pXML documents become less uniform, and incompatibility issues might occur in practice.
Suppose the following pXML that uses standard brackets []:
[doc
[foo This text contains \[, \], < and >]
]
A user who prefers <> would write the same code like this:
#BRACKETS=<>
<doc
<foo This text contains [, ], \< and \>>
>
Unescaping now becomes more complex for deserializers (readers), and possibly less efficient, because they now have to consider the #BRACKETS parameter.
Complexity is also increased for serializers (writers), because they have to apply escape rules that depend on the #BRACKETS parameter.
If somebody decides to change the brackets in a document, he/she must be careful to also adapt escape sequences.
To avoid confusion, one could decide that all types of brackets must always be escaped, but that would be very inconvenient for people who use a lot of brackets in their texts.
Moreover, if pXML snippets from different sources are merged (a pXML feature for later), then things get even more complicated, when different snippets use different brackets. Readers must then be able to change brackets on-the-fly.
I experienced this complexity myself a while ago when I had to create a parser with this kind of flexibility.
Unfortunately, it seems there is no one-size-fits-all solution for brackets.
The basic pXML syntax should be kept as simple as possible, because this makes it easier for people to create readers/writers. Maybe a brackets parameter could later be added as an optional extension or an experimental feature, before taking a final decision based on community feedback.
|
|
|
|
|
Thanks, your article hit upon an issue I have been dealing with. I really appreciate the analysis of [] verses alternatives and the thought given to making it easy to read. My ambition was to create something much simpler, just a few attributes, to markup code for UI help etc. all of which is generated. Good Job.
Bob Steiner
|
|
|
|
|
|
The following is valid XML:
<tag A="1" B="2" C="3"><D>4</D><E>5</E></tag>
My assumption is that this would convert to PXML as:
[tag (A=1 B=2 C=3)[D 4][E 4]]
Which would be fine and not lossy; however, in your paper you said that the PXML parser would surface both attributes and child nodes as child nodes. So would the node object then have a type (attribute or child)? Otherwise, it would seem we'd lose the knowledge of which was which.
|
|
|
|
|
David On Life wrote: My assumption is that this would convert to PXML as:
[tag (A=1 B=2 C=3)[D 4][E 4]]
Yes, correct!
When a pXML document is parsed into memory, each node has a flag to indicate if it has been defined with the attributes or tag syntax. Hence the information is not lost, and it is considered when pXML is converted to XML, or vice versa.
In my next article about the parser I will show examples of this.
I will also add a note in this article to clarify this.
|
|
|
|
|
I use XML mostly with attributes. I'd prefer the syntax:
<tag att="value" :="" text="" of="" tag="">
So your first example would be
<i:foo>
|
|
|
|
|
Sorry, but I really don't understand what you mean. Could you explain please?
|
|
|
|
|
Sorry, I was in a hurry that morning and the formating is terrible.
I think it's easer to read the code, if it matches the old XML syntax as close as possible. So, please keep the <> braces. [] and {} are hard to type on a German QWERTZ keyboard, where <> is very easy to type.
Next, keep the attributes the old style. In order to do that, we need a separator for the start of the contents of an element. I'd pick the ':' colon character for that, but I'm open for better ideas. Using the '\' for escaping is excellent.
So the XML/XHTML
<a href="link" target="blank" >Click here > </a>
becomes:
<a href="link" target=blank : Click here\> >
Preview:
Click here > (note the > character!)
To me, that is very readable, because it's very like the original and it's very short to type.
|
|
|
|
|
Interesting suggestion for a different syntax!
Gernot Frisch wrote: [] and {} are hard to type on a German QWERTZ keyboard
As stated already in other comments, there is unfortunately no brackets pair that is easy to type on all keyboards. Therefore I chose the pair that is easiest to type for most people in the world. Maybe a parameter could be added to the parser, so that users can chose their preferred bracket pair ([], <>, {}, or ()). However, this solution has disadvantages too, as explained in another comment[^]
Gernot Frisch wrote: I'd pick the ':' colon character for that, but I'm open for better ideas.
':' is problematic, because it's used already as namespace separator. <foo:bar> is parsed as an empty element with name 'bar' in namespace 'foo'. So I'll continue my comment with '/' as separator (arbitrary choice).
Gernot Frisch wrote: keep the attributes the old style. In order to do that, we need a separator for the start of the contents of an element.
Your syntax reduces verbosity for elements that have only attributes, or attributes and text:
pXML: [img (src=ball.png alt=Ball)]
your syntax: [img src=ball.png alt=Ball]
pXML: [a (href=link target=blank) Click here]
your syntax: [a href=link target=blank / Click here]
But it makes elements with text only less user-friendly:
pXML: [div text]
your syntax: [div / text]
or: [div/text]
I compared many other examples. An important point is to consider typical markup code (e.g. XHTML):
pXML: [p This is [i italic] and [b bold]]
Your syntax: [p/This is [i/italic] and [b/bold]]
In this case the pXML code is easier to read and write.
As with brackets, there is no absolute-best-one-size-fits-all syntax. The challenge is to choose (among the infinite set of possible syntaxes) a syntax that is well suited in most cases.
In the context of pXML, it is important to have a user-friendly syntax for markup code and config data.
Moreover, it is always possible to provide optional lenient parsing for specific domains. However, lenient parsing requires look-ahead parsing (and maybe also regexes), which makes parsing more complex and less efficient. Lenient parsing also requires more rules, and sometimes there must be specific rules for specific tags. Therefore lenient parsing should not be part of basic pXML.
However it can make the syntax much more user-friendly in specific use cases. As explained in the article, I use lenient parsing in PML[^] (makes PML code succinct). Here is an example:
strict pXML: [image (source=ball.png title="Red ball")]
lenient PML 1: [image source=ball.png title="Red ball"]
lenient PML 2: [image source=ball.png title=Red ball]
final PML: [image ball.png title=Red ball]
Thanks for your suggestions.
|
|
|
|
|
|
|
Nice and clean. XML is just too darn wordy.
Roger House
|
|
|
|
|
|
Your bracket choice of [ ] may be true for a QWERTY keyboard but we live here in France with an AZERTY one. It requires CTRL and ALT. I do not like to trigger this pair.. a DEL ( SUPPR on french keyboard ) is not far from reach !
|
|
|
|
|
> we live here in France with an AZERTY one
How many people in the world are using an AZERTY (french) keyboard? And how many use a QWERTY (english) keyboard?
If a designer has to favor one or the other he/she will select the keyboard that most people use.
Unfortunately there is no bracket pair that is easy to type on all keyboard layouts (as can easily be verified here).
To minimize the need for typing inconvenient key combinations, it might be useful to use an editor/tool that allows you to reconfigure your keyboard, use hotkeys, predefined code snippets, auto-completion, etc.
modified 22-Apr-21 3:35am.
|
|
|
|
|
Perhaps a better solution would be to allow an optional pml document header that allows the writer to specify what the bracket characters are, and the tokenizer adjusts appropriately? Make the default [] and let the rest of the world pick ones that work better for their keyboard layouts?
|
|
|
|
|
Yes, that's a good suggestion. Thank you.
See my answer[^] to another member who suggested a BRACKETS parameter.
|
|
|
|
|
I agree 100 % with the rationale behind PML and pXML
|
|
|
|
|
Glad you liked it. Thank you so much.
|
|
|
|
|
|
|
Meanwhile, in Lisp syntax:
(doc title:"Test"
(ch title:"An Unusual Surprise"
(p Look at the following picture:)
(image source:"images/strawberries.jpg")
(p Text of paragraph 2)
(p Text of paragraph 3)))
Seems like you've got a crippled re-implementation of Lisp syntax.
Every year modern programming languages get a little bit closer to the feature set of mid-eighties Lisp, so I'm not surprised that the syntax itself is being independently rediscovered.
TLDR - All languages converge on Lisp features, and all syntax converges on S-expressions.
|
|
|
|
|
Member 13301679 wrote: Seems like you've got a crippled re-implementation of Lisp syntax.
So you took a pXML example, replaced [] by () , replaced the indentation at the end with ))) (to make it look like Lisp), and then tell people that pXML is just "a crippled re-implementation of Lisp syntax". Unbelievable! Terribly unfair and simply wrong!
Member 13301679 wrote: All languages converge on Lisp features
Many languages use brackets to define boundaries (e.g. C- and Java-like languages use {} , XML uses <> , pXML uses [] ). But that doesn't make these languages "Lisp-like" if you just replace the brackets with () . Wrong and useless statement.
Member 13301679 wrote: all syntax converges on S-expressions
Wrong too (unless you mean that all syntax is just a list of tokens). pXML does not use s-expressions, although it might seem like that for people who don't understand the basic concepts. Consider this:
s-expression: (a b c d)
pXML: [a b c d]
The first example (s-expression) denotes a list with four elements: a , b , c , and d .
The second example (pXML), which is conceptually the same as writing <a>b c d</a> in XML, denotes a tree node with name a and with string "b c d" as content.
That's semantically very different!
Moreover, the syntax for attributes in XML and pXML (e.g. a = "b c d" ) is totally unrelated to s-expressions or Lisp.
|
|
|
|
|
Quote: So you took a pXML example, replaced [] by (), replaced the indentation at the end with ))) (to make it look like Lisp), and then tell people that pXML is just "a crippled re-implementation of Lisp syntax". Unbelievable! Terribly unfair and simply wrong!
Actually, Lisp came first, so yeah, pXML simply replaces all the '(' with '[' and all the ')' with ']'. This is exactly what I meant - pXML takes Lisp syntax and superficially changes some characters.
Quote:
Member 13301679 wrote:
All languages converge on Lisp features
Many languages use brackets to define boundaries (e.g. C- and Java-like languages use {}, XML uses <>, pXML uses []). But that doesn't make these languages "Lisp-like" if you just replace the brackets with (). Wrong and useless statement.
That's not "features", that's "syntax". When I say that all programming languages converge on Lisp feature-wise, I mean that they eventually get features that were in Lisp 30 years ago.
Syntax is separate.
Quote: Wrong too (unless you mean that all syntax is just a list of tokens)
It literally is. That's literally what the AST in all programming languages is!
Quote: The first example (s-expression) denotes a list with four elements: a, b, c, and d.
The second example (pXML), which is conceptually the same as writing b c d in XML, denotes a tree node with name a and with string "b c d" as content.
That's semantically very different!
Not in Lisp. This valid lisp code:
(defmacro a (&body rest)
(progn
(format t "<a>")
(dolist (r rest)
(format t "~a " r))
(format t "</a>")))
Turns any occurrence of "(a b c d)" into a tree "a b c d". Running this in my terminal gives me this:
$ cat t.lisp
(defmacro a (&body rest)
(progn
(format t "<a>")
(dolist (r rest)
(format t "~a " r))
(format t "</a>")))
(a b c d)
$ clisp -q < t.lisp
A
<a>B C D </a>
So, yeah, in Lisp every first element of an s-expression can be turned into the root of a tree that holds every other element, recursively.
I know this, because in 2001 I wrote a system for a client to generate the HTML and DOM from s-expressions. Generating html was as simple as this:
(html
(body
(h1 "Login")
(form action: /myform.php method: get
(span "Enter username")
(input name: username)
(span "Enter password")
(input name: password type: password)
(checkbox name: must_remember "Remember me!")
(button type: submit "Login"))))
And, IIRC, I wasn't the only one doing stuff like this. XML is a subset of Lisp s-expressions, html is a subset of XML. Your proposal is html transformed, hence it's a subset of Lisp s-expressions.
|
|
|
|
|
It also reminded me of Lisp when I first saw it. The critique may be harsh but I think mostly correct.
The Lisp-HTML approach shown seems a little bit more readable to me, and if I'm not wrong there are some systems that still use it today for some webpages.
I think some comparisons to HTML templating engines(?) (e. g. Emmet, pug, ...) might be good as they also try to reduce complexity or make is more humanly readable but still support the full html feature set as they are transpiled into html.
I'm also not sure whether pXML is really more readable for humans. The nesting shown seems not necessarily easier to read for deep and complex XML documents. And with good syntax highlighting XML is good enough.
|
|
|
|
|