|
thank you for the suggestion and I have VS Code so it would be a useful add-on
Regards
|
|
|
|
|
|
|
|
I have - it caused me to look out my old copy of MS FrontPage.
|
|
|
|
|
ah, well, that sucks.
I have always hand rolled my html and css. never used an editor other than visual studio, code, etc.
|
|
|
|
|
Thank you for the suggestion - I will investigate it
Regards
|
|
|
|
|
It's usually in reference to XML parsers, but it's a generic parsing model that can apply to parsing anything. Contrast .NET's XmlTextReader (a pull parser) with a SAX XML parser (a push parser)
The reason I ask is because I use the term a lot in my articles lately, and I'm trying to figure out if it might be worth it to write an article about the concept.
I don't want to waste time with it if it's something most people have heard of before.
It's hard for me to know because I deep dove parsing for a year and everything is familiar to me now.
Real programmers use butterflies
|
|
|
|
|
Never heard of it
|
|
|
|
|
Thanks. I'm just trying to determine if it might be worth an article of its own since I've implemented so many of them.
They basically work like this: (example for JSON)
if (!fileLC.open("./data.json")) {
printf("Json file not found\r\n");
return;
}
JsonReader jsonReader(fileLC);
long long int nodes = 0; milliseconds start = duration_cast< milliseconds >(system_clock::now().time_since_epoch());
bool done = false;
while (!done && jsonReader.read())
{
++nodes;
switch (jsonReader.nodeType())
{
case JsonReader::Value: printf("Value ");
switch (jsonReader.valueType())
{ case JsonReader::String: printf("String: ");
printf("%s\r\n", jsonReader.value()); break;
case JsonReader::Real: printf("Real: %f\r\n", jsonReader.realValue()); break;
case JsonReader::Integer: printf("Integer: %lli\r\n", jsonReader.integerValue()); break;
case JsonReader::Boolean: printf("Boolean: %s\r\n", jsonReader.booleanValue() ? "true" : "false");
break;
case JsonReader::Null: printf("Null: (null)\r\n");
break;
default:
printf("Undefined!\r\n");
break;
}
break;
case JsonReader::Field: printf("Field %s\r\n", jsonReader.value());
break;
case JsonReader::Object: printf("Object (Start)\r\n");
break;
case JsonReader::EndObject: printf("Object (End)\r\n");
break;
case JsonReader::Array: printf("Array (Start)\r\n");
break;
case JsonReader::EndArray: printf("Array (End)\r\n");
break;
case JsonReader::Error: printf("Error: (%d) %s\r\n", jsonReader.lastError(), jsonReader.value());
done=true;
break;
}
}
milliseconds end = duration_cast<milliseconds>(system_clock::now().time_since_epoch());
printf("Scanned %lli nodes and %llu characters in %d milliseconds using %d bytes of LexContext\r\n",nodes,fileLC.position()+1,(int)(end.count()-start.count()),(int)fileLC.used());
fileLC.close();
Sorry for the long code. As you can see they are not at all easy to use but they are easier than a SAX XML parser.
Some advanced ones (like my recent offerings ) support querying and data extraction to make it easier. In the case of mine it's also more efficient to query than it is to stupidly read() through the whole file like the above. But the above is a standard pull parse.
Real programmers use butterflies
|
|
|
|
|
Is it a pushmi-pullyu[^] that eats parsnips?
"I have no idea what I did, but I'm taking full credit for it." - ThisOldTony
"Common sense is so rare these days, it should be classified as a super power" - Random T-shirt
AntiTwitter: @DalekDave is now a follower!
|
|
|
|
|
Hmmm,
I don't think the terminology you are using is as universal as you think it is. Outside of XML/JSON parsing and maybe compiler construction nobody uses that label for such a simple algorithm. It looks like Stefan Haustein came up with that name when he was writing kXML. Then it seems Aleksander Slominski wrote a paper using the same nomenclature in 1998 and it's been growing ever since. I can't find any reference to 'Pull parsing' before 1998.
Looks like 100% of the patents than mention 'Pull Parsing' are XML related[^].
I'm going to rename it Pull-My-Finger parsing and see if it catches on.
|
|
|
|
|
I don't how universal it is - that's what i'm trying to figure out. I just don't know that there's another term for the model.
I've implemented pull parsers for all kinds of sources. Recently, I implemented a crazy efficient querying pull parser that can process bulk JSON even on an 8-bit arduino with 8kb ram (it actually needs a lot less ram than that in practice), and blazes on a real computer
Diet JSON and a Coke: An exploration of incredibly efficient JSON processing[^]
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: I don't how universal it is - that's what i'm trying to figure out. I just don't know that there's another term for the model. It doesn't really matter... occupational nomenclature is invented. If you write an article on 'Pull Parsers' then perhaps 1,000,000 more people will use that name. That's sorta how it works.
honey the codewitch wrote: I've implemented pull parsers for all kinds of sources. I can see that you enjoy parsing. And I enjoy reading your journey.
Best Wishes,
-David Delaune
|
|
|
|
|
I'm pretty much with Randor on you should use it, people should have the wherewithal to look it up, or if it's an article aimed a beginners a brief description (+ more in depth links) of the terms might be appropriate. It's a technical article, so technical language is fine, and you'll help spread the terms.
To answer your direct question - no I haven't heard Push/Pull parser but mostly worked it out from the context.
|
|
|
|
|
Thank you. That's helpful.
Real programmers use butterflies
|
|
|
|
|
I've learned quite a lot from your musings in the lounge, but I've only skimmed through your technical articles on parsing, as they are way to specific for my needs.
Which means my knowledge on parsing is still fairly superficial, so any reasonably easy to read breakdown on the principles (not just push vs pull) would be appreciated.
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|
|
Thanks. Here's a comment i just posted to RickZeeland which should hopefully serve as a quick explanation. I included code in it just so you could see it in all it's ugliness.
The Lounge - Pull Parsing[^]
Real programmers use butterflies
|
|
|
|
|
Write a Wikipedia article. That'll make it true.
I can't imagine any kind of reader/parser which doesn't tokenize by pulling.
|
|
|
|
|
That's not exactly what a pull parser is.
A pull parser parses one small step at a time before returning control to the caller.
while(reader.read()) {...}
You call it like that, and inside the loop you check the nodeType() and the value() and such to get information about the node at the current location.
Microsoft built one for XML in .NET call the XmlReader - you've probably used a derivative of it before, if not directly, then indirectly by way of another XML facility like XPath or the DOM
NewtonSoft has one for JSON but I don't like it, personally.
Real programmers use butterflies
|
|
|
|
|
Yeah, I do that, but it's at a higher level. So -- for instance -- when my loader finds an array of Widgets, it iterates all the Widgets in that array, loading each into the database.
|
|
|
|
|
Yeah, I build that kind of stuff on top of the pull parser. In my Diet JSON and a Coke article I go into that - constructing queries out of navigation and data extraction elements.
You basically build queries and then feed those to the reader, and it drives the reader for you (in fact, it's more efficient than reading by calling read() yourself)
Real programmers use butterflies
|
|
|
|
|
I don't query or search, I simply iterate tokens until I reach the start of an array of objects I'm interested in.
Then I iterate those objects.
That way, I read each file only once.
For the most part, each of the files I'm reading is just one array of objects and I load the whole thing into one database table.
Only the most recent files I'm working with contain multiple arrays containing different types of objects -- and each type of object gets thrown at a different database table.
|
|
|
|
|
I made my parser with selective bulk loading of machine generated JSON in mind, which means when you search it does partial parsing and no normalization, allowing it to find what you're after FAST at the expense of some of the well formedness checking (but like i said, geared for machine generated dumps)
Not that it matters in a .NET environment, but my parser also will not use memory to hold anything you didn't explicitly request which means you need bytes to scan the file, and then store your results. I often do queries with about 256 bytes of RAM to work with. It doesn't even compare field names or undecorate strings in memory - it does it right off the input source (usually a disk, a socket or a string)
My latest codebase i'm working on will even allow you to stream value elements (field values and array members) so you can read massive BLOB values in the document. Gigabytes.
Real programmers use butterflies
modified 24-Dec-20 17:51pm.
|
|
|
|
|
honey the codewitch wrote: selective bulk loading
Yup.
honey the codewitch wrote: machine generated JSON
Yup.
honey the codewitch wrote: partial parsing
Supported.
honey the codewitch wrote: no normalization
That's up to a higher level to determine.
honey the codewitch wrote: at the expense of some of the well formedness checking
Basically none.
honey the codewitch wrote: It doesn't even compare field names
Why would it? That's up to a higher level to determine.
honey the codewitch wrote: undecorate strings in memory
Unquote? Unescape? I do that as late as possible, not until I know I want the value.
Bear in mind also that the underlying reader/tokenizer (?) is not used only for JSON, but for CSV as well.
_____________________________________
| Loader |
|___________________________________|
| JSONenumerator | | CSVenumerator |
|________________| |_______________|
| JSONtokenizer | | CSVtokenizer | Unquoting and unescaping happen here, as appropriate
|________________|__|_______________|
| STREAMtokenizer (base) |
|===================================|
| TextReader |
|===================================|
|
|
|
|