|
Brandish an ancient weapon at the bard (11)
// TODO: Insert something here
|
|
|
|
|
SHAKESPEARE - the bard
brandish - SHAKE
ancient weapon - SPEAR(E)
|
|
|
|
|
Well done
Tag, you're it tomorrow.
// TODO: Insert something here
|
|
|
|
|
Thanks. Definitely a lot easier than yesterdays. That had me completely foxed!
|
|
|
|
|
I have to convert a recursive traversal to an iterative one.
My poor ESP32 can't handle the stack my JSON reader is generating. I have an optimization for skipping entire subtrees wihout actually parsing everything therein. The problem is it calls skipObjectPart() which may call skipArrayPart() which may call skipObjectPart() . Here I thought I was already ahead because it doesn't use stack in cases like { "foo": { "bar": {} } } only on the edge conditions where you're transitioning from a nested array to a nested object and vice versa like { "foo": [ { "bar": {} } ] } but even with that optimization it takes too much stack on my test document (about 120kB of highly nested JSON data)
So now I need to figure out how to do the whole thing iteratively
I do not like having to eliminate recursion. I guess this is what I get for porting from C# code to an embedded device.
It just occurred to me that I could eliminate the recursion entirely and easily by not distinguishing between {} and []. I would expect the document to be balanced in terms of left right brackets but i could ignore what the actual brackets are. That would lead to things like accepting { foo: [} ] but I think I might be able to get away that. Consider what a bad document looks like. If it's corrupt, how likely is it to be corrupt but still balanced in terms of general left and right brackets? Do you think this is an acceptable optimization for a tiny device who is reading data that 99% of the time will be machine generated? I've seen some pretty nefarious hacks to get big data to run on small systems, and this is one such situation. I'm flexible.
Real programmers use butterflies
modified 8-Dec-20 22:55pm.
|
|
|
|
|
When it has to be done, it has to be done. I had to do the same thing with QuadTree traversal, because recursive procedures were much too slow.
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
Too slow? How can that possibly be?
Until yesterday I would have come with my old and evil ways, but now I'm a reborn wraith (*) and must tell you that the compiler is far better than you at taking care of such things. Stacks are social constructs now, so you can easily ignore it. Then you can easily ignore it. It's size is now irrelevant and the calling overhead in recursive methods is also gone. Maybe it will help if your classes identify as iterative? Try to set the [iterative] attribute.
(*) No, I'm not.
I have lived with several Zen masters - all of them were cats.
His last invention was an evil Lasagna. It didn't kill anyone, and it actually tasted pretty good.
|
|
|
|
|
I would say it would be difficult to get the number of brackets matching and being wrong, but not hugely unlikely, as they share a key, so manually editing could introduce that issue.
You have to ask yourself two questions:
1. What would the issue be if that error is ignored?
2. How difficult would it be to add the checking (possibly at a different pipeline stage) i.e. by pushing a boolean onto a stack when getting a [{ and pulling and checking when getting a ]}?
I would probably try and check it, but I do not know the answer to 1. and do not know how close to the hardware/timing limit adding this check would take you. Also it could be overkill for parsing something that is machine generated.
Good luck
|
|
|
|
|
1. I'm not sure in the specific since this is a json reader and it depends on what kind of data they're using it for. It would mean practically at least in the abstract that it would accept invalid documents. The way they're invalid is so contrived though that I can't think of what that would even look like for the reader. It basically means it works kind of like overlapping tags do in old HTML except this overlaps your "array" data with your "object" data
2. It's a very finite memory issue. The problem is this JSON reader is designed to parse data of any length and any degree of nesting. That works when you have gobs of RAM like in .NET on a modern laptop but it's much different when you're doing everything on the stack and that stack is small on a 520kB ram TOTAL machine. Worse, I have no idea how much memory I'd need. And the icing on that fail cake is you get no warning when you're out of stack. Your machine reboots. This to me seems like a worse problem than accepting potentially invalid documents, now that I think about it. What do you think?
Real programmers use butterflies
|
|
|
|
|
I would allow ignoring potentially invalid documents.
You are not trying to validate the document, but to read it. Applying the robustness rule works here (Be conservative in what you do, be liberal in what you accept from others.)
BTW, thanks for sharing these projects you are working on, they are fascinating.
|
|
|
|
|
The problem is that '{' and '[' introduce different types of entities. '{' should be followed by <propertyname>':' <value> [',' <propertyname>':' <value>]... '}' whereas '[' should be followed by <value>[',' <value> ']' (where 'x' represents a literal x). To disambiguate, you'd have to see if the first two tokens after the '{' | '[' are <propertyname>':' rather than just parsing the first token after '{' | '[' which can be ambiguous (JSON accepts a quoted string as a <value> and as a <propertyname> ).
|
|
|
|
|
My JSON reader (in C# of course) is not recursive, it keeps a Stack of incomplete Objects as it reads.
It Pushes a new Object onto the Stack when it finds a { or [ .
It Pops the top Object off the Stack when it finds a } or ] .
But, looking at the code now, I see that it doesn't verify that they match -- only that they balance.
I wrote it in a hurry.
|
|
|
|
|
Underneath, I use a pull parser, which doesn't keep a stack either - it uses a state machine and forces the user of it to keep track of where they are. However, it recurses when it skips over (partial parsing). I could have made that use a stack but it would have been actually a bit harder to port to my arduino stuff. In the arduino version I use neither - I just keep a depth count, and i don't check if { [ ] } match, only that they balance, like you do.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: forces the user of it to keep track of where they are
Yes, that's at the next lower layer, an iterator which simply returns each piece of the JSON -- { , } , [ , ] , or a named value.
honey the codewitch wrote: it recurses when it skips over (partial parsing)
Mine copies it all into the Object. Then, when the Object is complete, it can "filter out" any parts which the application doesn't want (if requested) before returning it.
|
|
|
|
|
Ah, see, there we have a fundamentally different design.
My C# library has a low level pull based parser. This examines small windows of near infinite size documents. You move through it an element at a time in a loop, like Microsoft's XmlTextReader.** The pull parser is fast. Calling Read() is fast, but calling SkipSubtree() is significantly faster than reading through it because i do a partial parse - just enough to make sure the document is valid - i don't normalize.
In that parse is also ParseSubtree() which takes the section of the document you are on and puts it into an in-memory tree for you, which you can then do stuff like jsonpath filtering and navigation expressions or in place modification on.
Usually, you'll just parse an entire document into a tree and work with that, but for huge documents that's not practical, so you use the pull parser to navigate/skip to where you need to be, and then just load that subset you need rather than the whole document.
In my port to Arduino, I don't have the in memory trees, just the pull parser, but I may add the ability to do small in memory trees later. Everything else is pretty much the same except the functions are camelCase.
** Pull parsers are actually fantastic for embedded stuff because they allow you to process very large documents a bit at a time.
Real programmers use butterflies
|
|
|
|
|
honey the codewitch wrote: a low level pull based parser
I'm unsure what you mean by that, but it's likely to be about what I mean. Maybe yours is more generalized, while mine is JSON-specific.
Mine was designed primarily to quickly iterate the members of arrays within some large JSON files (20GB?) so the data contained can be written to SQL Server.
It also reads an element at a time, and builds up an Object as it goes, then passes each complete Object on for processing (writing to SQL Server) individually, so I have only a few Objects in memory at a time (determined by how many threads I'm using). So I don't load the whole file into memory at one time.
So, I may have a file which contains a number of arrays. I tell my utility to Read() the file a piece at a time and "when you find an array named 'Foo', load the Foo table with its contents ; when you find an array named 'Bar', load the Bar table ; etc." and then each requested array has its contents iterated object-by-object by some number of threads, processed, and written to SQL Server. I assume this is similar to your ParseSubtree()
Non-requested arrays and objects don't become full Objects, this is likely similar to your SkipSubtree() except for that it's really just Read() ing until it finds something it was told to look for -- it won't skip a subtree within an object it has been told to read.
Most files I'm reading contain only one big array each. And some JSON files I generate I segment into several parts to ease handling of them.
Filtering out parts of an Object happens after each Object has been fully populated, which allows the process to support more complex filtering requests.
|
|
|
|
|
It's probably easiest just to show you:
jsonReader.begin(file);
while (jsonReader.read()) {
switch (jsonReader.nodeType()) {
case JsonReader2k::Value: Serial.print("Value ");
switch (jsonReader.valueType()) { case JsonReader2k::String: Serial.print("String: ");
jsonReader.undecorate(); Serial.println(jsonReader.value()); break;
case JsonReader2k::Number: Serial.print("Number: ");
Serial.println(jsonReader.numericValue()); break;
case JsonReader2k::Boolean: Serial.print("Boolean: ");
Serial.println(jsonReader.booleanValue()); break;
case JsonReader2k::Null: Serial.print("Null: ");
Serial.println("null"); break;
}
break;
case JsonReader2k::Key: Serial.print("Key ");
Serial.println(jsonReader.value());
break;
case JsonReader2k::Object: Serial.println("Object");
break;
case JsonReader2k::EndObject: Serial.println("End Object");
break;
case JsonReader2k::Array: Serial.println("Array");
break;
case JsonReader2k::EndArray: Serial.println("End Array");
break;
case JsonReader2k::Error: Serial.println("Error!");
break;
}
}
file.close();
}
Which emits this:
Object
Key "backdrop_path"
Value String: /lgTB0XOd4UFixecZgwWrsR69AxY.jpg
Key "created_by"
Array
Object
Key "id"
Value Number: 1233032.00
Key "credit_id"
Value String: 525749f819c29531db09b231
Key "name"
Value String: Matt Nix
Key "profile_path"
Value String: /qvfbD7kc7nU3RklhFZDx9owIyrY.jpg
End Object
End Array
Key "episode_run_time"
Array
Value Number: 45.00
End Array
Key "first_air_date"
Value String: 2007-06-28
Key "genres"
For this:
{
"backdrop_path": "/lgTB0XOd4UFixecZgwWrsR69AxY.jpg",
"created_by": [
{
"id": 1233032,
"credit_id": "525749f819c29531db09b231",
"name": "Matt Nix",
"profile_path": "/qvfbD7kc7nU3RklhFZDx9owIyrY.jpg"
}
],
"episode_run_time": [
45
],
"first_air_date": "2007-06-28",
"genres":
...
It sounds like you're doing something similar except dumping to a database whereas I'm just spewing debug to the serial port for demonstration. (This is arduino C++ code)
Real programmers use butterflies
modified 9-Dec-20 12:58pm.
|
|
|
|
|
Home - /dev/null as a Service[^]
I... what.
What do you get when you cross a joke with a rhetorical question?
The metaphorical solid rear-end expulsions have impacted the metaphorical motorized bladed rotating air movement mechanism.
Do questions with multiple question marks annoy you???
|
|
|
|
|
That puts all of the <X>aaS in their place. Brilliant!
Freedom is the freedom to say that two plus two make four. If that is granted, all else follows.
-- 6079 Smith W.
|
|
|
|
|
I wonder if they take spontaneous job applications? I'm quite adept at that myself...
Anything that is unrelated to elephants is irrelephant Anonymous
- The problem with quotes on the internet is that you can never tell if they're genuine Winston Churchill, 1944
- Never argue with a fool. Onlookers may not be able to tell the difference. Mark Twain
|
|
|
|
|
Johnny J. wrote: I wonder if they take spontaneous job applications? I guess job applications all go straight in the bin!
Johnny J. wrote: I'm quite adept at that myself... rm -r * we've all been there. Well I have!
|
|
|
|
|
5teveH wrote: rm -r * we've all been there. Well I have! |
Strictly for amateurs. A professional uses rm -rf .*
N.B. I don't think this works any longer, but many years ago I had a new-to-unix DBA with root permissions do this. He wanted to get rid of all the hidden directories ... He managed that and a great deal more, besides!
Keep Calm and Carry On
|
|
|
|
|
|
Johnny J. wrote: I wonder if they take spontaneous job applications? Since you haven't done it yet, can it really be spontaneous anymore?
|
|
|
|
|
Maybe they have a service for that.
Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger
|
|
|
|