The Lounge - CodeProject

First Prev Next

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Slacker00717-Dec-20 8:38

Slacker007

17-Dec-20 8:38

we use Newtonsoft with all of our Web APIs, etc. never had any noticeable issues with performance.

I guess if you are parsing big json files then, perhaps that is an issue, but we don't do that. so....

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

honey the codewitch17-Dec-20 8:41

honey the codewitch

17-Dec-20 8:41

If you ever find yourself bulk loading JSON dumps into a database, you can do better. Hell, you could use my tiny JSON C# lib which is around here at CP somewhere.

Real programmers use butterflies

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Jörgen Andersson17-Dec-20 10:15

Jörgen Andersson

17-Dec-20 10:15

Tell me when you make a parser for XML.
I'm loading 80 GB into a database every week, and XML (or rather the built in tools) seriously isn't made for that.

Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

honey the codewitch17-Dec-20 10:18

honey the codewitch

17-Dec-20 10:18

will do!

Real programmers use butterflies

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

PIEBALDconsult17-Dec-20 13:09

PIEBALDconsult

17-Dec-20 13:09

I load 51GB of XML with what SSIS has built-in. It takes about twelve minutes.

I load 5GB of JSON with my own parser. It takes about eight minutes.
I load 80GB of JSON with my own parser -- this dataset has tripled in size over the last month. It's now taking about five hours.

These datasets are in no way comparable, I'm just comparing the size-on-disk of the files.

I will, of course, accept that my JSON loader is a likely bottleneck, but I have nothing else to compare it against. It seemed "good enough" two years ago when I had a year-end deadline to meet.
I may also be able to configure my JSON Loader to use BulkCopy, as I do for the 5GB dataset, but I seem to recall that the data wasn't suited to it.

At any rate, I'm in need of an alternative, but it can't be third-party.

Next year will be different.

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Jörgen Andersson18-Dec-20 6:19

Jörgen Andersson

18-Dec-20 6:19

PIEBALDconsult wrote:
I load 51GB of XML with what SSIS has built-in. It takes about twelve minutes.

How much memory do you have?
Early tests of mine ran out of memory.
Or have I done something wrong?

Mine takes an hour for 85GB XML, but that uses bulkcopy. Early versions without bulkcopy indicated that it would indeed take 5-6 hours.

Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

PIEBALDconsult18-Dec-20 8:21

PIEBALDconsult

18-Dec-20 8:21

I don't know what SSIS does internally, but I doubt it loads the entire XML document into memory all at once.
I don't know how much RAM or how many processors the servers have.
I ran the XML load on my laptop, 16GB of RAM and usage increased by only four percent.

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Jörgen Andersson18-Dec-20 8:24

Jörgen Andersson

18-Dec-20 8:24

Ok, then I had some other problem, I might take another look at SSIS then.

Wrong is evil and must be defeated. - Jeff Ello
Never stop dreaming - Freddie Kruger

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

honey the codewitch18-Dec-20 6:45

honey the codewitch

18-Dec-20 6:45

if you can run C++ binaries on the server this might give you better performance, especially if you're only doing loads of part of the data.

JSON on Fire: JSON (C++) is a Blazing JSON Library that can Run on Low Memory Devices[^]

Real programmers use butterflies

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

PIEBALDconsult17-Dec-20 8:44

PIEBALDconsult

17-Dec-20 8:44

If people didn't constantly reinvent the wheel, we'd still be using wooden wheels several feet in diameter. Laugh | :laugh:

Use the right wheel for the right job. Don't try to adapt to an existing wheel if it just doesn't do the job.

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

honey the codewitch17-Dec-20 8:44

honey the codewitch

17-Dec-20 8:44

agreed!

Real programmers use butterflies

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Nelek17-Dec-20 9:14

Nelek

17-Dec-20 9:14

honey the codewitch wrote:

People are religious about never reinventing the wheel, but it's not always such a bad thing - it depends on the wheel.

M.D.V.

If something has a solution... Why do we have to worry about?. If it has no solution... For what reason do we have to worry about?
Help me to understand what I'm saying, and I'll explain it better to you
Rating helpful answers is nice, but saying thanks can be even nicer.

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Stuart Dootson17-Dec-20 23:51

Stuart Dootson

17-Dec-20 23:51

honey the codewitch wrote:
hosting the .NET CLI in C++ just to use a .NET package from C++ to parse a little JSON seems heavy handed and horribly inefficient.

If you're using C++, why not use a C++ JSON library such as Modern JSON, RapidJSON or simdjson?

Or if you do develop your own library, you might be interested to look at simdjson's 'On Demand' parsing approach...

Java, Basic, who cares - it's all a bunch of tree-hugging hippy cr*p

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

honey the codewitch18-Dec-20 0:37

honey the codewitch

18-Dec-20 0:37

They use too much memory and can't target IoT. of them simdjson shows the most potential but it still isn't about 71 bytes to do an episodes query off of a tmdb.com show data dump

Real programmers use butterflies

modified 18-Dec-20 6:46am.

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

John Stewien17-Dec-20 19:28

John Stewien

17-Dec-20 19:28

Some people have to work on air gap networks, where you can not copy anything to the network. It comes configured with a couple of approved things like the operating system, and whatever comes bundled with say Visual Studio 2015, and that's it. Nothing else gets in. With good reason too, e.g. see supply chain poisoning like the recent SolarWinds incident.

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Sander Rossel17-Dec-20 11:40

Sander Rossel

17-Dec-20 11:40

I'm pretty trusting.
When someone says they're going to give me JSON I assume they'll give me JSON.
So I'd go for it and worry about validation when the party that should be giving me JSON isn't giving me JSON.
So far that has worked pretty well.
In practice, these kind of things rarely break.
You either get JSON or no JSON at all, but rarely (or even never) a badly formed JSON.

Best,
Sander

Azure DevOps Succinctly (free eBook)
Azure Serverless Succinctly (free eBook)
Migrating Apps to the Cloud with Azure
arrgh.js - Bringing LINQ to JavaScript

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

honey the codewitch17-Dec-20 12:55

honey the codewitch

17-Dec-20 12:55

I agree!

Real programmers use butterflies

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Alexander Munro17-Dec-20 20:01

Alexander Munro

17-Dec-20 20:01

Since JSON is such a well defined construct simple parsers are very easy to write. I have a few. The nub is of course in 'a few'. It really falls into the case usage arena. If you know the data a quick regex parser will do. Regex parsers are fundamentally flawed though, and tend to fail on large data sets containing mixed characters (locale is a pain).

So, well-formedness is largely there already. Two dimensional arrays only require a few lines of code. Multi dimensional arrays just a few more. Large unknown datasets across languages? Use someone else's library and save yourself time.

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Member 1330167917-Dec-20 20:22

Member 13301679

17-Dec-20 20:22

Quote:
Or you could do one that's significantly faster but skips well formedness checking during search/skip operations, which can lead to later error reporting or missed errors

As with all input to your program, you validate on reception. All the other code that uses that input after that can then assume valid input and you can choose whatever shortcuts you want to on the assumption of valid input.

Doesn't matter if the input is JSON, XML, key/value pairs from .ini files or tokens, you only validate it once on reception.

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Mehdi Gholam17-Dec-20 21:28

Mehdi Gholam

17-Dec-20 21:28

The spec is pretty clear, so correctness and errors are clear.

To be fast is another matter, see fastJSON - Smallest, Fastest Polymorphic JSON Serializer[^] and GitHub - simdjson/simdjson: Parsing gigabytes of JSON per second[^]

Exception up = new Exception("Something is really wrong.");
throw up;

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

obeobe17-Dec-20 23:26

obeobe

17-Dec-20 23:26

A key question is what this parser will be used for.

Is it for a hobby project or a production system?

What would be the benefits of the higher performance? Will it be perceivable for human users? Will it save money by requiring less hardware? How much money?

Is there an impact on the development effort? What is the impact on the resulting code in terms of maintainability?

What would be the cost of choosing one option now and updating to the other option later? (is it a full rewrite? would it be simpler to go from A to B, or from B to A? etc.)

What would be the code of implementing both options and letting the user (well, caller) decide which one to use?

There are many things to factor in this decision. Maybe different developers will give different weights to these considerations, and inexperienced developers will overlook some or all of them, but I believe that for most developers the answer would (and should) be "it depends on the details of the situation".

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Member 1409260518-Dec-20 2:02

Member 14092605

18-Dec-20 2:02

Stability over performance!

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

MGuerrieri18-Dec-20 2:53

MGuerrieri

18-Dec-20 2:53

I take a function-first approach. You won't be able to parse the JSON if it's not well formed, so I would do that check first. If performance is poor, then I'd do a trace to find the bottlenecks and address them if possible. I wouldn't want to spend my time unnecessarily tracking down import errors.

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

honey the codewitch18-Dec-20 3:36

honey the codewitch

18-Dec-20 3:36

I look at it this way - and keep in mind this is purely hypothetical:

Let's say you're bulk uploading parts of some JSON out of a huge dataset. Almost always that JSON is machine generated because who writes huge JSON by hand? Scanning through it quickly is important. If at some point you get a bad data dump, might it be better to roll back that update and then run a validator over the bad document that one time out 1000 when it fails, rather than paying for that validation every other 999 times?

Real programmers use butterflies

Re: I'd like to ask a question about JSON to get a feel for priorities of coders here

Mark Meuer18-Dec-20 3:42

Mark Meuer

18-Dec-20 3:42

As a general rule, I try to follow these steps in order:

1. Make the program run right.
2. Make the program run right.
3. Make the program run right.
4. If I really need to, make it faster.

Last Visit: 31-Dec-99 18:00 Last Update: 25-Jun-24 3:43

Refresh

ᐊ Prev 1...4303 4304 4305 430643074308 4309 4310 4311 4312 Next ᐅ

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.

Welcome to the Lounge