|
Roger Wright wrote: I believe that's a double-windsor
Stating the obvious?
Peter Wasser
Art is making something out of nothing and selling it.
Frank Zappa
|
|
|
|
|
I would have thought that a 'noXML' movement would make much more sense than the so-called 'noSQL' movement.
Last autumn I was faced with an odd bit of code that threw 13MB of xml around a solution based on RabbitMQ. Needless to say this didn’t exactly do wonders for the performance of the solution, so I spent a couple of days implementing a proprietary – gasp! – serialization mechanism for the data.
This reduced the amount of data to 32KB or about 0.24% of the original size.
Now, this isn’t the first time I’ve turned an abysmally performing solution into something halfway decent by getting rid of everything related to xml, except for the app.config; and I know many of us have similar experiences.
So, how come noSQL has gained a tremendous traction, while XML based communication like REST, OData, or even worse SOAP, is often used by the same applications that are now based on ‘noSQL’ databases?
I’m basically looking for sensible ways to explain that XML is a bad idea – when you have large amounts of data – to decision makers without stepping on too many toes.
|
|
|
|
|
I believe I have seen comments that suggest XML is 30% to 300% larger in size on average.
Seems reasonable. As for your example I can only suppose that you have very verbose syntax and/or extensive use of namespaces on individual elements to get that amount of reduction.
There exists a category to reduce XML in size called "binary XML"
http://en.wikipedia.org/wiki/Binary_XML[^]
Espen Harlinn wrote: I’m basically looking for sensible ways to explain that XML is a bad idea – when
you have large amounts of data
That of course is simplistic statement. It ignores both what "large" means as well as ignoring how the data is composed.
It also ignores completely some of the problems XML attempts to solve such as adding more data and dealing with data in multiple languages and platforms. A binary solution is often fragile in the first regard and requires libraries for each target in the second.
As an example with your example...
If I need to process one 32m file a day then "large" means nothing. If however I need to process 10,000 a second then size is going to have a huge impact. However the vast majority of businesses will never need to do the second case.
|
|
|
|
|
jschell wrote: As for your example I can only suppose that you have very verbose syntax and/or extensive use of namespaces on individual elements to get that amount of reduction
You can say that again
jschell wrote: It ignores both what "large" means
Often cases where 100MB/s bandwidth becomes a bottleneck.
jschell wrote: ignoring how the data is composed
For the case I mentioned, it was a process model with related process information.
Sometimes I work with pretty large amounts of process data - so I'm basically talking about tag identifier, timestamp, quality and process value, but I might see 50 000 to 100 000 of those each second - and doing that using datacontract kills performance.
|
|
|
|
|
Clearly sounds like the original codes were using the wrong tool for the job.
If your actions inspire others to dream more, learn more, do more and become more, you are a leader." - John Quincy Adams You must accept one of two basic premises: Either we are alone in the universe, or we are not alone in the universe. And either way, the implications are staggering” - Wernher von Braun
|
|
|
|
|
ahmed zahmed wrote: Clearly sounds like the original codes were using the wrong tool for the job.
Obviously, and somewhat poorly implemented at that.
My problem, on the other hand, is make a convincing case for implementing the OMG Data Distribution Service[^] for .Net
|
|
|
|
|
Espen Harlinn wrote: My problem, on the other hand, is make a convincing case for implementing
If I wanted a sustained rate of 100,000 tps then I suspect that I would end up with a proprietary protocol which started out with a specific analysis of the data.
|
|
|
|
|
The DDS Interoperability Protocol for the OMG Data Distribution Service has the required performance characteristics - I've done a bit of experimentation with OpenSplice DDS[^]
I've spent most of my career doing similar stuff, figuring out how to solve do this is not the problem.
Business case, backing, bickering and funding are the issues I will have to deal with.
|
|
|
|
|
Espen Harlinn wrote: Often cases where 100MB/s bandwidth becomes a bottleneck.
Not sure what you mean since, at least where I am, there is no such thing as a 100MB limit for most businesses. And businesses that expect that volume should have a bigger pipe.
Espen Harlinn wrote: but I might see 50 000 to 100 000 of those each second - and doing that using
datacontract kills performance.
However MOST businesses will never even see close to that volume.
As an example I did a calc several years ago that suggested that the credit card rate for the entire US at Xmas would be 2000 tps a second.
And no one is going to be handling that volume on a single pipe because the big customers would require coloc.
Not to mention that I seriously doubt you can be doing 100,000 on a 100mbs (at that actual limit) network.
|
|
|
|
|
jschell wrote: Not sure what you mean since, at least where I am.
Most of my work is related to the oil and energy industry.
jschell wrote: Not to mention that I seriously doubt you can be doing 100,000 on a 100mbs (at
that actual limit) network.
As long as I use binary, and only for test purposes - that is no processing and no storage, it seems that the practical limit is about 400 000 messages contaning 20 bytes each on a 100MB/s dedicated network.
|
|
|
|
|
Espen Harlinn wrote: As long as I use binary, and only for test purposes - that is no processing and
no storage, it seems that the practical limit is about 400 000 messages
contaning 20 bytes each on a 100MB/s dedicated network.
Presumably 20 bytes of data.
Minimum IP header is 20 bytes (and I suspect more for TCP but good enough) so you minimum packet size is 40 bytes.
40 bytes x 400,000 x 8 bits/byte = 128,000,000 bits.
So it is safe to say that your 100MB/s network is not in fact limited to just 100 mbs.
And this of course presumes that you messed with the default minimum packet size. Otherwise the real rate is something like 300 mps which is what many "100 mps" networks get these days.
Also why not just batch the data instead of attempting to deal with each one?
|
|
|
|
|
jschell wrote: 40 bytes x 400,000 x 8 bits/byte = 128,000,000 bits.
It's a stream of "messages" - and, yes, I can see that I should have expressed myself clearer.
jschell wrote: Also why not just batch the data instead of attempting to deal with each one?
I think you are suggesting what I actually did for this test ...
|
|
|
|
|
Espen Harlinn wrote: I think you are suggesting what I actually did for this test ...
Eh?
If so then you did not send 400,000 TCP messages. Instead you sent X number of TCP messages which was less than 400,000 and which each TCP message contained more than one of your date blocks.
|
|
|
|
|
jschell wrote: If so then you did not send 400,000 TCP messages. Instead you sent X number of TCP messages which was less than 400,000 and which each TCP message contained more than one of your date blocks.
Correct, as I mentioned, I can see that what I wrote could easily be missunderstood - sorry about that ... but what I asked wasn't really a technical question.
I would have thought that I'm not the first that has a problem convincing nontechnical people that xml isn't a silver bullet, and that it's sometimes smart to look elsewhere.
My guess was that somebody had a few winning arguments they would have liked to share, that's all.
|
|
|
|
|
Espen Harlinn wrote: I would have thought that I'm not the first that has a problem convincing
nontechnical people that xml isn't a silver bullet
Use what I posted in the other response. That is XML and has low overhead.
|
|
|
|
|
Imagine being hired to come in and save the day and fix a giant mess that involved XML. Now imagine you are hired to fix the same mess but everything used a different undocumented binary protocol with mixed endianness.
|
|
|
|
|
What I'd like to do, is to implement the OMG Data Distribution Service[^] for .Net with some sort of OPC UA[^] bridge.
While I'm sure this will solve a lot of problems in the long run, it's not a trivial project.
|
|
|
|
|
Espen Harlinn wrote: I spent a couple of days implementing a proprietary – gasp! – serialization
mechanism for the data.
But if you where using WCF you could have just created a tcp endpoint and sent binary data just like that! BAM WCF magic wins!
|
|
|
|
|
killabyte wrote: But if you where using WCF you could have just created a tcp endpoint and sent binary data just like that! BAM WCF magic wins!
Which, for the project I mentioned, I implemented and demonstrated. Somebody has still not forgiven me for that. I won the argument about serialization, and lost my argument about replacing RabbitMQ with WCF
Which brings me back to my question about how to argument, in this case, against the use of XML. To me, it seems that while most can agree that using the right tool for the job is the right thing to do, this is only in theory.
|
|
|
|
|
Espen Harlinn wrote: Which brings me back to my question about how to argument, in this case, against
the use of XML
I would argue that the only reason to have the extra overhead (XML output in this case) is to provide interop with java or something and other than that is an overhead that can be optimised out, for speed/performance/sanity
|
|
|
|
|
Which should be obvious, but obviously isn't and that's part of my problem.
|
|
|
|
|
then you should wield your senior architect powers and make an executive decision that cant be over ruled except by the chairman of the board!!!
sounds like you have to justify to much detail to people that dont understand 
|
|
|
|
|
Which brings me back no why I'm missing a noXML movement - while I do find XML useful, it's not a silver bullet.
|
|
|
|
|
I hate XML for this reason. WAY too verbose. I have started to adopt YAML in my projects at work and am so far pretty happy with it.
|
|
|
|
|
Other responses suggest that issue of XML, is trivial compared to the other issue that you attempting to get sustained 100,000 transactions per second.
And your other response suggest that is a real need rather than just an optimal goal.
Consquently there are all sorts of problems that must be solved.
And using XML, especially with the ludicrous misuse you cited before is wrong. The protocol in such a case must have an extremely low overhead and XML does not fit that bill. That rate probably requires tweaking the network as well.
However if your requirements are otherwise then that additional information could be relevant.
|
|
|
|