|
I have an app that runs on multiple (Windows, iOS, Android - as both a native app or as a web app).
JSON is a nice lingua franca -- I thought.
You see my app has simple string-based keys the user adds to keep track of her sites.
Since I'm not sure what the user might type, I go ahead and encode the keys to Base64.
Normally you see some Base64 encoded keys which look like:
c3VwZXJzaXRl
c2Vjb25kU2l0ZQ==
dGhyZWU=
Serialize Object As JSON
In my web apps and on Windows I serialize all the user's sites as objects via JSON and it works great and looks like:
[{"HasSpecialChars":false,"HasUpperCase":false,"Key":"c3VwZXJzaXRl","MaxLength":0},{"HasSpecialChars":false,"HasUpperCase":false,"Key":"dGhyZWU=","MaxLength":0},{"HasSpecialChars":false,"HasUpperCase":false,"Key":"c2Vjb25kU2l0ZQ==","MaxLength":0},{"HasSpecialChars":false,"HasUpperCase":false,"Key":"eWV0QW5vdGhlcg==","MaxLength":0}]
Of course on web sites (JavaScript), its the old JSON.stringify(allSiteKeys) that handles that so nicely.
And on Windows I've always used NewtonSoft libraries and it all works great. All interchangeable.
Android Gson - Google's JSON
Enter the problem.
However, while developing on Android I wanted to serialize the data the same way so I turned to Gson which is Google's official Android way of doing this.
However, I noticed that values which were Base64 encoded properly were output to JSON in an interesting way. Every equal sign was altered to \u003d which is the unicode equals sign[^].
That means my JSON would be altered to look like:
[{"HasSpecialChars":false,"HasUpperCase":false,"Key":"c3VwZXJzaXRl","MaxLength":0},{"HasSpecialChars":false,"HasUpperCase":false,"Key":"dGhyZWU\u003d","MaxLength":0},{"HasSpecialChars":false,"HasUpperCase":false,"Key":"c2Vjb25kU2l0ZQ\u003d\u003d","MaxLength":0},{"HasSpecialChars":false,"HasUpperCase":false,"Key":"eWV0QW5vdGhlcg\u003d\u003d","MaxLength":0}]
NOTE: This is not a case of oddly encoded Base64 -- if I look directly at the Base64 encoded data it still has the equals signs. This only occurred when the data was serialized to JSON.
Searching For An Answer
It wasn't easy to find the answer, because a Google developer explains it this way[^]:
Google dev = is a special javascript character that must be escaped to unicode in
JSON so that a string literal can be embedded in XHTML without further
escaping.
XHTML? Oh, are we back in the year 2000?
It made no sense to me and others who were like, "Uh, JSON is a format for transmission. Why would I care if there is an equals sign in the data? It's just bytes."
It is quite difficult to find docs on the Gson library. Not great stuff.
But, finally, I found a stackoverflow answer[^] that mentions that you have to use the Gson builder to create the initial Gson object, and when you do you have to turn off html escaping...
Gson gson = new GsonBuilder().disableHtmlEscaping().create();
Normally, that would just looke like:
Gson gson = new Gson();
Everything else is the same so when you serialize it you still call the same code:
String jsonSiteKeys = gson.toJson(allSiteKeys);
The GsonBuilder() is almost like magic, because how would you ever know it is there?
Different Dialect Which Includes HTML Escaping
The final point is "Why would the dev think that the normal case is to include HTML escaping when this is a transmit format?" Why not think that is the special case? Especially since other libraries handle it without the escaping.
All That For Compatible JSON?
That is a sick amount of time just to get JSON in a compatible format.
In the future when AI takes over all development work, how will it handle this? It will not allow anyone else to write JSON libraries.
modified 9-Dec-19 17:03pm.
|
|
|
|
|
|
Thanks for the link, I will check it out.
|
|
|
|
|
|
|
I once had a similar issue with a GUID.
Apparently there are different ways to look at a GUID
I'm not really sure if it was anymore, some encoding issue or whatever, but two applications showed the same GUID differently
|
|
|
|
|
Sander Rossel wrote: I'm not really sure if it was anymore, some encoding issue or whatever, but two applications showed the same GUID differently
If you look at Create GUID (Visual Studio Tools) - snapshot[^] you can generate GUIDs like:
All the same value, but different formats.
1. <guid("0c2d39cd-f486-435b-bdba-64a124d04d31")>
2. [Guid("0C2D39CD-F486-435B-BDBA-64A124D04D31")]
3. {0C2D39CD-F486-435B-BDBA-64A124D04D31}
4. // {0C2D39CD-F486-435B-BDBA-64A124D04D31}
static const GUID <<name>> =
{ 0xc2d39cd, 0xf486, 0x435b, { 0xbd, 0xba, 0x64, 0xa1, 0x24, 0xd0, 0x4d, 0x31 } };
5. // {0C2D39CD-F486-435B-BDBA-64A124D04D31}
DEFINE_GUID(<<name>>,
0xc2d// {0C2D39CD-F486-435B-BDBA-64A124D04D31}
6. IMPLEMENT_OLECREATE(<<class>>, <<external_name>>,
0xc2d39cd, 0xf486, 0x435b, 0xbd, 0xba, 0x64, 0xa1, 0x24, 0xd0, 0x4d, 0x31);
39cd, 0xf486, 0x435b, 0xbd, 0xba, 0x64, 0xa1, 0x24, 0xd0, 0x4d, 0x31);
|
|
|
|
|
Yeah, but this was really like one tool showing "abc" while another showed "123" and they still somehow had the same underlying value
I remember this was a thing on MongoDB, so I just installed Robo 3T just to look at the options and it's Legacy UUID encoding (do not decode or use Java/.NET/Python encoding).
A coworker used another tool with another encoding and we were looking at the same document, but different values
|
|
|
|
|
Sander Rossel wrote: Yeah, but this was really like one tool showing "abc" while another showed "123"
That is interesting and painful. Very similar to my problem.
|
|
|
|
|
Mostly painful
|
|
|
|
|
Oh no my friend, it gets far worse than that A GUID is simply an 128 bit number in 5 chunks of 32-16-16-16-48. GUIDs like {0C2D39CD-F486-435B-BDBA-64A124D04D31} are when the first chunk of 32 bits is converted into hex, then the next chunk and so on, but when you use some dumb-ass third-party system that stores the bytes in a different order *cough*mongodb*cough* then the string representation of the same data can look different in two different systems despite being equal. I know this is more of a UUID issue than a GUID one, but it shows the kind of nightmares you come across when you stray from the Microsoft path.
|
|
|
|
|
Quote: when you stray from the Microsoft path Stay on the path! Mirkwoodicrosoft has many hidden dangers!
- I would love to change the world, but they won’t give me the source code.
|
|
|
|
|
+1 for the {4C 4F 54 52} reference
|
|
|
|
|
F-ES Sitecore wrote: A GUID is simply an 128 bit number in 5 chunks of 32-16-16-16-48. GUIDs like {0C2D39CD-F486-435B-BDBA-64A124D04D31} are when the first chunk of 32 bits is converted into hex, then the next chunk and so on, but when you use some dumb-ass third-party system that stores the bytes in a different order *cough*mongodb*cough* then the string representation of the same data can look different in two different systems despite being equal.
Oh that is really terrible. You'd think they'd mention Big-Endian v. Little-Endian or some such.
|
|
|
|
|
Sander Rossel wrote: different ways to look at a GUID Yes, there are "case-sensitive" Guids. I guess it was project ids in a Visual Studio solution file...
Oh sanctissimi Wilhelmus, Theodorus, et Fredericus!
|
|
|
|
|
A good reason to ALWAYS add your own abstraction layer on top of ANY API that you use.
It is a lot easier to do it up front then try to retrofit it later.
This keeps your dependencies well documented as well.
String jsonSiteKeys = MyLib.json.toJson(allSiteKeys);
Or maybe better
String jsonSiteKeys = MyLib.toJson(allSiteKeys);
|
|
|
|
|
The bigger issue is that the difference in encoding doesn't make any difference.
According to the JSON standard this encoding is valid and will make no difference to the deserialization operation. In fact, all the characters could be encoded in this way, and it should still work.
Mike...
|
|
|
|
|
As if you need another reason.
But here is a good one that maybe you've seen before.
var extra = null;
extra += 'a' + 'b' + 'c';
extra += "abc";
var numberTest = null;
numberTest += 55 + 33 + 38;
Here's a snapshot (via imgur) of the FireFox dev console[^] in case you find it difficult to believe.
I've been using JS for quite some time and try to stay up on these types of things but this one bit me today. Really quite annoying.
I guess it pre-decides that it's a string so it figures it should convert the null object to the null string. I'm so glad it does that!!!
Really a PEBKAC error but still quite annoying.
|
|
|
|
|
raddevus wrote: I guess it pre-decides that it's a string so it figures it should convert the null object to the null string.
Nope - it "decides" when it executes the addition operator.
JavaScript Addition Operator in Details[^]
If one operand is a string , the other is converted to a string and the values are concatenated. String(null) === "null"
Otherwise, they are converted to numbers and added. Number(null) === 0
And yes, it sucks. Just be grateful it doesn't borrow VB's habit of coercing strings that look like numbers into numbers, adding them together, and then converting the result back to a string.
"These people looked deep within my soul and assigned me a number based on the order in which I joined."
- Homer
|
|
|
|
|
Richard Deeming wrote: Nope - it "decides" when it executes the addition operator.
Thanks for the info. It's a very good point and I should've thought of it myself -- that it occurs when the concatenation is done.
|
|
|
|
|
|
if you learn the rules of coercion you may love the language.
it's nothing ambiguous. you will become more expressive, but your expressions may look like gibberish to people coming from more strict languages.
on the other hand if you too are used to stricter language, didn't if bother you that you are assigning null to extra and then adding characters or integers to it?
don't you fell more natural to define extra as "" if you add strings to it or as 0 if you add integers to it?
|
|
|
|
|
sickfile wrote: didn't if bother you that you are assigning null to extra and then adding characters or integers to it?
That's actually a good point and good thinking.
I actually do like JavaScript, I just find that I get bit by these things and there are a lot of esoteric things to know about the language that cause my brain to burn more calories than I probably should have to. But that is the natural laziness that all we devs have I suppose.
The real thing that bothers me about JavaScript is the fact that there are two values which are so similar but are different:
undefined
null
I could just use one or the other really.
But then I become unusre at times and think, "hmm...is JS going to think this is undefined at this point? Is it going to think it is null?" then additionally in the case of a string I have to wonder if it is an empty string too (and there is no String.Empty to compare to so I have to use double-quotes "").
These are the things that are just bothersome. But they are bothersome because I cannot concentrate on JavaScript as much as I'd like too, since I'm an attempting to create solutions in software -- not just understand pedantic syntax.
I think that is why at times, I finally just have to rant about JS.
|
|
|
|
|
that's ok, we all rant here and there.
null is an object. it has it's purposes. it just takes time to make a significant difference between null and undefined .
not so long ago i have stumbled upon a situation at work where a coworker had decided to write the absence of an object in the database simply as 0 , instead of null .
all i can tell you is, and i remembered well, that the logic i had to use, because the field was either an object or 0 in it's absence, was like 3 boolean expressions concatenated with || or &&. if the absence of the object was marked with null i would have to use only one boolean expression. even tho if (0) and if (null) yield the same result.
there are many articles on the net that explain the differences between: 0, "", undefined, null, ... etc. and the subject can be sometimes so confusing that you would have to go through the comments of the article to figure it out and i'm talking about the best articles out there. no good text will only confuse a person.
|
|
|
|
|
raddevus wrote: there are two values which are so similar but are different:
undefined
null
or three. Don't forget
void 0
|
|
|
|
|