The case of curious characters

We stumbled upon an interesting issue the other day at work. A was working on adding search feature to a table/grid of cells, consisting of multiple string and date fields. The implementation used a client side search utility provided by our client SDK, which internally used tries for indexing and searching on string tokens.

The feature was working well overall, but strangely the search was not working on date fields in IE (v11). It worked fine on other browsers. So if you searched for a string like ‘3/23’, it would work in Chrome and Firefox, but not in IE o_O

What is special about these date fields, we wondered, that makes this issue specific to IE? On a closer look, we found that the trie wasn’t getting properly constructed in IE. We jumped into the client SDK code, looked around but did not find anything suspicious, we also tried a bunch of other things like changing system date format, trying to enter date in a string field in another column and searching on it but the results didn’t really provide any clues.

V then stepped in and looked at the part of SDK where the trie was getting built and found that it somehow was failing to add date fields to the index. On debugging further, V found that the date string contained characters we had not expected, it wasn’t a usual string, it had stuff in it that was failing the trie construction.

How were the date fields getting added to the index?

Our implementation was calling toLocaleDateString() on the Date object and passing the string off to the search utility to build the index. It turns out that toLocaleDateString() were returning different values in Chrome vs in IE. Here’s a small piece of code that demonstrates this,

If you run this code in Chrome and IE, this is what you’ll see.

Chrome

IE (v11)

What? Chrome reports length of the string as 8, which is what you’d expect. IE has got its own characters to the party. Turns out IE adds Left-to-Right markers in the string. The value 0x200E is the unicode code point for Left-to-Right mark. This marker gets added before every token in the date string, thus adding 5 characters to the string length.

The answer on this stackoverflow thread sums up the issue nicely,

Any of the output of toLocaleString, toLocaleDateString, or toLocaleTimeString are meant for human-readable display only

If the intent is anything other than to display to the user, then you should use one of these functions instead:

toISOString will give you an ISO8601/RFC3339 formatted timestamp
toGMTString or toUTCString will give you an RFC822/RFC1123 formatted timestamp
getTime will give you an integer Unix Timestamp with millisecond precision

The case of curious characters

One thought on “The case of curious characters”

Leave a Reply Cancel reply