Click here to Skip to main content
15,900,906 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have an Array of Array of Strings i.e. Array[Array[String]]

The strings are folder names in a filesystem and about 100k unique folder names are possible.

So the data structure will look like:

Array[ <- Outer Array which is not sorted, 100 million in length
       Array [/ABC/DEF,/XYZ/YTR,.......] <- This inner array is sorted on folder names
       Array [/CDE/FRT,/TUV/HYT,........] <- Want to generate a shorter unique id for each one of this
    ]

For each one the array of folder names in the outer array I want to generate a unique id. I know that simple hashing etc. of Strings will lead to collisions and hence isn't safe. But I was wondering if there was any way to exploit the fact that the inner array is sorted to generate a hashing algorithm. I can go upto a 500 character string. Any Java/Scala library that does this? Assume I can't do a groupBy etc. on this dataset.

What I have tried:

Did some research on the internet.
Posted
Updated 25-Mar-18 21:05pm
Comments
Venkat Dabri 26-Mar-18 12:20pm    
Can I use a SHA-512 algorithm for this?

1 solution

The index of every entry in the inner array is unique. If you concatenate it with the index in the outer array then your final id is globally unique. Or am I missing something?

By the way, hash is hash collisions are inevitable. You need another technique if uniqueness is a requirement.
 
Share this answer
 
Comments
Venkat Dabri 26-Mar-18 11:33am    
No my requirement is that same elements in the outer array have the same unique id

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900