A colleague recently reported this user case : A Switch flow creates folders thanks to the “Archive Hierarchy” element. So far nothing unusual but something seems wrong. Two folders are created and they bear the exact same name! No OS on Earth would create folders with identical names. So what, our friend had just created a glitch in the matrix?
Once the surprise effect is gone, the answer is obvious…no! Despite all evidences, those folders have different names. Even if command line would let you think otherwise.
Your brain is playing tricks on you. You simply can’t have folders with identical names. Simple answer : those two elements simply do not have the same name! Ready for the 7 mistakes game ?
We know what you feel right now. It’s just a matter of trailing space characters… First folder name is “Jobs” and the second one is “Jobs ” (i.e. two extra spaces after Jobs). But here, nothing obvious?! To go further, you have to consider the content and the encoding of the file name.
Selecting the folders doesn’t reveal any extra spaces… as it seems.
Let’s have a look at the hexadecimal code from our two folder names. Then we can start seeing the difference. The first file is encoded as “457420766F696c61cc80” when the second as “457420766F696c61cc80e2808b”. At least one character slipped carelessly into our file name. Analyzing code “E2808B” reveals the plot. Indeed, that’s code for character “Zero-Width-Space” which basically means space without width.
That’s why the manual selection of folder names didn’t reveal anything. Operators can certainly go crazy with this. And the situation can become even more fun if you work with NAS servers and collaborators using different OS. Almost every OS uses Unicode those days but you can’t exclude that once you use a tool with a different encoding such as ANSI for instance.
Even simpliest characters as “e acute” (é) may surprise you. But before we go any further, one must already understand the difference between a glyph and a character. A glyph is an abstract form and can be all or part of a character or multiple characters (think of ligatures). For character “é”, we do not have one but two glyphs including the basic “e” and the accent character (diacritical).
Once that said, Unicode specifies that the accented character can be specified in its “assembled” (=>é) or precomposed (u0065u00B4 =>e + ́) form. In extenso, if a system generates a folder with an X encoding and another uses Y encoding, you might have two folders of the same name simply because their names will not be encoded the same way!
If you need to generate folders through automated systems and collaborators using different OS and tools, we can only advise you to avoid the use of accented characters. Let’s remember that computers were born in the land of ASCII.
If you keep continuing to see the occurrence of folders and files identically named, you should take a look at character encoding. Law is the law, no files or folders with identical names in a file system.
You tried everything but it keeps bugging you? contact us for a tailor-made study.
Photo by Markus Spiske on Unsplash