Question for masses: What's a programming language that you've seen that handles text strings really well? Specifically, great support for Unicode (i.e. cleanly handles the distinction between code points and code units, stuff like that)? Is there anything out there that you think fits that bill?

@JoshJers the problem is that there's no such thing as a unicode character, especially now that you have joining characters for emoji that can join an arbitrary number of different emojis into a new one. Having researched this the best idea I found is utf8 everywhere and then you can iterate over the code points. But really I think no language solves this because it's not strictly solvable

@JoshJers I did write iterators over code points in C# (forward and backward) but that still doesn't give you actual characters so *farting noises*

@eniko yeah and there's not even a really great way to HANDLE that without, like, "here's an array of utf-32 chars that represent the whole visible thing, lol have fun" at every step

At work I wrote the code that does canonical string comparison (i.e. if you have é written as e + accent or as a single character, they'll compare as identical) and it was a nightmare of tables and clever table compression

@eniko (actually first became aware of how annoying the problem is because @AbuhRae was working on a thing where she had a Korean filename (Korean can be represented in multiple ways), and one OS was dealing with filenames in composed mode and another was dealing with filenames in decomposed mode and god help you transferring files from one to the next)

Follow

@JoshJers @eniko God, I remember this happening and driving me insane, but I’ve completely lost the context 😅

Sign in to participate in the conversation
Computer Fairies

Computer Fairies is a Mastodon instance that aims to be as queer, friendly and furry as possible. We welcome all kinds of computer fairies!