**Josh Jersild** @JoshJers@peoplemaking.games · May 23, 2023, 21:47

**Josh Jersild** @JoshJers@peoplemaking.games · May 23, 2023, 21:47

Josh Jersild @JoshJers@peoplemaking.games

May 23, 2023, 21:47

Josh Jersild @JoshJers@peoplemaking.games

Question for masses: What's a programming language that you've seen that handles text strings really well? Specifically, great support for Unicode (i.e. cleanly handles the distinction between code points and code units, stuff like that)? Is there anything out there that you think fits that bill?

**Eniko (moved ➡ gamedev.place)** @eniko@peoplemaking.games · May 23, 2023, 22:23

**Eniko (moved ➡ gamedev.place)** @eniko@peoplemaking.games · May 23, 2023, 22:23

May 23, 2023, 22:23

Eniko (moved ➡ gamedev.place) @eniko@peoplemaking.games

@JoshJers the problem is that there's no such thing as a unicode character, especially now that you have joining characters for emoji that can join an arbitrary number of different emojis into a new one. Having researched this the best idea I found is utf8 everywhere and then you can iterate over the code points. But really I think no language solves this because it's not strictly solvable

**Eniko (moved ➡ gamedev.place)** @eniko@peoplemaking.games · May 23, 2023, 22:25

**Eniko (moved ➡ gamedev.place)** @eniko@peoplemaking.games · May 23, 2023, 22:25

May 23, 2023, 22:25

Eniko (moved ➡ gamedev.place) @eniko@peoplemaking.games

@JoshJers I did write iterators over code points in C# (forward and backward) but that still doesn't give you actual characters so *farting noises*

**Josh Jersild** @JoshJers@peoplemaking.games · May 23, 2023, 22:29

**Josh Jersild** @JoshJers@peoplemaking.games · May 23, 2023, 22:29

May 23, 2023, 22:29

Josh Jersild @JoshJers@peoplemaking.games

@eniko yeah and there's not even a really great way to HANDLE that without, like, "here's an array of utf-32 chars that represent the whole visible thing, lol have fun" at every step

At work I wrote the code that does canonical string comparison (i.e. if you have é written as e + accent or as a single character, they'll compare as identical) and it was a nightmare of tables and clever table compression

**Josh Jersild** @JoshJers@peoplemaking.games · May 23, 2023, 22:32

**Josh Jersild** @JoshJers@peoplemaking.games · May 23, 2023, 22:32

May 23, 2023, 22:32

Josh Jersild @JoshJers@peoplemaking.games

@eniko (actually first became aware of how annoying the problem is because @AbuhRae was working on a thing where she had a Korean filename (Korean can be represented in multiple ways), and one OS was dealing with filenames in composed mode and another was dealing with filenames in decomposed mode and god help you transferring files from one to the next)

**AbuhRae** @AbuhRae@computerfairi.es · 2023-05-25T15:01:24Z

AbuhRae @AbuhRae@computerfairi.es

@JoshJers @eniko God, I remember this happening and driving me insane, but I’ve completely lost the context 😅

May 25, 2023, 15:01 · · · ·

Resources

Developers

What is Mastodon?

computerfairi.es

More…