so, you've seen ™ and ™️ before. but like. why are there two. well, i have an explanation! the answer is: FE0F

first, unicode. unicode is a standard definition of a bunch of codepoints, where a codepoint is just a number with meaning. for example, unicode codepoint
U+263A refers to ☺︎, or "White Smiling Face", and U+1F431 refers to 🐱, or "Cat Face"

so, lets start by looking at the codepoints for ™. decoding it, it becomes the codepoint
U+2122, referred to as "Trade Mark Sign". this was added in unicode 1.1 in 1993, a decent time ago!

next, the codepoints for
™️. decoding it, we get two codepoints! U+2122 (™︎) and U+FE0F. wait. who is FE0F. why is he in my emoji

well, unicode isn't as simple as a series of codepoints that refer to single characters. take a look at
é̗ for example. this is three codepoints, U+0065 (Latin Small Letter E), U+0301 (Combining Acute Accent), and U+0317 (Combining Acute Accent Below). the first codepoint is simple enough, it's just e. the next two, however, are combining codepoints. this means that they combine with the codepoint before them to modify it. U+0301 adds an acute accent above the previous codepoint, and U+0317 adds an acute accent below the previous codepoint. this example specifically isn't very useful (i don't know any language with a é̗ character beyond conlangs), but it becomes very useful for languages that use a lot of diacritics. imagine if we had to make a new set of characters for each set of possible diacritics! big waste of space, we shouldn't have done that!

so, what is
U+FE0F? well, it's a special codepoint called "Variation Selector-16". variation selectors are a reserved block of 16 unicode codepoints. only some have been defined, but among those currently in use are U+FE0E (VS15) and U+FE0F (VS16). from wikipedia: "VS15 and VS16 are reserved to request that a character should be displayed as text or as an emoji respectively." so, what's happening with ™️ is that it's combining a U+2122 (™) and a U+FE0F (Variant Selector-16) to create an emoji version of ™. they're the same character, just that one has been instructed to become an emoji!


also, for the interested, here's the word "unicode" with a shit ton of combining characters: ù́̂̃̄̅̆̇̈̉n̖̗̘̙̐̑̒̓̔̕i̡̢̧̨̠̣̤̥̦̩c̴̵̶̷̸̰̱̲̳̹ò͇͈͉́͂̓̈́͆ͅd͓͔͕͖͙͐͑͒͗͘eͣͤͥͦͧͨͩ͢͠͡. what appears to be seven letters is actually 77 codepoints, taking up 147 bytes when encoded in utf-8. or 156 in utf-16. or 312 in utf-32. why does anyone use utf-16 if it's longer? historical reasons :3

TL;DR:
™️ is ™︎ but instructed to be an emoji

@mia legitimately fascinating. thanks for sharing! :3

Sign in to participate in the conversation
Computer Fairies

Computer Fairies is a Mastodon instance that aims to be as queer, friendly and furry as possible. We welcome all kinds of computer fairies!