this post was submitted on 25 May 2026
1309 points (99.2% liked)
Programmer Humor
31576 readers
2048 users here now
Welcome to Programmer Humor!
This is a place where you can post jokes, memes, humor, etc. related to programming!
For sharing awful code theres also Programming Horror.
Rules
- Keep content in english
- No advertisements
- Posts must be related to programming or programmer topics
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Can this really not be fixed?
I still see this in various text that's meant to be readable.
Usually ampersands are the biggest culprit, but is it just a really sacred data type that can't be upgraded to include punctuation, but can include the foreign looking wingdings that try to stand in for it?
I'm just confused on why those characters have multi character reference names that aren't part of the regular alphabet or punctuation set either, but those still show up instead of having room to just remove the erroneous reference with the actual character.
It's 2026, just dig out this fossil and fix it already.
The characters are all in your own pc. The text data is actually just numbers, referencing the index of each character in a reference table.
Early on someone thought "let's create a bunch of different reference tables and each country uses the one that is best for them so we don't have to include every character in the world".
But that thinking has a critical problem: when you write some text that will only be read within the country, you don't need to keep track of which table you used because everyone will be using the same. Soon you forget that there are other tables for other countries so when you do send an international text using your table as a reference, the person on the other side will be parsing it using their own table and the resulting text will be different. And sometimes when this mixup happens, the index referenced by the text in the other table may actually be some internal control character that is not meant for rendering.
These days the problem is "mostly fixed" by the near-universal adoption of a single reference table that proposes including verything you may ever need (even a lot of emojis) - but this large table means that each character in a text may need more digits to represent the intended index so the total file size for the same text is larger than it would be with the non-universal table.
the wrong UTF encoding is usually the issue
That’s the joke. 😅