is there a type of quasi-OCR already done that i can leverage to just recognize exact characters which are always rendered identically like this or do i need to get my paws dirty in the python mines
several hours later here's a writeup on how i did:
https://www.tumblr.com/maplesynth/742691202686730240
spoilers, i didn't recover the whole JPEG. but about half of it is intelligible
i've also posted it on my non-tumblr blog now: https://maple.pet/blog/solving-a-base64-mystery-nobody-asked-for
@mavica_again FWIW I have gone through the Python mines doing exactly this. Even with custom code I could not get better than roughly 99% accuracy, mostly because of lookalike characters. Every off-the-shelf OCR system I tried was much worse (no better than roughly 85% IIRC).
Honestly, unless you have some kind of checksum to verify the result, you might have to just do it manually. Either manually typing in the characters ("human OCR" 😛), or having some code do a first pass and manually reviewing the result.
@diazona given that windows cleartype makes every single character pixel perfect identical and i've reduced it to 16 colors for even higher accuracy and even I and l are distinctive (dunno which one is which, but i can tell the two apart) i think i'll get 100%. there's no way i'm typing all of it out lol it's massive
@mavica_again Ahh I didn't realize the characters are identical to the pixel. That's a different story (from what I had); should be totally doable for you.
FWIW the approach I took started by separating the different character images from each other. My text was monospaced so I could use a grid approach, but that wouldn't work for you. I imagine that you could write some algorithm to maintain a "cursor" in the image and continually detect and advance past the next character to the right though.
Anyway, good luck 🙂 it'd make an interesting blog post if you figure it out!
@diazona yeah that's pretty much how i thought about doing it, i just wanted to know if someone already had so i don't have to reinvent the wheel 😅
@mavica_again Gotcha, makes sense. I dunno, you might actually be the first!
Wiki data may know I think I remember seeing something similar on their discord…if you don’t find someone they may know of someone at least…
no conventional OCR does not work on this at all