Follow

is there a type of quasi-OCR already done that i can leverage to just recognize exact characters which are always rendered identically like this or do i need to get my paws dirty in the python mines

several hours later here's a writeup on how i did:

tumblr.com/maplesynth/74269120

spoilers, i didn't recover the whole JPEG. but about half of it is intelligible

@mavica_again FWIW I have gone through the Python mines doing exactly this. Even with custom code I could not get better than roughly 99% accuracy, mostly because of lookalike characters. Every off-the-shelf OCR system I tried was much worse (no better than roughly 85% IIRC).

Honestly, unless you have some kind of checksum to verify the result, you might have to just do it manually. Either manually typing in the characters ("human OCR" 😛), or having some code do a first pass and manually reviewing the result.

@diazona given that windows cleartype makes every single character pixel perfect identical and i've reduced it to 16 colors for even higher accuracy and even I and l are distinctive (dunno which one is which, but i can tell the two apart) i think i'll get 100%. there's no way i'm typing all of it out lol it's massive

@mavica_again Ahh I didn't realize the characters are identical to the pixel. That's a different story (from what I had); should be totally doable for you.

FWIW the approach I took started by separating the different character images from each other. My text was monospaced so I could use a grid approach, but that wouldn't work for you. I imagine that you could write some algorithm to maintain a "cursor" in the image and continually detect and advance past the next character to the right though.

Anyway, good luck 🙂 it'd make an interesting blog post if you figure it out!

@diazona yeah that's pretty much how i thought about doing it, i just wanted to know if someone already had so i don't have to reinvent the wheel 😅

@mavica_again Gotcha, makes sense. I dunno, you might actually be the first!

@mavica_again @diazona

Wiki data may know I think I remember seeing something similar on their discord…if you don’t find someone they may know of someone at least…

Sign in to participate in the conversation
Computer Fairies

Computer Fairies is a Mastodon instance that aims to be as queer, friendly and furry as possible. We welcome all kinds of computer fairies!