**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 02:21

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 02:21

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

Feb 18, 2024, 02:21

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

is there a type of quasi-OCR already done that i can leverage to just recognize exact characters which are always rendered identically like this or do i need to get my paws dirty in the python mines

#programming #ocr #python

fe9641f41f9130fe.png

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 02:24

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 02:24

Feb 18, 2024, 02:24

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

no conventional OCR does not work on this at all

**David Zaslavsky** @diazona@techhub.social · Feb 18, 2024, 02:43

**David Zaslavsky** @diazona@techhub.social · Feb 18, 2024, 02:43

Feb 18, 2024, 02:43

David Zaslavsky @diazona@techhub.social

@mavica_again FWIW I have gone through the Python mines doing exactly this. Even with custom code I could not get better than roughly 99% accuracy, mostly because of lookalike characters. Every off-the-shelf OCR system I tried was much worse (no better than roughly 85% IIRC).

Honestly, unless you have some kind of checksum to verify the result, you might have to just do it manually. Either manually typing in the characters ("human OCR" 😛), or having some code do a first pass and manually reviewing the result.

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 03:06

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 03:06

Feb 18, 2024, 03:06

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

@diazona given that windows cleartype makes every single character pixel perfect identical and i've reduced it to 16 colors for even higher accuracy and even I and l are distinctive (dunno which one is which, but i can tell the two apart) i think i'll get 100%. there's no way i'm typing all of it out lol it's massive

**David Zaslavsky** @diazona@techhub.social · Feb 18, 2024, 03:17

**David Zaslavsky** @diazona@techhub.social · Feb 18, 2024, 03:17

Feb 18, 2024, 03:17

David Zaslavsky @diazona@techhub.social

@mavica_again Ahh I didn't realize the characters are identical to the pixel. That's a different story (from what I had); should be totally doable for you.

FWIW the approach I took started by separating the different character images from each other. My text was monospaced so I could use a grid approach, but that wouldn't work for you. I imagine that you could write some algorithm to maintain a "cursor" in the image and continually detect and advance past the next character to the right though.

Anyway, good luck 🙂 it'd make an interesting blog post if you figure it out!