**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 02:21

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 02:21

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

Feb 18, 2024, 02:21

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

is there a type of quasi-OCR already done that i can leverage to just recognize exact characters which are always rendered identically like this or do i need to get my paws dirty in the python mines

#programming #ocr #python

fe9641f41f9130fe.png

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · 2024-02-18T02:24:09Z

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

no conventional OCR does not work on this at all

Feb 18, 2024, 02:24 · · · ·

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 06:26

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 06:26

Feb 18, 2024, 06:26

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

sickos voice after a few hours of python yes ha ha yes

3ea1bdfc250b858a.png

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 06:26

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 06:26

Feb 18, 2024, 06:26

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

now i just need to sample the whole fuckin base64 alphabet

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 19:59

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 19:59

Feb 18, 2024, 19:59

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

several hours later here's a writeup on how i did:

https://www.tumblr.com/maplesynth/742691202686730240

spoilers, i didn't recover the whole JPEG. but about half of it is intelligible

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 23:03

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 23:03

Feb 18, 2024, 23:03

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

i've also posted it on my non-tumblr blog now: https://maple.pet/blog/solving-a-base64-mystery-nobody-asked-for

**David Zaslavsky** @diazona@techhub.social · Feb 18, 2024, 02:43

**David Zaslavsky** @diazona@techhub.social · Feb 18, 2024, 02:43

Feb 18, 2024, 02:43

David Zaslavsky @diazona@techhub.social

@mavica_again FWIW I have gone through the Python mines doing exactly this. Even with custom code I could not get better than roughly 99% accuracy, mostly because of lookalike characters. Every off-the-shelf OCR system I tried was much worse (no better than roughly 85% IIRC).

Honestly, unless you have some kind of checksum to verify the result, you might have to just do it manually. Either manually typing in the characters ("human OCR" 😛), or having some code do a first pass and manually reviewing the result.

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 03:06

**maple "mavica" syrup [6502]** @mavica_again@computerfairi.es · Feb 18, 2024, 03:06

Feb 18, 2024, 03:06

maple "mavica" syrup [6502] @mavica_again@computerfairi.es

@diazona given that windows cleartype makes every single character pixel perfect identical and i've reduced it to 16 colors for even higher accuracy and even I and l are distinctive (dunno which one is which, but i can tell the two apart) i think i'll get 100%. there's no way i'm typing all of it out lol it's massive

**David Zaslavsky** @diazona@techhub.social · Feb 18, 2024, 03:17

**David Zaslavsky** @diazona@techhub.social · Feb 18, 2024, 03:17

Feb 18, 2024, 03:17

David Zaslavsky @diazona@techhub.social

@mavica_again Ahh I didn't realize the characters are identical to the pixel. That's a different story (from what I had); should be totally doable for you.

FWIW the approach I took started by separating the different character images from each other. My text was monospaced so I could use a grid approach, but that wouldn't work for you. I imagine that you could write some algorithm to maintain a "cursor" in the image and continually detect and advance past the next character to the right though.

Anyway, good luck 🙂 it'd make an interesting blog post if you figure it out!