morto

joined 1 week ago
[–] morto@piefed.social 5 points 1 week ago (4 children)

and doesn't need to be exactly right

What kind of tasks do you consider that don't need to be exactly right?

[–] morto@piefed.social 3 points 1 week ago

I'm not sure if it would be viable for a long book, and I'm also avoiding google, but thanks for helping. I got some nice suggestions in this thread.

[–] morto@piefed.social 3 points 1 week ago

Well, I'm avoiding google, but I will keep it in mind as a last last resort, thanks

[–] morto@piefed.social 3 points 1 week ago

I'm giving preference to open source tools, but that's a good thing to know, thanks

[–] morto@piefed.social 3 points 1 week ago

Thanks for the suggestions. That OCR_translate looks interesting. I will prioritize other recommended tools that seem to be more focused on books, but I bookmarked it for future needs.

[–] morto@piefed.social 3 points 1 week ago (1 children)

I used tesseract, but the output pdf didn't have visible text, and I found no way to change it. Maybe I don't know how to properly use it., or it's not intended to keep formatting.

[–] morto@piefed.social 4 points 1 week ago (1 children)

That PaddleOCR looks very interesting. It will even extract images and formulas and somewhat preserve formatting in the output! I will try this one, even if takes more than a day to process is with my low end cpu. Thank you for the suggestion!

 

Situation: I got a scanned book that I'd like to read that is in chinese and has no available translation. I really want to read it, because it would probably help a lot with my university project.

What I tried: tried creating a version with ocr to get a text layer and use some translation tool on it, but found no way to make the ocr text visible. I also tried this tool, but the ocr didn't work for me, and I found no way to use it with some local model

Have any of you ever done a similar task? I'd appreciate any kind of suggestions and tips.