• 0

Turning a scanned pdf into a searchable document


Question

Howdy folks,

Great site you've got going here, really love the articles. I was wondering if someone could help with with a technical question here... I consider myself pretty savy, but no one knwos everything, and this is something I've never attempted before.

Basically, I've volunteered with a local organization to help scan some old newsletters. That's no problem, and I've got several done, saved as both seperate images and as pdfs. It's not a requirement, but I was told to OCR them if I could. The problem is, all that I understand about OCR is how to turn photographed text into actual 'plain' text - the kind you can copy and paste. It sounds to me like they want to make the actual PDFs searchable, for keywords and the like, which will also make it easier for them to create a master index to all of the old issues.

I'm using GIMP to crop the images, and I just turned them into a PDF with Open Office. I really don't want to buy any software for this, especially not anything expensive, so free solutions are definitely preferred. Does anyone have any idea how I might go about accomplishing this?

Thanks so much!

Link to comment
Share on other sites

5 answers to this question

Recommended Posts

  • 0

You could try omniformat.

It does have an OCR module, not tried it, so don't know how good it is.

I do OCR as part of my job, and, dependent on quantity, I can help you out. PM me if omniformat isn't good enough.

Link to comment
Share on other sites

  • 0

You could try omniformat.

It does have an OCR module, not tried it, so don't know how good it is.

I do OCR as part of my job, and, dependent on quantity, I can help you out. PM me if omniformat isn't good enough.

Alright man, thanks, I'm totally new to this. I've been playing around with stuff but so far all I've been able to do is get the PDF turned into a .txt document - what I need to do is make the PDF itself searchable (I mean, press f and enter a keyword). Will this accomplish that?

Link to comment
Share on other sites

  • 0

I probably sound like a total idiot here, but I still can't figure it out :/ Like I said, I volunteered to scan, not OCR, but it'll look nicer if I do it all, heh. But with the omniformat, all I've been able to figure out how to do is convert images to pdf, or pdf to images, and even pdf to text. How do I set it up to insert the searchable text to the PDF?

Link to comment
Share on other sites

This topic is now closed to further replies.