Index PDF files content


#1

Hello,

I’m running Pydio 7.0.4 on a Windows Server (AMPPS) and i’m looking for some help in order to index PDF files content in order for the users to search directly in those files.

My problem is that i don’t know how to install the packages for UNICONV + XPDF INTEGRATION mentioned in the Lucene Indexer documentation.

Also, it seems like the “Advanced Search” box doesn’t have the option to search directly in files.

Any advice will be much appreaciated !


#2

Hi,
i dont really know how to do it on windows server but i’ve found some guides that could help you :


#3

If the content of your pdf file doesn’t index, so there’s some issue with the OCR layer then. It basically depends on what software have you used in order to create that document. Upload that to the very pdf editing tool you have under your belt, I use this one eg https://form-cd-401s.pdffiller.com/ because it’s enough for such a purpose and cost lesser than Acrobat and others. There you’ll be able to fix the issues if there will be some