Index PDF files content

devneo01 · April 20, 2018, 11:55am

Hello,

I’m running Pydio 7.0.4 on a Windows Server (AMPPS) and i’m looking for some help in order to index PDF files content in order for the users to search directly in those files.

My problem is that i don’t know how to install the packages for UNICONV + XPDF INTEGRATION mentioned in the Lucene Indexer documentation.

Also, it seems like the “Advanced Search” box doesn’t have the option to search directly in files.

Any advice will be much appreaciated !

zayn · April 24, 2018, 10:44am

Hi,
i dont really know how to do it on windows server but i’ve found some guides that could help you :

vicWeller · July 5, 2018, 3:07pm

If the content of your pdf file doesn’t index, so there’s some issue with the OCR layer then. It basically depends on what software have you used in order to create that document. Upload that to the very pdf editing tool you have under your belt, I use this one eg https://form-cd-401s.pdffiller.com/ because it’s enough for such a purpose and cost lesser than Acrobat and others. There you’ll be able to fix the issues if there will be some

Topic		Replies	Views
Basic Lucene Content Search configuration? Pydio 8	16	1679	April 6, 2018
Content search on pydio 8.01 Pydio 8	1	639	December 12, 2017
Unable preview office document in pydio web Pydio 8	1	551	May 11, 2018
Difficulty setting up pydio, imagemagick, unoconv Pydio 8	1	548	October 30, 2018
Fail loading PDF Files Pydio Cells	29	2261	May 24, 2019

Index PDF files content

Related topics