Applying this patch: cant index any pdf file any more #1

maxodoble · 2018-04-06T10:57:31Z

Hi,
i tried this patch on a test repo of Alfresco 201707GA.

High CPU usage is gone for the problematic test pdf page, but now no new pdf get's indexed any more:
Log shows:

2018-04-06 12:50:24,851 WARN [content.metadata.AbstractMappingMetadataExtracter] [catalina-exec-35] Metadata extraction failed (turn on DEBUG for full error): Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@24dfb72e Content: ContentAccessor[ contentUrl=store://2018/4/6/12/50/908747cc-a822-418a-87cb-4e79d8130a5f.bin, mimetype=application/pdf, size=364282, encoding=UTF-8, locale=en_US] Failure: org/apache/tika/parser/pdf/PDF2XHTMLnull 2018-04-06 12:50:45,094 DEBUG [content.metadata.MetadataExtracterConfigImpl] [catalina-exec-50] Tika metadata options passed to tika parser: TIKA_PARSER_PARSE_SHAPES=false
when i remove your patch, new pdf files are getting indexed again o.k.

@angelborroy any idea why this is happening?
Cheers,

Max

The text was updated successfully, but these errors were encountered:

angelborroy-ks · 2018-04-06T11:02:17Z

Tika version is different for 201707-GA, probably a different patch is required as this patch is developed for 201605-GA.

I know that this issue has been solved by Alfresco itself for 201803-EA, but I don't know when a new "GA" is available.

sumitt · 2023-12-28T07:15:24Z

Hi @angelborroy-ks @maxodoble,

I am also facing the same problem on Alfresco 201707GA after applying the patch that no pdf content is getting indexed. PDFs are getting searched only through metadata.

Is there any solution? @maxodoble Have you found anything to resolve content indexing?

Best regards,
Sumit Tomar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying this patch: cant index any pdf file any more #1

Applying this patch: cant index any pdf file any more #1

maxodoble commented Apr 6, 2018 •

edited

Loading

angelborroy-ks commented Apr 6, 2018

sumitt commented Dec 28, 2023

Applying this patch: cant index any pdf file any more #1

Applying this patch: cant index any pdf file any more #1

Comments

maxodoble commented Apr 6, 2018 • edited Loading

angelborroy-ks commented Apr 6, 2018

sumitt commented Dec 28, 2023

maxodoble commented Apr 6, 2018 •

edited

Loading