You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
i tried this patch on a test repo of Alfresco 201707GA.
High CPU usage is gone for the problematic test pdf page, but now no new pdf get's indexed any more:
Log shows:
2018-04-06 12:50:24,851 WARN [content.metadata.AbstractMappingMetadataExtracter] [catalina-exec-35] Metadata extraction failed (turn on DEBUG for full error): Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@24dfb72e Content: ContentAccessor[ contentUrl=store://2018/4/6/12/50/908747cc-a822-418a-87cb-4e79d8130a5f.bin, mimetype=application/pdf, size=364282, encoding=UTF-8, locale=en_US] Failure: org/apache/tika/parser/pdf/PDF2XHTMLnull 2018-04-06 12:50:45,094 DEBUG [content.metadata.MetadataExtracterConfigImpl] [catalina-exec-50] Tika metadata options passed to tika parser: TIKA_PARSER_PARSE_SHAPES=false
when i remove your patch, new pdf files are getting indexed again o.k.
@angelborroy any idea why this is happening?
Cheers,
Max
The text was updated successfully, but these errors were encountered:
I am also facing the same problem on Alfresco 201707GA after applying the patch that no pdf content is getting indexed. PDFs are getting searched only through metadata.
Is there any solution? @maxodoble Have you found anything to resolve content indexing?
Hi,
i tried this patch on a test repo of Alfresco 201707GA.
High CPU usage is gone for the problematic test pdf page, but now no new pdf get's indexed any more:
Log shows:
2018-04-06 12:50:24,851 WARN [content.metadata.AbstractMappingMetadataExtracter] [catalina-exec-35] Metadata extraction failed (turn on DEBUG for full error): Extracter: org.alfresco.repo.content.metadata.PdfBoxMetadataExtracter@24dfb72e Content: ContentAccessor[ contentUrl=store://2018/4/6/12/50/908747cc-a822-418a-87cb-4e79d8130a5f.bin, mimetype=application/pdf, size=364282, encoding=UTF-8, locale=en_US] Failure: org/apache/tika/parser/pdf/PDF2XHTMLnull 2018-04-06 12:50:45,094 DEBUG [content.metadata.MetadataExtracterConfigImpl] [catalina-exec-50] Tika metadata options passed to tika parser: TIKA_PARSER_PARSE_SHAPES=false
when i remove your patch, new pdf files are getting indexed again o.k.
@angelborroy any idea why this is happening?
Cheers,
Max
The text was updated successfully, but these errors were encountered: