Skip to content

Releases: weblyzard/inscriptis

Integrated feedback obtained through the Journal of Open Source Software review process

11 Oct 14:11
Compare
Choose a tag to compare
  • improved documentation based on feedback provided by @reality, @rlskoeser and @sbenthall as part of the Journal of Open Source Software review process.
  • the Inscriptis web service has been included into the Python package and can now be started with
     export FLASK_APP="inscriptis.service.web"
     python3 -m flask run

Improved document model, parsing of borderline cases & HTML annotation support

12 Jul 08:48
5e5fcc3
Compare
Choose a tag to compare

Changes

HTML parsing:

  • new: improved model for handling text blocks and lines
  • chg: improved HTML parsing of tables, enumerations and margins; fixed borderline cases
  • chg: improved whitespace handling
  • add: cover more borderline cases with unit tests

Inscriptis core:

  • new: annotation support
  • new: processing of annotation rules and annotation output
  • new: type hints
  • add: extended and improved documentation

Inscript command line client:

  • new: added --annotation-rules option for annotation support.
  • new: added --post-processor option to export and visualize annotations (HTML, XML and surface form export)
  • chg: apply --encoding to Web URLs as well

Misc:

  • chg: migrated to the semantic versioning schema described on https://semver.org/ for versioning.

Note

In terms of functionality, this release corresponds to Inscriptis 2.0rc2.

Fixed annotations for borderline cases

10 Jul 15:21
Compare
Choose a tag to compare

Please refer to https://github.com/weblyzard/inscriptis/releases/tag/2.0rc1 for a list of all new features. This release candidate fixes the following issues in rc1:

  • fixed annotations for some borderline cases
  • improved documentation compared to 2.0rc2

Improved document model, parsing of borderline cases & HTML annotation support

30 Jun 09:51
84ec720
Compare
Choose a tag to compare
  1. HTML parsing:

    • new: new model for handling blocks and lines
    • chg: improved HTML parsing of tables, enumerations and margins; fixed borderline cases
    • chg: improved whitespace handling
    • add: cover more borderline cases with unit tests
  2. Inscriptis core:

    • new: support for annotation rules and annotation output
    • new: annotation post-processors (html, xml, surface form)
    • new: type hints
    • chg: extended and improved documentation
  3. Inscript command line client:

    • chg: apply --encoding to Web URLs as well

1.2

14 May 09:40
Compare
Choose a tag to compare
1.2
  • tables: add support for vertical (valign, css: text-vertical-alginment) and horizontal (align) cell alignment (fixes: #33)
  • improved handling of HTML attributes and styles
  • code cleanup
  • migrated build from travis to github actions

Improved margin handling & more liberal licensing

04 Jan 12:51
d6e275d
Compare
Choose a tag to compare
  • ignore top margins at the beginning of a document.
  • more liberal licensing:
    • the license change has been triggered by another project that created a Java port of inscriptis.
    • to facilitate the free sharing of code and ideas between our two projects, we have (i) obtained the permission of all contributors for a license change, and (ii) changed the inscriptis license to the "Apache License 2.0".

Improved testing and Python 3.9 support

08 Dec 06:33
Compare
Choose a tag to compare
  • minor performance improvements and code optimizations
  • added Python 3.9 test environment
  • improved test coverage
  • updated package metadata
  • improved tox configuration

Improved HTML rendering, command line client and Web service

20 May 19:10
Compare
Choose a tag to compare
  1. added support for rendering tags with the white-space: pre CSS attribute (e.g. <pre> which is often used for formatting code).
  2. API change: A ParserConfig object replaces the parameters display_images, dedpulicate_captions, display_links and indentation in get_text() and for initializing the Inscriptis class.
      from lxml.html import fromstring
      from inscriptis.model.config import ParserConfig
      
      html_tree = fromstring(html)   
      # optional parser configuration fine tuning
      config = ParserConfig(display_links=True, display_anchors=True)
      parser = Inscriptis(html_tree, config)
      text = parser.get_text()
  1. command line client:
    • added option for displaying anchor links
    • --encoding not sets the HTML and output encoding
    • new --version option
  2. Web service
    • use the related CSS profile per default
    • added version call
  3. Documentation fixes and improvements

Improved performance and code structure, documentation and unit testing

20 Dec 17:16
Compare
Choose a tag to compare
  • improved performance and code structure.
  • use metadata published in ./inscriptis/__init__.py for versioning and in setup.py.
  • improved test coverage
  • created sphinx API, usage and testing documentation which is published on https://inscriptis.readthedocs.org
  • requires Python 3.5+ (dropped support for Python 2.7)

Correct inscript.py default indentation strategy.

25 Sep 13:20
Compare
Choose a tag to compare

Use the extended indentation strategy per default as outlined in the README.md.