Skip to content

Kielipankki Korp frontend upgrade to 2023 MM DD (9.3.0 )

Jyrki Niemi edited this page Mar 15, 2023 · 2 revisions

Kielipankki Korp frontend upgrade 2022-02-02 (9.1.0) -> 2023-MM-DD (9.3.0): notes and tasks

General

  • Goal: Merge our additions or modifications feature by feature and create pull requests.
  • Our features are in separate branches, merged to the current master branch. Many of them are based on a single base commit in Språkbanken’s code, but some depend on each other
  • Språkbanken’s code has been refactored at places, so simply merging (or rebasing) our existing feature branches to the new code most likely won’t work in general. The changes could perhaps be rebased interactively (to retain original commit information), but it would in most cases require moving and modifying the changes. In that case, it might also be better to make some changes (such as converting configuration variable access) in a separate commit.
  • Some of our modifications have become obsolete or unnecessary by the changes and fixes in Språkbanken’s code.
  • Our frontend configuration also contains code that will need to be changed because of changes in Språkbanken's code. At least the following need to be done:
    • config.js needs to be replaced with config.yml.
    • Extended search components should be defined differently: they should not be in-line in attribute definitions but in custom/extended.js and referred to by name in attribute definitions.

Major changes in Språkbanken’s code

  • Corpus configuration has moved to the backend, so most of the configuration will be moved there.
    • However, some JavaScript code will remain in the frontend configuration.
    • Because of this, the backend corpus configurations will need to be functional before the updated frontend can be used.
  • Configuration variables should be accessed as settings["config_var"], not settings.configVar as previously.
  • Some code has been converted from AngularJS controllers to components, in preparation for moving to Vue.js.
  • The corpus selector has been completely reimplemented.
  • Language codes now use ISO 639-2 (three-letter codes) instead of two-letter codes.
  • Authentication has been reimplemented apparently with Shibboleth support, so we probably should check if we could also use the new implementation.

Features (branches) to be merged

Branches have several different prefixes, which often carry some meaning:

  • dev-v9-kp-: Strictly Kielipankki-specific branches. We should find a different way of implementing (maybe as plugins or via configuration), so that these would not be needed.
  • f-v9-: These branches may need more work. Some of them may have become obsolete and many features should be discussed with Språkbanken first, as the present approach might not work any more or be the best one.
  • fix-: These contain bug fixes, but they might have been fixed at Språkbanken.
  • t-: In general, these might be most relevant to Språkbanken and perhaps also easiest to port.
  1. dev-v9-kp-mods

    The only thing this branch currently does is to disable GitHub workflow “Korp with SB config”, which requires Språkbankens configuration. It would be better if we could make do without this, but how?

  2. dev-v9-kp-ui-style

    This currently only adds Kielipankki logo images. They should probably be moved to the frontend, but using images from the configuration will probably require changes elsewhere. It should be discussed with Språkbanken how that could be done.

  3. f-v9-allow-no-preselected-corpora

    Allow selecting no corpora initially (configurable).

  4. f-v9-configurable-lemgram-completion

    Support configurable lemgram completion, instead of using only Karp.

  5. f-v9-corpusfolder-extra-nonfolder-props

    Specify a list of property names in a corpus folder that do not designate a subfolder. This has most probably been obsoleted by changes in the corpus configuration format.

  6. f-v9-generate-ui-lang-menu

    Specify UI languages in the configuration instead of editing index.pug. This is most probably obsolete, as similar functionality has been implemented by Språkbanken.

  7. f-v9-grouped-select

    Add a datasetSelectController with value grouping to the extended search.

  8. f-v9-handle-unavail-corpora

    Support handling configured but unavailable corpora (configurable behaviour). This requires that the backend /corpus_info endpoint recognizes the parameter report_undefined_corpora.

  9. f-v9-load-customstyles

    Load custom styles (modifications) in the configuration. There might be better ways to implement this kind of functionality, so this could be discussed (in GitHub) with Språkbanken.

  10. f-v9-main-menu-include

    This will need to be implemented differently, as Pug is no longer used as the source format for index.html. This should be discussed with Språkbanken.

  11. f-v9-merge-transl-files

    A Webpack configuration modification to merge locale-*.json translation files from configuration and plugins, to allow overriding some translations locally. There might be better ways to do this; maybe ask in a GitHub discussion?

  12. f-v9-plugin-facility

    A plugin facility for the Korp frontend. In addition to plugins.js, the changes include hook points in many parts of the code that may have been refactored. Also the Webpack configuration contains code for discovering plugins. Some hook points may have become obsolete.

    This incorporates f-v9-merge-transl-files (see the previous item) and f-v9-pug-multiple-paths-plugin (see the next item), so some major changes are likely to be required for the plugins to work as currently.

  13. f-v9-pug-multiple-paths-plugin

    It seems that as of 2023-02-08, Pug is no longer used as the source format for index.html, so this is apparently no longer relevant.

    We should then find another way to override e.g. the main menu without having having to modify the main code for that. We should probably ask in a GitHub discussion what would be a good way to do that.

  14. fix-backend-based-kwic-download

    Fix the KWIC download that uses the backend CGI script korp_download.cgi. Språkbanken has implemented KWIC download in the frontend, so they do not currently use the backend-based approach, and some changes elsewhere had made it unworking.

    The backend CGI script should be converted to an endpoint plugin for the current Korp backend.

  15. fix-corpuschooser-intermediate

    Fix the corpus chooser to mark correctly with folders with both locked and unlocked corpora. This may have become obsolete or at least it probably needs to be rewritten because the corpus chooser has been compeletely reimplemented.

  16. fix-corpuschooser-multiple-linked-langs

    Fix the corpus chooser to support multiple linked languages in parallel corpora. We should check if the functionality has been implemented at Språkbanken, but at least this has to be rewritten as the corpus chooser has been reimplemented.

  17. fix-require-geo-map-attr-name

    Fix to actually require geo in the names of map attributes. This has been fixed at Språkbanken, so this is obsolete.

  18. fix-sidebar-dataset-attr-mapping

    Map values of attributes with non-array datasets through the dataset mapping to show their values in the sidebar. It might be good to ask if this is the desired behaviour or not.

  19. t-calc-map-center-dynamically

    Calculate the map centre coordinates dynamically based on the points on the map.

  20. t-config-attr-sidebarHideLabel

    Support attribute property sidebarHideLabel to hide a label in the sidebar. At least the name of the property should be converted to sidebar_hide_label.

  21. t-copy-markup-from-config

    A Webpack configuration change to copy markup files from the configuration directory. This might be obsolete, or at least we should discuss how to implement it.

  22. t-default-transl-langs

    Configure languages to fall back to if a translation is not found for a UI item or attribute value. We should check how this currently works in Språkbanken’s Korp.

  23. t-faster-attribute-mapping

    Speed up some functions used in the corpus selector. We should check if this is still relevant.

  24. t-localize-kwic

    Allow localizing “KWIC” in the Korp UI. This is a minor change that will have to be rewritten because Pug is no longer used.

  25. t-localize-support-within-postposition

    Support adding a postposition after the “within” selection list. A tiny change that will have to be partly rewritten because Pug is no longer used.

  26. t-mode-switch-restore-params

    Support saving and restoring hash parameters when switching modes in Korp (configurable).

  27. t-reduce-attr-select-width-in-stylesheet

    Move the width of the statistics attribute selection list to the stylesheet instead of having it fixed in the Pug/HTML code.

  28. t-remove-fixed-sv

    Remove fixed references to Swedish as the default UI language, and allow configuring the locales used by different UI languages. We should check if this is needed any more.

  29. t-settings-SimpleSearchGetLemgramCQP

    Support a configurable way to construct lemgrams in the simple search, instead of using Språkbanken’s default. This probably requires changes because of changes in the configuration system: we should find out a where the function for constructing lemgrams should be defined.

  30. t-settings-isKorpLabsURL

    Specify a function in the configuration to test whether using the “Korp Labs” version of Korp (a test version) or not. This probably requires changes because of changes in the configuration system: we should find out a where the function for constructing lemgrams should be defined.

  31. t-settings-logoKorpVersion

    Allow specifying in the configuration the version number shown with the Korp logo. This is not very essential: it was implemented so that the Korp Labs would also show “v9” instead of “v10” as in Språkbanken’s Korp.

  32. t-sidebar-video-configurable-size

    Allow configuring the sidebar video size, instead of using 100%.

  33. t-stats_cqp-pass-attribute-name

    Pass the attribute name to a stats_cqp function to allow the same custom function to be used for different attributes.

  34. t-tokens-in-custom-attr-patterns

    Allow referring to all tokens in the hit in custom attribute patterns.

Features not yet ported to Kielipankki Korp frontend 9

Some features in Kielipankki Korp 5 have not yet been ported to Kielipankki Korp frontend 9. These features are in most cases not in separate branches, so the relevant changes should be extracted based on commit information and diffs. At least some of them could or should perhaps be implemented as plugins.

The features include at least the following:

  • Restricted corpora modal: When the user tries to access a restricted corpus, a modal dialogue opens informing them of that and offering login.

  • Support a “prequery” in the simple search: Search only within structures (texts, paragraphs or sentences) containing the given word forms or lemmas. This should perhaps be implemented as a plugin or at least be configurable.

  • Support implicitly adding a given CQP expression between any two tokens in the extended search, defined corpus-wise. This is useful for corpora containing some kind of markup represented as tokens.

Plugins

General

  • Plugins are in the Kielipankki-korp-frontend repository as branches plugins/*, merged to plugins/master.

Individual plugins

  1. about_modifier

  2. config_augment_info

  3. config_corpus_features

  4. config_corpusinfo_copier

  5. config_corpus_settings_modifier

  6. config_licence_category

  7. config_logical_corpora

  8. config_sidebar_order

  9. config_url_opts

  10. corpus_aliases

  11. corpuschooser_item_formatter

  12. corpuschooser_prompt_empty

  13. corpusinfo_formatter

  14. news_banner

  15. shibboleth_auth

  16. sidebar_link_section

Frontend configuration

  • Frontend configuration is in the Kielipankki-korp-frontend repository as config/master, with test instances as config/*.
  • Some of the JavaScript code that has been in the mode files need to be moved to custom and referenced by name from the corpus configuration in the backend.
    • Some custom code has not been ported even to Korp 9.1.0 yet (e.g. ScotsCorr).
  • Configuration variables should be changed from camelCase to snake_case.