Add natural translation for DSL #574

BrentBlanckaert · 2024-12-11T18:37:56Z

You can run the pre-processor by using

python -m tested.nat_translation ./exercise/simple-example/program_language_map/suite.yaml en # English translation

tested/nat_translation.py

pdawyndt · 2024-12-11T19:46:27Z

Maybe we could support translations of a testplan (and rollout of templates?) as

python -m tested.translate <testplan>

with an extra option (or argument) to pass the natural language for the translation.

tested/nat_translation.py

BrentBlanckaert · 2024-12-12T13:33:38Z

Maybe we could support translations of a testplan (and rollout of templates?) as
python -m tested.translate <testplan>
with an extra option (or argument) to pass the natural language for the translation.

This should work.

BrentBlanckaert · 2024-12-12T13:47:45Z

@pdawyndt in #559 it also says that translations for files should be provided. In what sense?
I've got something like the following:

- files: !natural_language
    en:
      - name: "file.txt"
        url: "media/workdir/file.txt"
    nl:
      - name: "fileNL.txt"
        url: "media/workdir/fileNL.txt"

This seems pointless since I could also just do:

- files:
  - name: "file.txt"
    url: "media/workdir/file.txt"
  - name: "fileNL.txt"
    url: "media/workdir/fileNL.txt"

tested/nat_translation.py

pdawyndt · 2024-12-12T18:10:14Z

Not really pointless as TESTed will show all "linked files" to the students. For each file, TESTed will try to find its name in the expression/statement and then turn that into a hyperlink. If it doesn't find the filename, it will add it to a list of files that is displayed for the testcase.

BrentBlanckaert · 2024-12-13T13:23:58Z

@pdawyndt , I started looking for adding a translation table like

translation:
  animal:
    en: "animal"
    nl: "dier"
  result:
    en: "result"
    nl: "resultaat"

Is it even usefull to then also add support in a statement like the following:

- statement: !natural_language
   en: 'result = Trying(10, "{animal}")'
   nl: 'resultaat = Proberen(10, "{animal}")'

I would suggest not even searching lookingany deeper when a natural_language map is already found and only using translation map when the expected (like a string) is given.

pdawyndt · 2024-12-13T16:49:13Z

I definitely have many exercises where this (the combination of translation and template strings) is useful. So I would say yes. If we use Python format strings, then we could even write your example as

- statement: !natural_language
   en: 'result = Trying(10, {animal!r})'
   nl: 'resultaat = Proberen(10, {animal!r})'

And not even bother about using single or double quotes or escaping any quotes in the thing you put in the placeholders (which otherwise adds a lot of complication on the side of the DSL-author).

If you have a variable statement pointing to the format string for the statement, a dictionary translation containing the merged translation from the DSL hierarchy and a dictionary data containing the testcase data, turning the template string into the actual string (by filling up the placeholders) would then come down to

statement = statement.format(**translation, **data)

If we also allow data to be an YAML-array instead of a YAML map (positional instead of named placeholders), then formatting is done by

statement = statement.format(*data, **translation)

For example, if data = [3, 4, 7] then we could have

'{} +  {} = {}'

or with explicit positions (which would also allow reodering and reusing the array values)

'{0} +  {1} = {2}'

…ntext-scope

tested/nat_translation.py

BrentBlanckaert · 2024-12-15T19:41:14Z

Currently I've implemented support for !natural_language and a translation map you can define globally, in a tab and in a context. Here is a quick rundown of everything that is possible:

The `translation` map looks like the following:

translation:
  animal:
    en: "animals"
    nl: "dieren"
  result:
    en: "results"
    nl: "resultaten"

This can be defined

Next to the tabs (globally)
In a tab
In a context

The `!natural_language` map can be defined in the following ways:

In a `tab`

If tab (the name) is a dict, it means it's a !natural_language map where using !natural_language is not necessary.
- After that translation of the !natural_language map, it is assumed that the name will be a string. This will then be formatted using the translation maps.

In a `testcase`

For a statement or expression using !natural_language is mandatory.
- If it is there it'll first perform the translation.
- After that, it'll check if it's a dict. If it is, then we do formatting based of the translation maps on each value.
- If it's a string we just perform formatting on that.
When a stdin is a dict it is assumed that it's a !natural_language map. So using !natural_language is not necessary.
- From this dict a translation is performed.
- The result of that should be a string, which is always formatted even if stdin wasn't a dict.
For arguments the same holds as stdin except that the result will be a list and formatting is performed on each item.
stderr, exception and stdout follow the same structure:
- The usage of !natural_language mandatory. If it's there we'll do the translation. If the result is a string, it will be formatted.
- If a dict remains, we'll look at the "data" key ("message" for exception)
  - Check if that is a dict:
  - If it is, perform translation (no !natural_language needed).
- The value of "data" should be a string or should be one after the translation. That string is formatted.
For files I've only added support for usage of !natural_language. No formatting is done.
If the return is an Oracle:
- We look at the arguments and do the exact same as specified before.
- After that we look at the value. If translation is done, it's mandatory to use !natural_language. That translation will turn it in a list, dict, int or string. This will be parsed an correctly formatted.
If it's not an Oracle, we check if it's a !natural_language map. If it is, we parse the result of the translation for possible formatting.
Otherwise just parse the value for possible formatting.
When using a description using !natural_language is also mandatory. The result of that translation will be formatted if its a sstring
When it's a dict, check the "description" key. If that is a dict, then it's a translation. After the value of the "description" key should always be a string and formatted.

tested/nat_translation.py

…tests

…r consistency

tested/nat_translation.py

BrentBlanckaert · 2025-01-04T15:54:33Z

tested/nat_translation.py

+
+
+def create_enviroment() -> Environment:
+    enviroment = Environment()


This doesn't seem to be relevant for our case. Since, we're not passing any html or xml through this.

tested/nat_translation.py

+    # def represent_str(dumper, data):
+    #     return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')


BrentBlanckaert · 2025-01-05T14:25:43Z

Documentation Natural language translation

In this documentation 2 things will be discussed.
Firstly, there is the natural_language map and
the translations map that can be defined
in the test-suite.

Globally

You can start with a list of tabs or a directory.
In this dict you can define the tabs, but now you can
also define a translations map. In this map you can
specify for each key a corresponding translation in
different languages. For example, We could create the
following test-suite:

translations:
  animals:
    en: "animals"
    nl: "dieren"
  humans:
    en: "humans"
    nl: "mensen"
tabs:
  - tab: "{{animals}}"
    ...
  - tab: "{{humans}}"
    ...

This defined a translations map with the keys animals
and humans. The values define the corresponding
translation.
Each of these keys can be used in the tabs map to perform
translations on nearly every string (for example: the tab-titles).
This is done by using double brackets around the key ({{animals}}).
If a translation in dutch was performed, it would generate
the following test-suite:

tabs:
  - tab: "dieren"
    ...
  - tab: "mensen"
    ...

TESTed would be able to understand this again.

Say you want the tab name to be "{dieren}" instead of "dieren".
In this case, there are two things you could do.
- You could use "{{ animal|braces }}" or,
- "{{ '{' + result + '}' }}"

Inside a tab

As discussed above, a tab could have a title that can be
translated using the translations map, but you could also
use a natural_language map for the title:

- tab: !natural_language
    en: 'animal/{{animals}}'
    nl: 'dier/{{animals}}'
  ...

In this case, the natural_language map is used to generate the
tab-title. The key animals can also be used here.
So a combination of a natural_language map with
key placeholders for the translations map is possible.

In a tab you can also define a translations map that can
overwrite certain translations for that tab.

Inside a tab, context and testcase, you can also define files
that act as potential input files. Top level you can define
a natural_language map for it and you can use
placeholders for the url and name attribute of each file:

files: !natural_language
  en:
    - name: "file_{{animal}}.txt"
      url: "media/workdir/file_{{animal}}.txt"
  nl:
    - name: "bestand_{{animal}}.txt"
      url: "media/workdir/bestand_{{animal}}.txt"

Inside a context

A tab can contain a context object that contains all the
testcases. Just like a tab, you can define a translations map
inside of it.

Inside a testcase

A testcase can contain all kinds of different things. One of those
are statements and expressions. An example of what kind of
translations you can do, are the following:

- statement: !natural_language
    en: '{{result}} = Trying(10)'
    nl: '{{result}} = Proberen(10)'
- expression: !natural_language
    en: 'count_words({{result}})'
    nl: 'tel_woorden({{result}})'

For statements and expressions you can also define a
program language specific map. Normally you don't need to add
anything special for this, but for consistency reasons
you must now also add !programming_language. So you could have
the following expression:

- expression: !programming_language
    javascript: !natural_language
      en: "{{animal}}_javascript_en(1 + 1)"
      nl: "{{animal}}_javascript_nl(1 + 1)"
    typescript: !natural_language
      en: "{{animal}}_typescript_en(1 + 1)"
      nl: "{{animal}}_typescript_nl(1 + 1)"
    java: !natural_language
      en: "Submission.{{animal}}_java_en(1 + 1)"
      nl: "Submission.{{animal}}_java_nl(1 + 1)"
    python: !natural_language
      en: "{{animal}}_python_en(1 + 1)"
      nl: "{{animal}}_python_nl(1 + 1)"

An equivalent of this would be:

- expression: !natural_language
    en: !programming_language
      javascript: "{{animal}}_javascript_en(1 + 1)"
      typescript: "{{animal}}_typescript_en(1 + 1)"
      java: "Submission.{{animal}}_java_en(1 + 1)"
      python: "{{animal}}_python_en(1 + 1)"
    nl: !programming_language
      javascript: "{{animal}}_javascript_nl(1 + 1)"
      typescript: "{{animal}}_typescript_nl(1 + 1)"
      java: "Submission.{{animal}}_java_nl(1 + 1)"
      python: "{{animal}}_python_nl(1 + 1)"

If no statements and expressions are specified,
there must be a stdin and arguments provided.

For both of these, you can specify a natural_language
map. For stdin the values should be strings that
also can contain formatting for translation
with the translations map.

For arguments, you can also specify a natural_language
map. The values should be lists that represent the arguments.
Formatting can be performed on all strings in those lists.

Next up, you can also specify stdout, stderr, and
exception. All three of these follow the
same format when it comes to translations.
At the top level, you can specify a natural_language
map. The values should either be a string or a
dictionary.

If it's a string, it can just be formatted with the
translations map.
If it's a dictionary, it should contain the key data
(message for exception). The value of that could
be another natural_language map.
The values of that natural_language map should be
of any yaml type. Strings in thoses values can also be
formatted with the translations map.

An example could be the following:

stderr:
  data: !natural_language
    en: "Nothing to see here {{User}}"
    nl: "Hier is niets te zien {{User}}"
  config:
    ignoreWhitespace: true

You can also add a file which corresponds to an output file.
This can be a natural_language map and the values should
be dictionaries. Those dictionaries should contain
the keys content and location, that can be formatted.

When specifying the return, it can be a lot of things
at the top level:

An oracle that is basically a dictionary where two
keys can have translations:
- arguments: works the exact same way as discussed above.
- value: works the same way as data in stderr and stdout.
A natural_language map: the values are expected to be valid yaml types, where the strings can be formatted.
A valid yaml value, where the strings can be formatted.

An example of this would be the following:

return: !oracle
  value: !natural_language
    en: "The {{result}} 10 is OK!"
    nl: "Het {{result}} 10 is OK!"
  oracle: "custom_check"
  file: "test.py"
  name: "evaluate_test"
  arguments: !natural_language
    en: ["The value", "is OK!", "is not OK!"]
    nl: ["Het {{result}}", "is OK!", "is niet OK!"]

Lastly, there is a description that could be added.
At the top level, this can be a natural_language map.
The values of this are either a dictionary or a string.

If it's a string, it can simply be formatted.
If it's a dictionary, it should contain the key description:
This can be a also be a natural_language map. The values
should be a string. These strings can be formatted.

An example of this would be the following:

description:
  description: !natural_language
    en: "Eleven_{{elf}}"
    nl: "Elf_{{elf}}"
  format: "code"

niknetniko

I have not looked at all the implementation code itself, I mostly read the "documentation".

I don't see any immediate conflicts in the DSL with other features, so I think the proposal is good.
I think the choice for explicit !natural_language is a good choice, even if it adds some verbosity to the test suites. It is always easier to add shorthands later if the experience of using the feature indicates that they are needed (the reverse is much more difficult).

So in summary, I think this is a good way of adding translation to the DSL, while keeping simple things simple but still providing a fairly flexible approach.

jorg-vr

First of all sorry for the late review.

As Niko already reviewed the docs, and I agree with his take, I have mostly looked at the code.

I have left more questions then actual comments, sorry for that, I am not very familiar with this part of the code.

I think it would be a big improvement should it be possible to write the code a bit more agnostic of the actually TESTed DSL. But it might be best to discuss this with @pdawyndt before you start a rewrite, as he might deem it more worthwhile for you to focus on other features instead of over-optimizing this one.

jorg-vr · 2025-01-14T09:10:50Z

tested/dsl/translate_parser.py

@@ -148,6 +184,8 @@ def _parse_yaml(yaml_stream: str) -> YamlObject:
            yaml.add_constructor("!" + actual_type, _custom_type_constructors, loader)
    yaml.add_constructor("!expression", _expression_string, loader)
    yaml.add_constructor("!oracle", _return_oracle, loader)
+    yaml.add_constructor("!natural_language", _natural_language_map, loader)
+    yaml.add_constructor("!programming_language", _programming_language_map, loader)


I am a bit confused why !programming_language is added in this pr, as I assumed this to already work before this pr. But I am not very familiar with this part of the code.

Could you explain it to me?

jorg-vr · 2025-01-14T09:12:48Z

tests/test_dsl_yaml.py

@@ -20,7 +20,14 @@
    StringTypes,


If this is easy to do, I would put these new tests in a separate file.
(But do keep them here if creating a second testfile requires a lot of code duplication)

jorg-vr · 2025-01-14T09:29:23Z

tested/nat_translation.py

@@ -0,0 +1,412 @@
+import sys


I must say I expected this file to be much simpler.

In principle the !natural_language should simply be replaced by the content of the specified language in the map.
So in my mind, this code should not know whether it is working within a tab, context, testcase,...

The fact that this preprocess script is so heavily linked to the precise TESTed DSL, will make it harder to maintain in the future. Any change to the TESTed DSL will also have to be verified here.

Do you think it is possible to write a more abstract solution, or have I missed some potential issues?

jorg-vr · 2025-01-14T09:39:04Z

tested/nat_translation.py

+
+
+def wrap_in_braces(value):
+    return f"{{{value}}}"


Is this a standard way of doing this in jinja2?
To me it felt a bit odd. But I assume this was an explicit feature request?

BrentBlanckaert added 3 commits December 11, 2024 19:35

Made first version for translation using !natural_language

d1a544b

forgot to push actual file

d0cf7de

fixed linting

0334e1f

github-advanced-security bot found potential problems Dec 11, 2024

View reviewed changes

tested/nat_translation.py Fixed Show fixed Hide fixed

fixed pyright issue

c1114bc

add test for unit-test

c61b563

github-advanced-security bot found potential problems Dec 11, 2024

View reviewed changes

tested/nat_translation.py Fixed Show fixed Hide fixed

tested/nat_translation.py Fixed Show fixed Hide fixed

Fixed some bugs and wrote another test for io

145deae

github-advanced-security bot found potential problems Dec 12, 2024

View reviewed changes

tested/nat_translation.py Fixed Show fixed Hide fixed

tested/nat_translation.py Fixed Show fixed Hide fixed

setup main

e14c758

BrentBlanckaert added 3 commits December 12, 2024 20:06

Made a small fix

65fb097

Tested an extra edge case

e941ef6

Cleaned up code and added extra cases.

eef397b

BrentBlanckaert added 4 commits December 13, 2024 19:04

Started on usage with translation table.

30bcdcc

Added support for translation-table in global scope, tab-scope and co…

4230003

…ntext-scope

Cleaned up code and fixed pyright issue

1ddef15

fixed tests and added more

5dabc80

github-advanced-security bot found potential problems Dec 15, 2024

View reviewed changes

tested/nat_translation.py Fixed Show fixed Hide fixed

fixed some small issues

69a77d3

BrentBlanckaert added 4 commits December 17, 2024 11:40

made some small fixes

eb62f92

wrote an extra test

6e258cf

fix spelling mistake

1db1db2

fixed linting issue

4dedd8c

BrentBlanckaert added 2 commits December 17, 2024 17:43

increasing test coverage

b9786c2

removed some redundant code

c63e391

github-advanced-security bot found potential problems Dec 19, 2024

View reviewed changes

tested/nat_translation.py Fixed Show fixed Hide fixed

BrentBlanckaert added 10 commits December 19, 2024 18:36

Adding a few comments

a35c49a

Cleaned up code some more and added extra cases for input and output …

10b0eb3

…tests

Updated statement/expression case and added programmingLanguageMap fo…

3539227

…r consistency

started added new json schema

e1180f9

Made some changes to schema

466ec16

fixed some bugs in the schema

49079a0

fixed some bugs and fixed the tests

4174beb

fixed an edge case and made an extra test for it.

1a79a82

added the actual writing to a file.

81ce161

changed formatter to jinja

64a00cd

github-advanced-security bot found potential problems Jan 4, 2025

View reviewed changes

small cleanup

88a2ede

BrentBlanckaert marked this pull request as ready for review January 7, 2025 17:07

BrentBlanckaert requested review from niknetniko and jorg-vr January 7, 2025 17:08

niknetniko reviewed Jan 7, 2025

View reviewed changes

BrentBlanckaert self-assigned this Jan 11, 2025

BrentBlanckaert added the enhancement New feature or request label Jan 11, 2025

jorg-vr reviewed Jan 14, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add natural translation for DSL #574

Add natural translation for DSL #574

BrentBlanckaert commented Dec 11, 2024 •

edited

Loading

pdawyndt commented Dec 11, 2024

BrentBlanckaert commented Dec 12, 2024 •

edited

Loading

BrentBlanckaert commented Dec 12, 2024

pdawyndt commented Dec 12, 2024

BrentBlanckaert commented Dec 13, 2024 •

edited

Loading

pdawyndt commented Dec 13, 2024 •

edited

Loading

BrentBlanckaert commented Dec 15, 2024 •

edited by pdawyndt

Loading

BrentBlanckaert Jan 4, 2025

BrentBlanckaert commented Jan 5, 2025

niknetniko left a comment

jorg-vr left a comment

jorg-vr Jan 14, 2025

jorg-vr Jan 14, 2025

jorg-vr Jan 14, 2025

jorg-vr Jan 14, 2025



		def create_enviroment() -> Environment:
		enviroment = Environment()

		# def represent_str(dumper, data):
		# return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')

Add natural translation for DSL #574

Are you sure you want to change the base?

Add natural translation for DSL #574

Conversation

BrentBlanckaert commented Dec 11, 2024 • edited Loading

pdawyndt commented Dec 11, 2024

BrentBlanckaert commented Dec 12, 2024 • edited Loading

BrentBlanckaert commented Dec 12, 2024

pdawyndt commented Dec 12, 2024

BrentBlanckaert commented Dec 13, 2024 • edited Loading

pdawyndt commented Dec 13, 2024 • edited Loading

BrentBlanckaert commented Dec 15, 2024 • edited by pdawyndt Loading

The translation map looks like the following:

The !natural_language map can be defined in the following ways:

In a tab

In a testcase

BrentBlanckaert Jan 4, 2025

Choose a reason for hiding this comment

BrentBlanckaert commented Jan 5, 2025

Documentation Natural language translation

Globally

Inside a tab

Inside a context

Inside a testcase

niknetniko left a comment

Choose a reason for hiding this comment

jorg-vr left a comment

Choose a reason for hiding this comment

jorg-vr Jan 14, 2025

Choose a reason for hiding this comment

jorg-vr Jan 14, 2025

Choose a reason for hiding this comment

jorg-vr Jan 14, 2025

Choose a reason for hiding this comment

jorg-vr Jan 14, 2025

Choose a reason for hiding this comment

BrentBlanckaert commented Dec 11, 2024 •

edited

Loading

BrentBlanckaert commented Dec 12, 2024 •

edited

Loading

BrentBlanckaert commented Dec 13, 2024 •

edited

Loading

pdawyndt commented Dec 13, 2024 •

edited

Loading

BrentBlanckaert commented Dec 15, 2024 •

edited by pdawyndt

Loading

The `translation` map looks like the following:

The `!natural_language` map can be defined in the following ways:

In a `tab`

In a `testcase`