Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command line tools for XML sync testing between languages #222

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 101 additions & 54 deletions scripts/translation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,19 @@ Because of the above, it's possible to silence each alert indempendly. These
scripts will output `--add-ignore` commands that, if executed, will omit the
specific alerts in future executions.

## First execution
## broken.php

The first execution of these scripts may generate an inordinate amount of
alerts. It's advised to initially run each command separately, and work the
alerts on a case by case basis. After all interesting cases are fixed,
it's possible to rerun the command and `grep` the output for `--add-ignore`
lines, run these commands, and by so, mass ignore the residual alerts.
`doc-base/scripts/broken.php` will test if individual XML files are
ill-formed. That is, if a file contains Unicode BOM, carriage returns (CR),
or if XML contents are not
[well-balanced](https://www.w3.org/TR/xml-fragment/#defn-well-balanced).

Unbalanced XML contents are invalid XML and will result in a broken build.
BOM and CR marks may not result in broken builds, but *will* cause several
tools below to misbehave, as `libxml` behaviour changes if XML text contains
these bytes.

## qaxml-attributes.php (structural)
## qaxml-attributes.php

`doc-base/scripts/translation/qaxml-attributes.php` checks if all translated
files have the same tag-attribute-value triplets. Tag's attributes are
Expand All @@ -35,7 +39,7 @@ This script accepts an `--urgent` option, to filter alerts related to `xml:id`
attributes. This will help translators on languages that are failing to build,
to focus on mismatches that are probably most related with build fails.

## qaxml-entities.php (structural)
## qaxml-entities.php

`doc-base/scripts/translation/qaxml-entities.php` checks if all translated
files contain the same XML Entities References as the original files.
Expand All @@ -55,15 +59,99 @@ entities when generating alerts. This is handy in languages that use some
`&zb;` and `&dh;` entities, and could run with `-zb -dh` to avoid generating
alerts for these entities' differences.

## Old tools (below)
## qaxml-pi.php

`doc-base/scripts/translation/qaxml-pi.php` checks if all translated files have
the same processing instructions (PI) as the original files. Unbalanced PIs may
cause compilation errors, as they are utilized in the manual build process.

## qaxml-tags.php

`doc-base/scripts/translation/qaxml-tags.php` checks if all translated files
have the same tags as the original files. Different number of tags between
source texts and translations indicated mismatched translated texts, and may
cause compilation errors

This script accepts an `--detail` option, that will print lines of each
mismatched tag, to facilitate the work on big files.

This script also accepts an `--content=` option, that will check the
*contents* of tags, to inspect tags where the contents are expected *not* to
be translated. Example below.

## qaxml-ws.php

`doc-base/scripts/translation/qaxml-ws.php` inspect whitespace usage inside
some known tags. Spurious whitespace may break manual linking or generate
visible artifacts.

## qaxml-revtag.php

`doc-base/scripts/translation/qaxml-revtag.php` checks if all translated
files have valid [revision tags](https://doc.php.net/guide/translating.md).
Files without revision tags in expected format will fail to generate pretty
diffs on [Translation status](https://doc.php.net/revcheck.php) website or
locally generated `revcheck.php` status pages.

## Suggested execution

The first execution of these scripts may generate an inordinate amount of
alerts. It's advised to initially run each command separately, and work the
alerts on a case by case basis. After all interesting cases are fixed,
it's possible to rerun the command and `grep` the output for `--add-ignore`
lines, run these commands, and by so, mass ignore the residual alerts.

Structural checks:

```
php doc-base/scripts/broken.php
php doc-base/scripts/translation/qaxml-revtag.php

php doc-base/scripts/translation/qaxml-attributes.php
php doc-base/scripts/translation/qaxml-entities.php
php doc-base/scripts/translation/qaxml-pi.php
php doc-base/scripts/translation/qaxml-tags.php --detail
php doc-base/scripts/translation/qaxml-ws.php
```

The tools on `doc-base/scripts/translation/` are slowly being rewritten. While
this effort is not complete, the previous tools, document below, could be used
to supply for features yet not completed.
Tags where is expected no translations:

```
php doc-base/scripts/translation/qaxml-tags.php --content=acronym
php doc-base/scripts/translation/qaxml-tags.php --content=classname
php doc-base/scripts/translation/qaxml-tags.php --content=constant
php doc-base/scripts/translation/qaxml-tags.php --content=envar
php doc-base/scripts/translation/qaxml-tags.php --content=function
php doc-base/scripts/translation/qaxml-tags.php --content=interfacename
php doc-base/scripts/translation/qaxml-tags.php --content=parameter
php doc-base/scripts/translation/qaxml-tags.php --content=type
php doc-base/scripts/translation/qaxml-tags.php --content=classsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=constructorsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=destructorsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=fieldsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=funcsynopsis
php doc-base/scripts/translation/qaxml-tags.php --content=methodsynopsis
```

Tags where is expected few translations:

```
php doc-base/scripts/translation/qaxml-tags.php --content=code
php doc-base/scripts/translation/qaxml-tags.php --content=computeroutput
php doc-base/scripts/translation/qaxml-tags.php --content=filename
php doc-base/scripts/translation/qaxml-tags.php --content=literal
php doc-base/scripts/translation/qaxml-tags.php --content=varname
```

---

Before using the old scripts, they need be configured:
## Old tools (below)

Document below is the previous version of these tools. These tools are
deprecated, and scheduled for remotion very soon.


These old tools needed to be separated configured, before use:
```
php doc-base/scripts/translation/configure.php $LANG_DIR
```
Expand Down Expand Up @@ -107,44 +195,3 @@ contents, as some tag contents are expected *not* be translated.

`--detail` will also print line definitions of each mismatched tag,
to facilitate bitsecting.

## Suggested execution

Structural checks:

```
php doc-base/scripts/translation/configure.php $LANG_DIR

php doc-base/scripts/translation/qarvt.php

php doc-base/scripts/translation/qaxml.a.php
php doc-base/scripts/translation/qaxml.e.php
php doc-base/scripts/translation/qaxml.p.php
php doc-base/scripts/translation/qaxml.t.php
php doc-base/scripts/translation/qaxml.w.php
```
Tags where is expected no translations:
```
php doc-base/scripts/translation/qaxml.t.php acronym
php doc-base/scripts/translation/qaxml.t.php classname
php doc-base/scripts/translation/qaxml.t.php constant
php doc-base/scripts/translation/qaxml.t.php envar
php doc-base/scripts/translation/qaxml.t.php function
php doc-base/scripts/translation/qaxml.t.php interfacename
php doc-base/scripts/translation/qaxml.t.php parameter
php doc-base/scripts/translation/qaxml.t.php type
php doc-base/scripts/translation/qaxml.t.php classsynopsis
php doc-base/scripts/translation/qaxml.t.php constructorsynopsis
php doc-base/scripts/translation/qaxml.t.php destructorsynopsis
php doc-base/scripts/translation/qaxml.t.php fieldsynopsis
php doc-base/scripts/translation/qaxml.t.php funcsynopsis
php doc-base/scripts/translation/qaxml.t.php methodsynopsis
```
Tags where is expected few translations:
```
php doc-base/scripts/translation/qaxml.t.php code
php doc-base/scripts/translation/qaxml.t.php computeroutput
php doc-base/scripts/translation/qaxml.t.php filename
php doc-base/scripts/translation/qaxml.t.php literal
php doc-base/scripts/translation/qaxml.t.php varname
```
4 changes: 3 additions & 1 deletion scripts/translation/libqa/ArgvParser.php
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ class ArgvParser
public function __construct( array $argv )
{
$this->argv = array_values( array_filter( $argv ) );
$this->used = [];
$this->used = array_fill( 0 , count( $argv ) , false );
}

Expand Down Expand Up @@ -58,6 +57,9 @@ public function consume( string $equals = null , string $prefix = null , int $po
$this->argv[ $pos ] = null;
$this->used[ $pos ] = true;

if ( $foundByPrefix )
return substr( $arg , strlen( $prefix ) );

return $arg;
}
}
Expand Down
10 changes: 7 additions & 3 deletions scripts/translation/libqa/OutputBuffer.php
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ class OutputBuffer
private OutputIgnore $ignore;
private string $options;

public int $printCount = 0;

public function __construct( string $header , string $filename , OutputIgnore $ignore )
{
$filename = str_replace( "/./" , "/" , $filename );
Expand Down Expand Up @@ -81,7 +83,7 @@ public function contains( string $text ) : bool
return false;
}

public function print( bool $useAlternatePrinting = false )
public function print( bool $alternatePrinting = false )
{
if ( count( $this->matter ) == 0 && count( $this->footer ) == 0 )
return;
Expand All @@ -93,9 +95,11 @@ public function print( bool $useAlternatePrinting = false )
if ( $this->ignore->shouldIgnore( $this , $hashFile , $hashHead , $hashFull ) )
return;

$this->printCount++;

print $this->header;

if ( $useAlternatePrinting )
if ( $alternatePrinting )
$this->printMatterAlternate();
else
foreach( $this->matter as $text )
Expand Down Expand Up @@ -128,8 +132,8 @@ private function printMatterAlternate() : void

for ( $idx = 0 ; $idx < count( $this->matter ) ; $idx++ )
{
if ( isset( $add[ $idx ] ) ) print $add[ $idx ];
if ( isset( $del[ $idx ] ) ) print $del[ $idx ];
if ( isset( $add[ $idx ] ) ) print $add[ $idx ];
}

foreach( $rst as $text )
Expand Down
19 changes: 8 additions & 11 deletions scripts/translation/libqa/OutputIgnore.php
Original file line number Diff line number Diff line change
Expand Up @@ -20,23 +20,21 @@

class OutputIgnore
{
private bool $appendIgnores = true;
private bool $showIgnore = true;
private string $filename = ".qaxml.ignores";
private string $argv0 = "";

public bool $appendIgnoreCommands = true;
public ArgvParser $argv;

public function __construct( ArgvParser $argv )
{
$this->argv = $argv;
$this->argv0 = escapeshellarg( $argv->consume( position: 0 ) );

$arg = $argv->consume( prefix: "--add-ignore=" );
$item = $argv->consume( prefix: "--add-ignore=" );

if ( $arg != null )
if ( $item != null )
{
$item = substr( $arg , 13 );
$list = $this->loadIgnores();
if ( ! in_array( $item , $list ) )
{
Expand All @@ -46,10 +44,9 @@ public function __construct( ArgvParser $argv )
exit;
}

$arg = $argv->consume( prefix: "--del-ignore=" );
if ( $arg != null )
$item = $argv->consume( prefix: "--del-ignore=" );
if ( $item != null )
{
$item = substr( $arg , 13 );
$list = $this->loadIgnores();
$dels = 0;
while ( in_array( $item , $list ) )
Expand All @@ -66,7 +63,7 @@ public function __construct( ArgvParser $argv )
}

if ( $argv->consume( "--disable-ignore" ) != null )
$this->showIgnore = false;
$this->appendIgnoreCommands = false;
}

private function loadIgnores()
Expand Down Expand Up @@ -96,12 +93,12 @@ public function shouldIgnore( OutputBuffer $output , string $hashFile , string $
if ( in_array( $active , $marks ) )
$ret = true;
else
if ( $this->showIgnore )
if ( $this->appendIgnoreCommands )
$output->addFooter( " php {$this->argv0} --add-ignore=$active\n" );

// --del-ignore command

if ( $this->showIgnore )
if ( $this->appendIgnoreCommands )
foreach ( $marks as $mark )
if ( str_starts_with( $mark , $prefix ) )
if ( $mark != $active )
Expand Down
4 changes: 2 additions & 2 deletions scripts/translation/libqa/XmlFrag.php
Original file line number Diff line number Diff line change
Expand Up @@ -37,13 +37,13 @@ static function listNodesRecurse( DOMNode $node , int $type, array & $ret )
XmlFrag::listNodesRecurse( $child , $type, $ret );
}

static function loadXmlFragmentFile( string $filename )
static function loadXmlFragmentFile( string $filename , bool $fakeDtdForMissingEntity = true )
{
$contents = file_get_contents( $filename );

[ $doc , $ent , $err ] = XmlFrag::loadXmlFragmentText( $contents , "" );

if ( count( $err ) == 0 )
if ( count( $err ) == 0 || $fakeDtdForMissingEntity == false )
return [ $doc , $ent , $err ];

$dtd = "<?xml version='1.0' encoding='utf-8'?>\n<!DOCTYPE frag [\n";
Expand Down
71 changes: 71 additions & 0 deletions scripts/translation/qaxml-pi.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
<?php /*
+----------------------------------------------------------------------+
| Copyright (c) 1997-2025 The PHP Group |
+----------------------------------------------------------------------+
| This source file is subject to version 3.01 of the PHP license, |
| that is bundled with this package in the file LICENSE, and is |
| available through the world-wide-web at the following url: |
| https://www.php.net/license/3_01.txt. |
| If you did not receive a copy of the PHP license and are unable to |
| obtain it through the world-wide-web, please send a note to |
| [email protected], so we can mail you a copy immediately. |
+----------------------------------------------------------------------+
| Authors: André L F S Bacci <ae php.net> |
+----------------------------------------------------------------------+

# Description

Compare processing instructions usage between two XML files. */

require_once __DIR__ . '/libqa/all.php';

$argv = new ArgvParser( $argv );
$ignore = new OutputIgnore( $argv ); // may exit.
$argv->complete();

$list = SyncFileList::load();

foreach ( $list as $file )
{
$source = $file->sourceDir . '/' . $file->file;
$target = $file->targetDir . '/' . $file->file;
$output = new OutputBuffer( "# qaxml.p" , $target , $ignore );

[ $s , $_ , $_ ] = XmlFrag::loadXmlFragmentFile( $source );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you reusing the variable $_ twice? It's just going to overwrite it. And I don't think you are actually using this variable anywhere so why assign it at all?

[ $t , $_ , $_ ] = XmlFrag::loadXmlFragmentFile( $target );

$s = XmlFrag::listNodes( $s , XML_PI_NODE );
$t = XmlFrag::listNodes( $t , XML_PI_NODE );
Comment on lines +37 to +38
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$s = XmlFrag::listNodes( $s , XML_PI_NODE );
$t = XmlFrag::listNodes( $t , XML_PI_NODE );
$s = XmlFrag::listNodes( $s , XML_PI_NODE );
$t = XmlFrag::listNodes( $t , XML_PI_NODE );
Suggested change
$s = XmlFrag::listNodes( $s , XML_PI_NODE );
$t = XmlFrag::listNodes( $t , XML_PI_NODE );
$source = XmlFrag::listNodes( $source , XML_PI_NODE );
$target = XmlFrag::listNodes( $target , XML_PI_NODE );

Please don't use single-character variable names.


$s = extractPiData( $s );
$t = extractPiData( $t );

if ( implode( "\n" , $s ) == implode( "\n" , $t ) )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be ===? I think it would make more sense.

continue;

$sideCount = array();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$sideCount = array();
$sideCount = [];


foreach( $s as $v )
$sideCount[$v] = [ 0 , 0 ];
foreach( $t as $v )
Comment on lines +48 to +50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this coding style. Usually it would be something like:

Suggested change
foreach( $s as $v )
$sideCount[$v] = [ 0 , 0 ];
foreach( $t as $v )
foreach ($s as $v)
$sideCount[$v] = [ 0 , 0 ];
foreach ($t as $v)

$sideCount[$v] = [ 0 , 0 ];

foreach( $s as $v )
$sideCount[$v][0] += 1;
foreach( $t as $v )
$sideCount[$v][1] += 1;

foreach( $sideCount as $k => $v )
if ( $v[0] != $v[1] )
$output->addDiff( $k , $v[0] , $v[1] );

$output->print();
}

function extractPiData( array $list )
{
$ret = array();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$ret = array();
$ret = [];

foreach( $list as $elem )
$ret[] = "{$elem->target} {$elem->data}";
return $ret;
}
Loading
Loading