Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/tighten hash usage 2 (fixing Issues #316, #423, and probably #492) #493

Open
wants to merge 53 commits into
base: develop
Choose a base branch
from

Conversation

JoshuaGreen
Copy link
Collaborator

This branch is an implementation of the same changes as those in feature/tighten_hash_usage but based off of develop commit #5ffb472.  See PR #441 for the full explanation of what's going on here.

I believe this resolves Issues #316 and #423.  Additionally, I assume it resolves Issue #492, though as per the discussion there one could take the relevant changes even further if desired.

@JoshuaGreen JoshuaGreen changed the title Feature/tighten hash usage 2 Feature/tighten hash usage 2 fixing Issues #316, #423, and probably #492 May 19, 2024
@JoshuaGreen JoshuaGreen changed the title Feature/tighten hash usage 2 fixing Issues #316, #423, and probably #492 Feature/tighten hash usage 2 (fixing Issues #316, #423, and probably #492) May 19, 2024
@MuTsunTsai
Copy link
Contributor

I saw you left DOMEASURE on in this PR. Shouldn't it be off by default?

@JoshuaGreen
Copy link
Collaborator Author

@MuTsunTsai, that's set in develop commit #5ffb472, so I left it alone.  It's useful for testing, and I trust @thomas-maeder to disable such settings when he builds the releases.

…herefore always decrement, just like we always increment it when allocating.
…ise usage, we may be able to work with a smaller maximum alignment than what the system would require in general. This update allows us to select such a value as a compile-time flag. This, of course, would have to be carefully tested for each such system.
@JoshuaGreen
Copy link
Collaborator Author

I did a bit more testing on this, and I'm fairly confident that the differences I see can be ascribed to using somewhat larger memory allocations for each object.  There are two factors at play here:

  1. Extra memory is required to enforce alignment.
  2. BCMemValue is one byte larger due to BCMemValue::Leng now being a short.

(1) can (now) be mitigated by manually setting the MAX_ALIGNMENT macro when compiling.  While the FXF infrastructure should be completely generic, it's likely that on specific systems with our specific usage a smaller MAX_ALIGNMENT value than what's computed there will work.  For example, on my Lubuntu virtual machine it seems that MAX_ALIGNMENT ≥ 8 is needed for well-definedness (according to UBSan) but even MAX_ALIGNMENT == 1 doesn't seem to cause any obvious problems in practice.  Experiments along these lines should be carefully tested before being used in specific releases.

Setting MAX_ALIGNMENT to 1 would still leave us with (2).  Memory usage is still a bit higher, but obviously much closer to the original values, and the output I see aligns with every allocation being one-byte larger than before.

@thomas-maeder
Copy link
Owner

This looks very good!
There are only 3 problems in all the *.inp files where we see a difference in the numbers produced if we set DOMEASURE:

hauntedchess.ref
323,326c323,326
<   add_to_move_generation_stack:   156050941
<                      play_move:   130481836
<  is_white_king_square_attacked:    34629453
<  is_black_king_square_attacked:   106030160
---
>   add_to_move_generation_stack:   167361760
>                      play_move:   134590349
>  is_white_king_square_attacked:    36197320
>  is_black_king_square_attacked:   110138673
nanna.ref
53,56c53,56
<   add_to_move_generation_stack:   699530479
<                      play_move:   703240780
<  is_white_king_square_attacked:    52542018
<  is_black_king_square_attacked:    26318548
---
>   add_to_move_generation_stack:   685036587
>                      play_move:   688631853
>  is_white_king_square_attacked:    55229450
>  is_black_king_square_attacked:    25636490
haan.out
301,304c301,304
<   add_to_move_generation_stack:   172423409
<                      play_move:   172423409
<  is_white_king_square_attacked:    43072625
<  is_black_king_square_attacked:   172169418
---
>   add_to_move_generation_stack:   183246656
>                      play_move:   183246656
>  is_white_king_square_attacked:    43113180
>  is_black_king_square_attacked:   182992665

It would be helpful to understand these differences before we merge the pull request.

DHT/dht.c Outdated
if (KeyV==0)
assert(key.value.object_pointer!=0); /* TODO: This assert assumes that object_pointer is the active member.
Is there a more generic test we could do? Do we need one? */
if ((ht->procs.DupKeyValue)(key.value, &KeyV.value))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.
Before, we signaled "key duplication failed" if DupKey returned 0.
Now, we signal it if DupKeyValue doesn't.
Can this be correct?

Copy link
Collaborator Author

@JoshuaGreen JoshuaGreen May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's admittedly confusing.  Previously we asserted that key is non-NULL, so now I assert that key's stored object pointer is non-NULL.  I've replaced the previous call to DupKey with a call to DupKeyValue, which is specified to return 0 on success and nonzero on failure (e.g., memory exhaustion), hence the nonzero return value corresponds to DupKey returning NULL.  (DupKeyValue is allowed [and supposed] to "duplicate" a NULL object pointer and return 0; this is needed elsewhere in this file, as discussed below.)

DHT/dht.c Outdated
{
(ht->procs.FreeKey)((dhtValue)KeyV);
(ht->procs.FreeKeyValue)(KeyV.value);
TraceText("data duplication failed\n");
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.
Before, we signaled "data duplication failed" if DupData returned 0.
Now, we signal it if it doesn't.
Can this be correct?

Copy link
Collaborator Author

@JoshuaGreen JoshuaGreen May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Analogously to above, DupData is specified to return 0 on success and nonzero on failure, so we're still checking the analogous condition.  What's special here was that previously we allowed for the original data to be NULL and (in that case) just set DataV to NULL without signaling an error, hence I had to allow the same here.  To make that work I had to require DupData to simply copy a NULL object pointer and return 0.

return 1;
while (length > ((size_t)-1))
{
if (memcmp(data1, data2, ((size_t)-1)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 3rd argument looks very strange.
Assuming that size_t is an unsigned type, this argument is very big.
Shouldn't we use the (equal) Length value of the two objects?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of my insanely pedantic portability idioms.  The loop is intended to handle the extremely unlikely case that, somehow, length > ((size_t)-1), the latter being the largest third argument that memcmp can take.  In that case, rather than have that third argument silently truncated, I loop over the buffer in chunks of length (size_t)-1 until what's left is small enough to check correctly.  I doubt this loop will ever (or even can ever) execute on a real system.

@JoshuaGreen
Copy link
Collaborator Author

@thomas-maeder:

It would be helpful to understand these differences before we merge the pull request.

Agreed, and this is what I'm currently investigating.  This is a big update, and we shouldn't push it out until all concerns are resolved.  Thanks for taking a first pass.

@thomas-maeder
Copy link
Owner

I have reduced the haan chess example that produces a difference and ended up with this:

anfang
forderung ser-#4
bedingung HaanerSchach loch b1
option Zugnummer
steine weiss kb3 16g6
schwarz ka1
endeproblem

If I test this with -maxmem 15k, with DOMEASURE activated, I notice that the executable from this pull request prints higher numbers than the executable from branch develop.

The reason is that compresshash() is invoked after move
#397 1939 16a5-g4 CURRMOVE_OF_PLY(nbply):13 nr_ghosts:0 current_ply:9
. The executable from branch develop doesn't invoke compresshash().

compresshash() removes some elements from the hash table, which has the effect of the moves played to differ starting at move 2039:
< 2039 Kc4-c3 CURRMOVE_OF_PLY(nbply):13 nr_ghosts:0 current_ply:8

2039 Kb5-a4 CURRMOVE_OF_PLY(nbply):22 nr_ghosts:0 current_ply:9

I have not yet understood why one executable has to compress the hash table and the other doesn't. FWIW, the values of size computed in DupBCMemValue() seem to be equal.

NB: you changed the type of Leng in DHT/dhtbcmem.h to unsigned short (from unsigned char). Do we need Leng to be longer? Could its type be uint8_t?

@JoshuaGreen
Copy link
Collaborator Author

JoshuaGreen commented Jun 1, 2024

There are a few things in this pull request that increase the sizes of allocated blocks.

  1. Forcing alignment.
  2. Bumping BCMemValue::Leng to an unsigned short.
  3. (Using a union for dhtValue could as well, though that's less likely.)

At the time I believed (1) was necessary for correctness, and implementing it did eliminate some crashes that were otherwise unexplainable.  Also, it's clear that we need to do this if we're going to pretend that FXF is a perfectly generic infrastructure.  That written, I now suspect that the crashes were at least partially due to dhtValue originally including a long double member, as long doubles have some particularly stringent alignment requirements.  Having now removed that, we can likely get by with less on many systems.  This can be experimented with by #defining MAX_ALIGNMENT at compile-time to a small power of 2.  (In my next push that will be renamed to FXF_MAX_ALIGNMENT.)  On my system the computed alignment is 16, but even setting MAX_ALIGNMENT to 1 doesn't seem to cause problems (though anything < 8 makes UBSan complain).

(2) was implemented to resolve Issue #423.  As @MuTsunTsai noted in PR #360, we could sometimes require more than 255 bytes to encode a position.  My updates to optimisations/hash.h are intended to determine the maximum we could possibly need so that the initial buffers can be large enough.  For now, it seems that an unsigned short is sufficient to store the length of any position encoding we may need.

UPDATE: I think I better understand (1) now.  The current version of FXF doesn't really enforce any particular alignment.  However, if the size of the allocation is a multiple of sizeof(void *) then the block is pulled off the high end of the allocated space, thereby providing the needed alignment as long as only sizeof(void *) alignment is needed.  In contrast, if the required size is not divisible by sizeof(void *) then the allocation is taken from the bottom end of the range, and no adjustments are made to ensure that that bottom remains aligned.  This works well enough in current practice since currently BCMemValue has no alignment requirements and, indeed, the only objects with alignment requirements are dhtElements which, conveniently, require sizeof(void *) alignment and have size that's a multiple of sizeof(void *) == 8.  When I started adjusting dhtValue I initially gave it a long double member, and that forced its alignment up to 16.  Removing that brings the required alignment back down to 8, so presumably I could go back to similar code, but we'd still be assuming that no oddly-sized blocks have any alignment requirements.  My version removes even that assumption, allowing us to, e.g., bump BCMemValue::Leng up to unsigned short in a well-defined way.

…. Also, implementing my TODO as a compile-time option. Primarily pushing now so that I can pull this into my testing infrastructure.
@JoshuaGreen
Copy link
Collaborator Author

As a sanity check, I tried running the non-lengthy regression testing with the alignment set to 1 and BCMemValue::Leng reverted back to an unsigned char.  This eliminated two of the discrepancies, but nanna.inp discrepancies remain:

nanna.ref
53,56c53,56
<   add_to_move_generation_stack:   699530479
<                      play_move:   703240780
<  is_white_king_square_attacked:    52542018
<  is_black_king_square_attacked:    26318548
---
>   add_to_move_generation_stack:   700602407
>                      play_move:   704322485
>  is_white_king_square_attacked:    52565047
>  is_black_king_square_attacked:    26347819

I'll continue digging here.

@JoshuaGreen
Copy link
Collaborator Author

JoshuaGreen commented Jun 2, 2024

This is interesting.  I:

  • set the maximum alignment to 1
  • set BCMemValue::Leng to unsigned char
  • enabled TESTHASH
  • (and eliminated the output between lines 981 and 990 of optimisations/hash.c, and similarly in mine; do we really need to keep this output around?)

I then tested each version on EXAMPLES/nanna.inp.  The first difference appears after a compression when my version has two extra free blocks of size 27; all other free counts are the same, as are all malloced counts.  The difference (though not the precise values) is the same after a call to allocDHTelement, but things look completely different before the next compression.  This will take some digging.

@JoshuaGreen
Copy link
Collaborator Author

JoshuaGreen commented Jun 3, 2024

For those who wish to help figure this out, I'd like to lay out my thoughts.  I believe that many of the observed differences can be explained by my version simply allocating larger sizes than the current version.  We can discuss the costs and benefits of doing so, but I consider that secondary (until/unless we find a bug in that particular logic, of course).  For now, the most important thing is to figure out why we see differences even when the allocations are the same size.  To simulate this in my (current) code:

  • Add -DFXF_MAX_ALIGNMENT=1 to the compile line.
  • Change BCMemValue::Leng (back) into an unsigned char (in DHT/dhtbcmem.h).

With these changes my FXF allocations should be identical (modulo which end of the overall allocation the blocks are taken from), and output should be identical.  When I do this I do see identical behavior except for EXAMPLES/nanna.inp, so understanding those differences should be the first order of business (and is what I'm working on).

…o more with this alignment information, but for now let's just be sure our assumptions hold.
…he alignment we want, taking the bottom bit should be safe.
@JoshuaGreen
Copy link
Collaborator Author

I'm ready to open this branch up to independent testing and, perhaps, eventual merging.

I restored the previous behavior of putting even allocations at the top of the range and uneven ones at the bottom.  Though the FXF code will by default ensure sufficient alignment for any use, I now allow a bit of customization that, admittedly, is very specific to Popeye's precise usage (and which the old version implicitly relied on anyway).  Specifically, I allow compile-time macros to set the alignment(s) needed by specifying the relevant types.  In the build system I've set those to types that I think are appropriate, and the remaining differences in testing are minimal.

There is the downside that it's impossible to verify that these settings are correct across all systems or that they will remain correct as the code continues to evolve.  As a mitigation, I modified the fxfAlloc interface to take the type as an argument, and in debug mode that type's alignment will be compared to the alignment FXF intends to provide.  If the needed alignment is larger than the provided alignment then a warning will be emitted.  This warning is non-fatal, but it should hopefully allow us to detect and eliminate any misalignment issues before or when they cause problems on some system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants