Feature/tighten hash usage 2 (fixing Issues #316, #423, and probably #492) #493

JoshuaGreen · 2024-05-19T23:40:08Z

This branch is an implementation of the same changes as those in feature/tighten_hash_usage but based off of develop commit #5ffb472. See PR #441 for the full explanation of what's going on here.

I believe this resolves Issues #316 and #423. Additionally, I assume it resolves Issue #492, though as per the discussion there one could take the relevant changes even further if desired.

…p commit 5ffb472

…the header file.

MuTsunTsai · 2024-05-20T03:52:00Z

I saw you left DOMEASURE on in this PR. Shouldn't it be off by default?

JoshuaGreen · 2024-05-20T10:32:18Z

@MuTsunTsai, that's set in develop commit #5ffb472, so I left it alone. It's useful for testing, and I trust @thomas-maeder to disable such settings when he builds the releases.

…herefore always decrement, just like we always increment it when allocating.

…ise usage, we may be able to work with a smaller maximum alignment than what the system would require in general. This update allows us to select such a value as a compile-time flag. This, of course, would have to be carefully tested for each such system.

JoshuaGreen · 2024-05-28T01:04:54Z

I did a bit more testing on this, and I'm fairly confident that the differences I see can be ascribed to using somewhat larger memory allocations for each object. There are two factors at play here:

Extra memory is required to enforce alignment.
BCMemValue is one byte larger due to BCMemValue::Leng now being a short.

(1) can (now) be mitigated by manually setting the MAX_ALIGNMENT macro when compiling. While the FXF infrastructure should be completely generic, it's likely that on specific systems with our specific usage a smaller MAX_ALIGNMENT value than what's computed there will work. For example, on my Lubuntu virtual machine it seems that MAX_ALIGNMENT ≥ 8 is needed for well-definedness (according to UBSan) but even MAX_ALIGNMENT == 1 doesn't seem to cause any obvious problems in practice. Experiments along these lines should be carefully tested before being used in specific releases.

Setting MAX_ALIGNMENT to 1 would still leave us with (2). Memory usage is still a bit higher, but obviously much closer to the original values, and the output I see aligns with every allocation being one-byte larger than before.

thomas-maeder · 2024-05-30T18:11:50Z

This looks very good!
There are only 3 problems in all the *.inp files where we see a difference in the numbers produced if we set DOMEASURE:

hauntedchess.ref
323,326c323,326
<   add_to_move_generation_stack:   156050941
<                      play_move:   130481836
<  is_white_king_square_attacked:    34629453
<  is_black_king_square_attacked:   106030160
---
>   add_to_move_generation_stack:   167361760
>                      play_move:   134590349
>  is_white_king_square_attacked:    36197320
>  is_black_king_square_attacked:   110138673
nanna.ref
53,56c53,56
<   add_to_move_generation_stack:   699530479
<                      play_move:   703240780
<  is_white_king_square_attacked:    52542018
<  is_black_king_square_attacked:    26318548
---
>   add_to_move_generation_stack:   685036587
>                      play_move:   688631853
>  is_white_king_square_attacked:    55229450
>  is_black_king_square_attacked:    25636490
haan.out
301,304c301,304
<   add_to_move_generation_stack:   172423409
<                      play_move:   172423409
<  is_white_king_square_attacked:    43072625
<  is_black_king_square_attacked:   172169418
---
>   add_to_move_generation_stack:   183246656
>                      play_move:   183246656
>  is_white_king_square_attacked:    43113180
>  is_black_king_square_attacked:   182992665

It would be helpful to understand these differences before we merge the pull request.

thomas-maeder · 2024-05-30T18:17:36Z

DHT/dht.c

-  if (KeyV==0)
+  assert(key.value.object_pointer!=0); /* TODO: This assert assumes that object_pointer is the active member.
+                                          Is there a more generic test we could do?  Do we need one? */
+  if ((ht->procs.DupKeyValue)(key.value, &KeyV.value))


Hmm.
Before, we signaled "key duplication failed" if DupKey returned 0.
Now, we signal it if DupKeyValue doesn't.
Can this be correct?

It's admittedly confusing. Previously we asserted that key is non-NULL, so now I assert that key's stored object pointer is non-NULL. I've replaced the previous call to DupKey with a call to DupKeyValue, which is specified to return 0 on success and nonzero on failure (e.g., memory exhaustion), hence the nonzero return value corresponds to DupKey returning NULL. (DupKeyValue is allowed [and supposed] to "duplicate" a NULL object pointer and return 0; this is needed elsewhere in this file, as discussed below.)

thomas-maeder · 2024-05-30T18:21:01Z

DHT/dht.c

  {
-    (ht->procs.FreeKey)((dhtValue)KeyV);
+    (ht->procs.FreeKeyValue)(KeyV.value);
    TraceText("data duplication failed\n");


Hmm.
Before, we signaled "data duplication failed" if DupData returned 0.
Now, we signal it if it doesn't.
Can this be correct?

Analogously to above, DupData is specified to return 0 on success and nonzero on failure, so we're still checking the analogous condition. What's special here was that previously we allowed for the original data to be NULL and (in that case) just set DataV to NULL without signaling an error, hence I had to allow the same here. To make that work I had to require DupData to simply copy a NULL object pointer and return 0.

thomas-maeder · 2024-05-30T18:28:40Z

DHT/dhtmem.c

    return 1;
+  while (length > ((size_t)-1))
+  {
+    if (memcmp(data1, data2, ((size_t)-1)))


The 3rd argument looks very strange.
Assuming that size_t is an unsigned type, this argument is very big.
Shouldn't we use the (equal) Length value of the two objects?

This is one of my insanely pedantic portability idioms. The loop is intended to handle the extremely unlikely case that, somehow, length > ((size_t)-1), the latter being the largest third argument that memcmp can take. In that case, rather than have that third argument silently truncated, I loop over the buffer in chunks of length (size_t)-1 until what's left is small enough to check correctly. I doubt this loop will ever (or even can ever) execute on a real system.

JoshuaGreen · 2024-05-30T23:58:37Z

@thomas-maeder:

It would be helpful to understand these differences before we merge the pull request.

Agreed, and this is what I'm currently investigating. This is a big update, and we shouldn't push it out until all concerns are resolved. Thanks for taking a first pass.

thomas-maeder · 2024-06-01T16:35:06Z

I have reduced the haan chess example that produces a difference and ended up with this:

anfang
forderung ser-#4
bedingung HaanerSchach loch b1
option Zugnummer
steine weiss kb3 16g6
schwarz ka1
endeproblem

If I test this with -maxmem 15k, with DOMEASURE activated, I notice that the executable from this pull request prints higher numbers than the executable from branch develop.

The reason is that compresshash() is invoked after move
#397 1939 16a5-g4 CURRMOVE_OF_PLY(nbply):13 nr_ghosts:0 current_ply:9
. The executable from branch develop doesn't invoke compresshash().

compresshash() removes some elements from the hash table, which has the effect of the moves played to differ starting at move 2039:
< 2039 Kc4-c3 CURRMOVE_OF_PLY(nbply):13 nr_ghosts:0 current_ply:8

2039 Kb5-a4 CURRMOVE_OF_PLY(nbply):22 nr_ghosts:0 current_ply:9

I have not yet understood why one executable has to compress the hash table and the other doesn't. FWIW, the values of size computed in DupBCMemValue() seem to be equal.

NB: you changed the type of Leng in DHT/dhtbcmem.h to unsigned short (from unsigned char). Do we need Leng to be longer? Could its type be uint8_t?

JoshuaGreen · 2024-06-01T18:03:23Z

There are a few things in this pull request that increase the sizes of allocated blocks.

Forcing alignment.
Bumping BCMemValue::Leng to an unsigned short.
(Using a union for dhtValue could as well, though that's less likely.)

At the time I believed (1) was necessary for correctness, and implementing it did eliminate some crashes that were otherwise unexplainable. Also, it's clear that we need to do this if we're going to pretend that FXF is a perfectly generic infrastructure. That written, I now suspect that the crashes were at least partially due to dhtValue originally including a long double member, as long doubles have some particularly stringent alignment requirements. Having now removed that, we can likely get by with less on many systems. This can be experimented with by #defining MAX_ALIGNMENT at compile-time to a small power of 2. (In my next push that will be renamed to FXF_MAX_ALIGNMENT.) On my system the computed alignment is 16, but even setting MAX_ALIGNMENT to 1 doesn't seem to cause problems (though anything < 8 makes UBSan complain).

(2) was implemented to resolve Issue #423. As @MuTsunTsai noted in PR #360, we could sometimes require more than 255 bytes to encode a position. My updates to optimisations/hash.h are intended to determine the maximum we could possibly need so that the initial buffers can be large enough. For now, it seems that an unsigned short is sufficient to store the length of any position encoding we may need.

UPDATE: I think I better understand (1) now. The current version of FXF doesn't really enforce any particular alignment. However, if the size of the allocation is a multiple of sizeof(void *) then the block is pulled off the high end of the allocated space, thereby providing the needed alignment as long as only sizeof(void *) alignment is needed. In contrast, if the required size is not divisible by sizeof(void *) then the allocation is taken from the bottom end of the range, and no adjustments are made to ensure that that bottom remains aligned. This works well enough in current practice since currently BCMemValue has no alignment requirements and, indeed, the only objects with alignment requirements are dhtElements which, conveniently, require sizeof(void *) alignment and have size that's a multiple of sizeof(void *) == 8. When I started adjusting dhtValue I initially gave it a long double member, and that forced its alignment up to 16. Removing that brings the required alignment back down to 8, so presumably I could go back to similar code, but we'd still be assuming that no oddly-sized blocks have any alignment requirements. My version removes even that assumption, allowing us to, e.g., bump BCMemValue::Leng up to unsigned short in a well-defined way.

…. Also, implementing my TODO as a compile-time option. Primarily pushing now so that I can pull this into my testing infrastructure.

JoshuaGreen · 2024-06-01T20:10:22Z

As a sanity check, I tried running the non-lengthy regression testing with the alignment set to 1 and BCMemValue::Leng reverted back to an unsigned char. This eliminated two of the discrepancies, but nanna.inp discrepancies remain:

nanna.ref
53,56c53,56
<   add_to_move_generation_stack:   699530479
<                      play_move:   703240780
<  is_white_king_square_attacked:    52542018
<  is_black_king_square_attacked:    26318548
---
>   add_to_move_generation_stack:   700602407
>                      play_move:   704322485
>  is_white_king_square_attacked:    52565047
>  is_black_king_square_attacked:    26347819

I'll continue digging here.

JoshuaGreen · 2024-06-02T18:15:06Z

This is interesting. I:

set the maximum alignment to 1
set BCMemValue::Leng to unsigned char
enabled TESTHASH
(and eliminated the output between lines 981 and 990 of optimisations/hash.c, and similarly in mine; do we really need to keep this output around?)

I then tested each version on EXAMPLES/nanna.inp. The first difference appears after a compression when my version has two extra free blocks of size 27; all other free counts are the same, as are all malloced counts. The difference (though not the precise values) is the same after a call to allocDHTelement, but things look completely different before the next compression. This will take some digging.

JoshuaGreen · 2024-06-03T00:55:51Z

For those who wish to help figure this out, I'd like to lay out my thoughts. I believe that many of the observed differences can be explained by my version simply allocating larger sizes than the current version. We can discuss the costs and benefits of doing so, but I consider that secondary (until/unless we find a bug in that particular logic, of course). For now, the most important thing is to figure out why we see differences even when the allocations are the same size. To simulate this in my (current) code:

Add -DFXF_MAX_ALIGNMENT=1 to the compile line.
Change BCMemValue::Leng (back) into an unsigned char (in DHT/dhtbcmem.h).

With these changes my FXF allocations should be identical (modulo which end of the overall allocation the blocks are taken from), and output should be identical. When I do this I do see identical behavior except for EXAMPLES/nanna.inp, so understanding those differences should be the first order of business (and is what I'm working on).

…ipulation functions.

…hange our minimum allocation).

…ended answer anyway.

…ilable.

…heck ensures that there's no overflow.

…peye specifically uses it.

…o more with this alignment information, but for now let's just be sure our assumptions hold.

…he alignment we want, taking the bottom bit should be safe.

JoshuaGreen · 2024-10-02T01:34:55Z

I'm ready to open this branch up to independent testing and, perhaps, eventual merging.

I restored the previous behavior of putting even allocations at the top of the range and uneven ones at the bottom. Though the FXF code will by default ensure sufficient alignment for any use, I now allow a bit of customization that, admittedly, is very specific to Popeye's precise usage (and which the old version implicitly relied on anyway). Specifically, I allow compile-time macros to set the alignment(s) needed by specifying the relevant types. In the build system I've set those to types that I think are appropriate, and the remaining differences in testing are minimal.

There is the downside that it's impossible to verify that these settings are correct across all systems or that they will remain correct as the code continues to evolve. As a mitigation, I modified the fxfAlloc interface to take the type as an argument, and in debug mode that type's alignment will be compared to the alignment FXF intends to provide. If the needed alignment is larger than the provided alignment then a warning will be emitted. This warning is non-fatal, but it should hopefully allow us to detect and eliminate any misalignment issues before or when they cause problems on some system.

JoshuaGreen added 8 commits May 16, 2024 20:42

Creating a new version of feature/tighten_hash_usage based off develo…

69cac7b

…p commit 5ffb472

Updating comment.

dec54dd

Correcting this comment.

cf8b6bc

No reason not to include these even though they're already needed in …

7150d8a

…the header file.

Adding a useful comment.

29bc885

Removing some unnecessary changes.

881c20c

Oops, we need to initialize the element here.

c85b0de

Switching to the idiomatic expression.

bcbbe73

JoshuaGreen changed the title ~~Feature/tighten hash usage 2~~ Feature/tighten hash usage 2 fixing Issues #316, #423, and probably #492 May 19, 2024

JoshuaGreen changed the title ~~Feature/tighten hash usage 2 fixing Issues #316, #423, and probably #492~~ Feature/tighten hash usage 2 (fixing Issues #316, #423, and probably #492) May 19, 2024

JoshuaGreen mentioned this pull request May 19, 2024

Feature/tighten hash usage (fixing Issues #316 and #423) #441

Closed

JoshuaGreen added 3 commits May 22, 2024 21:00

sh->MallocCount should be the number of allocated blocks. We should t…

b3159bd

…herefore always decrement, just like we always increment it when allocating.

Oops, forgot to update a couple of computations.

a32691c

Making the comment a bit more accurate.

6a27c3e

thomas-maeder reviewed May 30, 2024

View reviewed changes

Hoping to simplify the logic by moving the boilerplate into functions…

cc8b444

…. Also, implementing my TODO as a compile-time option. Primarily pushing now so that I can pull this into my testing infrastructure.

Using the write check, not that it should really matter in our uses.

fe2c881

JoshuaGreen added 29 commits August 14, 2024 08:15

Aiming for portability.

eda84bc

Trying to guarantee exactly what we need.

5599b89

SEGMENTED code needs to compile.

d734868

Removing some hard-coded values, hopefully correctly.

83130b2

Moving some of the debugging infrastructure inside the free store man…

e5fae8a

…ipulation functions.

Adding some compile-time checks that should eliminate the run-time ones.

644aeb2

Tightening some assertions.

a97bcc3

Noting in the other place where we may leak memory (if we happen to c…

36dbd69

…hange our minimum allocation).

Only compute that value once.

cbce3da

Macroizing more.

eaf4979

Fixing a dumb typo.

c5b57f0

Simplifying the expression. If the sum overflow, then so does the int…

1dbcabf

…ended answer anyway.

Reusing code.

a6b0b14

We have a macro for that.

fa4df87

That should really be configurable at compile-time.

e822687

If memory is leaking then treat it as still allocated rather than ava…

6c04627

…ilable.

It looks a bit better this way.

efaf8b0

Eliminating compiler warnings, because I can.

81b58bc

Shift the operation to the compile-time constant now that the outer c…

7d4c233

…heck ensures that there's no overflow.

Documenting where one of our ideas came from.

3de22b7

The comments should be based on the FXF infrastructure, not on how Po…

84ec9dc

…peye specifically uses it.

We might as well make this macro signed regardless of how it's defined.

d591331

One more sanity check.

e00318f

Adding an alignment sanity check in debug mode. Maybe someday we'll d…

f880972

…o more with this alignment information, but for now let's just be sure our assumptions hold.

Ensure we "use" alignment.

aac65c4

Some more sanity checks.

b5190e7

In the off-chance the offset we compute is a nontrivial multiple of t…

9f6cefe

…he alignment we want, taking the bottom bit should be safe.

Isolating the alignment hackery to one location.

a3d4d67

Making fxfAlloc type-safe.

06deb88

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/tighten hash usage 2 (fixing Issues #316, #423, and probably #492) #493

Feature/tighten hash usage 2 (fixing Issues #316, #423, and probably #492) #493

JoshuaGreen commented May 19, 2024

MuTsunTsai commented May 20, 2024

JoshuaGreen commented May 20, 2024

JoshuaGreen commented May 28, 2024

thomas-maeder commented May 30, 2024

thomas-maeder May 30, 2024

JoshuaGreen May 30, 2024 •

edited

Loading

thomas-maeder May 30, 2024

JoshuaGreen May 31, 2024 •

edited

Loading

thomas-maeder May 30, 2024

JoshuaGreen May 31, 2024

JoshuaGreen commented May 30, 2024

thomas-maeder commented Jun 1, 2024

JoshuaGreen commented Jun 1, 2024 •

edited

Loading

JoshuaGreen commented Jun 1, 2024

JoshuaGreen commented Jun 2, 2024 •

edited

Loading

JoshuaGreen commented Jun 3, 2024 •

edited

Loading

JoshuaGreen commented Oct 2, 2024

Feature/tighten hash usage 2 (fixing Issues #316, #423, and probably #492) #493

Are you sure you want to change the base?

Feature/tighten hash usage 2 (fixing Issues #316, #423, and probably #492) #493

Conversation

JoshuaGreen commented May 19, 2024

MuTsunTsai commented May 20, 2024

JoshuaGreen commented May 20, 2024

JoshuaGreen commented May 28, 2024

thomas-maeder commented May 30, 2024

thomas-maeder May 30, 2024

Choose a reason for hiding this comment

JoshuaGreen May 30, 2024 • edited Loading

Choose a reason for hiding this comment

thomas-maeder May 30, 2024

Choose a reason for hiding this comment

JoshuaGreen May 31, 2024 • edited Loading

Choose a reason for hiding this comment

thomas-maeder May 30, 2024

Choose a reason for hiding this comment

JoshuaGreen May 31, 2024

Choose a reason for hiding this comment

JoshuaGreen commented May 30, 2024

thomas-maeder commented Jun 1, 2024

compresshash() removes some elements from the hash table, which has the effect of the moves played to differ starting at move 2039: < 2039 Kc4-c3 CURRMOVE_OF_PLY(nbply):13 nr_ghosts:0 current_ply:8

JoshuaGreen commented Jun 1, 2024 • edited Loading

JoshuaGreen commented Jun 1, 2024

JoshuaGreen commented Jun 2, 2024 • edited Loading

JoshuaGreen commented Jun 3, 2024 • edited Loading

JoshuaGreen commented Oct 2, 2024

JoshuaGreen May 30, 2024 •

edited

Loading

JoshuaGreen May 31, 2024 •

edited

Loading

compresshash() removes some elements from the hash table, which has the effect of the moves played to differ starting at move 2039:
< 2039 Kc4-c3 CURRMOVE_OF_PLY(nbply):13 nr_ghosts:0 current_ply:8

JoshuaGreen commented Jun 1, 2024 •

edited

Loading

JoshuaGreen commented Jun 2, 2024 •

edited

Loading

JoshuaGreen commented Jun 3, 2024 •

edited

Loading