Debug flags #169

scemama · 2024-11-12T14:35:15Z

This PR adds --enable-debug and --enable-sanitizer to configure.ac to make many checks on the library for the github actions.
It depends on PR #168
There are still some issues to fix before it can be merged, but I created the draft so that you know that I am working on it.

This reverts commit 3638513.

scemama · 2024-12-05T13:10:18Z

All good now! Agressive compiler checks don't give any warning, and we can now easily use the sanitizer to detect many issues at runtime.

q-posev · 2024-12-05T15:19:01Z

@scemama there is a bug at the compilation step:

src/trexio.c: In function ‘trexio_string_of_error_f’:
src/trexio.c:184:16: error: ‘MAX_STRING_LENGTH’ undeclared (first use in this function)
  184 |   if (sizeCp > MAX_STRING_LENGTH) sizeCp = MAX_STRING_LENGTH;
      |                ^~~~~~~~~~~~~~~~~
src/trexio.c:184:16: note: each undeclared identifier is reported only once for each function it appears in

scemama · 2024-12-05T17:28:32Z

Sorry... it compiles now :-)

q-posev · 2024-12-06T08:51:07Z

@scemama thanks! I ran a few tests through valgrind. The C tests look fine though there is one pthread-related error reported when they are compiled with all debug and sanitizer flags on. No error reported in conventional ./configure build.

However, on the Fortran test I get the following error:
(my valgrind-libtool command is this: libtool --mode=execute valgrind - it's the recommended way to run valgrind for memory leaks)

I guess it's an artifact of the Fortran test modifications from PR #168

~/trexio $ valgrind-libtool ./tests/test_f
==27762== Memcheck, a memory error detector
==27762== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==27762== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==27762== Command: /home/q-posev/trexio/tests/.libs/test_f
==27762== 
============================================
         TREXIO VERSION STRING : 2.5.1       
         TREXIO MAJOR VERSION  :   2
         TREXIO MINOR VERSION  :   5
============================================
TREXIO_PACKAGE_VERSION : 2.5.1
TREXIO_GIT_HASH        : cd369bd1875a46e7da73457a24a8fad09d915a1e
HAVE_HDF5              : true
HDF5 library version: 1.10.7
 call test_write
 SUCCESS HAS NOT 1
 SUCCESS HAS NOT 2
 SUCCESS HAS NOT 2.1
 SUCCESS HAS NOT 2.2
 SUCCESS HAS NOT 3
 SUCCESS HAS NOT 4
 SUCCESS HAS NOT 5
 SUCCESS HAS NOT 6
 SUCCESS WRITE NUM
 SUCCESS WRITE CHARGE
 SUCCESS WRITE COORD
 SUCCESS WRITE LABEL
 SUCCESS WRITE POINT GROUP
 SUCCESS WRITE BASIS NUM
 SUCCESS WRITE INDEX
 SUCCESS WRITE INDEX TYPE
 SUCCESS WRITE AO NUM
 SUCCESS WRITE MO NUM
 SUCCESS WRITE ENERGY
 SUCCESS WRITE SPIN
 SUCCESS WRITE SPARSE
 SUCCESS WRITE SPARSE
 SUCCESS WRITE SPARSE
 SUCCESS WRITE SPARSE
 SUCCESS WRITE SPARSE
 SUCCESS WRITE DET LIST
 SUCCESS WRITE DET LIST
 SUCCESS WRITE DET LIST
 SUCCESS WRITE DET LIST
 SUCCESS WRITE DET LIST
 SUCCESS HAS 1
 SUCCESS HAS 2
 SUCCESS HAS 3
 SUCCESS HAS 4
 SUCCESS HAS 5
 SUCCESS HAS 6
 SUCCESS CLOSE
 call test_read
 SUCCESS READ NUM
 SUCCESS READ CHARGE
 SUCCESS READ COORD
 SUCCESS READ LABEL
 SUCCESS READ INDEX
 SUCCESS READ INDEX TYPE
 SUCCESS READ POINT GROUP
 SUCCESS READ SPARSE DATA
 SUCCESS READ SPARSE DATA EOF
 SUCCESS READ SPARSE SIZE
 SUCCESS GET INT64_NUM
 SUCCESS READ DET LIST
 SUCCESS READ DET NUM
 SUCCESS CONVERT DET LIST
 SUCCESS CONVERT ORB LIST
 call test_read_void
==27762== Conditional jump or move depends on uninitialised value(s)
==27762==    at 0x4C4A981: _gfortran_string_len_trim (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x10D1CE: test_read_void_ (test_f.f90:675)
==27762==    by 0x10FE4B: MAIN__ (test_f.f90:51)
==27762==    by 0x10B72E: main (test_f.f90:2)
==27762== 
==27762== Conditional jump or move depends on uninitialised value(s)
==27762==    at 0x4C4A8FD: _gfortran_string_len_trim (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x10D1CE: test_read_void_ (test_f.f90:675)
==27762==    by 0x10FE4B: MAIN__ (test_f.f90:51)
==27762==    by 0x10B72E: main (test_f.f90:2)
==27762== 
==27762== Syscall param write(buf) points to uninitialised byte(s)
==27762==    at 0x4DAD887: write (write.c:26)
==27762==    by 0x4C3BED8: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x4C445B1: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x4C36E14: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x4C39E41: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x4C3A323: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x10D1EA: test_read_void_ (test_f.f90:675)
==27762==    by 0x10FE4B: MAIN__ (test_f.f90:51)
==27762==    by 0x10B72E: main (test_f.f90:2)
==27762==  Address 0x639c798 is 40 bytes inside a block of size 512 alloc'd
==27762==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==27762==    by 0x49E0D88: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x4C44465: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x4C3B2A1: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x49DF3D1: ??? (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x400647D: call_init.part.0 (dl-init.c:70)
==27762==    by 0x4006567: call_init (dl-init.c:33)
==27762==    by 0x4006567: _dl_init (dl-init.c:117)
==27762==    by 0x40202C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
==27762== 
 Test error message: Error opening file�������G���������
 call test_write('test_write_f.h5', TREXIO_HDF5)
 SUCCESS HAS NOT 1
 SUCCESS HAS NOT 2
 SUCCESS HAS NOT 2.1
 SUCCESS HAS NOT 2.2
 SUCCESS HAS NOT 3
 SUCCESS HAS NOT 4
 SUCCESS HAS NOT 5
 SUCCESS HAS NOT 6
 SUCCESS WRITE NUM
 SUCCESS WRITE CHARGE
 SUCCESS WRITE COORD
 SUCCESS WRITE LABEL
 SUCCESS WRITE POINT GROUP
 SUCCESS WRITE BASIS NUM
 SUCCESS WRITE INDEX
 SUCCESS WRITE INDEX TYPE
 SUCCESS WRITE AO NUM
 SUCCESS WRITE MO NUM
 SUCCESS WRITE ENERGY
 SUCCESS WRITE SPIN
 SUCCESS WRITE SPARSE
 SUCCESS WRITE SPARSE
 SUCCESS WRITE SPARSE
 SUCCESS WRITE SPARSE
 SUCCESS WRITE SPARSE
 SUCCESS WRITE DET LIST
 SUCCESS WRITE DET LIST
 SUCCESS WRITE DET LIST
 SUCCESS WRITE DET LIST
 SUCCESS WRITE DET LIST
 SUCCESS HAS 1
 SUCCESS HAS 2
 SUCCESS HAS 3
 SUCCESS HAS 4
 SUCCESS HAS 5
 SUCCESS HAS 6
 SUCCESS CLOSE
 call test_read('test_write_f2.h5', TREXIO_HDF5)
 SUCCESS READ NUM
 SUCCESS READ CHARGE
 SUCCESS READ COORD
 SUCCESS READ LABEL
 SUCCESS READ INDEX
 SUCCESS READ INDEX TYPE
 SUCCESS READ POINT GROUP
 SUCCESS READ SPARSE DATA
 SUCCESS READ SPARSE DATA EOF
 SUCCESS READ SPARSE SIZE
 SUCCESS GET INT64_NUM
 SUCCESS READ DET LIST
 SUCCESS READ DET NUM
 SUCCESS CONVERT DET LIST
 SUCCESS CONVERT ORB LIST
==27762== Conditional jump or move depends on uninitialised value(s)
==27762==    at 0x4C4A981: _gfortran_string_len_trim (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x10D1CE: test_read_void_ (test_f.f90:675)
==27762==    by 0x10B72E: main (test_f.f90:2)
==27762== 
==27762== Conditional jump or move depends on uninitialised value(s)
==27762==    at 0x4C4A8FD: _gfortran_string_len_trim (in /usr/lib/x86_64-linux-gnu/libgfortran.so.5.0.0)
==27762==    by 0x10D1CE: test_read_void_ (test_f.f90:675)
==27762==    by 0x10B72E: main (test_f.f90:2)
==27762== 
 Test error message: Error opening file             test_write_f2.dir                                               ��>
==27762== 
==27762== HEAP SUMMARY:
==27762==     in use at exit: 1,864 bytes in 3 blocks
==27762==   total heap usage: 7,383 allocs, 7,380 frees, 5,647,684 bytes allocated
==27762== 
==27762== LEAK SUMMARY:
==27762==    definitely lost: 0 bytes in 0 blocks
==27762==    indirectly lost: 0 bytes in 0 blocks
==27762==      possibly lost: 0 bytes in 0 blocks
==27762==    still reachable: 1,864 bytes in 3 blocks
==27762==         suppressed: 0 bytes in 0 blocks
==27762== Rerun with --leak-check=full to see details of leaked memory
==27762== 
==27762== Use --track-origins=yes to see where uninitialised values come from
==27762== For lists of detected and suppressed errors, rerun with: -s
==27762== ERROR SUMMARY: 6 errors from 5 contexts (suppressed: 0 from 0)

scemama · 2024-12-06T10:17:30Z

In this statement:

  call trexio_string_of_error(rc, str)
  print *, 'Test error message: ', trim(str)

the string returned by trexio_string_of_error was shorter than the max size of str. The trim function scans it completely, so as str was not initialized (such as str = '') trim looked at uninitialized values.

I fixed it in the tests by initializing str to ''.
But now I am fixing it in the library to avoid expecting initialized string. Wait a minute before merging.

scemama · 2024-12-06T10:20:41Z

Fixed! You can merge now :-)

q-posev · 2024-12-06T10:37:57Z

Thank you @scemama ! It's interesting that I haven't seen this bug before, I used to run valgrind on the Fortran test too and it was clean. Perhaps I need to add valgrind calls to the CI.

q-posev · 2024-12-06T10:40:45Z

If that's OK with you, I prefer to fix the Python determinant tests first in PR #168 and then merge this PR. Nothing to do on your side, I will update this branch when it's done.
I hope to get some time over Christmas to fix the Python tests.

scemama · 2024-12-06T10:42:12Z

Perfect!

scemama · 2024-12-30T02:13:55Z

While I was improving the rust binding, the tests I made for determinants were not working with the text backend on the master branch. This was due to the too small representation of integers in the files, which is fixed in this PR. I think that this fix is an important one, but it breaks backward-compatibility of the text backend. When we merge, we may need to set the version to 3.0.0.

q-posev · 2024-12-30T14:11:38Z

Are you talking about the fix from the PR #168? If yes, then it does not require an update of the major version as the TREXIO API remains unchanged. It is an important bug fix of the determinant IO in the text backend, which can be reflected in the minor version bump, but i am not convinced that the API compatibility is violated.

scemama · 2024-12-30T16:30:26Z

No, I am talking about the current PR. This particular commit: 0867434
where we have things like this:

-  uint64_t line_length = dims[1]*11UL + 1UL; // 10 digits per int64_t bitfield + 1 space = 11 spots + 1 newline char
+  uint64_t line_length = dims[1]*21UL + 1UL; // 20 digits per int64_t bitfield + 1 space = 11 spots + 1 newline char

You are right that the API is unchanged, maybe we should not change the major version.
But the old text files will not be readable anymore. TREXIO will produce an error, but the produced files are very likely to be wrong anyway....

I think it is a bit urgent to fix this particular bug in the master branch. Maybe we can create another PR with only this commit to merge it quickly. What do you think?

I think that the current PR is also important: when I tried to run the rust interface with the TREXIO of this particular branch, I had many errors detected at runtime that were silent before (some safe functions were not really safe...). It helped me fix some silent bugs in the rust interface! :-)

q-posev · 2025-01-02T10:46:34Z

The commit you mentioned was introduced in PR #168 and then appeared here after you forked the branch. If merging this bug fix it is urgent for you - I can merge PR #168 as it is but the python tests will remain broken until i find some time to fix them. Will it work for you?
This PR #169 introduced a lot of changes unrelated to the determinant IO and i am not convinced yet that we need all of them (i know that these changes make the compiler happy). I prefer to have a detailed look at these changes before merging them, if that's ok with you. But this might take time, given my current workload.
The safe functions have been originally introduced as a dummy proxy for the Python SWIG interface. I don't know anyone who uses them directly. On the C side there is no guarantee of the safety anyways because one might accidentally pass a pointer to a shifted memory address (e.g. following some pointer arithmetic) and the size-max argument is completely disconnected from that passed pointer. But I am absolutely happy to see these improvements, especially if they reinforce the Rust interface! :-)

scemama · 2025-01-02T13:35:32Z

The commit you mentioned was introduced in PR #168 and then appeared here after you forked the branch. If merging this bug fix it is urgent for you - I can merge PR #168 as it is but the python tests will remain broken until i find some time to fix them. Will it work for you?

This is a good idea! Can you comment out the python tests that are broken so that we can get a green CI?

This PR #169 introduced a lot of changes unrelated to the determinant IO and i am not convinced yet that we need all of them (i know that these changes make the compiler happy).

It is not only that they make the compiler happy, it is that they enable the possibility to use the adress sanitizer and some more agressive checking in the CI. So it will help keep the code clean in the long term.
I understand that this PR is big. Take the time you need to look at it carefully instead of merging in a rush ;-)

The safe functions have been originally introduced as a dummy proxy for the Python SWIG interface. I don't know anyone who uses them directly.

In the foreign interfaces, I always use the safe functions. Also, it is possible that I use them in some QP plugins. I agree with you that they are not 100% safe, but they are as safe as the safe variants of the dangerous C functions (like strnlen, etc..).

q-posev

@scemama I am done with my review. I fixed the data corruption issue reported by @sheepforce, cleaned up some tests and addded the valgrind checks to the CI.

We should fix the TREXIO exit codes decoding function before merging this PR.

src/templates_front/templator_front.org

src/templates_text/templator_text.org

src/templates_front/templator_front.org

scemama · 2025-01-08T14:20:01Z

We should fix the TREXIO exit codes decoding function before merging this PR.

Done!

q-posev

Thanks @scemama ! Ready to merge?

scemama · 2025-01-09T10:14:19Z

@q-posev Thanks for the review!

scemama added 11 commits November 11, 2024 17:26

Removed many warnings + added more checks

cda0ad7

More checks

d14b98b

No warnings in trexio.c

debee61

Merge branch 'det-checks' into debug_flags

4bd1474

Merge branch 'det-checks' into debug_flags

d9e56c5

Added sanitizer flags for fortran

fad2df8

Added missing prototypes in the text backend

38a4b60

Fixed unused parameters in hdf5

8d534d8

Added download link of tar.gz in documentation

1a07e65

Merge branch 'master' into debug_flags

e5a3061

Fixed bug for arrays of strings in text backend

66bb1e9

scemama marked this pull request as ready for review December 5, 2024 12:10

scemama added 4 commits December 5, 2024 13:23

Fixed strncpy

3638513

Revert "Fixed strncpy"

770b136

This reverts commit 3638513.

Replaced some strnlen by memcpy

a5d39ac

Removed strnlen

72619db

q-posev self-requested a review December 5, 2024 15:24

Fix previous commit

cd369bd

scemama force-pushed the debug_flags branch from 4306e42 to cd369bd Compare December 5, 2024 17:24

Fixed valgrind test_f.f90

c7d5d42

Fixed valgrind issue with trim

751b3c9

scemama added 3 commits December 30, 2024 03:20

Fixed previous commit

8cbc71e

Merge branch 'master' of github.com:TREX-CoE/trexio

156f09f

Merge branch 'master' into debug_flags

deb0e65

q-posev and others added 12 commits January 2, 2025 22:19

Merge branch 'master' into debug_flags

886561a

Add hardcore valgring checking

0fd9779

No need for -g in FCFLAGS

d804b14

Refactored CI: first do valgrind and then debug configure

1513eec

Fix CI

a9cdc27

Fix CI

0533cc6

Fix CI

68d6780

Install valgrind for the CI

a7cf055

Silence prints in tests

22be7fa

Remove TODO - the error handling is manual

155d695

Remove useless n_chunks

b62ddcb

Implement a check to prevent data corruption for sparse I/O

fb4e234

q-posev requested changes Jan 3, 2025

View reviewed changes

src/templates_front/templator_front.org Show resolved Hide resolved

src/templates_text/templator_text.org Outdated Show resolved Hide resolved

src/templates_front/templator_front.org Outdated Show resolved Hide resolved

scemama added 3 commits January 8, 2025 15:01

+2 -> +1

fd1e23a

TREXIO exit codes decoding function

a889bb8

Update TREXIO exit codes decoding function

ff6e556

q-posev self-requested a review January 8, 2025 17:12

q-posev approved these changes Jan 8, 2025

View reviewed changes

scemama merged commit 173fc3b into master Jan 9, 2025
4 checks passed

q-posev deleted the debug_flags branch January 9, 2025 12:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debug flags #169

Debug flags #169

scemama commented Nov 12, 2024 •

edited

Loading

scemama commented Dec 5, 2024

q-posev commented Dec 5, 2024

scemama commented Dec 5, 2024

q-posev commented Dec 6, 2024

scemama commented Dec 6, 2024

scemama commented Dec 6, 2024

q-posev commented Dec 6, 2024

q-posev commented Dec 6, 2024

scemama commented Dec 6, 2024

scemama commented Dec 30, 2024

q-posev commented Dec 30, 2024

scemama commented Dec 30, 2024

q-posev commented Jan 2, 2025

scemama commented Jan 2, 2025

q-posev left a comment •

edited

Loading

scemama commented Jan 8, 2025

q-posev left a comment

scemama commented Jan 9, 2025

Debug flags #169

Debug flags #169

Conversation

scemama commented Nov 12, 2024 • edited Loading

scemama commented Dec 5, 2024

q-posev commented Dec 5, 2024

scemama commented Dec 5, 2024

q-posev commented Dec 6, 2024

scemama commented Dec 6, 2024

scemama commented Dec 6, 2024

q-posev commented Dec 6, 2024

q-posev commented Dec 6, 2024

scemama commented Dec 6, 2024

scemama commented Dec 30, 2024

q-posev commented Dec 30, 2024

scemama commented Dec 30, 2024

q-posev commented Jan 2, 2025

scemama commented Jan 2, 2025

q-posev left a comment • edited Loading

Choose a reason for hiding this comment

scemama commented Jan 8, 2025

q-posev left a comment

Choose a reason for hiding this comment

scemama commented Jan 9, 2025

scemama commented Nov 12, 2024 •

edited

Loading

q-posev left a comment •

edited

Loading