Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when Out records self-update on a Raspberry Pi #153

Closed
codedump opened this issue Mar 11, 2024 · 12 comments
Closed

Bug when Out records self-update on a Raspberry Pi #153

codedump opened this issue Mar 11, 2024 · 12 comments

Comments

@codedump
Copy link

codedump commented Mar 11, 2024

Hello,

there appears to be a bug when using pythonSoftIOC on a Raspberry Pi.

Try out the "hello world" IOC example from the documentation (reproduced below without all the comments and without the aIn part, for better overview); then modify the on_update=... parameter to self-update the aOut record.

This is something that is necessary, for instance, to implement specific EPICS motor fields, e.g. HOMF, which are expected to update their own value back to 0 once they're done.

Here is the IOC in question:

from softioc import softioc, builder
import cothread
builder.SetDeviceName("MY-DEVICE-PREFIX")
ao = builder.aOut('AO', initial_value=12.45, always_update=True, on_update=lambda v: ao.set(7.62))
builder.LoadDatabase()
softioc.iocInit()
softioc.interactive_ioc(globals())

Then trying to actually trigger the self-update, by simply just writing a value -- any value:

$ caput MY-DEVICE-PREFIX:AO 3.14

This is how it fails on a Raspberry Pi v4, with fairly recent Debian 12.1 (I think it's Bookworm):

INFO: PVXS QSRV2 is loaded, permitted, and ENABLED.
Starting iocInit
############################################################################
## EPICS 7.0.7.0
## Rev. 7.0.7.99.0.2
## Rev. Date 7.0.7.99.0.2
############################################################################
iocRun: All initialization complete
Python 3.11.2 (main, Mar 13 2023, 12:18:29) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> Asynchronous callback raised uncaught exception
Traceback (most recent call last):
  File "/home/specuser/fifer-venv/lib/python3.11/site-packages/cothread/cothread.py", line 964, in callback_events
    action(*args)
  File "/home/specuser/fifer-venv/lib/python3.11/site-packages/softioc/cothread_dispatcher.py", line 29, in wrapper
    func(*func_args)
  File "/home/specuser/tmp/./foo.py", line 6, in <lambda>
    on_update=lambda v: ao.set(7.62))
                        ^^^^^^^^^^^^
  File "/home/specuser/fifer-venv/lib/python3.11/site-packages/softioc/device.py", line 264, in set
    db_put_field(_record.NAME, dbf_code, data, length)
  File "/home/specuser/fifer-venv/lib/python3.11/site-packages/softioc/imports.py", line 24, in db_put_field
    return _extension.db_put_field(name, dbr_type, pbuffer, length)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OverflowError: Python int too large to convert to C ssize_t

The same IOC code works on as expected an x86 based computer.

How to fix this?

Cheers,
F.

PS: sorry, the error message somehow got botched while copy-pasting. It's fixed now.

@Araneidae
Copy link
Collaborator

I think this might be a bug here: https://github.com/dls-controls/pythonSoftIOC/blob/f2330b6464856d0d05127a79c84ad166a0d47d2d/softioc/extension.c#L103
Specifically, the "n" argument causes pbuffer in the original call to be parsed as an ssize_t, but here we're actually passing a void * ... which ctypes will have passed to us as an integer.

Unfortunately there doesn't seem to be an intptr_t option, so I'm not sure what the correct fix is here, and there isn't a size_t option either. It would be interesting to know the actual value (in hex, please!) of the pbuffer being passed into imports.db_put_field; I'm assuming it's a 64-bit value with the top bit set being passed as an unsigned integer (am a bit surprised to see the top bit set in userspace code, so maybe something else is going on?)

The buffer being passed is constructed as

ctypes.addressof(ctypes.c_double(7.62))

This is a (large) integer that we somehow need to pass to extension.db_put_field as a pointer. Can you print the hex values you get on your system and compare?

I would expect you to be able to reproduce the problem even more simply: does calling ao.set(7.62) straight after .iocInit() also fail in the same way?

@Araneidae
Copy link
Collaborator

A closer look at how things are implemented suggests a possible fix, but we'll need to be able to reproduce the problem. Can you be very specific about the target system where you have the bug, as there are a lot of versions of the RPi around now. The output of uname -a would probably be quite helpful.

Looking at the sources for ctypes I see that addressof is implemented thus: https://github.com/python/cpython/blob/d308d33e098d8e176f1e5169225d3cf800ed6aa1/Modules/_ctypes/callproc.c#L1784

return PyLong_FromVoidPtr(((CDataObject *)obj)->b_ptr);

and according to the documentation we can call PyLong_AsVoidPtr() on this to get the origin value back.

@Araneidae
Copy link
Collaborator

This might do the trick:

diff --git a/softioc/extension.c b/softioc/extension.c
index fd71687..d14feac 100644
--- a/softioc/extension.c
+++ b/softioc/extension.c
@@ -98,9 +98,12 @@ static PyObject *db_put_field(PyObject *self, PyObject *args)
 {
     const char *name;
     short dbrType;
-    void *pbuffer;
+    PyObject *buffer_ptr;
     long length;
-    if (!PyArg_ParseTuple(args, "shnl", &name, &dbrType, &pbuffer, &length))
+    if (!PyArg_ParseTuple(args, "shOl", &name, &dbrType, &buffer_ptr, &length))
+        return NULL;
+    void *pbuffer = PyLong_AsVoidPtr(buffer_ptr);
+    if (!pbuffer)
         return NULL;
 
     struct dbAddr dbAddr;
@@ -133,9 +136,12 @@ static PyObject *db_get_field(PyObject *self, PyObject *args)
 {
     const char *name;
     short dbrType;
-    void *pbuffer;
+    PyObject *buffer_ptr;
     long length;
-    if (!PyArg_ParseTuple(args, "shnl", &name, &dbrType, &pbuffer, &length))
+    if (!PyArg_ParseTuple(args, "shOl", &name, &dbrType, &buffer_ptr, &length))
+        return NULL;
+    void *pbuffer = PyLong_AsVoidPtr(buffer_ptr);
+    if (!pbuffer)
         return NULL;
 
     struct dbAddr dbAddr;

@AlexanderWells-diamond , can we look into whether we can reproduce this issue? I think the fix above is probably the right way to do things, but it'll need some checking.

@codedump
Copy link
Author

A closer look at how things are implemented suggests a possible fix, but we'll need to be able to reproduce the problem. Can you be very specific about the target system where you have the bug, as there are a lot of versions of the RPi around now.

As I said in the bug report, it's a Raspberry Pi v4 with a Debian 12.1.

The output of uname -a would probably be quite helpful.

Linux fifer 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr  3 17:24:16 BST 2023 aarch64 GNU/Linux

The bug is also triggered by a ao.set(...) right after iocInit(), btw, and also by any other EPICS record type (I expressly tried stringOut, longOut and boolOut.

Everything works as expected when using In record types, e.g. boolIn or longIn, BTW.

@codedump
Copy link
Author

[...]
Unfortunately there doesn't seem to be an intptr_t option, so I'm not sure what the correct fix is here, and there isn't a size_t option either. It would be interesting to know the actual value (in hex, please!) of the pbuffer being passed into imports.db_put_field;

How would I provide that to you? Is there any way to access that from Python? Or do I need to build from source and printf() all over the place?

I'm assuming it's a 64-bit value with the top bit set being passed as an unsigned integer (am a bit surprised to see the top bit set in userspace code, so maybe something else is going on?)

The buffer being passed is constructed as

ctypes.addressof(ctypes.c_double(7.62))

This is a (large) integer that we somehow need to pass to extension.db_put_field as a pointer. Can you print the hex values you get on your system and compare?

This is on a x86 system:

>>> hex(ctypes.addressof(ctypes.c_double(7.62)))
'0x7f687cd8dc20'

And this on the Raspi:

hex(ctypes.addressof(ctypes.c_double(7.62)))
'0xf74eeaf8'

I would expect you to be able to reproduce the problem even more simply: does calling ao.set(7.62) straight after .iocInit() also fail in the same way?

Yes, it fails the same.

But keep in mind (see my other post) that this only fails for ...Out type of records, and it does fail for all of them as far as I can tell.

I don't know about pythonSoftIOC internals, but db_put_field doesn't sound like it's specific only to Out records, is it?

@codedump
Copy link
Author

codedump commented Mar 12, 2024

This might do the trick: [...]

I've patched pythonSoftIOC by hand and gave it a pip install -e .... Apparently your patch does indeed work. This is the Python script I'm using to test:

from softioc import softioc, builder
import cothread
builder.SetDeviceName("MY-DEVICE-PREFIX")
ao = builder.longOut('AO', initial_value=0)
builder.LoadDatabase()
softioc.iocInit()
ao.set(1)
softioc.interactive_ioc(globals()) 

Again, the same script failed previously.

Thanks!

@Araneidae
Copy link
Collaborator

That printf was probably enough information. Something very odd here ... I was expecting a 64-bit address! I definitely don't understand why that doesn't fit into a ssize_t. One wild idea ... is it possible that your Python is somehow using 32-bit addresses on a 64-bit system? I'm thinking of something like the rather weird x32 libraries ... but I'm pretty sure that's dead now and never made it onto aarch64.

Anyhow, it's excellent news that my fix works for you. We'll apply it our end and run through our regression tests, and I expect it can be a general fix.

Araneidae added a commit that referenced this issue Mar 12, 2024
As reported in issue #153 there appears to be a problem in converting a
Python integer generated by ctypes.address() back to a void* pointer when
naively using PyArg_ParseTuple.  Don't fully understand why this issue
arises, but the fix is straightforward.

Fixes issue #153
@Araneidae
Copy link
Collaborator

Fixed in merge #154

@Araneidae
Copy link
Collaborator

I don't know about pythonSoftIOC internals, but db_put_field doesn't sound like it's specific only to Out records, is it?

Actually, yes it is! The difference is that .set() on In records is part of the normal driver -> EPICS processing flow and so is forwarded through normal record processing, whereas on Out records the updated value has to be effectively injected into the EPICS processing layer through a low level API: db_put_field. The difference is that In records carry content from the IOC to the outside, and so the IOC is in charge, whereas for Out records the flow of content is "pushed" from outside into the IOC. The In/Out naming convention is back to front.

@EmilioPeJu
Copy link
Contributor

EmilioPeJu commented Mar 12, 2024

Hi @codedump, apparently you are using an Arch64 ABI called ILP32 in which pointers are 4 bytes, Could you please confirm that is the case? you can do that by building a small c program that prints the value of sizeof(void *), additionally, you could also check the size of ssize_t to know if the problem was about size mismatching or about unsigned/signed conversion

@codedump
Copy link
Author

codedump commented Mar 20, 2024 via email

@Araneidae
Copy link
Collaborator

Well, I'm very glad we caught this tricksy bug, and am happy with the fix .. so thanks for trying this on a weird system!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants