Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decode_fast fails with RS(255,247) with 4 errors, when it should not #5

Open
jason-s opened this issue Jun 10, 2018 · 5 comments
Open

Comments

@jason-s
Copy link

jason-s commented Jun 10, 2018

This is with unireedsolomon 1.0 via pip install unireedsolomon. The decode_fast() function seems to fail when it shouldn't. I've gotten failures with 4 errors for (255,247) and 8 errors for (255,239).

(And for the record, it doesn't seem faster than decode() at least not for (255,247) and (255,239). edit: oh, never mind, I wasn't timing it carefully enough. Just ran it on (255,191) with 32 errors and for 500 iterations it took 8 seconds for decode_fast() and 36 seconds for decode(). For (255,223) with 16 errors: 500 iterations took 4 seconds for decode_fast() and about 8 seconds for decode(). For (255,239) with 8 errors: 500 iterations took 1.9 seconds for decode_fast()and 2.9 seconds fordecode()`)

Python code:

import unireedsolomon as rs

decoder = rs.RSCoder(255,247,generator=2,prim=0x11d,fcr=0,c_exp=8)
e = [0]*255
e[68] = 4
e[150] = 128
e[181] = 1
e[184] = 16
print "Using decoder.decode():"
print decoder.decode(e)
print "Using decoder.decode_fast():"
print decoder.decode_fast(e)

Result:

Using decoder.decode():
('', '\x00')
Using decoder.decode_fast():
---------------------------------------------------------------------------
RSCodecError                              Traceback (most recent call last)
<ipython-input-9-d8a8a6fc239f> in <module>()
     11 print decoder.decode(e)
     12 print "Using decoder.decode_fast():"
---> 13 print decoder.decode_fast(e)

/Users/jmsachs/anaconda/lib/python2.7/site-packages/unireedsolomon/rs.pyc in decode_fast(self, r, nostrip, k, erasures_pos, only_erasures, return_string)
    439         # being the rightmost byte
    440         # X is a corresponding array of GF(2^8) values where X_i = alpha^(j_i)
--> 441         X, j = self._chien_search_faster(sigma)
    442 
    443         # Sanity check: Cannot guarantee correct decoding of more than n-k errata (Singleton Bound, n-k being the minimum distance), and we cannot even check if it's correct (the syndrome will always be all 0 if we try to decode above the bound), thus it's better to just return the input as-is.

/Users/jmsachs/anaconda/lib/python2.7/site-packages/unireedsolomon/rs.pyc in _chien_search_faster(self, sigma)
    884         if len(j) != errs_nb:
    885             # Note: decoding messages+ecc with length n > self.gf2_charac does work partially, but it's wrong, because you will get duplicated values, and then Chien Search cannot discriminate which root is correct and which is not. The duplication of values is normally prevented by the prime polynomial reduction when generating the field (see init_lut() in ff.py), but if you overflow the field, you have no guarantee anymore. We may try to use a bruteforce approach: the correct positions ARE in the final array j, but the problem is because we are above the Galois Field's range, there is a wraparound because of overflow so that for example if j should be [0, 1, 2, 3], we will also get [255, 256, 257, 258] (because 258 % 255 == 3, same for the other values), so we can't discriminate. The issue with that bruteforce approach is that fixing any errs_nb errors among those will always give a correct output message (in the sense that the syndrome will be all 0), so we may not even be able to check if that's correct or not, so there's clearly no way to decode a message of greater length than the field.
--> 886             raise RSCodecError("Too many (or few) errors found by Chien Search for the errata locator polynomial!")
    887 
    888         return X, j

RSCodecError: Too many (or few) errors found by Chien Search for the errata locator polynomial!
@jason-s
Copy link
Author

jason-s commented Jun 10, 2018

Another test case that also fails: (255,251) with default args and two errors

import unireedsolomon as rs

decoder = rs.RSCoder(255,251)
e = [0]*255
e[71] = 16
e[141] = 32
print "Using decoder.decode():"
print decoder.decode(e)
print "Using decoder.decode_fast():"
print decoder.decode_fast(e)

I can't seem to find any examples of (255,253) with one error failing.

@lrq3000
Copy link
Owner

lrq3000 commented Jun 11, 2018

Thank you for opening this issue, but I'm not sure I understand what you are trying to do.

Here the message you build is not encoded. Is the message you build something you already calculated by yourself beforehand?

In any case, it may well be that decode_fast() fails because the chien search used (chien_search_faster()) is a very specifically tailored function with tricks to speed up calculations by avoiding calculations on non interesting coefficients. I don't remember if I invented this trick or if I read it somewhere, but skipping coefficients is probably not a foolproof way to go about it, so you might try to change _chien_search_faster() by _chien_search_fast() in the decode_fast() method, this should be more reliable at the expense of some speed loss.

Also note that the library is extensively unit tested (see here: https://github.com/lrq3000/unireedsolomon/blob/master/unireedsolomon/tests/test_rs.py#L319), but it's true that your decoding case.

@jason-s
Copy link
Author

jason-s commented Jun 11, 2018

No, you're missing the point. The encoded message is the all-zero codeword (255 zeros) and these are examples of received messages with 2 errors for RS(255,251) and 4 errors for RS(255,247) that are decoded correctly (as they should) by decode(), but incorrectly by decode_fast(). Therefore this is a bug, and decode_fast() is an incorrect implementation of a Reed-Solomon decoder. It may work in most cases, but it fails in these.

This represents an error pattern that can be added to any message that should therefore fail decoding. For example:

import unireedsolomon as rs

coder = rs.RSCoder(255,251)
message = """
I do not like them
In a house.
I do not like them
With a mouse.
I do not like them
Here or there.
I do not like them
Anywhere.
I do not like green eggs and ham.
I do not like them, Sam-I-am.
"""
print "Correctly encoded message:"
encmsg = coder.encode(message)
print repr(encmsg)
encmsg_bytes = [ord(c) for c in encmsg]
encmsg_bytes[71] ^= 16
encmsg_bytes[141] ^= 32
recvmsg = ''.join(chr(b) for b in encmsg_bytes)
print "Received message:"
print repr(recvmsg)
print "Using decoder.decode():"
print decoder.decode(recvmsg)
print "Using decoder.decode_fast():"
print decoder.decode_fast(recvmsg)

which prints (note the errors of I do not like themI do not li{e them and I do not like themI do not like theM)

Correctly encoded message:
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\nI do not like them\nIn a house.\nI do not like them\nWith a mouse.\nI do not like them\nHere or there.\nI do not like them\nAnywhere.\nI do not like green eggs and ham.\nI do not like them, Sam-I-am.\n\x92O\x00O'
Received message:
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\nI do not li{e them\nIn a house.\nI do not like them\nWith a mouse.\nI do not like theM\nHere or there.\nI do not like them\nAnywhere.\nI do not like green eggs and ham.\nI do not like them, Sam-I-am.\n\x92O\x00O'
Using decoder.decode():
('\nI do not like them\nIn a house.\nI do not like them\nWith a mouse.\nI do not like them\nHere or there.\nI do not like them\nAnywhere.\nI do not like green eggs and ham.\nI do not like them, Sam-I-am.\n', '\x92O\x00O')
Using decoder.decode_fast():
---------------------------------------------------------------------------
RSCodecError                              Traceback (most recent call last)
<ipython-input-199-c74faa26c0ad> in <module>()
     26 print decoder.decode(recvmsg)
     27 print "Using decoder.decode_fast():"
---> 28 print decoder.decode_fast(recvmsg)

/Users/jmsachs/anaconda/lib/python2.7/site-packages/unireedsolomon/rs.pyc in decode_fast(self, r, nostrip, k, erasures_pos, only_erasures, return_string)
    439         # being the rightmost byte
    440         # X is a corresponding array of GF(2^8) values where X_i = alpha^(j_i)
--> 441         X, j = self._chien_search_faster(sigma)
    442 
    443         # Sanity check: Cannot guarantee correct decoding of more than n-k errata (Singleton Bound, n-k being the minimum distance), and we cannot even check if it's correct (the syndrome will always be all 0 if we try to decode above the bound), thus it's better to just return the input as-is.

/Users/jmsachs/anaconda/lib/python2.7/site-packages/unireedsolomon/rs.pyc in _chien_search_faster(self, sigma)
    884         if len(j) != errs_nb:
    885             # Note: decoding messages+ecc with length n > self.gf2_charac does work partially, but it's wrong, because you will get duplicated values, and then Chien Search cannot discriminate which root is correct and which is not. The duplication of values is normally prevented by the prime polynomial reduction when generating the field (see init_lut() in ff.py), but if you overflow the field, you have no guarantee anymore. We may try to use a bruteforce approach: the correct positions ARE in the final array j, but the problem is because we are above the Galois Field's range, there is a wraparound because of overflow so that for example if j should be [0, 1, 2, 3], we will also get [255, 256, 257, 258] (because 258 % 255 == 3, same for the other values), so we can't discriminate. The issue with that bruteforce approach is that fixing any errs_nb errors among those will always give a correct output message (in the sense that the syndrome will be all 0), so we may not even be able to check if that's correct or not, so there's clearly no way to decode a message of greater length than the field.
--> 886             raise RSCodecError("Too many (or few) errors found by Chien Search for the errata locator polynomial!")
    887 
    888         return X, j

RSCodecError: Too many (or few) errors found by Chien Search for the errata locator polynomial!

Optimization at the expense of correctness should be avoided, or at least heavily documented.

I discovered these cases while doing some statistical analysis of Reed-Solomon using Monte Carlo analysis to generate random patterns with more than t errors (where t = (n-k)/2 is the maximum number of errors that RS will always decode correctly), where RS is not guaranteed to work properly. My intent was to get an idea of how often RS succeeded, or made a bad correction, or failed during the Chien search. Just for validation purposes I verified that it corrected t errors. Using decode_fast() didn't in a few cases, so I posted it here.

I have willingness to help, and experience both with finite fields (I'm trying to finish a series of articles) and with implementing Reed-Solomon decoding myself once in Java at a former employer, but very little time unfortunately due to some issues with my family.

@jason-s
Copy link
Author

jason-s commented Jun 11, 2018

I am willing to add a unit test case that covers these and others (may need some help building/installing the package from source if it does anything besides pure Python), although please note that this would presently produce a failure.

It may take me a while to do so, however; my free time is spoken for until at least June 27.

@lrq3000
Copy link
Owner

lrq3000 commented Jun 11, 2018

I agree that optimization should not be at the cost of correctness, but please know that I tried my best to do just that, since most opensource implementations heavily rely on undocumented optimizations for very specific sets of parameters (I think this is the only one or at least one of the very few universal reed-solomon codec out there in any programming language :-) ). So I tried my best to "de-optimize" then "re-optimize" with a more generalizable code including all codec parameters where possible (eg, fcr which is usually assumed as a constant of 0 or 1).

I also lack time and knowledge (I did this as a hobby for a personal project, I'm not a professional of signal processing ;-) ), so please feel free to add a test case (for example in the unit test I pointed above, it's a cross-validation kind of test, if you add your case here then it will be tested against all implementations, fast and not fast). Following the results, I will update the package to only use what works with these cases :-)

Thank you very much @jason-s for having debugged this !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants