-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to parse the Mainframe copybook which has a COBOL datatype of BBBB which means empty spacesc #734
Comments
Hi, Yes, 'BBBB' is something Cobrix does not support mainly because we are not sure at the moment how to properly handle it. Does it work if you remove 'BBBB'? Does it produce the expected output in this case? |
Hi @yruslan , Thank you so much for your response! As adviced, I will try once by removing the 'BBBB' from my Copybook file , rerun the Cobrix program and get will back to you asap. Thank you |
Hi @yruslan , One query, could you advice on what could be a replacement for 'BBBB', I mean, is there any other Cobol datatype definition that could be analogous to the use-case of 'BBBB' and works with Cobrix too? Please note, I am yet to try out your advice on removing the 'BBBB' and give a try. Sorry for the delay, will get back on that asap! Thank you |
Hi @suryagits, Since 'B' means just inserting spaces in the data representation of the number, and because Cobrix converts numbers to Spark native binary formats, 'B' should not need a replacement. We may eventually implement it so Cobrix ignores all 'B' in numbers. We haven't done it yet since we haven't encountered such PICs in our organization so we can't confirm that ignoring 'B's would be an expected behavior. Once you confirm that removing 'B's from PICs produces correct output in numeric fields we are going to implement the support 'B's natively. |
Hi @yruslan , Given a copybook as below : 20 col_a
So to answer your query, the output post removing BBBB did help and we are able to proceed further. Thank you so much! Based on the above observations, request you to kindly let us know if you will be adding support for the B's in future Cobrix releases so that we are aligned to it. Thank you |
Hi @suryagits, thanks for the detailed description! It is very helpful. Yes, I think the support for 'B's can be added to Cobtix eventually. Let's keep this issue open. Just a couple of more questions in order to understand how Cobrix should interpret BBBs.
|
Describe the bug
We are using CoBrix with PySpark and executing it on AWS EMR.
We have the EBCDIC file and it's corresponding copybook in the AWS S3 bucket. While trying to parse the EBCDIC file using the Copybook, we are getting an error.
Error message :
py4j.protocol.Py4jJavaError : An error occurred while calling o2021.loa : za.co.absa.cobrix.cobol.parser.exceptions.SyntaxErrorException : Syntax error in the copybook at line 29 : Invalid input 'BBBB' at position 29:45
Code snippet that caused the issue
Expected behavior
We expected the Cobrix to successfully parse the EBCDIC file record column using the Cobybook which has this datatype of 'BBBB'
Context
PySpark Jar dependencies :
Copybook (if possible)
Attach a small data file that can help reproduce the issue, if possible : Need to check the feasibility due to confidentiality of the data. Will get back.
The text was updated successfully, but these errors were encountered: