Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing newlines are not caught #33

Open
KuechlerO opened this issue Jul 29, 2023 · 3 comments
Open

Missing newlines are not caught #33

KuechlerO opened this issue Jul 29, 2023 · 3 comments

Comments

@KuechlerO
Copy link

Hey guys,
thx for this cool tool!

I just stumbled over a weird error (using a Julia-Program).

ERROR: LoadError: ArgumentError: malformed FASTQ file
Stacktrace:
[1] read!(rdr::FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{BufferedStreams.BufferedInputStream{IOStream}}}, rec::FASTX.FASTQ.Record)
 ...

Hence, I used your tool to validate my FASTQ-files
However, also your tool assured me that my FASTQ-files were valid.

After some time I figured out that a final newline character was missing in my FASTQ-files.

Thus, my suggestion for this tool is to include also a warning/error for such cases.

@zaeleus
Copy link
Contributor

zaeleus commented Jul 29, 2023

Hi @KuechlerO,

Do you mean that FASTX is not allowing a missing newline at EOF? Can you share an example of a record that is passing fq lint but failing FASTX?

@KuechlerO
Copy link
Author

KuechlerO commented Jul 30, 2023

Yes exactly, I mean that FASTX is not allowing a missing newline at EOF.

Example FASTA, that passes fq, but throws error with FASTX:

@OLI FN1:sample1:read1/1
CTGGCTTGATGGTTCTCTGGATTGGAGTCTGGCCATTGGCTGGAACGGCATCAACTTGGAAGCCAGTGATCGTCTCAGTCTTGGTTCTCCAGCTAATGGTGATGGTGGTCTCAGTAGCATCTGTC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/BF7BFFFBFFFFF<BFFFFFFB

-> Missing a newline-char at the end!

@zaeleus
Copy link
Contributor

zaeleus commented Jul 30, 2023

Text lines typically have two interpretations: they can either be terminated by a newline, as defined in POSIX, or separated by one, as in your case. Most line readers support both, including fq's FASTQ parser.

I would recommend opening an issue with FASTX.jl about this. I was unable to reproduce your error, and it seems to be out of scope of fq.

$ echo -ne "@OLI FN1:sample1:read1/1\nCTGGCTTGATGGTTCTCTGGATTGGAGTCTGGCCATTGGCTGGAACGGCATCAACTTGGAAGCCAGTGATCGTCTCAGTCTTGGTTCTCCAGCTAATGGTGATGGTGGTCTCAGTAGCATCTGTC\n+\nBBBBBFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF/BF7BFFFBFFFFF<BFFFFFFB" > in.fq

$ julia

   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.9.2 (2023-07-05)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

(@v1.9) pkg> status FASTX
Status `~/.julia/environments/v1.9/Project.toml`
  [c2308a5c] FASTX v2.1.2

julia> using FASTX

julia> reader = FASTQReader(open("in.fq"))
FASTX.FASTQ.Reader{TranscodingStreams.NoopStream{IOStream}}(TranscodingStreams.NoopStream{IOStream}(<mode=idle>), 1, 1, FASTQ.Record:
  description: ""
     sequence: ""
      quality: "", true)

julia> first(reader)
FASTQ.Record:
  description: "OLI FN1:sample1:read1/1"
     sequence: "CTGGCTTGATGGTTCTCTGGATTGGAGTCTGGCCATTGG…"
      quality: "BBBBBFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFF…"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants