Feature Request: OBO Enumerator #3

Thyra · 2020-06-19T17:17:01Z

I just tried parsing the Gene Ontology which is pretty huge, and my MacBook almost had a heart attack. What do you think about adding a method that enables lazy parsing (i.e. only parse one Term at a time, whenever somebody asks for it), either as a separate method or perhaps with an option such as lazy=true? I've been using this super-simple python OBO-parser until now but now that I'm trying to package my software into a Ruby Gem, of course everything should be just Ruby. And I don't like the idea of having multiple obo_parser gem equivalents floating around or that everybody starts their own thing from scratch.
I understand this would make many of the sanity/crossref checks impossible but I only care about very specific parts of the terms anyway and would rather have it parse quickly than safely in this case.

The text was updated successfully, but these errors were encountered:

Thyra · 2020-06-20T10:31:25Z

I just noticed: I think what I'm asking for is actually not a lazy way of parsing but a transient one, where only one Stanza is kept in memory at each time. Something like iterate_over_obo(IO) that would return an Enumerator, usable like this:

iterate_over_obo(File.open("go.obo")).each do |term|
  puts term.id.value
  # ...
end

names = iterate_over_obo(File.open("go.obo")).map do |term|
  term.name.value
end

Do you think that would be a valuable addition to the gem and would you be able to implement it?

Thyra changed the title ~~Feature Request: Lazy Parsing~~ Feature Request: OBO Enumerator Jun 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: OBO Enumerator #3

Feature Request: OBO Enumerator #3

Thyra commented Jun 19, 2020

Thyra commented Jun 20, 2020

Feature Request: OBO Enumerator #3

Feature Request: OBO Enumerator #3

Comments

Thyra commented Jun 19, 2020

Thyra commented Jun 20, 2020