IO optimization #164

keflavich · 2019-04-23T19:24:08Z

This is a WIP to speed up the reading and parsing of dendrograms from disk.

One key change in speeding up the parsing was removing a conversion from an array of coordinates to a list of tuples of coordinates. Creating N tuples was a very expensive step, apparently, and eliminating it cut the time by >50% in the _fast_reader.

The WIP is (1) to make sure that we didn't break anything, really and (2) to try to speed up parse_newick.

implented some small speedups

still inexplicably have an order-of-magnitude speed difference between calling `parse_newick` on a newick string and doing the _exact same thing_ within the io code...

e-koch

I added a few minor comments. Some of the test failures are legitimate and at least one seems to be related to this PR (https://travis-ci.org/github/dendrograms/astrodendro/jobs/523668907#L1286).

e-koch · 2021-12-07T21:22:01Z

astrodendro/structure.py

-        self._smallest_index = min(self._indices)
+        try:
+            self._smallest_index = min(self._indices)
+        except ValueError:


What causes the ValueError in min? Is there a check somewhere if _indices is empty?

astrodendro/io/util.py

astrodendro/structure.py

parse_newick right now)

keflavich added 5 commits April 23, 2019 12:12

@GiantMolecularCloud was having some problems with io times, so we

d11c7ad

implented some small speedups

add a progressbar for the actual bottleneck

8ad22ed

add a JSON parser and move the parse_newick to a different location. We

f2e40f8

still inexplicably have an order-of-magnitude speed difference between calling `parse_newick` on a newick string and doing the _exact same thing_ within the io code...

move the newick parsing back to where it was

c7ba98c

add a print statement

5510e42

keflavich requested a review from e-koch December 7, 2021 17:55

e-koch requested changes Dec 7, 2021

View reviewed changes

keflavich added 2 commits December 7, 2021 18:03

fix tests

58f21df

make progressbar pseudo-optional (nothing is passing kwargs to

cd4f56e

parse_newick right now)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IO optimization #164

IO optimization #164

keflavich commented Apr 23, 2019

e-koch left a comment

e-koch Dec 7, 2021

IO optimization #164

Are you sure you want to change the base?

IO optimization #164

Conversation

keflavich commented Apr 23, 2019

e-koch left a comment

Choose a reason for hiding this comment

e-koch Dec 7, 2021

Choose a reason for hiding this comment