Gridfs versus file system performance #301

pavlis · 2022-01-25T20:33:20Z

pavlis
Jan 25, 2022
Collaborator

Thought I'd pass along some numbers from a serial run I have running on my "quakes" machine. For reference, both the database and file system with this workflow are driven from a raid1 magnetic disk (2 disks mirrored). All the job is doing is reading TimeSeries data from gridfs as ensembles, running bundle, and then writing the result out as files to the file system using save_ensemble_data with "file" as the storage mode. The files it is building are "common source gathers" with all the sample data packed into one file for each gather. Our current implementation, however, opens and closes the file after writing the sample data from each Seismogram object.

Here are some numbers:

Each of the files this workflow is writing end up around 0.5 GB in size.
Each write requires of the order of 1000 open,write, close cycles.

Here is an (incomplete) sample of the output:

working on  601000814b4f9e654b4dfffc
size of TimeSeriesEnsemble= 3792
size of ensemble returned by bundle= 1267
read time= 32.74741172790527  bundle time= 17.175071239471436  write time= 25.4978187084198
working on  601000814b4f9e654b4dfffd
size of TimeSeriesEnsemble= 3808
size of ensemble returned by bundle= 1272
read time= 53.39820218086243  bundle time= 17.836015462875366  write time= 35.178895711898804
working on  601000814b4f9e654b4dfffe
size of TimeSeriesEnsemble= 3796
size of ensemble returned by bundle= 1269
read time= 57.35251188278198  bundle time= 19.10366153717041  write time= 25.761695861816406
working on  601000814b4f9e654b4dffff
size of TimeSeriesEnsemble= 3783
size of ensemble returned by bundle= 1264
read time= 65.43653964996338  bundle time= 22.184596061706543  write time= 27.004825115203857
working on  601000814b4f9e654b4e0000
size of TimeSeriesEnsemble= 3378
size of ensemble returned by bundle= 1130
read time= 81.44764757156372  bundle time= 21.888903856277466  write time= 25.467494010925293
working on  601000814b4f9e654b4e0001
size of TimeSeriesEnsemble= 3814
size of ensemble returned by bundle= 1274
read time= 47.90579891204834  bundle time= 21.041545152664185  write time= 25.922308444976807
working on  601000824b4f9e654b4e0002
size of TimeSeriesEnsemble= 3812
size of ensemble returned by bundle= 1274
read time= 47.660040616989136  bundle time= 17.306525468826294  write time= 26.380173206329346
working on  601000824b4f9e654b4e0003
size of TimeSeriesEnsemble= 3610
size of ensemble returned by bundle= 1206
read time= 78.30606889724731  bundle time= 21.404655933380127  write time= 26.206895112991333
working on  601000824b4f9e654b4e0004
size of TimeSeriesEnsemble= 3425
size of ensemble returned by bundle= 1145
read time= 41.02765703201294  bundle time= 17.133479595184326  write time= 24.827390670776367
working on  601000824b4f9e654b4e0005
size of TimeSeriesEnsemble= 3800
size of ensemble returned by bundle= 1270
read time= 48.641247272491455  bundle time= 17.186299562454224  write time= 26.019386529922485
working on  601000824b4f9e654b4e0006
size of TimeSeriesEnsemble= 3789
size of ensemble returned by bundle= 1267
read time= 46.70861291885376  bundle time= 19.12036895751953  write time= 27.84042453765869
working on  601000824b4f9e654b4e0007
size of TimeSeriesEnsemble= 3799
size of ensemble returned by bundle= 1270
read time= 39.61109662055969  bundle time= 17.821937799453735  write time= 26.367314338684082
working on  601000824b4f9e654b4e0009
size of TimeSeriesEnsemble= 3796
size of ensemble returned by bundle= 1269
read time= 36.845688343048096  bundle time= 16.96921944618225  write time= 27.310006618499756
working on  601000824b4f9e654b4e000b
size of TimeSeriesEnsemble= 3801
size of ensemble returned by bundle= 1271
read time= 39.08718490600586  bundle time= 17.31556487083435  write time= 26.772661447525024
working on  601000824b4f9e654b4e000c
size of TimeSeriesEnsemble= 3807
size of ensemble returned by bundle= 1273
read time= 38.72846817970276  bundle time= 17.267874717712402  write time= 27.8333842754364
working on  601000824b4f9e654b4e000d
size of TimeSeriesEnsemble= 2329
size of ensemble returned by bundle= 779
read time= 23.504699230194092  bundle time= 10.280654668807983  write time= 16.00247049331665
working on  601000824b4f9e654b4e0013
size of TimeSeriesEnsemble= 3039
size of ensemble returned by bundle= 1015
read time= 29.458850145339966  bundle time= 13.561093807220459  write time= 21.448329210281372
working on  601000824b4f9e654b4e0014
size of TimeSeriesEnsemble= 3039
size of ensemble returned by bundle= 1015
read time= 32.16256093978882  bundle time= 13.65669322013855  write time= 32.21094012260437
working on  601000824b4f9e654b4e0015
size of TimeSeriesEnsemble= 2737
size of ensemble returned by bundle= 915
read time= 36.463353633880615  bundle time= 12.945478439331055  write time= 19.217315912246704
working on  601000824b4f9e654b4e0016
size of TimeSeriesEnsemble= 3775
size of ensemble returned by bundle= 1261
read time= 49.03338932991028  bundle time= 18.286068201065063  write time= 31.769449472427368
working on  601000824b4f9e654b4e0017
size of TimeSeriesEnsemble= 3816
size of ensemble returned by bundle= 1276
read time= 43.80911636352539  bundle time= 17.302143812179565  write time= 29.432601928710938
working on  601000824b4f9e654b4e0018
size of TimeSeriesEnsemble= 3621
size of ensemble returned by bundle= 1211
read time= 42.599528312683105  bundle time= 16.18039298057556  write time= 26.462611436843872
working on  601000824b4f9e654b4e0019
size of TimeSeriesEnsemble= 3534
size of ensemble returned by bundle= 1181
read time= 36.97156000137329  bundle time= 15.925049304962158  write time= 24.350287914276123
working on  601000824b4f9e654b4e001a
size of TimeSeriesEnsemble= 3618
size of ensemble returned by bundle= 1209
read time= 37.33484244346619  bundle time= 16.314956665039062  write time= 24.93072199821472
working on  601000824b4f9e654b4e001b
size of TimeSeriesEnsemble= 3826
size of ensemble returned by bundle= 1279
read time= 39.65821647644043  bundle time= 16.84499454498291  write time= 27.04046106338501
working on  601000824b4f9e654b4e001c
size of TimeSeriesEnsemble= 3645
size of ensemble returned by bundle= 1219
read time= 43.06977367401123  bundle time= 16.502297163009644  write time= 26.486803770065308
working on  601000824b4f9e654b4e001d
size of TimeSeriesEnsemble= 3834
size of ensemble returned by bundle= 1282
read time= 40.11210870742798  bundle time= 17.154951810836792  write time= 30.504440307617188
working on  601000824b4f9e654b4e001e
size of TimeSeriesEnsemble= 3829
size of ensemble returned by bundle= 1280
read time= 42.21038269996643  bundle time= 17.214308738708496  write time= 26.88381266593933
working on  601000824b4f9e654b4e001f
size of TimeSeriesEnsemble= 3472
size of ensemble returned by bundle= 1161
read time= 35.05531191825867  bundle time= 15.50342869758606  write time= 23.696781873703003
working on  601000824b4f9e654b4e0020
size of TimeSeriesEnsemble= 3827
size of ensemble returned by bundle= 1279
size of TimeSeriesEnsemble= 3803
size of ensemble returned by bundle= 1273
read time= 37.43334698677063  bundle time= 17.065462827682495  write time= 26.87726092338562
working on  601000824b4f9e654b4e0025
size of TimeSeriesEnsemble= 3847
size of ensemble returned by bundle= 1286
read time= 37.70990180969238  bundle time= 17.242804527282715  write time= 27.755101919174194
working on  601000824b4f9e654b4e0026
size of TimeSeriesEnsemble= 3845
size of ensemble returned by bundle= 1285
read time= 37.69155740737915  bundle time= 18.010506629943848  write time= 26.817289113998413
working on  601000824b4f9e654b4e0027
size of TimeSeriesEnsemble= 3841
size of ensemble returned by bundle= 1284
read time= 41.651981830596924  bundle time= 17.603503942489624  write time= 28.59811305999756
working on  601000824b4f9e654b4e0028
size of TimeSeriesEnsemble= 3846
size of ensemble returned by bundle= 1285
read time= 36.04982352256775  bundle time= 17.753157138824463  write time= 26.74865174293518
working on  601000824b4f9e654b4e0029
size of TimeSeriesEnsemble= 3860
size of ensemble returned by bundle= 1289
read time= 38.31715512275696  bundle time= 17.354063510894775  write time= 27.566601514816284
working on  601000824b4f9e654b4e002b
size of TimeSeriesEnsemble= 3538
size of ensemble returned by bundle= 1183
read time= 35.608712673187256  bundle time= 15.779703378677368  write time= 26.058210134506226
working on  601000824b4f9e654b4e002c
size of TimeSeriesEnsemble= 3693
size of ensemble returned by bundle= 1235
read time= 49.14308142662048  bundle time= 19.454637050628662  write time= 27.279898166656494
working on  601000824b4f9e654b4e002d
size of TimeSeriesEnsemble= 3864
size of ensemble returned by bundle= 1291
read time= 37.92432761192322  bundle time= 18.387404918670654  write time= 27.770765781402588
working on  601000824b4f9e654b4e002e
size of TimeSeriesEnsemble= 3056
size of ensemble returned by bundle= 1022
read time= 29.917216539382935  bundle time= 13.429659843444824  write time= 23.238003730773926

There is a lot more variance in the gridfs read time. Have no idea why. A rough guess scanning the numbers is that the read time is around 1.5 times (on average) longer than the write time. That isn't horrible, BUT this is a serial job. We can test this with a parallel job later if you think it would be helpful, but I suspect the results are somewhat predictable: gridfs reads will be a throttle although how it would trade off with writes with a parallel job would be hard to predict. Incidentally, the overall throughput is not that bad for the writer. It is running about 20 Mb/s. I suspect strongly, however, that that could be speeded up a lot that time didn't include 1000+ open/close calls.

I do think there are some clear things we need to fix with save_ensemble_data:

I think the api needs to be changed. Currently this will only work if the writer is passed a list of directory and file names with one string for each list per object being written. I created "gather files" using the following construct but think it is not the way we should expect our users to handle this:

    dfile = str(srcid)+".d3c"
    dir = "./wf3c/2012"
    dfile_list = [dfile for _ in range(len(d3c.member))]
    dir_list = [dir for _ in range(len(d3c.member))]

I suggest we should change the name of the current arg for save_ensemble_data from dfile_list to dfile and dir_list should be dir. The function should accept either a single string or a list of strings for both args. The first few lines of the method function could essentially run the code above to create an internal list IF the input is a single string. If the type is list it would just use it as it does now.

We should consider a modification of save_ensemble_data such that if dir and dfile are single strings the file is not closed until the entire ensemble is written. This would require some fairly major surgery on that method function, but it may yield a major performance improvement.

pavlis · 2022-01-25T20:50:02Z

pavlis
Jan 25, 2022
Collaborator Author

Another thought on this. It would be VERY EASY for me to write a C/C++ function that would take an ensemble object, write only the sample data to a file with C fwrite, and return an index with a struct to be determined. Then the python wrapper would only need to reorganize the return into documents being inserted into MongoDB - i.e. putting the right dir, dfile, and foff values for the correct waveform. The index would need to be designed to make sure there that process would be simple and error proof.

A reader could be produced the same way, but the input would need to look much like the output of the writer. i.e. a python wrapper could pass a C function the file read index and a skeleton of the ensemble's data (i.e. the member vector in the ensmble would have Metadata loaded but not the sample data). Then the C function would (for each member): (a) create the sample data array, (b) fseek to the right position, and (c) call fread to load data into the array.

I would bet a lot that will really speed up file-based reads and writes. I suspect it would be a lot faster than a pure python solution, but I could be wrong on that.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gridfs versus file system performance #301

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Gridfs versus file system performance #301

pavlis Jan 25, 2022 Collaborator

Replies: 1 comment

pavlis Jan 25, 2022 Collaborator Author

pavlis
Jan 25, 2022
Collaborator

pavlis
Jan 25, 2022
Collaborator Author