Plassembler & Assembler Non Determinism

All following data was conducted on a Intel® Core™ i7-10700K CPU @ 3.80GHz on a machine running Ubuntu 20.04.6 LTS.

In the course of bechmarking plassembler, I ran into some interesting problems regarding non-determinism.

It was flagged in manuscript review that plassembler was non-deterministic, which I didn't actually know about. All my testing I'd done had been with the same amount of threads - so this will be something I take into future benchmarking analyses.

Previously (perhaps naively), I had assumed assemblers ( SPAdes, Flye and Unicycler (which uses SPAdes)) were deterministic. But after some digging, I quickly found out they were not.

It turns out essentially every assembler is not deterministic between threads due to multi-threading:

In practice from my benchmarking however, I found that Unicycler (which uses SPAdes), is "practically" deterministic between thread counts (i.e. it always gave the same output). So I did not investigate any further.

In Flye, the developers recently added a --deterministic option.

Interestingly, I found out that even Flye using --deterministic isn't deterministic between threads either, so there is some deeper stochastisity at play in Flye. So I decided to investigate this.

It turns out that --deterministic doesn't really make a difference to Flye's determinism - so I would probably not use it as it slows Flye down!

The following Tables show the results of Flye assemblies of real read sets for Wick et al here and Houtak et al (C222, subsampled to 30x and 60x), here.

I have included the Flye v2.9.2 assembly lengths with and without without --deterministic run using Plassembler. The exact same readset for all 3 runs (1, 8 and 16 threads) are used as input (as chopper is deterministic).

Note that this is with Plassembler v1.0.0, so if you dig into the details on Zenodo the log files will look different.

The exact command used within plassembler is:

flye --nano-hq <input fastq> --out-dir <output directory> --threads <thread count>

Flye without --deterministic

Isolate Threads 1 8 16
A. baumannii J9 Flye Contig Lengths (bp) 3797823; 145027; 12145 3797806; 145027; 12086 3797803; 145026; 12144
Plassembler Contig Lengths 145071; 6078 145069; 6078 145071; 6078
S. aureus C222 30x Flye Contig Lengths (bp) 2788497; 2472 2788497; 2472 2788499; 2472
Plassembler Contig Lengths 2473 2473 2473
S. aureus C222 60x Flye Contig Lengths (bp) 2788510; 2469 2788510; 2471 2788509; 1895
Plassembler Contig Lengths 2473 2473 2473
C. koseri MINF 9D Flye Contig Lengths (bp) 4757420; 64915; 18570 4757424; 64916; 9283 4757426; 64913; 18593
Plassembler Contig Lengths 64962; 9294 64962; 9294 64962; 9294
E. kobei MSB1 1B Flye Contig Lengths (bp) 4836290; 132437; 108265; 14925; 7173 4836272; 244830; 4650 4836272; 136413; 108366; 4650
Plassembler Contig Lengths 136482; 108411; 4665; 3715; 2370 136482; 108411; 4665; 3715; 2370 136482; 108411; 4665; 3715; 2370
Haemophilus sp002998595 M1C132 1 Flye Contig Lengths (bp) 2051487; 39372; 21394; 19915 2051476; 39371; 27888; 21412; 13546; 3160 2051483; 39373; 21404; 19929
Plassembler Contig Lengths 39345; 10719; 9975 39345; 10719; 9975 39345; 10719; 9975
K. oxytoca MSB1 2C Flye Contig Lengths (bp) 5802199; 118088; 58430; 9135 5802339; 118091; 58430; 9141 5802182; 118088; 58430; 4402
Plassembler Contig Lengths 118161; 58382; 10697; 9975; 4574 118161; 58382; 10697; 9975; 4574 118161; 58382; 10697; 9975; 4574
K. variicola INF345 Flye Contig Lengths (bp) 5415051; 250848; 243518; 31785; 11548 5415038; 250870; 243809; 31785; 11547; 1947 5415038; 250848; 243515; 30857; 11550
Plassembler Contig Lengths 250272; 243534; 30412 + 663 (linear); 5738; 3514; 1934 250914; 243534; 30412 + 663 (linear); 5783; 3514; 1934 250914; 243534; 30412 + 663 (linear); 5783; 3514; 1934

Flye with --deterministic

Isolate Threads 1 8 16
A. baumannii J9 Flye Contig Lengths (bp) 3797822; 144252; 11505 3797808; 145027; 18231 3797822; 144252; 18053
Plassembler Contig Lengths 145069; 6078 145069; 6078 145071; 6078
S. aureus C222 30x Flye Contig Lengths (bp) 2788497; 2472 2788496; 2471 2788497; 2471
Plassembler Contig Lengths 2473 2473 2473
S. aureus C222 60x Flye Contig Lengths (bp) 2788514; 2471 2788510; 2471 2788509; 2469
Plassembler Contig Lengths 2473 2473 2473
C. koseri MINF 9D Flye Contig Lengths (bp) 4757416; 64916; 18569 4757419; 64916; 17890 4757436; 64915; 18569
Plassembler Contig Lengths 64962; 9294 64962; 9294 64962; 9294
E. kobei MSB1 1B Flye Contig Lengths (bp) 4836273; 128941; 107534; 9421; 7516; 811 4836272; 244870; 8916 4836292; 136412; 108364; 9304
Plassembler Contig Lengths 136482; 108411; 4665; 3715; 2370 136482; 108411; 4665; 3715; 2370 136482; 108411; 4665; 3715; 2370
Haemophilus sp002998595 M1C132 1 Flye Contig Lengths (bp) 2080062; 48768; 39374; 21401; 19934; 5348 2051484; 39371; 21404; 19932 2051487; 39372; 30363; 19931
Plassembler Contig Lengths 48620 (likely prophage); 39398; 10719; 9975 39345; 10719; 9975 39345; 10719; 9975
K. oxytoca MSB1 2C Flye Contig Lengths (bp) 5802201; 118088; 58433; 9133 5802179; 118081; 58428; 9097 5802238; 118089; 58430; 9133
Plassembler Contig Lengths 118161; 58381; 10697; 9975; 4574 118161; 58382; 10697; 9975; 4574 118161; 58382; 10697; 9975; 4574
K. variicola INF345 Flye Contig Lengths (bp) 5415037; 250849; 243507; 31779; 5772; 1499 5415055; 250850; 243514; 31778; 11570 5415037; 250843; 243613; 31172; 17176; 11547; 6130
Plassembler Contig Lengths 250272; 243534; 30412 + 663 (linear); 5783; 3514; 1934 250272; 243534; 30412 + 663 (linear); 5783; 3514; 1934 250272; 243534; 30412 + 663 (linear); 5783; 3514; 1934