Why Does Plassembler Exist?

In long-read assembled bacterial genomes, small plasmids are difficult to assemble correctly with long read assemblers. They commonly have circularisation issues and can be duplicated or missed (see this, this and this). This recent paper in Microbial Genomics by Johnson et al also suggests that long read assemblers particularly miss small plasmids.

Plassembler was therefore created as a fast automated tool to ensure plasmids are assembled correctly without duplicated regions for high-throughput uses - like Unicycler but a lot laster - and to provide some useful statistics as well (such as estimate plasmid copy numbers for both long and short read sets).

As it turns out (though this wasn't a motivation for making it), Plassembler also recovers more small plasmids than the existing gold standard tool Unicycler. I think this is because it throws away chromosomal reads, similar to subsampling short reads sets which can improve assembly quality by reducing noise (see this and this and this - thanks Michael Hall for the references!). As there are more plasmid reads a proportion of the overall read set, there seems to be a higher chance of recovering smaller plasmids.

You can see this increase in accuracy and speed in the benchmarking results for simulated and real datasets.

Plassembler also uses mash as a quick way to determine whether each assembled contig has any similar hits in PLSDB.

Additionally, due to its mapping approach, Plassembler can also be used as a quality control tool for checking whether your long and short read sets come from the same isolate. This may be particularly useful if your read sets come from different extractions, or you have multiplexed many samples (& want to avoid mislabelling).

Why Not Just Use Unicycler?

Unicycler is awesome and still a good way to assemble plasmids from hybrid sequencing - plassembler uses it! But there are a few reasons to use plassembler instead:

  1. Time. Plassember throws away all the chromosomal reads (i.e. most of them) before running Unicycler, so it is much faster (wall clock 3-10x faster generally).
  2. Accuracy. Benchmarking has shown Plassembler is better than Unicycler in terms of recovering small plasmids.
  3. Plassembler will output only the likely plasmids, and can more easily be integrated into pipelines. You shouldn't be assembling the chromosome using Unicycler anymore so Plassembler can get you only what is necessary from Unicycler.
  4. Plassembler will give you summary depth and copy number stats for both long and short reads.
  5. Plassembler can be used as a quality control to check if your short and long reads come from the same sample - if plassembler results in many non-circular contigs (particularly those that have no hits in PLSDB), it is likely because your read sets do not come from the same isolate! See Quality Control.
  6. You will get information whether each assembled contig has a similar entry in PLSDB. Especially for common pathogen species that are well represented in databases, this will likely tell you specifically what plasmid you have in your sample.
  7. Note: Especially for less commonly sequenced species, I would not suggest that that absence of a PLSDB hit is necessary meaningful, especially for circular contigs - those would likely be novel plasmids uncaptured by PLSDB.