featureCounts: an efficient general-purpose read summarisation program

Standard

I just spotted a link to a paper on a new read summarisation program featureCounts in my twitter feed. the second author Gordon Smyth is the guy who wrote limma – the linear model framework for microarrays.  Presently there is just a link to a paper on the arxiv. This program takes read alignments and summarises them according to the genome feature they fall within or near.

Their program is a lot quicker and uses less memory than some commonly used tools htseq (Python) and GenomicRanges (R- Bioconductor) – e.g. Table 1.

Screen Shot 2013-05-16 at 13.44.15

Firstly people are always complaining about the number of assembly or read alignment programs. It seem to me that there are a lot of simple tools that are pretty inefficient. And their inefficiency has a substantial economic cost in the hardware you need and the time you run it. Perhaps people think it is a fraction of alignment or assembly so forget it … move on.

The thing is the GenomicRanges tool is written in C. Their tool featureCounts is written in C. Whilst GenomicRanges is not necessarily a tool for summarisation but a more general framework for handling read alignments. It still seems hamstrung by R memory. It is also very difficult to follow the C code underlying GenomicRanges (to me anyway) and other Bioconductor sequencing packages.

Presently a lot of the high-level numerical or statistical work on genomics is done with R packages after summarisation. So this is perhaps just a step below the designed competency of R. However I think that the work demonstrated at companies like Revolution, with packages like bigmemory and Rcpp, and efforts like bigvis or data.table — all point to the fact that more of the genomic workflow could be done within R – or at least within an R wrapper.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s