We show that when sequencing RNA fragments from one end, as it is the case on most platforms, an oscillation in the read count is observed at the other end. We further show that these oscillations can be well described by Kolmogorov's 1941 broken stick model. We investigate how the model can be used to improve predictions of gene ends (3' transcript ends) but conclude that with present data the improvement is only marginal. The results highlight subtle effects in high-throughput transcriptomics experiments which do not have a biological origin, but which may still be used to obtain biological information.
This talk is based on the paper: Lognormality and oscillations in the coverage of high-throughput transcriptomic data towards gene ends, [arXiv:1303.4229], JSTAT (in press)