Understanding the non-coding component is essential for comprehending the complexity of eukaryotic genomes

The so-called “central dogma of molecular biology” has been the cornerstone for much of the exciting developments in biology during the past 3-4 decades. This dogma basically implied that the biological information flows uni-directionally from nucleic acids to proteins through the genetic code and that the proteins produced through translation on basis of the genetic information encoded in DNA/RNA determine the phenotype. Paradoxically, in spite of this central role of proteins in determining the functional/structural phenotypes, it was also clear that only a very small fraction of eukaryotic genomic DNA actually codes for proteins. The apparent lack of correlation between the C-value and biological complexity of a species/taxonomic group and the belief (erroneous) that bulk of chromosomal DNA does not transcribe or does not result in a distinct phenotype if mutated, gave rise to the concept of “junk” or “selfish DNA”. This concept soon caught the fancy of “molecular biologists”, excited with the power of newly discovered recombinant DNA methods and the emerging field of biotechnology. Thus from early 1980s onwards, it began to be widely believed that DNA sequences, which are not directly involved in synthesis of a polypeptide, are “junk” or “selfish” and, therefore, not worthy of a serious attention.

It is interesting to note in hindsight that biochemical evidences in 1960s suggested that much of the nuclear DNA may actually be transcribed but before these observations could be seriously examined by molecular methods that became available later, they were lost in the tidal waves of the “central dogma” and “junk/selfish DNA”. The attraction of making rapid material gains through biotechnology using the powerful recombinant DNA methods to produce proteins of desired types, resulted in further neglect of the non-coding DNA, which was often accepted as something that exists but is of little consequence. Even when some “non-coding” transcripts were identified, their biological significance was seriously questioned and these transcripts were often suggested to be either edited to become a functional mRNA or to be merely “run-on” transcripts because the RNA polymerase failed to stop at the expected termination signal or were simply dumped as “selfish”.

 

Notwithstanding such prejudice, painstaking work in some laboratories helped establish a few “non-coding” RNAs as the functional end-products of at least some “genes”. Some examples in the early 1990s included the hsr -omega (heat shock RNA-omega) and the Rox (RNA-on-X) transcripts in Drosophila and the Xist (X-inactive specific transcript) transcripts in human and other mammals. The discovery of post-transcriptional gene silencing (PTGS) in the mid-1990s, initially in plants and later in the worm, C. elegans, led to discovery of the small (21-32 base long) RNAs with far-reaching effects on gene expression. The phenomenon of RNA-interference (RNAi) brought about by a variety of small RNAs (the miRNA, siRNA or piRNA etc) soon became the buzz word in late 1990s and early 2000s and appeared to be the most potent biotechnological solution for a variety of diseases. This hype about therapeutic applications of miRNA/siRNA has somewhat mellowed down because of the recent discoveries that these small RNAs actually function through several complex layers of regulations and thus cannot be directed, in most cases, to one specific gene product. However, one of the major unexpected “side-effect” of the excitement about these small RNAs has been that long non-coding RNAs (lncRNA) also received greater attention and acceptability. Thanks to this acceptability and the development of significantly improved methods of RNA sequencing during the past decade, it is now generally believed that during an individual’s life, nearly whole of the genome is transcribed when all cell types are taken together.

A large variety of short and long ncRNAs are being discovered and functionally understood at an increasing rate in recent years. While the small RNAs function mostly through RNAi and related processes, many small and not so small RNAs seem to affect transcription of the protein coding genes through chromatin remodeling, while other lncRNAs function through their direct interactions with proteins, which in turn affect the cellular networks. Thus many of these lncRNAs may act as hubs that maintain homeostasis between different cellular networks.

It is well known that the numbers of protein coding genes do not differ greatly between simpler and more complexly organized eukaryotes. Even with alternative splicing, the diversity of proteome remains limited. It is the regulatory systems that produce different combinations/quantities of various proteins in a given cell and the resulting combinatorial diversity of proteins brings about cellular differentiation and thus biological complexity. The non-coding DNA sequences and transcripts play the most significant regulatory roles at multiple levels including chromatin organization/remodeling, transcription, post-transcriptional processing, transport of RNA, translation and post-translational modifications/stability of proteins. Thus the non-coding sequences and transcripts in eukaryotes emerge as the real managers who themselves are not “direct producers”, but who manage orderly activities of the proteins (and the protein-coding genes) to generate the complex cellular protein interaction networks, which are necessary for generation and maintenance of the eukaryotic complexity.

I have been interested in the hsr-omega gene in Drosophila and its non-coding transcripts for a little more than 4 decades. These studies have helped us understand novel ways through which a long non-coding RNA can have pleiotropic effects on multiple phenotypes through interactions with a variety of proteins. Some of our recent studies will be discussed to illustrate the regulatory roles of lncRNAs in eukaryotes.