Evolutionarily new sequences expressed in tumors
© Kozlov et al; licensee BioMed Central Ltd. 2006
Received: 21 July 2006
Accepted: 25 December 2006
Published: 25 December 2006
Earlier we suggested the concept of the positive evolutionary role of tumors. According to this concept, tumors provide conditions for the expression of evolutionarily new and/or sleeping genes in their cells. Thus, tumors are considered as evolutionary proving ground or reservoir of expression. To support this concept we have previously characterized in silico and experimentally a new class of human tumor-related transcribed sequences.
In this article we describe results of further studies of previously described tumor-related sequences. The results of molecular phylogeny studies, Southern hybridization experiments and computational comparison with genomes of other species are presented.
These results suggest that these previously described tumor-related human transcripts are also relatively evolutionarily new.
In previous studies [1, 2], we formulated the concept of the positive evolutionary role of tumors. According to this concept, tumors provide conditions for the expression of evolutionarily new and/or sleeping genes in their cells. Thus, tumors are considered as an evolutionary proving ground of expression.
In earlier work using the computational differential display approach, we identified a considerable number of human tumor-related expressed sequence tag (EST) clusters many of which had not been described previously . Experimental data confirmed the results obtained in silico, i.e., the tumor-specificity of expression of these sequences .
To experimentally examine our prediction [1, 2] that at least some tumor-related sequences are evolutionarily new, we performed Southern hybridization of our newly described tumor-related sequences with genomic DNA from different animal species. Hybridization was found only with human and orangutan DNA, with one exception in which a signal was also developed with chicken DNA.
We performed a search for ortholog sequences in fugu, tetraodon, zebrafish, frog, chicken, rat, mouse, cow, dog, macaque, and chimpanzee genomes using cross-species chained alignments. This search confirmed that our newly described tumor-related transcripts are relatively evolutionarily new, with some of their orthologs having originated in mammals and others in primates.
PCR experiments with specific primers were performed on a panel of DNAs from different primates. Amplified fragments were cloned and sequenced, and their molecular phylogeny was studied. The results show that these sequences form well-defined phylogenetic clusters which correspond to the phylogeny of primates as previously understood.
Taken together, our Southern hybridization, molecular phylogeny, and comparative genomics data support our prediction [1, 2] that evolutionarily new and/or sleeping sequences may be specifically expressed in tumor cells.
Transcribed sequences analyzed
Because of the constant rebuilding of the UniGene clusters and EST shuffling between them, we cannot follow the history of each cluster. Clusters very often do not mark a specific transcript, but a set of transcripts whose genome mapping regions are often neighboring but may be not overlapping. Therefore, we selected ESTs which were used for primer design in our previous investigations [3, 4] and followed the history of their sequences in UniGene.
In this paper, we present analyses of the following ESTs (UniGene buid 185): [GenBank:AA166653], now in cluster Hs.426704 (former Hs. 154173); [GenBank:AL040372], now in cluster Hs.133294; [GenBank:AI952931] from cluster Hs.128594 (former Hs.67624); and [GenBank:AI792557] from cluster Hs.133107.
PCR analysis and Southern hybridization
We performed Southern hybridization of [α-32P]-labeled sequence-specific fragments with genomic DNA from eleven different animal species: lamprey, fish, frog, chicken, pigeon, mouse, rat, guinea pig, sheep, horse, and human. Southern hybridization analysis reveals only homology sequences in chicken genome for AA166653-specific probe. In addition, we were able to demonstrate by Southern hybridization that a sequence homologous to the human AA166653-specific 11.2-kb fragment is present in orangutan DNA [see Additional file 1].
Results of PCR experiments* and comparative genomics analysis within primates.
Species/Transcript (EST Clusters)
New World monkeys
Old World monkeys
Apes and Human
Pongo pygmaeus (sumatran)
Pongo pygmaeus (bornean)
Gorilla gorilla (sample 1)
Gorilla gorilla (sample 2)
Pan troglodytes (sample 1)
Pan troglodytes (sample 2)
AL040372- and AI792557-specific tumor-related human sequences are found in the majority of primate species studied, even in the most archaic. The AI792557-homologous sequence is not found in lemurs and colobus monkeys. The AA166653-homologous sequence is present only in apes and macaques. The AI952931-homologous sequence is found in apes, new world monkeys, and macaques. No sequences discussed in this article could be amplified by PCR on a DNA evolutionary panel with genomic DNA from species non-primate species using the selected primers.
Comparative genomics and bioinformatics analyses
Summary of cross-species homology analysis results
Compared Genomes where Homology was found*
Aligned Bases between Genomes
% of Aligned Bases**
Matched Bases between Genomes
% of Matched Bases***
Full length of Aligned Sequence
Full length of Aligned Sequence
chr2:132 864 310-
132 864 909
Duplications of tumour-related sequences studied in primate genomes
Mapping/Transcript (EST Cluster)
Original transcript mapping on chromosome in human genome
Human duplications and their mapping
5 (12, 16, Y × 3)
The number of homologs in P. troglodytes genome and their mapping
2 (1, 14)
4 (13 × 2, 18, Y)
The number of homologs in M. mulatta genome**
Cluster Hs. 133294 corresponds to mRNA IQGAP3, which encodes a member of the Rho GTPase family of regulators involved in cytokinesis. Specifically, cluster Hs. 133294 includes an alternatively spliced isoform of the IQGAP3 gene that arises by retention of its 672-nt intron. Earlier , we demonstrated that this isoform is characterized by broad tumor-related and embryonal expression, thus representing a new carcinoembryonic transcript.
The AL040372-specific sequence corresponding to the tumor-related transcript of interest is mapped to the 3'-terminal intron and the 3'-UTR of IQGAP3 mRNA. Sequences with strong homology to this genomic region are present in macaque (94%) and chimpanzee (99%) genomes. Moreover, sequences with a similarity of 52%–73% to this genomic region have been found in opossum, mouse, rat, dog, and cow genomes (Table 2). Interestingly, the part of the 3'-UTR exonic sequence which is overexpressed in human tumors according to UniGene data is not present (or is highly divergent) in the mouse genome [see Additional file 2].
Using BLAT, we found that AL040372- and AA166653-homologous sequences have duplicates in the human and nonhuman primate genomes (Table 3).
Molecular phylogenetic analysis
The prediction that evolutionarily new sequences may be expressed in tumor cells was made in our previous articles [1, 2]. To experimentally examine this prediction, we performed Southern hybridization of [α-32P]-labeled newly described tumor-related fragments with genomic DNA from different animal species. Sequences studied in the present article were selected from tumor-related transcripts revealed by an in silico search and experimentally described in our previous papers [3, 4].
Hybridization signals were detected only with human and orangutan DNA, with the single exception of a signal observed after hybridization of the AA166653-specific [α-32P]-labeled probe with chicken DNA [see Additional file 1]. This signal was consistently observed in several hybridization experiments. However, comparative genomics analysis has not revealed AA166653-homologous sequences in the chicken genome. We suggest that this signal may be an artifact of hybridization.
Interestingly, in the case of the AA166653-homologous sequence, signals on Southern blot form a "ladder" [see Additional file 1], which is a feature of fragments located in a repetitive sequence. It is in good agreement with computational evidence that the AA166653-specific sequence is located in an intergenic spacer upstream of the 23 repeat region of the human ribosomal DNA complete repeating unit , which is tandemly repeated and forms arrays in genomes of eukaryotes.
Comparative genomics analysis have shown that the tumor-related transcripts under consideration have orthologs in mammal genomes only and not in those of fishes, amphibia, and birds, with the single exception of a short sequence in the chicken genome with low homology for AI952931 (Tables 2 and S1).
The reason why the probe did not hybridize with DNA from mammals in which we found homologous sequences using comparative genomics analysis is due to low homology and the short length of orthologous sequences (Table 2).
We may conclude that Southern hybridyzation and comparative genomics data confirm the evolutionary novelty of the sequences studied, i.e., their origins in mammals or in primates.
The results of molecular phylogenetic analysis are in accordance with Southern hybridization and comparative genomics results. AA166653-homologous sequences are present only in apes and macaques and have no homology with any sequences in other mammals. The most archaic of the four species presented on the phylogenetic tree in Fig. 1b is the macaque. We cannot find an AA166653-specific sequence in primates before the divergence of old world monkeys and apes. Therefore, the origin of AA166653-specific sequences took place about 25 mya, during the divergence of macaques and apes.
AL040372-, AI792557- and AI952931-specific sequences formed separate clusters on phylogenetic trees demonstrating high nucleotide sequence divergence (from 20% to 35%) with related sequences in mammals (Fig. 1a,1c and 1d). AL040372-homologous sequences were found in lemurs – the most archaic members of the primate group. Lemur sequences demonstrate lower divergence from other primates (about 8%) than related sequences from non-primate animals (20% and more). Phylogenetic analysis has shown that lemur sequences belong to the primate phylogenetic cluster. Other primates form a separate non-lemur subcluster in this phylogenetic cluster (Fig. 1a).
AI792557-homologous sequences form a well-supported monophyletic group in apes and old world monkeys. These sequence homologs were found in the Ateles-Callimico group and were not present in older primates. The divergence of the Ateles-Callimico group from old world monkeys took place about 40 mya The cDNA of Hs.133107 which includes EST AI952931, is identified as PVT1, encoding for the Pvt1 oncogene homolog. The Pvt1 locus also is a common integration site for murine leukemia viruses on mouse chromosome 15 and is located approximately 270 kb from c-myc. MLV proviruses integrated in the Pvt1 locus activate c-myc expression by long-range (up to-300 kb) cis-effects . In the human genome, the corresponding sequence is located on chromosome 8. Therefore, an evolutionarily new tumor-specific sequence with a high potential of oncogenicity is presented in the mammalian lineage near Pvt1 locus. Obvious overexpression of AI792557-specific transcripts in human tumors [3, 4] could be explained by enhanced transcriptional activity of the c-myc-regulating element.
The proportion of the mammalian genome which is transcribed is greater than usually realized [7, 8]. It turns out that large regions of the genome beyond the coding segments are transcribed, producing non-coding RNAs [3, 7, 9]. As shown in this article two of ESTs studied are from introns (plus or minus chains), one from intergenic spacer region and one represent 3-UTR of mRNA, containing alternativerly spliced intron. According to our previous data , they do not contain easily recognized open reading frames or contain only short open reading frames.
There is a growing number of recent publications on non-coding RNAs and their possible functions [10–12]. But the fact that certain RNAs have low coding potential may also characterize them as evolving sequences. The concept of evolution by gene duplication  involves understanding that the extra copy of the duplicated gene may accumulate mutations and acquire a new function. Before acquisition of a new function, it may express RNA without long open reading frames or with stop-codons and/or frame-shift mutations interrupting open reading frames. In the similar way, non-coding sequences could evolve and eventually acquire a function and/or longer open reading frames. The fact that we were able to demonstrate duplications of AL040372- and AA166653-homologous sequences in the human, chimpanzee, and macaque genomes (Table 3) supports this interpretation.
The Alu-Y element was found in a AL040372-homologous sequence in Ateles and Callimico. The presence of the Alu sequence in the genome may mediate DNA recombination, the creation of new exons, and the donation of new regulatory elements . It was found in our study that part of the AL040372-homologous sequence in the lemur genome has an extension with no similarity in those of other primates (data not shown). In higher primates, this region demonstrates a homology with the human genome.
Taken together, these data from Southern hybridization experiments, molecular phylogenetic studies, and computational evidence suggest that AA166653-, AL040372-, AI792557- and AI952931-homologous sequences are indeed evolutionarily new. They originate in mammals (AA166653 – in primates) and form phylogenetic clusters in primates. They are not expressed in normal cells [3, 4], i.e., they are sleeping.
Earlier [1, 2], we formulated the concept of the positive evolutionary role of tumors. According to this concept, tumors provide conditions for the expression of evolutionarily new and/or sleeping genes in their cells. As evolutionary new genes we defined genes which participate in the origin of new cell types . New cell type origin is very rare event which is associated with progressive evolution. During 109 years of multicellular organisms evolution only about 200 specialized cell types have been originated . Thus, within the framework of our hypothesis sequences originated in mammalas may be well considered as evolutionary new.
We may guess that during the earliest period of the origin of mammals, genome evolution and cellular proliferative tumor-like processes provided material for the origin of diversity of mammalian cell and tissue types by generating a diversity of new gene expression patterns. Populations of tumor-bearing animals could be ancestors of the first mammals. Present-day tumors (at the earlier stages of progression) may somehow recapitulate these processes.
Our data presented in this and previous articles [3, 4] demonstrate the expression of relatively evolutionarily new (in respect to progressive evolution) and/or sleeping sequences in tumor cells and support the concept of the possible evolutionary role of tumors as a proving ground or evolutionary reservoir of expression. If proven to be correct, this concept may substantially increase our capabilities in the diagnosis and treatment of cancer. This concept may also describe one of the mechanisms of progressive evolution of animal species in which tumors participate.
Human, ape (Pan troglodytes, Gorilla gorilla, Pongo pygmaeus, Hylobates concolor), old world monkey (Erythrocebus patas, Macaca mulatta, Colobus guereza, Cercopithecus aethiops), and new world monkey (Callimico goeldii, Lemur catta, Ateles fusciceps) genomic DNAs were used in the study. All samples except human DNA were kindly provided by Dr. S. O'Brien (Chief, Laboratory of Genomic Diversity, National Cancer Institute). The DNA concentration of each sample was brought to 200 ng/μl before being used.
Oligonucleotide primers for PCR were designed with OLIGONEW software after alignment of human EST sequences and the corresponding regions of the human genome. We performed BLAST searches for all primer pairs created. Only PCR primers that corresponded to a unique location in the human genome and to an EST cluster of interest were used.
Primers for AA166653: 5'-TCTTTCTTGATGAATTATCTTATG-3' and 5'-ACACACCCTCATTCCCGC-3'; the expected fragment size is 443 bp. Primers for AL040372: 5'-GTCAACCTTCTCATCTTCCTC-3' and 5'-CAGGAAGTTGGGTAGATGTG-3'; the expected fragment sizes are 412 bp on cDNA and 1084 bp on genomic DNA. Primers for AI952931: 5'-TAATTGCATTCTTCAAAATTCTAC-3' and 5'-CTTCGCACCATTGAATAAAC-3'; the expected fragment size is 315 bp. Primers for AI792557: 5'-TACATAGTTGTTATCTTAAGGTG-3' and 5'-TGGGAATTCTATACTTTTGAC-3'; the expected fragment size is 344 bp. Histone H4 control primers: 5'-ATGTCTGGCCGTGGTAAAGG-3' and 5'-CCGAAGCCGTAAAGAGTGCG-3'; the expected fragment size is 300 bp.
The PCR mixture contained 500 ng of genomic DNA as template, PCR buffer (1), MgC12 (4 mM), dNTP (each at 200 μM), specific forward and reverse primers (each at 0.2 μM), and Taq DNA Polymerase (1 u) in a total volume of 25 μl (all reagents were supplied by Fermentas, Lithuania).
PCR was carried out under the following conditions: 1 min at 95°C, 35 cycles each consisting of 30 s at 95°C and 30 s at 56°C for AA166653 primers and histone H4 primers or at 58°C for all other primers, and 1 min at 72°C. At the final stage of the PCR reaction, mixtures were incubated for 5 min at 72°C to elongate the DNA fragments synthesized. PCR products were separated by electrophoresis in 2% agarose gel and visualized by staining with ethidium bromide.
DNA samples were digested with HindIII (10 U per μg of DNA) for 16 h at 37°C. Digested DNA (8 μg per lane) was electrophoresed in 0.8% agarose gel overnight at 25 V/cm. Gels were stained with ethidium bromide to assess loading and blotted onto a nylon membrane, Hybond-N (Amersham, USA), according to the manufacturer's instructions.
PCR products specific for genes of interest were labeled with [α-32P]dCTP using the HexaLabel DNA Labeling Kit (Fermentas, Lithuania) according to the manufacturer's instructions. Filter prehybridization and hybridization were carried out according to the standard procedure . Washing conditions were as follows: two times in 0.25 M sodium phosphate (pH 7.2), 5% SDS for 30–60 min at 65°C and two times in 0.125 M sodium phosphate (pH 7.2), 1% SDS for 30–60 min at 65°C (medium stringency) or two times in 20 mM sodium phosphate (pH 7.2), 1% SDS for 30–60 min at 65°C (high stringency). X-ray films were exposed to the blots for 3 days at -70°C with an intensifying screen.
Cloning and sequencing
Amplified fragments were cloned by standard techniques using the bacterial plasmid vector pGEM-T Easy (Promega, USA). Colonies of recombinant DH10B/R E. coli cells obtained by electrotransformation were selected. We subjected recombinant plasmids to restriction endonuclease analysis and isolated those with fragments of interest using the Wizard Minipreps Plasmid DNA Purification System (Promega, USA). Multiple clone sequencing was performed for each amplicon.
Sequencing was carried out by the Sanger method using the AutoCycle Sequencing Kit (Pharmacia Biotech, Sweden) and standard Cy5-labeled primers T7, whose binding sites flank the cloning site of recombinant fragment. We analyzed the products of the sequence reaction with an automated sequencer, ALFexpress (Pharmacia Biotech, Sweden), using the ALFwin v. 1.10 software package (Pharmacia Biotech, Sweden).
Molecular phylogenetic analysis
PCR amplified fragments of primate DNA were cloned as described above. A plasmid collection from each primate was created. In total, 86 clones containing sequences of interest were obtained. For each fragment, at least two clones were sequenced in forward and reverse directions in order to exclude PCR and sequencing errors. The BioEdit software was used to generate sequence alignments. The alignments consist of the following numbers of phylogenetically informative sites: 412 for the AL040372 fragment, 443 for the AA166653 fragment, 315 for the AI952931 fragment, and 344 for the AI792557 fragment. We constructed phylogenetic trees using the neighbor-joining method. Distance-based reconstructions and parsimony reconstructions based on the optimal alignments gave qualitatively similar phylogenetic results, with the same major clades and topological differences in nodes. The results of phylogenetic analysis are presented in Fig. 1.
Sequencing data were analyzed with the DNASIS v. 2.5 software (Hitachi Software Engineering, USA). We carried out alignments using the BioEdit software and excluded gap-containing sites. Phylogenetic trees were built according to the neighbor-joining method using the Kimura distances by the DNADIST and NEIGHBOR modules of the PHYLIP software package and PHYLIP v.3.57c , respectively. The reliability of the tree topology was assessed by bootstrapping with 1,000 replicates (the SEQBOOT and CONSENCE modules of the PHYLIP). The tree was drawn with Tree View software.
Identification of gene duplications and comparative genomics analysis
BLAT searches among primate genome nucleotide sequences were conducted to reveal duplications of sequences under analysis. Matches with a level of identity greater than or equal to 80% of maximum for each sequence were taken as duplications.
The cross-species chained alignments database integrated in the Genome Browser tool was used to search for orthologous sequences in fugu, tetraodon, zebrafish, frog, chicken, rat, mouse, cow, dog, macaque, and chimpanzee genomes .
The authors thank S. O'Brien for the primate DNA panel and V. Evtushenko for discussions.
This results were presented:
• As lecture on National Cancer Institute/Center for Cancer Research Grand Rounds (November 2, 2004).
• As oral abstract «Expression of Evolutionary New Sequences in Human tumors», MBE 05 Conference, Allan Wilson Centre for Molecular Ecology and Evolution, University of Auckland, Auckland, New Zealand, 2005.
• As plenary lecture, «In silico gel hybridization for tumor Ag discovery», International Cancer Vaccine Conference, May 26–28, 2006, Naples, Italy
- Kozlov AP: Evolution of Living Organisms as a Multilevel Process. J Theor Biol. 1979, 81: 1-10.1016/0022-5193(79)90076-6.PubMedView ArticleGoogle Scholar
- Kozlov AP: Gene Competition and the Possible Evolutionary Role of Tumours. Medical Hypotheses. 1996, 46: 81-10.1016/S0306-9877(96)90005-5.PubMedView ArticleGoogle Scholar
- Baranova AV, Lobashev AV, Ivanov DV, Krukovskaya LL, Yankovsky NK, Kozlov A: In silico screening for tumour-specific expressed sequences in human genome. FEBS Lett. 2001, 508: 143-10.1016/S0014-5793(01)03028-9.PubMedView ArticleGoogle Scholar
- Krukovskaja LL, Baranova AV, Tyezelova T, Polev D, Kozlov AP: Experimental study of human expressed sequences newly identified in silico as tumour specific. Tumour Biol. 2005, 26: 17-10.1159/000084182.PubMedView ArticleGoogle Scholar
- Napier JR, Napier PH: The Natural History of The Primates. 1985, MIT Press, Cambridge, MAGoogle Scholar
- Koehne CF, Lazo PA, Alves K, Lee JS, Tsichlis PN, O'Donnell PV: The Mlvi-1 locus involved in the induction of rat T-cell lymphomas and the pvt-1/Mis-1 locus are identical. J Virol. 1989, 63: 2366-PubMedPubMed CentralGoogle Scholar
- Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR: Large-Scale Transcriptional Activity in Chromosomes 21 and 22. Science. 2002, 296: 916-10.1126/science.1068597.PubMedView ArticleGoogle Scholar
- Evtushenko VI, Hanson KP, Barabitckaya OV, Emelyanov AV, Reshetnikov VL, Kozlov AP: Determination of the upper level of expression of mammalian genome. Mol Biol (Mosk). 1989, 23: 663-Google Scholar
- Nekrutenko A: Reconciling the numbers: ESTs versus protein-coding genes. Mol Biol Evol. 2004, 21: 1278-10.1093/molbev/msh125.PubMedView ArticleGoogle Scholar
- Kelley RL, Kuroda MI: Noncoding RNA Genes in Dosage Compensation and Impriting. Cell. 2000, 103: 9-10.1016/S0092-8674(00)00099-4.PubMedView ArticleGoogle Scholar
- Erdmann VA, Barciszewska MZ, Hochberg A, de Groot N, Barciszewski J: Regulatory RNAs. Cell Mol Cell Mol Life Sci. 2001, 58: 960-10.1007/PL00000913.PubMedView ArticleGoogle Scholar
- Huttenhofer A, Kiefmann M, Meier-Ewert S, O'Brien J, Lehrach H, Bachellerie JP, Brosius J: RNomics: an experimental approach that identifies 201 candidates for novel, small, non-messenger RNAs in mouse. EMBO J. 2001, 20: 2943-10.1093/emboj/20.11.2943.PubMedPubMed CentralView ArticleGoogle Scholar
- Ohno S: Evolution by Gene Duplication. 1970, Springer-Verlag, New YorkView ArticleGoogle Scholar
- Batzer M, Deininger P: Alu repeats and human genomic diversity Alu repeats and human genomic diversity. Nature Rev Genet. 2002, 3: 370-10.1038/nrg798.PubMedView ArticleGoogle Scholar
- Sambrook J, Fritsch EF, Maniatis T: Molecular Cloning: A Laboratory Manual. 1989, Cold Spring Harbor, New YorkGoogle Scholar
- PHYLIP. [http://evolution.genetics.washington.edu/phylip.html]
- Kent WJ, Baertsch R, Hinrichs A, Miller W, Haussler D: Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes. Proc Natl Acad Sci USA. 2003, 100: 11484-10.1073/pnas.1932072100.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.