Phylogeny and polymorphism in the E6 and E7 of human papillomavirus: alpha-9 (HPV16, 31, 33, 52, 58), alpha-5 (HPV51), alpha-6 (HPV53, 66), alpha-7 (HPV18, 39, 59, 68) and alpha-10 (HPV6, 44) in women from Shanghai

Background Persistent infection with human papillomaviruses (HPVs) has been associated with cervical intraepithelial neoplasia (CIN) and cervical cancer. However, why only a fraction of HPV cases progress to cancer is still unclear. Methods We focused on the heterogeneity, classification, evolution and dispersal of variants for 14 common HPV types in 262 HPV-positive patients with cervical lesions. The E6 and E7 genes of HPV were sequenced and compared with the HPV reference for sequence analysis. Phylogenetic trees were constructed using the neighbour-joining tree method with MEGA 7.0. Results In this study, 233 E6 and 212 E7 sequences were successfully amplified by PCR, and these sequences were divided into 5 species groups: alpha-9 (HPV16, 31, 33, 52, 58), alpha-5 (HPV51), alpha-6 (HPV53, 66), alpha-7 (HPV18, 39, 59, 68) and alpha-10 (HPV6, 44). The incidence of high-grade squamous intraepithelial lesion (HSIL) in patients infected with alpha-9 HPV was significantly increased compared with other groups (P < 0.0001), especially HPV16 (P < 0.0001). Strikingly, E7 had significantly fewer nonsynonymous variants in the HSIL compared to <HSIL groups (P = 3.17× 10− 4). The A388C (K93 N) variation in HPV58 E6 can significantly reduce the risk of HSIL (P = 0.015). However, T7220G (D32E) variation in HPV16 E6 and A7689G (N29S) in HPV16 E7 increased the incidence of HSIL compared to the <HSIL group (P = 0.036 and 0.022). Conclusions Strict conservation of E7 is important for HPV carcinogenicity, especially N29 of HPV16. The findings in this work provide preventative/therapeutic interventions for HPV infections and CIN.

However, why only a small proportion of HPV infections progressed to precancer and cancer is unclear [4]. In addition to the pathogenic heterogeneity of distinct HPV types, previous studies indicate that HPV variants are also associated with different risks of cancer progression. For example, the HPV16 variant has significantly different risks of HPV persistent infection, progression to cervical intraepithelial neoplasia (CIN) and cervical cancer [5,6]. Lisa Mirabello observed that compared to the most frequent A1/A2 sublineages, the A4, C, D2 and D3 sublineages conferred a higher hazard of CIN and cervical cancer [7]. The C variant (vs. B variant) of HPV52 was associated with an increased prevalence of cytologically diagnosed and histologically confirmed HSIL or worse lesions [8]. These data indicate that HPV variants have different phenotypic characteristics, including carcinogenicity.
HPV E6 and E7 are the major oncogenes, which are highly expressed in tumours and are related to inducing cellular immortalization, transformation, and carcinogenesis through protein-protein interactions with tumour suppressor proteins [9]. For example, E6 binds the conserved LxxLL consensus sequences of the ubiquitin ligase E6-associated protein (E6-AP), which works as a connecting bridge between E6 and p53, leading to its subsequent degradation [10]. Similarly, E7 targets and promotes the inactivation of RB1, thus inducing cell-cycle progression through activation of E2F-driven transcription [11].

Study population
In total, 262 HPV-positive patients (mean age 38.34 ± 10.52 years, 21-78) with histopathologically confirmed cervical lesions, including 92 nonneoplastic, 69 low-grade squamous intraepithelial lesion (LSIL) and 101 high-grade squamous intraepithelial lesion (HSIL), were recruited from the Cervical Disease Centre at the Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine in Shanghai, China. Histopathological findings are divided into certain groups as nonneoplastic (chronic cervicitis and inflammation-related regenerative changes), LSIL (CIN I/mild dysplasia), HSIL (CIN II and CIN III/moderate and severe dysplasia) and invasive carcinoma. CIN I refers to mildly atypical cellular changes in the lower third of the epithelium, CIN II refers to moderately atypical cellular changes confined to the basal twothirds of the epithelium (formerly called moderate dysplasia) with preservation of epithelial maturation. CIN III refers to severely atypical cellular changes encompassing greater than two-thirds of the epithelial thickness and includes full-thickness lesions (previous terms were severe dysplasia or carcinoma in situ).
The criteria for the inclusion of patients enrolled into their current study: HPV single infection; Histopathologically confirmed by Colposcopy biopsy. The exclusion: Coinfected with different HPV types; Not histopathologically confirmed; the patients with vaginitis or other bacterial/ virus infection.
Genomic DNA isolation and HPV typing DNA from exfoliated cervical cells was extracted using the TIANamp Genomic DNA Kit (No: 3304-9) according to the manufacturer's instructions. HPV genotyping was conducted using an HPV GenoArray Test Kit (HybriBio Ltd).

Amplification and sequencing
After HPV testing, the remaining DNA samples were stored at − 80°C and used to amplify E6 and E7 using specific primers (Table 1). Subsequently, PCR products excised from 1.5% agarose gel were sequenced bidirectionally by SAIYIN Gene Biotechnology Company, Shanghai, China.

Phylogenetic tree analysis and sequence analysis
The neighbour-joining phylogenetic tree of the HPVs was constructed by MEGA 7.0 using the maximum composite likelihood estimate [12]. To construct distinct phylogenetic branches, the reference HPV sequences were obtained from the GenBank database. The phylogenetic trees were visualised in FigTree v1.4.3 and online Evolview [13,14].
The sequences were subsequently analysed by NCBI Blast, and all unique sequences were compared pairwise using the ClustalW tool of

Statistical analysis
Fisher's exact test was chosen for statistical analysis. P < 0.05 was used as the threshold to indicate statistical significance. All the P values in the present study were two-sided. The power calculation was performed by G*power software [15].
Interestingly, we observed that one variant represented four out of 13 HPV59-positive samples that appeared to form a new candidate, sublineage B1-2 (Fig. 3a). A 9-base sequence (AGTGAAACT) was inserted after position 519 of the E6 sequence, and 9 inserted bases were translated into 3 amino acids SET (Fig. 3b and c). These diagnostic SNPs were unique to the B1-2 sublineage.
Nonsynonymous mutations for the E6 and E7 genes within all types of HPV were evaluated. The A burden test was used to determine if the variant distribution was These sequences were divided into 5 species groups (alpha-5, alpha-6, alpha-7, alpha-9, alpha-10), of which alpha-10 was a low-risk (LR) clade. Green, grey and red circle represent cervicitis, low-grade squamous intraepithelial lesion, high-grade squamous intraepithelial lesion, respectively; the star represents nonsynonymous mutations, and blue stars are insertion/deletions Comparison between Genus; * P values remain significant after Bonferroni adjustment for multiple tests. # P<0.05 using analysis of variance. The boldface entries indicate the distribution of α-5, α-6, α-7, α-9 and α-10 HPV infection in different populations (IF, LSIL and HSIL group) different between the IF, LSIL and HSIL groups by viral region (Table 3, Fig. 1, and Fig. 2). Despite nearly equal numbers of E6 and E7 sequences among three groups (IF, 159; LSIL, 121; HSIL, 165), the IF group overall had a significantly higher number of variants compared to the LSIL and HSIL groups (P = 3.83× 10 − 4 ). Strikingly, the E7 gene had significantly fewer nonsynonymous variants in the HSIL compared to LSIL and IF groups (P = 3.17× 10 − 4 ).
Moreover, we confirmed that the incidence of HSIL in patients infected with the alpha-9 HPV group was significantly increased compared with the other groups (P < 0.0001). We then further analysed nonsynonymous mutations of the alpha-9 HPV (HPV16, 31, 33, 52, 58) E6 and E7 genes in the HSIL case and control groups (Table 4). In the case group, 13 variations were observed in the E6 gene, and 19 mutations were observed in the E7 gene. In the control group, 17 and 14 variations were found in the E6 and E7 genes, respectively. For HPV16, the distribution of T7220G (D32E) variation in E6 and A7689G (N29S) in E7 showed a different trend between the case group and control group (P = 0.036 and 0.022) ( Table 4), power (1-β) 0.562 and 0.629. For HPV58, A388C (K93 N) variation can significantly reduce the risk of HSIL and was a protective factor (P = 0.015), power (1-β) 0.624. In the remaining three types of alpha-9 HPV, no significant differences in the distribution of other variations between the case group and the control group were found. In addition, we performed covariation analysis of five HPVs E6 and E7 genes in the α-    Table S1).
The genome variations in humans and HPV may influence any stage of HPV infection by inducing cervical cancer [27]. For E6, the T7220G (D32E) variation in HPV16 E6 was a risk factor that increased the incidence of HSIL, whereas A388C(K93 N) variation in HPV58 E6 significantly reduced the risk of HSIL. Previous studies have shown that the susceptibility to cervical disease is increased by the specific protein interaction HPV16 E6 (L83 V)-p53 (Arg-72, [28]. Moreover, the gene variant T350G of HPV-16 was found to display more efficient degradation of Bax and binding to the E6 binding protein [29]. We found that E7 was highly conserved in the HSIL group compared to the <HSIL group, and A7689G (N29S) in E7 significantly increased the risk of HSIL. While the HPV16 A4 sublineage (P < 0.0001) and HPV16 E7 29S (P = 0.0002) rarely occurred in cancer patients compared to women with cervicitis in Vietnam [30]. HPV16 E7 S63F was significantly different between the case and control groups (P = 4.861 × 10 − 10 ) in a Han Chinese population [31]. The T20I/G63S substitutions in HPV16 A3 E7 significantly increased the risk for HSIL in Taizhou area, China [32]. In one word, HPV sublineage and variation dispersal was population-specific, and we should develop different screening and treatment schemes according to the distribution of HPV variation in different regions. Due to the limitation of sample capacity, we should increase the sample size to confirm the role and mechanism of these mutations in the development of cervical cancer in Shanghai area or south China.
In current study, the E7 gene had significantly fewer nonsynonymous variants in the HSIL compared to LSIL and IF groups (P = 3.17× 10-4). Lisa Mirabello et al. confirmed hypovariation in that E7 had significantly fewer, rare non-silent genetic variants in cancers (P = 6.13× 10 − 5 ) compared to E6 [33]. Previous studies have reported that the HPV16 E7 protein leading to cervical cancer is virtually invariant, and E7 displayed a fully conserved sequence [34,35]. In summary, E7 variation greatly decreases the risk of CIN and invasive cancer.

Conclusions
In this study, we focused on the phylogeny and polymorphism of 14 HPV variants based on the E6 and E7 genes. In addition, we also found that the E7 gene lacked significant genetic variation in CIN, and which was strict conservation in the HSIL. This comprehensive analysis will help us understand the clinical and biological effects of sequence changes and provide preventative/therapeutic interventions for HPV-related CIN and cervical cancer.
Additional file 1 Table S1. Co-variations analysis of α-9 HPV E6 and E7 gene in the case and control groups.