The burden of cancers associated with HIV in the South African public health sector, 2004–2014: a record linkage study

Introduction The impact of South Africa’s high human immunodeficiency virus (HIV) burden on cancer risk is not fully understood, particularly in the context of antiretroviral treatment (ART) availability. We examined national cancer trends and excess cancer risk in people living with HIV (PLHIV) compared to those who are HIV-negative. Methods We used probabilistic record linkage to match cancer records provided by the National Cancer Registry to HIV data provided by the National Health Laboratory Service (NHLS). We also used text search of specific HIV terms from the clinical section of pathology reports to determine HIV status of cancer patients. We used logistic and Joinpoint regression models to evaluate the risk and trends in cancers in PLHIV compared to HIV-negative patients from 2004 to 2014. In sensitivity analysis, we used inverse probability weighting (IPW) to correct for possible selection bias. Results A total of 329,208 cancer cases from public sector laboratories were reported to the NCR from 2004 to 2014 with the HIV status known for 95,279 (28.9%) cancer cases. About 50% of all the female cancer cases (n = 30,486) with a known status were HIV-positive. PLHIV were at higher risk of AIDS-defining cancers (Kaposi sarcoma [adjusted OR:134, 95% CI:111–162], non-Hodgkin lymphoma [adjusted OR:2.73, 95% CI:2.56–2.91] and, cervix [adjusted OR:1.70, 95% CI:1.63–1.77], conjunctival cancer [adjusted OR:21.5, 95% CI:16.3–28.4] and human papilloma virus (HPV) related cancers (including; penis [adjusted OR:2.35, 95% CI:1.85–2.99], and vulva [adjusted OR:1.94, 95% CI:1.67–2.25]) compared to HIV-negative patients. Analysis using the IPW population yielded comparable results. Conclusion There is need for improved awareness and screening of conjunctival cancer and HPV-associated cancers at HIV care centres. Further research and discussion is warranted on inclusive HPV vaccination in PLHIV.


Introduction
In Africa, 25.7 million people currently live with the Human Immunodeficiency Virus (HIV) as of 2017 [1]. In South Africa, approximately 14% of the population was living with HIV in 2017 [2]. Since the introduction of antiretroviral treatment (ART) in 2004, there has been an increase in longevity amongst people living with HIV (PLHIV) in South Africa [3]. With this increase in longevity and the known association between cancer and HIV, the risk for cancer amongst PLHIV has increased. However, the additional risk of cancer that PLHIV in South Africa have compared to those who are HIV negative in the ART era is not fully documented.
Studies in developed countries have shown a higher burden of non-AIDS-defining cancers (NADCs) amongst PLHIV in the ART era particularly, anal, skin, liver and lung cancer [4,5]. Associated with this is age, race, unavailability of ART in some cases, HIV transmission route, lifestyle related factors and immunosuppression [4,[6][7][8]. However, not all NADCs have exhibited differential rates before and after ART. For example PLHIV have remained at low risk of colon, breast and prostate cancers, leading to the possibility that not all cancers are associated with immunosuppression [9]. In contrast, developing countries still have a higher burden of AIDS-defining cancers (ADCs), namely Kaposi Sarcoma (KS), cervical cancer (CC), and non-Hodgkin lymphoma (NHL). This is largely due to co-infections with oncogenic viruses and possibly, poor access to HIV care including ART [10][11][12].
Studies on HIV and cancer done in South Africa have involved HIV cohorts or case control studies which have limited generalization to the general population [13,14]. The cancer data provided by the National Cancer Registry (NCR) lacks information on HIV status amongst cancer patients as HIV status is not routinely collected in the cancer registry. The South African HIV Cancer Match (SAM) study is a probabilistic record linkage study. It consists of a national HIV cohort created from National Health Laboratory Service (NHLS) HIV laboratory data (CD4 counts, viral load, HIV tests), linked to the NCR data, in order to study cancer risk in HIV positive people [15]. The current study is nested within the SAM study. We aimed to determine the impact of HIV on cancer burden and the cancer risk in PLHIV compared to HIV negative people or the general South African population.

Study setting and design
The NHLS is the largest diagnostic pathology service in South Africa. It provides laboratory and public health services to over 80% of the South African population [16]. This is achieved through a national network of laboratories in all the nine provinces of South Africa. The NHLS' Corporate Data Warehouse (CDW) is an electronic data repository for all public sector laboratory data. The NCR's main mandate is pathology-based cancer surveillance with both private and public laboratories legislated to report all cancer cases to the institution. This was a cross sectional study of all cancers diagnosed in public sector laboratories from 2004 to 2014 with HIV data being obtained from the NHLS' CDW.

Study population, variables and data sources
We included all records of patients diagnosed with cancer in public healthcare laboratories from 2004 to 2014. Cancer diagnosis was coded according to International Classification of Diseases for Oncology (ICD-O-3) excluding all cancer pre-cursor lesions. Since the source of our HIV data was the NHLS, which services the public sector, we excluded cancer records from the private sector. Our rationale was, if a patient accessed cancer care at a private facility, they were more likely to access HIV care at a private facility as well [17]. From our linkage out of the 335,589 cancer records that were reported from the private sector only 1122 had a known HIV result thus supporting our hypothesis.
An individual was considered HIV positive or negative if the HIV diagnostic test result was positive or negative respectively. If the result was indeterminate or neither positive nor negative, the HIV result was regarded as unknown. In addition, HIV monitoring tests such as HIV viral load and CD4 counts were used to assume an HIV positive status. To supplement the NHLS HIV dataset, repeated text mining was done to extract more HIV results from the clinical section of pathology reports on confirmed cases of cancer reported to the NCR. By definition, text mining refers to the drawing out of important and specific information from a block of text [18]. The text mining process involved the use of key terms used to refer or infer HIV status. The key words used included, "HIV", "HIV+" "HIV positive" "AIDS", "haart", "ART", "ARV", "antiretroviral", anti-retroviral", "RVD", "RVD positive", "retroviral disease", "immune suppression", "immunosuppression", "immuno-suppression", "acquired immune-deficiency", "retroreactive", "immunocompromised", "HIV reactive", "CD4", "regimen 1 treatment", "reg 1 treatment "Retroviral disease", "RVD", "HIV", "HAART" and "ARV". From the extracted records a series of samples were taken and reviewed to refine the search terms. Demographic characteristics and potential confounders such as age, gender and race were extracted from the NCR database.

Data management
The HIV and cancer datasets were linked using the in house CDW probabilistic record linkage algorithm. This algorithm is used to link all the laboratory records that belong to the same individual within the entire NHLS database. The linkage variables include name, surname and date of birth. For records to be considered a match, the first letter of the first names should match and two components of the date of birth must also match. First names and surnames are given the same linkage weights (40% each) and the date of birth contributes 20% of the overall weight. For records with a recorded national identity number, exact matching is done and this is used to validate the probabilistic record linkage. Records that attain a score of 90% and above are considered a match. After linkage, duplicates were removed and private sector cancer records were excluded and a final sample of 329,280 records remained.

Data analysis
We determined the characteristics of cancer patients (age, gender, race, cancer type (NADC or ADC) and cancer diagnosis year) by HIV status (positive, negative or unknown) with 95% confidence intervals. To determine the additional risk that PLHIV had of developing specific cancers as per ICD-O-3 coding, logistic regression models were fitted adjusting for age (as a continuous variable), gender (males and females), race (Asian, Black, Coloured and White) and cancer diagnosis year (modelled as a continuous variable).
We assessed trends in cancer risk for selected cancers by plotting yearly crude odds ratios using Joinpoint regression models (Joinpoint Regression Program, Version 4.6.0.0. April, 2018 Statistical Research and Applications Branch, National Cancer Institute). The Joinpoint program allows one to determine if the trend observed is statistically significant or not. In most cases the independent variable is the calendar year. Observed odds ratios (or other parameters such as incidence rate or counts) are joined in straight lines at each time point hence the term joinpoint. The model goes to identify at which time point a significant change in trend is observed as well as the magnitude of the change (Annual Percentage Change (APC)). Permutation tests are then used to select the final model that better describes the change in trends. To determine the contribution of HIV to the cancer burden in South Africa, we calculated Attributable Risk Fractions (ARFs) using adjusted odds ratios as demonstrated by Newson [20].

Sensitivity analysis
Clinicians are more likely to request an HIV test if the patient is symptomatic, hence creating a selection bias. With high number of missing HIV status, inverse probability weighting (IPW) methods were used as a post-hoc sensitivity analysis to correct for possible selection bias. We created the weights using age, gender, cancer diagnosis year and cancer type similar to the method used by Dryden-Petersen et al. [10].
Analysis was done using Stata version 15 (College Station, TX: StataCorp LP). P-values of less than 0.05 were considered to be statistically significant.

Results
From 2004 until 2014, a total of 329,208 cancers were reported to the NCR by the public sector laboratories. Probabilistic record linkage identified 90,796 HIV results and through text mining of cancer pathology reports an additional 4483 HIV results were found. Of the 95,279 (28.9%) cancer patients with a known HIV status, 46,951 (14.3%) were HIV positive. Amongst PLHIV, cancer proportions were highest between the ages of 25 and 49 (Table 1 below). In contrast, 37% (n = 17,890) of all HIV negative individuals were in the over 60 age group. Across all the HIV status subgroups, the greater proportion of cancers was observed in the Black population at 62.6% (n = 206,286). A general increase in cancer proportions was observed for all cancers irrespective of the HIV status by calendar year. Compared to the HIV negative individuals and those with an unknown status, more ADCs were observed in PLHIV. Throughout the study period, ADCs remained constantly higher than NADCs in HIV positive individuals, (Fig. 1 below).
Correcting for age, gender, race, and year of cancer diagnosis, cancer risk was highest in the HIV positive population for all ADCs (Kaposi sarcoma, NHL, and cervical cancer) with an overall adjusted odds ratio of 4.5 (95% CI =4. 35-4.65). The NHL subtypes Burkitt's lymphoma (adjusted OR: 6.48, 95% CI (5.21-8.07)), Diffuse large B-cell lymphoma (DLBCL) (adjusted OR 2.93 95% CI (2.67-3.22)) and Diffuse immunoblastic large B-cell lymphoma (DILBCL) (adjusted OR 12.1 95% CI (9.02-16.3)). Compared to HIV negative individuals, PLHIV were 0.74 times less likely to develop NADCs (adjusted OR: 0.26, 95% CI (0·25-0.26). As a group, virus-related NADCs were not significantly associated with HIV but most of the HPV-associated cancers such as anal, penile, vulva and lip and Hodgkin's lymphoma (EBV-associated), were high risk in HIV positive individuals [p < 0.0001]. Liver cancer, which is associated with hepatitis viruses, was not significantly associated with HIV. People living with HIV were at a higher risk for Squamous Cell Carcinoma (SCC) of the skin, Basal Cell Carcinoma (BCC), eye, and conjunctival cancers (p < 0.0001). Non-virus related NADCs were also not associated with HIV. The weighted analysis produced results that were comparable to the complete case analysis ( Table 2).
Trends in cancer risk for selected individual cancers varied, with significant increases observed for cervix, anus, vulva, conjunctiva and penis from 2004 to 2014 in PLHIV (Fig. 2 below). Although the APC was not significant for Kaposi sarcoma, there was a substantial There was no shift in burden between ADCs and NADCs observed amongst incident cancers in PLHIV (Fig. 1). Using weighted estimates of the odds ratio, 41% of all ADCs reported between 2004 and 2014 were attributable to HIV. The contribution of HIV on ADCs increased by 22% within the study period (Fig. 3). No particular contribution by HIV towards NADCs as a whole was noted, given the negative ARFs. The same was true for the category virus-related NADCs (Fig. 3), HIV did not seem to contribute to the burden of virus related NADCs amongst PLHIV in the public sector. However, the "protective" effect of HIV has been waning overtime.

Discussion
Over the period 2004-2014 (ART era), the risk of all ADCs and some virus-related NADCs was higher amongst HIV positive individuals compared to those who were HIV negative.   [21]. In both our study and the Stein pre-ART study, the odds ratios were adjusted for age, gender, race and year of diagnosis, which allowed for comparability. Possible explanations for the higher risk in our study is that, until 2016 when the universal test and treat policy was adopted in South Africa, treatment initiation was dependent on CD4 count [22]. In 2004, ART became freely available in the public sector with patients who had CD4 counts of less than 200 cell/ μl or in the WHO stage IV of disease being eligible for treatment [23]. Patients were also evaluated to determine if they were psychological fit to receive the treatment. In 2010 in addition to the 2004 recommendations, those who had a co-infection with TB were also automatically eligible for the free ART [24]. In 2011, the criteria were then expanded to include all patients who had a CD4 count of less than 350 cells/μl [24]. These CD4 count thresholds led to a high proportion of immunosuppressed individuals with a high burden of disease, a risk factor for ADCs [14,22]. As a result, the risk of KS remained elevated even after ART introduction. Moreover, it is possible that the pick-up rate of KS at HIV clinics improved with the expansion of ART and improvements in HIV treatment policies in South Africa hence the greater strength of association observed. Despite the high risk reported for KS in our study, it was lower than reported in other studies particularly those done in the developed countries [4,25]. In South Africa, the prevalence of Human Herpes Virus (HHV8) was high even before the HIV era, therefore creating a high KS background risk [26]. In addition to this, clinical diagnosis of KS is quite prevalent in the African context with no biopsies or other samples being sent to the laboratory [27]. Therefore, under-reporting of KS to the pathology-based cancer registry may have been possible.
In contrast, the risk reported in our study for NHL (adjusted OR: 2.73, 95% CI 2.56-2.91) was lower than the one reported by Stein et.al. (adjusted OR: 6.1, 95% CI 4.4-8.4) which points to a possible reduction in risk of NHL after the introduction of ART [21]. There was no change noted before and after ART in overall cervical cancer risk although an upward trend was observed in the ART era. This is in line with other reports from Africa with various reasons being put forth to account for the increase in cervical cancer risk even with the introduction of ART. These include advanced disease upon ART initiation and older age [28,29]. Another theory that has been put forward is the lack of a relationship between cervical cancer risk and immunosuppression. Some studies have demonstrated that low CD4 counts do not necessarily amount to increased risk of cervical cancer and other HPV-related cancers [29]. As such, restoration of immunity with ART will not necessarily lead to a reduced risk of cervical cancer. In addition, the prevalence of HPV (a known risk factor for cervical cancer) is higher amongst women living with HIV [29,30]. Possible co-infection with HPV has also been highlighted in this study with increased risk amongst PLHIV observed for HPV associated cancers such as vulva, anus, penis and lip. Besides the ADCs and HPV related cancers, we observed other additional cancers were strongly associated with HIV in the ART era. Compared to HIV negative individuals, the risk of conjunctival cancer, Hodgkin's lymphoma and BCC was also higher in PLHIV in our study. Before ART, there were no reports of conjunctival cancer and BCC as being high risk amongst PLHIV in South Africa [21]. The association between conjunctival cancer and HIV has been reported in Africa [29,31]. High rates of solar radiation and unproved associations with HPV have been cited as possible reasons why this cancer is common in Sub-Saharan Africa compared to other parts of the world [29]. Like SCC skin, we observed stronger associations between HIV and BCC. Reports have linked age and white race to higher BCC risk in PLHIV with immunosuppression and increased viral loads only being linked to SCC skin [32,33]. On the other hand PLHIV were less likely to develop virus-unrelated cancers such as breast and prostate which is in line with the literature [4,7,25]. Lower risks were also observed for lung and liver cancers in PLHIV consistent with the results reported by Stein et al. but contrary to other reports especially those done in resource rich areas [4,7,9]. In the resource rich countries, there is a higher prevalence of lifestyle related factors such as smoking which results in lung cancer and increased alcohol intake which results in liver cancer in HIV cohorts [7]. In our study, it is still uncertain why the liver and lung cancer risk was lower in PLHIV compared to HIV negative individuals.
In the ART era, different cancer trends have been observed, with ADCs decreasing upon ART introduction The line graphs were fitted in Joinpoint using crude odds ratios (dots). The annual percentage change in odds ratios was significant (p-value < 0.05) for all cancers selected for in-depth analysis of trends except for Kaposi sarcoma, Burkitt's lymphoma, NHL and Hodgkin's lymphoma in other settings [6,9,25]. In particular, KS has declined with the introduction and expansion of ART hence supporting the association between this cancer and immunosuppression [6,25,34]. In our study following the initial drop in KS risk after ART introduction in 2004, there has not been a significant change in risk amongst PLHIV in the ART era [10]. This is similar to what was reported in a recent study done in Botswana which demonstrated a decrease in KS risk with ART introduction but no significant change with increased roll out of ART [10]. The arguments for this are similar to the reasons why KS risk was reported as higher in our study compared to the pre-ART era, which include HIV treatment policies and improved pick-up rate. The trend in NHL risk exhibited a slight but insignificant decrease over the 11-year period. Whilst some studies have shown decreasing trend in NHL in the ART era in PLHIV others have shown stable trends even with the increased rollout of ART [10,25]. This has largely been because of Burkitt's lymphoma as its incidence has remained constant even in the ART era.
Also showing increasing trends in the ART era were most HPV related anogenital cancers (cervix, anus, penis and vulva). Although anal cancer is on the rise, the risk reported is not as high as observed in developed countries. This is possibly due to the difference in HIV epidemiology between South Africa and developed countries. In the latter, the main mode of HIV transmission is men who have sex with men (MSM) through receptive anal sex where as in South Africa, HIV transmission is mainly heterosexual [4,5]. Co-infection with HPV is higher amongst people living with HIV with the routes of transmission being similar to HIV [35]. Both anal and cervical cancer are associated with HPV, but the different transmission routes will result in more cervical cancer in the African context and more anal cancer in developed countries.
This was the first nationwide study to compare cancer risk amongst the HIV positive and HIV negative people in the ART era. Laboratory confirmation of both cancer and HIV allowed for high specificity of HIV and cancer diagnosis. Although a greater proportion of the HIV status was unknown, the methods used to ascertain HIV status such as probabilistic record linkage and text search ensured that we extracted and matched most of the available HIV records. In addition, probabilistic record linkage allowed us to identify records belonging to the same individual even in the absence of a unique identifier. The greater percentage of black population with HIV and cancer was reflective of the HIV epidemic in South Africa as well as patterns of access to public health services. In addition to this, the use of IPW allowed for assessment of the risk estimates given the possible selection bias due to the high proportion of missing HIV status. The conclusions from the weighted analysis (IPW) were comparable with the complete case analysis. Moreover, women were well represented with enough numbers for cancers that are common in females to be fully analysed.
Despite all these strengths, our study had limitations. Due to its laboratory-based surveillance system, the NCR underreports some cancers that are diagnosed clinically or radiologically like lung and liver cancers. This might potentially result in misrepresentation of association between HIV and these cancers. Although probabilistic record linkage allowed for matching, in the absence of a unique identifier there is still room for some false matches. The national unique identifier remains the gold standard. Another limitation of our study was overrepresentation of the HIV positive individuals. Doctors are more likely to note down the HIV status of a patient if the patient is tested positive. In addition to this, specific cancers such as KS and other symptoms that are known to be associated with HIV are more likely to prompt a clinician to request an HIV test to be done on the patient [36]. This will result in a higher HIV testing and subsequently higher HIV prevalence compared to the general population. Therefore, with the text mining of doctors' clinical notes in pathology-reports, we were more likely to pick up those that were tested positive than those that were tested negative or never tested. As such, our study also shares the same limitations as proportionate incidence ratio studies. The increased risk observed may be a reflection of a higher HIV prevalence resulting in more cancer cases that are associated with HIV in our study population compared to that in the general population. The evaluation of cancer risk in PLHIV as a function of time was not possible in this study. However, through the SAM study determination of cancer risk with a person-time denominator will be possible. Data on other potential confounders such as lifestyle patterns (smoking, alcohol intake, diet and exercise) and other opportunistic infections was also not available. Access to this information would have possibly made the results more robust.

Conclusion
PLHIV have a higher risk for all ADCs and most virus-related NADCs. The risk of anogenital cancers and conjunctival cancer continues to rise in the ART era and suggests that, ART alone is inadequate in reducing cancer in PLHIV. Most of these cancers are HPV-related. Targeted public health interventions for HPV such as screening and expansion of HPV vaccination (for cervical cancer) amongst PLHIV are essential in reducing the burden. To consolidate these efforts, ART expansion and availability as well as retention in care should be strengthened. With the introduction of universal ART treatment in 2016, further decreases in ADCs are expected provided individuals report to health care centres before the HIV disease has advanced.