Statistical Analysis Strategy and Methods
The primary objective was to identify cofactors for cKS among people with KSHV infection. The secondary objective was to identify variables that distinguished KSHV seropositive from KSHV seronegative people without cKS. To address these objectives, KSHV seropositive controls were used as the referent group, and the multinomial logistic regression procedure was used to calculate the odds ratio (OR) and 95% confidence interval (CI) for each variable's association with cKS and, among the controls, with KSHV seronegativity.
As described , weights were included in each regression model to adjust for the multi-stage sampling of the controls. Base weights were calculated as the product of the reciprocal of the selection probabilities at each stage of sampling. Non-response adjusted weights were then calculated as the product of these base weights and cross-classified categories of age, gender, and (for controls) region (eastern/western Sicily). These non-response adjusted weights were further adjusted by using post-stratification to constrain the weights to reflect the population totals by age, gender and six zones (three community sizes × 2 regions). These non-response/post-stratification-adjusted weights, that were rescaled to sum to the sample sizes of the cases and controls, are the final sample weights for each participant's data. PROC MULTILOG in SUDAAN statistical software (SAS-Callable SUDAAN Release 10.0.1, Research Triangle Institute) was used to conduct weighted multinomial logistic regression analyses that incorporated the sample weights and accounted for the stratified cluster sampling of the controls.
Prior to considering plant and soil exposures, a core model was developed with 5 variables: sex and age category (<68, 68-74, 75-80, ≥81 years) to account for matching variables, plus diabetes, use of oral or topical corticosteroid medication in past 10 years, and cigarette smoking (current, former, never). Cumulative time working with plants or soils, previously noted to be associated with elevated KSHV seroprevalence among women [9, 10], was considered but not retained in the core model. All plant and soil analyses were built on this core model, and all models included the identical participants. To assess confounding, plant and soil models were repeated with exclusion of the one core variable (diabetes) found to be associated with KSHV seronegativity. History of asthma , level of attained education , and both of these were added to the final model to further assess possible confounding or effect modification.
To evaluate how exposures to multiple plants might relate to cKS risk, three dimension-reducing methods were employed. Total contacts with all 20 plants, assuming values of 0, 2, 20, 200, and 2000 for each plant for the exposure categories (zero, <10, 10-100, 100-1000, >1000), were summed (range of values, 0 - 23,224) then divided into quartiles for regression analysis.
Factor analysis uses covariance relationships among multiple observed variables to generate a few underlying, but unobservable, quantities called factors. Four factors were generated with an orthogonal rotation method (VARIMAX and PROC FACTOR, SAS Institute, Cary, NC) based on the proportion of variance explained in the exposures to the 20 plants. These factors were labeled descriptively (Asteraceae, Euphorbia/Datura/Agave, Hypericum, and food/beverage/gladiolus) based on the interpretation of the factors from their factor loadings. The score for each factor was dichotomized at its median value for inclusion as an independent variable in the multinomial regression analysis.
PROC FASTCLUS in SAS was used to partition participants into clusters based on the Euclidean distances computed from the levels of contact with the 20 plants. The uncommon clusters, labeled C (high exposures including Hypericum and Euphorbia) and B (high exposures to plants other than Hypericum and Euphorbia), were compared to the more common cluster (relatively few plant exposures).
For 14 typical soils, the likelihood of each participant's exposure was categorized as none (childhood community with zero for soil or luminescence), low (<median of non-zero luminescence-weighted soil value) or high (≥ non-zero median). For two widely distributed soils (lithosol and eutric regosol) that were present in nearly all communities (<200 controls with zero exposure), tertiles of luminescence-weighted values were used. One uncommon soil (gleyic arenosol) was dichotomized as any versus no exposure.
Lastly, all 20 plants in levels (zero, <100, ≥100 contacts; except any/none for Datura stramonium, Euphorbia characias euphorbiaceae, Hypericum perforatum guttiferae, and Hypericum hiricinum to which fewer than 20 participants reported ≥100 contacts) and all 17 soils (classified as in the preceding paragraph) were included in a backward-elimination stepwise regression model. In addition to 5 variables in the core model, individual plants and soil with P
trend ≤ 0.15 were retained. As a sensitivity analysis, childhood residential soil exposures were substituted with adulthood soil exposures. Overlaps of the soils that were strongly associated with cKS risk were illustrated (Figure 1E and 1F). In all models, P ≤ 0.05 was considered statistically significant.