To the best of our knowledge, this work is the first on the reproducibility of histological and cytological diagnoses from a hospital located in a low-HDI country where a surgical pathology laboratory has been operating autonomously for just over 2 years. Furthermore, this review is inter-departmental, inter-continental, independent, and blinded, which makes it different from many others studies that used an intra-departmental approach [9, 22]. Similarly, but not equally, to the error definition by Renshaw [18], we looked at the hypothetical clinical implications of discordance, and separated them into high hypothetical clinical implications (those in which treatment or prognosis might be different and likelihood of significantly altering patient investigation and treatment), intermediate hypothetical clinical implications (those in which the second opinion was not definitive to modify clinical management substantially), and no significant hypothetical clinical implications (those that did not imply different or relevant approaches to clinical management).
The overall findings showed a substantial concordance between external reviewers from a country with very high HDI and local pathologists from a country with a low HDI, and this concordance was very high in adult general and paediatric/adolescent pathological diagnoses. This indicates that, for the most common pathologies that exist in the Lake Region of Tanzania, there is a very high chance of receiving a reliable diagnosis based on the four diagnostic categories we considered (malignant, benign, inflammatory, or suspicious). We observed discordance with high hypothetical clinical implications in only 6.1% of diagnoses. Moreover, concordance was quite high even for lymphoproliferative diagnoses and for the histological subtypes of lymphoma. This is surprising because the original diagnoses were not supported by immunohistochemistry, which is not available at the local level. However, this may be one reason for the considerable discordance in one case of benign thymoma diagnosed at the BMC, which was diagnosed as a T-cell lymphoma in the Italian review, as well as one case of non-Hodgkin lymphoma and three cases of BL that were missed by BMC pathologists. It is worth mentioning that a standard diagnosis of BL should always be supported by the recommended algorithmic approach of Naresh et al. [15]. This discordance might be associated to a 6.9% of error with high hypothetical clinical implications. The few instances of discordance we observed in general adult pathological diagnoses were not related to the lack of immunohistochemistry. For example, one case of focal prostatic adenocarcinoma diagnosed by BMC pathologists was not mentioned by the Italian reviewer. In our opinion, these pathologists could come to a consensus on this and other cases of discordance on suspicious cases if a consensus conference were possible. To a lesser extent, the discordance observed in PAP tests could be related to the absence of screening programmes at the time of this survey. Moreover, discordance was not related to the misclassification of invasive cervical cancer (in fact it refers to low-grade squamous intraepithelial lesions), and finally, seven and six of 20 smears were defined as technically insufficient or satisfactory with limitations, respectively, by the Italian reviewer.
Technical issues may also be responsible for the worst results and for the high hypothetically clinically significant implications found in the subset of fluid/FNA cytological diagnoses: in fact, none of the cytological slides were considered technically satisfactory by the Italian reviewer, who considered them insufficient or just sufficient for staining, and/or thickness, and/or cellular representativeness of smear. This could be related to the main problems facing the practice of pathology in Tanzania reported by Ngoma and Diwani [16] and Stefan et al. [23], i.e., a low number of histopathology technicians, poor recruitment of these technicians, and difficulty in obtaining and replacing essential reagents.
This analysis has some limitations: it is not a comprehensive evaluation of errors in surgical pathology according to Renshaw [18], because we did not proceed to the second step, which involves revision of discordant diagnoses by the original pathologist and the second reviewer together, and an eventual third step (submission of cases to a third, outside reviewer as a “gold standard” in the event of lack of consensus of the first two observers) in order to classify definitive diagnostic errors. The sample was probably not large enough to detect a reasonable percentage of errors according to Renshaw et al. [22], and finally, we did not assess discordance by other, more detailed, important correlates like grade of differentiation in tumours or specific morphological prognostic characteristics [9].
With regard to the sample size, we assumed that the initial number of 215 diagnoses was appropriate and would represent the best available data on the performance of a surgical pathology laboratory operating in sub-Saharan Africa. According to Cantor [3], the final overall sample of 196 cases seems reasonable to detect an inter-rater agreement in the range of 30 to 50%, with an error margin of 20%, and a kappa of 0.75 against a null hypothesis of a kappa of 0.6 (i.e., that the agreement is substantial, not moderate), which would have required 199 assessments made by two observers to achieve 90% power. Furthermore, as this was intended to be an exploratory survey, we did not evaluate discordance by grade of differentiation in tumours or specific morphological prognostic characteristics.
The rather constrained nature of our diagnostic categories (benign, malignant, inflammatory, and suspicious) reflects the immediate diagnostic needs in a region where chronic inflammatory diseases like tuberculosis, neglected tropical diseases (lymphatic filariasis, soil-transmitted helminthiases, schistosomiasis, [25]), rhinosporidiosis, actinomyces, or virus-related pathologies can cause organ or tissue enlargement or tumour-like clinical presentations. There is also a need to rule out malignancy in the biopsies of relatively frequent pathological fractures in local children suffering from sickle cell anaemia [6], and malignancy in the endometrium in a district with high prevalence of molar pregnancies [10]. Furthermore, PAP tests and cytology are usually assessed in terms of the diagnostic categories we used [12].
The constrained nature of diagnostic categories we used and the exploratory nature of the study also affect its comparability with other comparative reviews of surgical pathology laboratories. Nevertheless, our result of 13% discordant cytological diagnoses with high hypothetical clinical implications is in the range of major discordances found by Kuijpers et al. (9.1 to 19.4%) in their study of a routine review of fluid/FNA samples by expert cytopathologists. In that study, concordance was at 60.10%, while in our survey it was 56.52%. Furthermore, the proportions of discordance with high hypothetical clinical implications, assuming that our definition overlaps the concept of major error in surgical pathology described by Frable [9], in general adult (5.6%), paediatric/adolescent (2.9%), and lymphoproliferative (6.9%) pathological diagnoses were somewhat higher than those reported in the review by Frable [9], in which the reported rates of major errors in all anatomical sites ranged from 0.26 to 5.7%.
Despite of these limitations, and taking into account that our aim was to evaluate basic categorical discordance, we chose to carry out a much less expensive survey instead of performing the exhaustive process of evaluating pathology misclassifications. Again, it is important to note that our observers live on two different continents, including one in a poor-resource environment. Therefore, we think that our findings are of great value and certainly could be used as a reference point for future analyses. We believe that one of the key factors behind the high overall diagnostic concordance we found was the long period of training at the BMC, during which local personnel had the chance to work (and learn) closely with experienced professionals.