A Conserved Fibrinogen and Immune Evasion Gene Signature Predicts Mortality Across Lung Cancer Histological Subtypes: An Interpretable Machine Learning Discovery Study
Fibrinogen and Immune Evasion Predict Lung Cancer Mortality
DOI:
https://doi.org/10.64949/sfa07913Keywords:
lung cancer prognosis, fibrinogen, kynurenine pathway, KYNU, machine learning, SHAP, pan-lung cancer, immune evasion, XGBoost, external validationAbstract
This article is a preprint and has not yet been peer-reviewed. Not for clinical use.
Background
TNM staging systematically misclassifies patients with favourable outcomes as high-risk, with direct consequences for treatment decisions. Whether the molecular drivers of lung cancer mortality are conserved across histological subtypes remains largely uncharacterised.
Methods
An XGBoost classifier was trained on 104 features (100 gene expression probes and four clinical variables) in 440 lung adenocarcinoma (LUAD) patients (GSE68465). A three-model ablation study quantified the independent contribution of gene expression over clinical staging alone. The trained model was applied without modification to an independent cohort (GSE30219, n=287), stratified into adenocarcinoma (n=85) and mixed-histology (n=202) subsets. SHapley Additive exPlanations (SHAP), Kaplan-Meier survival analysis, and decision curve analysis assessed biological and clinical utility.
Results
Clinical staging alone misclassified 49% of surviving patients as high-risk (specificity=0.51). Adding gene expression reduced this to 29% (specificity=0.71), corresponding to approximately 93 fewer incorrect high-risk designations per 1,000 patients. The full model achieved AUC=0.73 in the training test set, with stable external validation in the LUAD (AUC=0.71) and mixed-histology (AUC=0.69) subsets. When applied to the mixed-histology subset, the LUAD-trained model maintained near-identical biological feature hierarchies, with 12 of 14 top SHAP features shared across all three evaluation cohorts. The dominant signal in both external validation cohorts was the fibrinogen chain gene pair FGG and FGA, outranking pathological nodal stage and implicating coagulation-mediated tumour immune exclusion as a conserved mortality mechanism. Kynureninase (KYNU), a key mediator of IDO/TDO-mediated immune evasion, was independently recovered as a top predictor. Risk stratification significantly separated disease-free survival in both validation subsets (LUAD: log-rank p=0.044; mixed-histology: log-rank p=0.011).
Conclusion
Gene expression substantially improves survivor identification over clinical staging alone and reveals a conserved molecular signature of lung cancer mortality transcending histological boundaries. The dominance of fibrinogen and kynurenine pathway components across multiple histotypes suggests histotype-independent mechanisms of aggressiveness with direct implications for prognostic stratification and therapeutic targeting.
References
1. Sung H, Ferlay J, Siegel RL, et al. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin. 2021;71(3):209-249. doi:10.3322/caac.21660 DOI: https://doi.org/10.3322/caac.21660
2. Goldstraw P, Chansky K, Crowley J, et al. The IASLC lung cancer staging project: Proposals for revision of the TNM stage groupings in the forthcoming (eighth) edition of the TNM Classification for lung cancer. Journal of Thoracic Oncology. 2016;11(1):39-51. doi:10.1016/j.jtho.2015.09.009 DOI: https://doi.org/10.1016/j.jtho.2015.09.009
3. Chen T, Guestrin C. XGBoost: A scalable tree boosting system. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Vols 13-17-August-2016. Association for Computing Machinery; 2016:785-794. doi:10.1145/2939672.2939785 DOI: https://doi.org/10.1145/2939672.2939785
4. Lundberg SM, Allen PG, Lee SI. A Unified Approach to Interpreting Model Predictions. In: Guyon I, Luxburg UV, Bengio S, et al., eds. Proceedings of the 31st International Conference on Neural Information Processing Systems. Vol 30. Neural Information Processing Systems; 2017:4768-4777.
5. Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2(1):56-67. doi:10.1038/s42256-019-0138-9 DOI: https://doi.org/10.1038/s42256-019-0138-9
6. Edgar R. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 2002;30(1):207-210. doi:10.1093/nar/30.1.207 DOI: https://doi.org/10.1093/nar/30.1.207
7. Shedden K, Taylor JMG, Enkemann SA, et al. Gene expression-based survival prediction in lung adenocarcinoma: A multi-site, blinded validation study. Nat Med. 2008;14(8):822-827. doi:10.1038/nm.1790 DOI: https://doi.org/10.1038/nm.1790
8. National Center for Biotechnology Information. GSE68465. Gene Expression Omnibus (GEO). May 2, 2015. Accessed June 6, 2026. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE68465
9. Rousseaux S, Debernardi A, Jacquiau B, et al. Ectopic activation of germline and placental genes identifies aggressive metastasis-prone lung cancers. Sci Transl Med. 2013;5(186). doi:10.1126/scitranslmed.3005723 DOI: https://doi.org/10.1126/scitranslmed.3005723
10. National Center for Biotechnology Information. GSE30219. Gene Expression Omnibus (GEO). May 24, 2013. Accessed June 6, 2026. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30219
11. Sean D, Meltzer PS. GEOquery: A bridge between the Gene Expression Omnibus (GEO) and BioConductor. Bioinformatics. 2007;23(14):1846-1847. doi:10.1093/bioinformatics/btm254 DOI: https://doi.org/10.1093/bioinformatics/btm254
12. Ritchie ME, Phipson B, Wu D, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47. doi:10.1093/nar/gkv007 DOI: https://doi.org/10.1093/nar/gkv007
13. Vickers AJ, Elkin EB. Decision curve analysis: A novel method for evaluating prediction models. Medical Decision Making. 2006;26(6):565-574. doi:10.1177/0272989X06295361 DOI: https://doi.org/10.1177/0272989X06295361
14. Kaplan EL, Meier P. Nonparametric Estimation from Incomplete Observations. J Am Stat Assoc. 1958;53(282):457-481. DOI: https://doi.org/10.1080/01621459.1958.10501452
15. Davidson-Pilon C. lifelines: survival analysis in Python. J Open Source Softw. 2019;4(40):1317. doi:10.21105/joss.01317 DOI: https://doi.org/10.21105/joss.01317
16. Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep. 1966;50(3):163-170.
17. Pedregosa F, Michel V, Grisel O, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825-2830.
18. Collins GS, Reitsma JB, Altman DG, Moons KGM. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD Statement. BMC Med. 2015;13(1). doi:10.1186/s12916-014-0241-z DOI: https://doi.org/10.1186/s12916-014-0241-z
19. Zhong H, Qian Y, Fang S, Wang Y, Tang Y, Gu W. Prognostic value of plasma fibrinogen in lung cancer patients: A meta-analysis. J Cancer. 2018;9(21):3904-3911. doi:10.7150/jca.26360 DOI: https://doi.org/10.7150/jca.26360
20. Zhang K, Xu Y, Tan S, Wang X, Du M, Liu L. The association between plasma fibrinogen levels and lung cancer: A meta-analysis. J Thorac Dis. 2019;11(11):4492-4500. doi:10.21037/jtd.2019.11.13 DOI: https://doi.org/10.21037/jtd.2019.11.13
21. Palumbo JS, Talmage KE, Massari J V., et al. Platelets and fibrin(ogen) increase metastatic potential by impeding natural killer cell–mediated elimination of tumor cells. Blood. 2005;105(1):178-185. doi:10.1182/blood-2004-06-2272 DOI: https://doi.org/10.1182/blood-2004-06-2272
22. Opitz CA, Litzenburger UM, Sahm F, et al. An endogenous tumour-promoting ligand of the human aryl hydrocarbon receptor. Nature. 2011;478(7368):197-203. doi:10.1038/nature10491 DOI: https://doi.org/10.1038/nature10491
23. Opitz CA, Litzenburger UM, Sahm F, et al. An Endogenous Ligand of the Human Aryl Hydrocarbon Receptor Promotes Tumor Formation.
24. Pignon JP, Tribodet H, Scagliotti G V., et al. Lung adjuvant cisplatin evaluation: A pooled analysis by the LACE collaborative group. Journal of Clinical Oncology. 2008;26(21):3552-3559. doi:10.1200/JCO.2007.13.9030 DOI: https://doi.org/10.1200/JCO.2007.13.9030
Downloads
Published
Data Availability Statement
Data Availability Statement All gene expression data are publicly available through the NCBI Gene Expression Omnibus under accession numbers GSE68465 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE68465) and GSE30219 (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30219). Analysis code is available from the corresponding author upon reasonable request.Issue
Section
License
Copyright (c) 2026 Simbarashe G. Magwenzi

This work is licensed under a Creative Commons Attribution 4.0 International License.
