Non-small cell lung cancer biomarker discovery from unfractionated blood cell pellets.
Ben Herbert, Cameron Hill, Elisabeth Karsten, Dana Pascovici, Rosalee McMahon, Natasha Lucas
Blood biomarker studies are almost exclusively performed on plasma or serum. The blood collection and sample preparation steps are optimised to avoid haemolysis where possible because of the additional dynamic range issues arising from haemoglobin and the potential for artefactual results in immunoassays. In some clinical studies the cell pellet after plasma removal is stored frozen, although the lysis of cells exacerbates the sample preparation challenges for proteomics.
In this study we obtained plasma and frozen cell pellets from 16 patients with clinical stages II – IV non-small cell lung cancer (NSCLC) and 18 age and sex matched healthy controls. For proteomics analysis we have focused on the cell pellets and used a novel sample preparation technique to enable broad coverage of soluble, intracellular and membrane proteins. In a previous study we demonstrated detection and robust quantification from single-shot shotgun-LC-MS analysis of dried whole blood1. We used volumetric absorptive micro-sampling (VAMS) dried blood spot devices and loaded 30µL aliquots of thawed cell pellet sample. The samples were dried, washed, and trypsin digested in situ. Peptides were separated with a one hour gradient and quantified using DIA on a QE-HFX Orbitrap, producing ~3,700 protein IDs for each sample.
There were 508 differentially expressed proteins initially identified. Ingenuity pathway analysis revealed these identified proteins were involved with a variety of functional pathways covering adhesion and migration, as well as a strong network of known cancer-associated cytokines and enzymes. To reduce complexity, proteins were ranked based on area under the curve (AUC), and separately using a boosted regression importance filter.
The first filter produced a short list of markers that were both differentially expressed for the model and had an AUC > 0.9. The second filter calculated the importance rank of each protein using gradient boosting methods. The procedure was repeated 100 times, and the markers were ranked in terms of the number of times they were selected in the top 10 importance rank was recorded for each protein. Using the methods described above a set of 14 markers were identified using the AUC filter, and 13 markers were identified using the importance rank filter that discriminate between NSCLC and healthy controls with 3 markers common to both analyses.
Our rapid and reproducible methods enable the production of high-quality data from small aliquots of complex samples that are typically seen as requiring significant fractionation prior to proteomic analysis.