Translational Proteomics: Solving the Reproducibility Riddle
Data analysis in proteomics is not fit for purpose – here’s how we can get on track.
David Chiang |
Proteomics, with its unlimited potential for biomedicine, has so far fallen short. I believe the reason is simple: sophisticated big data is being processed by simplistic bioinformatics with underpowered computers. Novices are dazzled by thousands of proteins characterized at the push of a button. But experts find that it is mostly common proteins that are correctly identified, much of the quantitation is suspect, and – critically – it is hard to tell whether an identification is correct. How can we improve the utility of proteomics for identifying important low-abundance proteins? The trick is to borrow data analysis from numerical data mining in physics, not abstract statistics.
Let’s say we run a pneumonia sample to identify pathogens from proteins with a mass spectrometer. We process a gigabyte file with 50K raw spectra with a fast PC program that identifies and quantifies peptides and proteins from 20 percent of the spectra at 1 percent error. When analysis is so easy, who needs hypotheses or data understanding? We just need “better” software – defined as faster and cheaper and reporting more proteins. Of course, this assumes 1 percent error is enough, a self-estimated error is always robust, and quantity means quality – all of which are obviously incorrect.
Read the full article now
Log in or register to read this article in full and gain access to The Translational Scientist’s entire content archive. It’s FREE and always will be!