How to interpret your results/reports?
General ideas to interpret MS result
What do we do to your sample(s)?
Whether for in-solution or in-gel samples, we reduce the disulfide bonds of proteins using DTT; and then alkylate the free -SH using IAA. Then we add trypsin to do the digestion. Trypsin cuts your protein into lysine or arginine C-terminated peptides. Then the tryptic peptides are subjected to tandem MS analysis. The peptides are ionized and fragmented inside MS and all the fragment ions are recorded in a MS/MS spectrum.
A database of all known proteins of specified species would be in-silico digested into theoretical MS/MS spectra. A database search engine is used to align the theoretical spectra and MS-acquired spectra and find the best matched one for the “real” spectra. One or multiple scores are produced to measure the similarity between the spectra and the best peptide match. The best match is the most possible candidate peptide. From this sense, there is no right or wrong, only more or less likely to be right or wrong.
Then peptides are assembled into proteins. Note that in the digestion procedure, a lot of protein sequence information is lost. For example, many proteins have the same consensus sequence, however, it would be impossible to retrieve from which protein a pepides arises. In order to solve this problem, the concept of proteingroup is introduced.
What is Protein Group?
We group proteins sharing the same identified peptides together into one protein group. In one protein group, all proteins have the same or less number of the identified peptides. There is no point to list all proteins as separate items, because no more information would be provided. For example, protein A has four identified peptides; a,b,c and d, while protein B has peptide c and d. Protein A and B would be grouped together. The protein group assigning algorithm is actually much more complicated than our simple situation.
Unique peptide and peptide hits
We use HPLC to do the separation before the MS. The configuration allows a peptide to be fragmented multiple times, which allows multiple spectra to be recorded. The number of peptide-spectrum matches is called peptides hits or spectra hits, while unique peptide is the unique peptide sequence by removing the redundancy from the peptide hits. Peptide hits is very useful to describe the relative abundance of a protein, the larger the number, the higher abundance of the protein. While unique peptide somewhat determines the sequence coverage of corresponding proteins, it also indicates the confidence of protein identification.
What is Razor peptide?
A razor peptide is a peptide that has been assigned to the Protein Group with the largest number of total peptide identified (IDs). If the razor peptide is also unique it only matches to this single Protein Group. If it is not unique, it will only be a razor peptide for the group with the largest number of peptide IDs.
Let's say you have identified a peptide that matches to Protein Group A and Protein Group B. Let's assume that proteinGroup A is already identified with 5 additional peptides while proteinGroupB has not yet been identified with any peptide. So should you assign your peptide to group A or B (or both)? Occam's razor principle tells us that we should not make unnecessary assumptions. It is not necessary to assume that ProteinGroup B is present in your sample because you can explain all peptide IDs with the presence of proteinGroup A already. Your peptide is therefore assigned to proteinGroup A as a razor peptide. MaxQuant will also assign it to proteinGroupB for your information, but not as a razor peptide. Note however, that proteinGroupB will only show up in the proteinGroups file if it is also identified by at least one unique peptide (default settings in identify). In this way, MaxQuant will always generate the shortest proteinGroup list that is sufficient to explain all peptide IDs. Note that every peptide sequence is a razorpeptide for one proteinGroup only.
False Discovery/Positive Rate (FDR/FPR)
We use search engine to do the “match“ job, and score-based results inevitably have some mismatch (false positive) due to the quality of MS/MS spectrum. Strategies are employed to evaluate the false positive rate (FPR). The most frequently used one is target-decoy database search. In this the search is against a combined database composed of a normal (target) fasta sequence and a reversed (decoy) fasta sequence. If a peptide is aligned to a decoy database, it is considered as a false positive to the target search. We calculate the false discovery rate (FDR) by dividing number of decoy matches by number of target matches (or similar way). We usually use 1% FPR as filtering criteria for large scale datasets, e.g. large scale in-solution digestion for protein identification, then the result has 99% true positives. Although this strategy works efficiently for large scale proteomics, it is not appropriate in the gel band identification because of the small number of IDs for statistical analysis.
Sometimes people would take the risk of increasing the FDR to get more identifications.
Why are there so many proteins in my gel band?
Usually, in one gel band , your target protein as well as some contaminants, like keratin – inevitable in the protein digestion process, and other low abundant proteins will also be present.
How to interpret the table of your results
For LTQ identification
For protein identification we usually use the LTQ (linear ion trap) MS to do the analysis which has the required sensitivity and accuracy. We use mascot to do the database search. The search result is then parsed by in-house software.
We will send you a file called “combined.xls”. In this file, each proteingroup is index with "$". If there is more than one protein in one group, proteins will be further numbered as $1-1, $1-2.... The following lines below the protein/protein group line (marked by $) are all the identified peptides from MS2 spectra, with additional information like mass difference, pI, MW etc (all the column names). All this info is also provided in the file combined_peptides which is the filtered database search result from MS raw data.
For LTQ-Orbitrap identification and PTM analysis
For PTM analysis we choose the LTQ-Orbitrap MS due to its higher mass accuracy, which is good to localize PTMs.
We use MaxQuant to do the database search.
We send you a file called proteinGroup.xls which is the summary of proteinGroups identified. For PTM analysis, another file of related modification would also be included.
Reference:
Elias, J. E., et al., Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations, Nature Methods 2 667-675 (2005).
G. Wang, et al. (2009), "Decoy Methods for Assessing False Positives and False Discovery Rates in Shotgun Proteomics", Anal Chem. 81(1):146-159