Percolator

Example

A series of scripts that execute system tests are shipped with the source code. To learn more, refer to the section “TEST PERCOLATOR” in the README file shipped with the source code.

The data sets used in Percolator’s original publication are all available from the Noble lab web site. To test Percolator on those, run the command sequence:


  $ mkdir test; cd test
  $ wget -q http://noble.gs.washington.edu/proj/percolator/data/yeast-01.sqt.tar.gz
  $ tar xvzf yeast-01.sqt.tar.gz
  $ sqt2pin -o pin.xml yeast-01.sqt yeast-01.shuffled.sqt
  $ percolator -X pout.xml pin.xml > yeast-01.psms
  Percolator version 1.15, Build Date Oct 21 2010 10:43:31
  Copyright (c) 2006-9 University of Washington. All rights reserved.
  Written by Lukas Käll (lukall@u.washington.edu) in the
  Department of Genome Sciences at the University of Washington.
  Issued command:
  percolator -E pin.xml -X pout.xml
  Started Thu Oct 21 13:22:36 2010 on unknown_host
  Hyperparameters fdr=0.01, Cpos=0, Cneg=0, maxNiter=10
  Train/test set contains 69705 positives and 69705 negatives, size ratio=1 and pi0=1
  selecting cpos by cross validation
  selecting cneg by cross validation
  Estimating 7086 over q=0.01 in initial direction
  Reading in data and feature calculation took 46.96 cpu seconds or 47 seconds wall time
  ---Training with Cpos selected by cross validation, Cneg selected by cross validation, fdr=0.01
  Iteration 1	: After the iteration step, 10212[1] target PSMs with q<0.01 were estimated by cross validation
  Iteration 2	: After the iteration step, 11256 target PSMs with q<0.01 were estimated by cross validation
  Iteration 3	: After the iteration step, 11564 target PSMs with q<0.01 were estimated by cross validation
  Iteration 4	: After the iteration step, 11667 target PSMs with q<0.01 were estimated by cross validation
  Iteration 5	: After the iteration step, 11706 target PSMs with q<0.01 were estimated by cross validation
  Iteration 6	: After the iteration step, 11714 target PSMs with q<0.01 were estimated by cross validation
  Iteration 7	: After the iteration step, 11716 target PSMs with q<0.01 were estimated by cross validation
  Iteration 8	: After the iteration step, 11741 target PSMs with q<0.01 were estimated by cross validation
  Iteration 9	: After the iteration step, 11742 target PSMs with q<0.01 were estimated by cross validation
  Iteration 10	: After the iteration step, 11747 target PSMs with q<0.01 were estimated by cross validation
  Obtained weights (only showing weights of first cross validation set)
  first line contains normalized weights, second line the raw weights
  lnrSp	 deltLCn deltCn	Xcorr	 Sp	   IonFrac Mass	   PepLen Charge1 Charge2 Charge3 enzN enzC enzInt lnNumSP dM	 absdM	   m0
  -0.454 0.094	 0.586	0.554[2] -0.044	   -0.0211 0.757   -0.647 0.138	  0.0316  -0.0599 1.19 1.22 -1.7   -0.0401 0.246 -0.222[3] -6.23
  -0.244 0.562	 6.71	0.921	 -0.000166 -0.13   0.00125 -0.118 1.35	  0.0632  -0.12	  3.01 2.79 -1.27  -1.29   0.382 -0.601	   7.92
  After all training done, 11621 target PSMs with q<0.01 were found when measuring on the test set
  Found 11621 target PSMs scoring over 1% FDR level on testset
  Merging results from 3 datasets
  Selecting pi_0=0.813 [4]
  Calibrating statistics - calculating q values
  New pi_0 estimate on merged list gives 11854 PSMs over q=0.01
  Calibrating statistics - calculating Posterior error probabilities (PEPs)
  Processing took 81.07 cpu seconds or 82 seconds wall time

Here we have labelled a couple of interesting features in the output, referenced with superscripts above:
[ 1 ] Here the number of PSMs over a q-value of 0.01 are estimated by cross-validation
[ 2 ] The weight of XCorr is positive – indicative of a high xcorr gives a better hit – that is a good indication
[ 3 ] The weight of the absdM is negative – indicative that large differences between observed and calculated mass gives a worse score – that is good
[ 4 ] We estimate that 81.3% of the PSMs are incorrect matches