Программное обеспечение

Twister: a top-down driven approach to de novo protein sequencing

Twister is a software tool for de novo sequencing of proteins and peptides from tandem mass spectra. Given a set of deconvoluted top-down tandem mass spectra, it first generates a set of de novo sequences, and subsequently combines them into a set of aggregated paths. 

This is an \alpha-version; the first official version will be released upon acceptance of the respective paper. Prior to that, the version available via this web page will be regularly updated, and the adopted changes will be appropriately reflected in the present description of the tool. However, no log of changes will be explicitely maintained.

Twister will be gradually extended so as to handle combined sets of top-down and bottom-up spectra; this is reflected by the option keys. 

Based on the tag generation strategy applied by Twister, we developed the tag convolution approach to validating amino acid sequences, and implemented it in a standalone software tool.

Usage 

java -Xmx2G -jar Twister.jar -d=input directory [options]

Input

The input comprises a set of deconvoluted MS/MS spectra stored in the msalign format supported by the tool MS-Deconv, which we thus recommend to use for deconvolution. 

All the msalign files from the unput directory will be considered as input.

Options

-k, --tag-length <integer value>

Specify the length of the tags to be extracted from  the input spectra.

Default: 4

-e, --mass-tolerance <float value>

Specify mass tolerance in mDa (please do not specify the units).

Default: 4mDa

-r, --peak reflection <0|1>

If set to 1, the peak reflection procedure will be applied prior to tag generation. If set to 0, no peak reflection will be performed.

Default: 1

-m, --modifications <0|1>

If set to 1, the peaks supposed to correspond to water loss ions will be removed from each spectrum. If set to 0, all the peaks will be kept.

Default: 1

-a, --aggregated string formation <0|1>

If set to 1, the aggregated strings will be computed from the de novo strings. If set to 0, no aggregated string formation will be performed.

Default: 1

-details, -- details on the aggregated string formation <0|1>

If set to 1, the IDs of the spectra contributing to an aggregated string will be reported, for all the obtained aggregated strings. Can be set to 1 only if 'a' is set to 1. 

-gapped, --gapped string formation <0|1>

If set to 1, the gapped strings will be computed from the aggregated strings. If set to 0, no gapped string formation will be performed.

Default: 0.

-bu, -- type of the input data <0|1>

If set to 1, a high-resolution bottom-up data set is expected as input, and cysteines are supposed to be either carbamidomethylated or unmodified.

Output

In the input directory, a subdirectory "results-Twister" is created, which contains the output files. The name of each output file starts with the short name "InputFolder" of the input folder.

Test datasets

Top-down

Carbonic anhydrase 2 (CAH2)raw (436Mb), mzXML (80Mb), msalign (6Mb), sample output CAH2_k4.de-novo.txt and CAH2_k4.aggregated.txt for the default parameter settings, and CAH2_k4.contributing-spectra.txt for 'details' set to 1 and the default settings of other parameters.

Alemtuzumabraw (966Mb), mzXML (181Mb), msalign (19Mb), sample output alemtuzumab_k4.de-novo.txt and alemtuzumab_k4.aggregated.txt for the default parameter settings, and alemtuzumab_k4.contributing-spectra.txt for 'details' set to 1 and the default settings of other parameters.

For either dataset, the mzXML files were obtained from the raw files using ReAdW, and then passed to MS-Deconv. The resulting set of msalign files represents an input for Twister.

Suppose the CAH2 dataset is stored in the folder /data/CAH2. From the folder, in which the file Twister.jar is located, the sample output file CAH2_tdK4.de-novo.txt can be generated with the command

java -Xmx2G -jar Twister.jar -d="/data/CAH2"

or the one explicitely setting all the parameters to their default values:

java -Xmx2G -jar Twister.jar -d="/data/CAH2" -k="4" -e="4" -r="1" -m="1"

or any one directly setting some of the parameters "k", "e", "r", "m" to their default values

The file CAH2_k4.de-novo.txt will appear in the output folder "data/CAH2/results-Twister", which will be created if it has not existed.

Bottom-up

Carbonic anhydrase 2 (CAH2)raw (436Mb), mzXML (720Mb), msalign (55Mb).

HeLaraw (769Mb), mzXML (131Mb), msalign (5Mb).

Publications and posters

K. Vyatkina, S. Wu, L. J. M. Dekker, M. M. VanDuijn, X. Liu, N. Tolic, M. Dvorkin, S. Alexandrova, T. M. Luider, L. Pasa-Tolic, and P. A. Pevzner, "De novo sequencing of peptides from top-down tandem mass sepctra", Jornal of Proteome Resarch, 14(11), pp. 4450-4462, 2015.

K. Vyatkina, S. Wu, L. J. M. Dekker, M. M. VanDuijn, X. Liu, N. Tolic, T. M. Luider, L. Pasa-Tolic, and P. A. Pevzner, "Top-down analysis of protein samples by de novo sequencing techniques", Bioinformatics, 32, pp. 2753-2759, 2016.

 

K. Vyatkina, L. J. M. Dekker, S. Wu, M. VanDuijn, V. Demyanyuk, X. Liu, M. Dvorkin, S. Alexandrova, N. Tolic, T. Luider, L. Pasa-Tolic, and P.A. Pevzner, "A Top-Driven Approach to De Novo Sequencing of Proteins", presented at the 62nd ASMS Conference on Mass Spectrometry and Allied Topics, June 15-19, 2014, Baltimore, MD, USA. (poster)