|
Sample preparation and data validation in Proteomics
8 – 12 March 2010
Warsaw, Poland
Organiser:
Michal Dadlez: Institute of Biochemistry and Biophysics, Polish Academy of Science, Warsaw, Poland
Draft
Report
Summary
The ESF School entitled “Sample preparation and data validation in Proteomics” was aimed at
increasing awareness and capabilities of researchers in mass spectrometry based proteomics leading
to better designing of biological experiments. The course was focused on the data analysis (including
statistical methods in data analysis), with elements of sample preparation methods – critical
elements of a successful experiment taking advantages of mass spectrometry. Although the course
was opened to all interested researchers, it was especially addressed to researchers already involved
in proteomic studies, who do not directly work at MS facilities, but have prior experience with e.g.,
protein purification. 21 participants were selected from several European and non European Union
countries to attend the meeting. They mostly originated from Academic environments, however two
of them were representing business orientated organszation – Innovation Centre. 12 of the
participants were PhD students, 8 were having a PhD diploma and 1 was a full professor.
The meeting schedule was briefly divided into morning sessions, which were mostly focused on
introduction lectures, and afternoon sessions providing hands-on training on various elements of
proteomic experiment cycle. In the evenings there were additional sessions for participants who
wanted discuss their individual experiments being outside the curriculum.
The outcomes so far are threefold. First of all participants became more aware of mass
spectrometry as a technique in modern molecular biology, its advantages and limitations. Secondly,
some of them introduced changes in the design of their experiments to produce more reliable,
higher quality data. Participants, who previously cooperated with us, learned how to efficiently
analyse the data by extraction of valuable information and being sensitive to false positives. Third,
we have begun a cooperation with 3 participants. Scientific
Content
The school was well–timed and the curriculum was carefully elaborated on the basis of our vast
experience. We have always begun with short theoretical introduction followed by simple exercises
that would rather engage and encourage, smoothly moving into more advanced tasks. The first day
of the course was generally assumed to be participants arrival day, with only one talk in the afternoon,
sample preparation session and establishing organisational issues. The meeting started off with a
Introductory talk by Prof. Michal Dadlez who presented an overall design of proteomic experiment
and how the protein identification is being achieved. He emphasised how not to produce large amount
of useless proteomic data. He depicted also a role of statistics in analysing mass spectrometry data
showing an example of mass fingerprint experiment, when from randomly selected lists of masses it
is still possible to identify a protein. He also introduced idea of quantitative approaches both using
labelling techniques and label free approaches. Then there was a short sessions regarding
organisation issues – we have discussed the agenda, course curriculum and shortly presented the
Institute, where the course took place. Janusz Debski explained the idea of the meeting, indicating
that mass spectrometry is often considered as a black box, where samples are delivered to core
facility, somehow processed and produce some results, with which it is not clearly know what should
be done next. This situation is quite common, as mass spectrometry experiment composes of
number of steps requiring different expertise. During the first day we have also collected and started
to process participants samples.
The second day of the meeting was started with lecture introducing the concept of proteome,
proteomics and the role of mass spectrometry in proteomics. Participants have learnt what are the
differences between classical and proteomic approach to protein analysis and different
measurement techniques were described – intact protein measurement, top-down and bottom-up
approach. Also sample preparation issues were discussed – what are the expectations from an ideal
protocol, what are the most efficient ways for sample clean up from unwanted / interfering
compounds, methods for protein solubilisation and what enzyme should be chosen in regards to the
aim of the experiment. We have presented non-standard, but very promising methods to achieve
better insight into protein composition of the sample like FASP – Filter Aided Proteome Preparation,
MudPIT – Multidimensional Protein Identification Technology, Isoelectric focusing of the peptides. An
examples have been show how the above elements influence experimental conditions and how slight
changes in the design can dramatically change output. Next lecture related to fundamentals of mass
spectrometry as a technique, explaining crucial parameters like mass accuracy, sensitivity, m/z and
how they correspond to biological experiments, and also what are the main types of different mass
spectrometers, explaining which mass analysers are dedicated for which type of the experiment.
Afterwards we discussed various strategies of protein identification beginning with peptide mass
fingerprint and more advanced peptide fragmentation methods. As the protein identification is
directly bound to data analysis software, the lecture covered topics of most commonly used search
engines, with special emphasis on Mascot, which now became a gold standard. After the lectures,
participants had practical exercises with data analysis software – Mascot – for the first day they were
given data originating from sample protein mixture, which was easy to handle. Their aim was to
produce a list of reliable protein hits. They have played with different Mascot parameters, observing
how they influence the results. A very important point was that rather than just pointing the most
probable hits, they had to explain how did they come to such conclusion. Thanks to that they really
had to understand the whole process of protein identification. This part of the course was very
interactive, as participants were asking many questions.
The third day begun with a lecture, where we have moved from the user friendly Mascot search
results output into a raw spectra – the participants have learnt how to read MS spectrum of the
peptides and of the proteins, we have explained the concept of isotopic and charge envelope, why
one peptide can generate number of spectra, what is the resolution of the mass spectrometer, why it
is important in measurements of biological samples, how intensity of the spectra depends on the
peptide concentration and what is the dynamic range. Directly afterwards course students were
given the first set of exercises – different MS spectra to be interpreted – they have to calculate mass
to charge value of the molecules, how to perform spectra deconvolution from m/z domain into the
mass domain etc. In the next exercise they were dealing with MS spectra of the intact proteins,
which present different features. Importantly all materials were originating from host laboratory
data presenting variety of real problems e.g., after calculating the mass of the intact
protein (recombinant protein produced in bacteria) it appeared that on the spectra were visible two
peaks, instead of only one as expected, having a bit different properties. After analysing the spectra,
they have made a proper conclusion, that those peaks correspond to the same proteins, however
bacteria processed the N-terminus on the protein in two ways. In the next exercise we have switched
to protein identification from the raw data. Participants were asked to predict sequence of the
peptide on the basis of the fragmentation spectra – and identify protein to which this peptide
corresponds. In the reciprocal task, they had to confirm identified protein by digestion simulation of
the protein with a particular enzyme, and check, whether any of these peptides is consistent with
both empirical fragmentation pattern and MS spectrum. The aim of these exercises was to
understand fundamentals of mass spectrometers and to gain some practice in the raw data analysis.
The following session was dedicated to analysis of Mascot search results of a highly complex protein
mixture, where the choices were not that obvious as previously, although this task ended with
producing a list of reliable protein hits. We deliberately have broken Mascot data analysis into two
days, due to large amount of information that participants should assimilate. Next, there was a short
lecture on analysis of post– translational modifications. We have explained what kind of
modifications we recognise in mass spectrometry, how they can be identified and localised within
the protein sequence, and what are the limitations of the technique in this aspect. Than we have
briefly described function and structure the most common modifications (methylations,
phosphorylation, glycosylation, common cystein modifications) and how they can be analysed using
mass spectrometry. Also some structural studies were presented. After the lecture, participants were
given couple of different exercises – at the beginning they were asked to localise peptide
phosphorylation having the raw fragmentation spectra (without any help of search engines), so that
they could precisely observe mass shift of the fragmentation ions, and to compare this spectra to its
non modified counterpart. Afterwards they were analysing Mascot search results of the sample
enriched in phosphopeptides in order to point out phosphorylation sites of the protein. This task was
more demanding, as fragmentation spectra of modified peptides are usually more complex and more
complicated. The last exercise of this day was the most demanding one – we have introduced
participants to Mascot Error Tolerant searches – this is a special kind of search, where most of the
search parameters are released. As a results, number of peptides assigned to protein of choice is few
times longer – it proposes various modifications of the protein, most of which are false assignments.
Participants were asked to propose the full map of various protein modifications. Here, besides
exercising data analysis, important aspect was to look at the protein from the holistic perspective.
Rather than looking at the isolated peptide to confirm modification, it is important to compare it to
other peptides with similar sequence, but different modifications. Error tolerant task appeared to be
especially interesting for participants, as some of them were dealing with similar problems. At the
end we have introduced a bit different approach – where cross analysis of couple of samples might
provide an answer for the question. As an example, participants were asked to briefly predict
primary structure of the protein, indicating what is the length of N’ and C’ terminal domain. To
answer it, they had to analyse 4 independent search results and extract relevant information.
Volunteers, who still had some strength to work carried out one more experiment – they were
trying to predict potential protein complex components. In the first analysis they were analysing
proteins specifically bounded to the bait. In the second analysis, a potential interactor was
immobilised and the flow through fraction was analysed. Proteins cross identified in the experiments
(and absent in the control samples) were considered as a potential candidates.
Next day of the meeting was in principal dedicated to quantitative analysis. At the beginning a label
free quantitation was introduced. We have shown, that some phenomena are easier to observe in
biology from the global point of view, rather from looking for individuals. Special attention was paid
that the participants understood, how the quantitation is being performed at the level of mass
spectrometer, regarding the labelling method, and how to prepare the sample in order to perform
high quality quantitative experiment. We have stressed, that the most important in these approaches
in reproducible sample preparation, because in every analysis, a normalisation procedure is essential
to point out differential proteins. The more accurate we are, the less normalisation is required, the
more sensitive our method is. Second aspect that needs to be taken into account is that we have to
have at least some statistics – it is impossible to perform high quality results without confirmation of
observation in a number of replicates. At the moment many mass spectrometry laboratories use
their own, very often homemade software, so we focused on explaining the principals of label free
quantitation, rather than limiting the scope only to our solutions. Nevertheless after the lecture
participants carried out a short data analysis of the label free quantitative experiment. Due to the
time necessary to perform all of the steps, some of them were prepared in advance. Next lecture
introduced the concept of labelling methods and described those, that are most commonly used:
iCAT, SILAC, iTRAQ, mTRAQ etc. In those approaches there is more unique software available, also
quantitation is possible in Mascot, participant analysed the data provided by one of the them. They
have to point out proteins differing control cell line from stressed cells – in each case they were
analysing in parallel two biological replicates.
During the last day of the meeting, there was one lecture entitled “Emerging MS applications”. It
presented couple of other applications of mass spectrometry, which were beyond the curriculum.
First one considered MSI – mass spectrometry imaging – a relatively new application, where tissue is
cut into a slice (like for microscopy) and directly analysed by mass spectrometer. As a result a spatial
distribution of masses within the tissue is provided, which might monitor changes within the tissue at
the molecular level. Also more detailed applications of top-down sequencing were discussed and non
routine methods like proton–deuter exchange. The rest of the day was scheduled for individual
interactions with participants – they were discussing their experiments and how they should be
adjusted. Assessment of the results & impact of the event on the future direction of the field The primary goal of the school was to familiarise researchers with mass spectrometry as a technique
in modern molecular biology, its advantages and limitations. To reach the maximum impact, we have
organised a course especially for those, who are already involved in proteomic experiments, but do
not directly work at mass spectrometry laboratories. In the course curriculum 3 main layers could be
identified: (i) introductory lectures and exercises to mass spectrometry based proteomics, (ii)
exhaustive exercises in data analysis, ensuring that after the course participants will be capable to
perform mass spectrometry data analysis individually, (iii) interactions with participants to discuss
their scientific projects. Initially the main focus was allocated to the second layer, as it usually causes
most of misunderstandings. To get participants more involved, we have asked them to deliver their
own samples, so instead of working on course materials, they will be dealing with their own data. At
the end, they were analysing the data very efficiently. Participants, who previously cooperated with
us, learned how to prepare samples to face mass spectrometry demands, how to efficiently analyse
the data by extraction of valuable information and being sensitive to false positives. In the first layer
we have put the main emphasis on drawbacks of each technique, so participants who will be
planning to perform mass spectrometry experiments, they will be aware which equipments fit their
needs best, what is doable and how to begin. Surprisingly third layer was greater than our
expectations as it raised a lot of interest. Many of participants had various, very exciting projects
investigating completely different problems ranging from analysis of S-glycosylation, through
prediction of protein complexes to analysis of protein which incorporate egzogennic amino acids
from phagocyted bacteria. There were also in our opinion a few spectacular examples of the course impact, when participants decided to change significantly their experiments or when one said not to
proceed with samples as he realised his experiment was not properly designed. We have also begun
a scientific cooperation with 3 participants. At the end of the meeting there was a suggestion to
organise a similar course, but with limited number of participants and focused on phosphorylation
analysis.
Programme
Monday
13:00 – 15:00 Participants arrival and registration
15:00 – 15:15 Inauguration of the course
15:15 – 16:30 Inauguration lecture
16:30 – 17:00Establishing working groups, general affairs of the meeting, safety
issues
17:00 Get together party
Tuesday
09:00 – 10:00 Introduction to mass spectrometry
10:00 – 11:00 General sample preparation methods for MS measurements. Protein
precipitation and solubilisation techniques.
11:00 – 11:20 Coffee break
11:20 – 13:00 Fundamentals of mass spectrometry part I
13:00 – 14:00 Lunch Break
14:00 – 15:00 Protein identification by mass spectrometry
15:00 – 16:00 Introduction to Mascot – MS data analysis software
16:00 – 16:20 Coffee Break
16:20 – 18:00 Protein identification from low complexity samples
Wednesday
09:00 – 11:00 Fundamentals of mass spectrometry part II
11:00 – 11:20 Coffee Break
11:20 – 12:20 Interpretation of MS and MS/MS spectra
12:20 – 13:20 Search engines
13:20 – 14:00 Lunch break
14:00 – 16:00 Protein identification from highly complex samples
16:00 – 16:20 Coffee break
16:20 – 17:30 Identification of post translational modifications
17:30 – 18:00 Discovery of protein complexes
Thursday
09:00 – 10:00 Introduction to label – free protein quantitation
10:00 – 11:00 Differences in protein profiles originating from technical and
biological variability or originating from different cell conditions.
11:00 – 11:20 Coffee Break
11:20 – 13:00 Application of statistical methods in proteome data analysis
13:00 – 14:00 Lunch Break
14:00 – 16:00 Quantitative analysis of label – free samples
16:00 – 16:20 Coffee Break
16:20 – 18:30 Quantitative analysis of iTRAQ labelled samples
Friday
09:00 – 10:00 Data extraction and protein interactions network building
10:00 – 10:30 MALDI imaging
10:30 – 11:30 Individual discussions – part I
11:30 – 12:30 Brown bag lunch
12:30 – 13:00 Individual discussions – part II
13:00 – 13:20 Thank God – it’s over.
|