- Training Courses
- Workshops
- Grants & Fellowships
- European Conference in Functional Genomics
- Meeting Reports
- Online Registration

 

 

Sample preparation and data validation in Proteomics
8 – 12 March 2010
Warsaw, Poland

Organisers
Report
1. Summary
2. Scientific content
3. Assessment of the results & impact of the event
4. Programme

Organiser:

Michal Dadlez: Institute of Biochemistry and Biophysics, Polish Academy of Science, Warsaw, Poland

Draft Report

Summary

The ESF School entitled “Sample preparation and data validation in Proteomics” was aimed at increasing awareness and capabilities of researchers in mass spectrometry based proteomics leading to better designing of biological experiments. The course was focused on the data analysis (including statistical methods in data analysis), with elements of sample preparation methods – critical elements of a successful experiment taking advantages of mass spectrometry. Although the course was opened to all interested researchers, it was especially addressed to researchers already involved in proteomic studies, who do not directly work at MS facilities, but have prior experience with e.g., protein purification. 21 participants were selected from several European and non European Union countries to attend the meeting. They mostly originated from Academic environments, however two of them were representing business orientated organszation – Innovation Centre. 12 of the participants were PhD students, 8 were having a PhD diploma and 1 was a full professor.

The meeting schedule was briefly divided into morning sessions, which were mostly focused on introduction lectures, and afternoon sessions providing hands-on training on various elements of proteomic experiment cycle. In the evenings there were additional sessions for participants who wanted discuss their individual experiments being outside the curriculum.

The outcomes so far are threefold. First of all participants became more aware of mass spectrometry as a technique in modern molecular biology, its advantages and limitations. Secondly, some of them introduced changes in the design of their experiments to produce more reliable, higher quality data. Participants, who previously cooperated with us, learned how to efficiently analyse the data by extraction of valuable information and being sensitive to false positives. Third, we have begun a cooperation with 3 participants.

Scientific Content

The school was well–timed and the curriculum was carefully elaborated on the basis of our vast experience. We have always begun with short theoretical introduction followed by simple exercises that would rather engage and encourage, smoothly moving into more advanced tasks. The first day of the course was generally assumed to be participants arrival day, with only one talk in the afternoon, sample preparation session and establishing organisational issues. The meeting started off with a Introductory talk by Prof. Michal Dadlez who presented an overall design of proteomic experiment and how the protein identification is being achieved. He emphasised how not to produce large amount of useless proteomic data. He depicted also a role of statistics in analysing mass spectrometry data showing an example of mass fingerprint experiment, when from randomly selected lists of masses it is still possible to identify a protein. He also introduced idea of quantitative approaches both using labelling techniques and label free approaches. Then there was a short sessions regarding organisation issues – we have discussed the agenda, course curriculum and shortly presented the Institute, where the course took place. Janusz Debski explained the idea of the meeting, indicating that mass spectrometry is often considered as a black box, where samples are delivered to core facility, somehow processed and produce some results, with which it is not clearly know what should be done next. This situation is quite common, as mass spectrometry experiment composes of number of steps requiring different expertise. During the first day we have also collected and started to process participants samples.

The second day of the meeting was started with lecture introducing the concept of proteome, proteomics and the role of mass spectrometry in proteomics. Participants have learnt what are the differences between classical and proteomic approach to protein analysis and different measurement techniques were described – intact protein measurement, top-down and bottom-up approach. Also sample preparation issues were discussed – what are the expectations from an ideal protocol, what are the most efficient ways for sample clean up from unwanted / interfering compounds, methods for protein solubilisation and what enzyme should be chosen in regards to the aim of the experiment. We have presented non-standard, but very promising methods to achieve better insight into protein composition of the sample like FASP – Filter Aided Proteome Preparation, MudPIT – Multidimensional Protein Identification Technology, Isoelectric focusing of the peptides. An examples have been show how the above elements influence experimental conditions and how slight changes in the design can dramatically change output. Next lecture related to fundamentals of mass spectrometry as a technique, explaining crucial parameters like mass accuracy, sensitivity, m/z and how they correspond to biological experiments, and also what are the main types of different mass spectrometers, explaining which mass analysers are dedicated for which type of the experiment. Afterwards we discussed various strategies of protein identification beginning with peptide mass fingerprint and more advanced peptide fragmentation methods. As the protein identification is directly bound to data analysis software, the lecture covered topics of most commonly used search engines, with special emphasis on Mascot, which now became a gold standard. After the lectures, participants had practical exercises with data analysis software – Mascot – for the first day they were given data originating from sample protein mixture, which was easy to handle. Their aim was to produce a list of reliable protein hits. They have played with different Mascot parameters, observing how they influence the results. A very important point was that rather than just pointing the most probable hits, they had to explain how did they come to such conclusion. Thanks to that they really had to understand the whole process of protein identification. This part of the course was very interactive, as participants were asking many questions.

The third day begun with a lecture, where we have moved from the user friendly Mascot search results output into a raw spectra – the participants have learnt how to read MS spectrum of the peptides and of the proteins, we have explained the concept of isotopic and charge envelope, why one peptide can generate number of spectra, what is the resolution of the mass spectrometer, why it is important in measurements of biological samples, how intensity of the spectra depends on the peptide concentration and what is the dynamic range. Directly afterwards course students were given the first set of exercises – different MS spectra to be interpreted – they have to calculate mass to charge value of the molecules, how to perform spectra deconvolution from m/z domain into the mass domain etc. In the next exercise they were dealing with MS spectra of the intact proteins, which present different features. Importantly all materials were originating from host laboratory data presenting variety of real problems e.g., after calculating the mass of the intact protein (recombinant protein produced in bacteria) it appeared that on the spectra were visible two peaks, instead of only one as expected, having a bit different properties. After analysing the spectra, they have made a proper conclusion, that those peaks correspond to the same proteins, however bacteria processed the N-terminus on the protein in two ways. In the next exercise we have switched to protein identification from the raw data. Participants were asked to predict sequence of the peptide on the basis of the fragmentation spectra – and identify protein to which this peptide corresponds. In the reciprocal task, they had to confirm identified protein by digestion simulation of the protein with a particular enzyme, and check, whether any of these peptides is consistent with both empirical fragmentation pattern and MS spectrum. The aim of these exercises was to understand fundamentals of mass spectrometers and to gain some practice in the raw data analysis. The following session was dedicated to analysis of Mascot search results of a highly complex protein mixture, where the choices were not that obvious as previously, although this task ended with producing a list of reliable protein hits. We deliberately have broken Mascot data analysis into two days, due to large amount of information that participants should assimilate. Next, there was a short lecture on analysis of post– translational modifications. We have explained what kind of modifications we recognise in mass spectrometry, how they can be identified and localised within the protein sequence, and what are the limitations of the technique in this aspect. Than we have briefly described function and structure the most common modifications (methylations, phosphorylation, glycosylation, common cystein modifications) and how they can be analysed using mass spectrometry. Also some structural studies were presented. After the lecture, participants were given couple of different exercises – at the beginning they were asked to localise peptide phosphorylation having the raw fragmentation spectra (without any help of search engines), so that they could precisely observe mass shift of the fragmentation ions, and to compare this spectra to its non modified counterpart. Afterwards they were analysing Mascot search results of the sample enriched in phosphopeptides in order to point out phosphorylation sites of the protein. This task was more demanding, as fragmentation spectra of modified peptides are usually more complex and more complicated. The last exercise of this day was the most demanding one – we have introduced participants to Mascot Error Tolerant searches – this is a special kind of search, where most of the search parameters are released. As a results, number of peptides assigned to protein of choice is few times longer – it proposes various modifications of the protein, most of which are false assignments.

Participants were asked to propose the full map of various protein modifications. Here, besides exercising data analysis, important aspect was to look at the protein from the holistic perspective. Rather than looking at the isolated peptide to confirm modification, it is important to compare it to other peptides with similar sequence, but different modifications. Error tolerant task appeared to be especially interesting for participants, as some of them were dealing with similar problems. At the end we have introduced a bit different approach – where cross analysis of couple of samples might provide an answer for the question. As an example, participants were asked to briefly predict primary structure of the protein, indicating what is the length of N’ and C’ terminal domain. To answer it, they had to analyse 4 independent search results and extract relevant information. Volunteers, who still had some strength to work carried out one more experiment – they were trying to predict potential protein complex components. In the first analysis they were analysing proteins specifically bounded to the bait. In the second analysis, a potential interactor was immobilised and the flow through fraction was analysed. Proteins cross identified in the experiments (and absent in the control samples) were considered as a potential candidates.

Next day of the meeting was in principal dedicated to quantitative analysis. At the beginning a label free quantitation was introduced. We have shown, that some phenomena are easier to observe in biology from the global point of view, rather from looking for individuals. Special attention was paid that the participants understood, how the quantitation is being performed at the level of mass spectrometer, regarding the labelling method, and how to prepare the sample in order to perform high quality quantitative experiment. We have stressed, that the most important in these approaches in reproducible sample preparation, because in every analysis, a normalisation procedure is essential to point out differential proteins. The more accurate we are, the less normalisation is required, the more sensitive our method is. Second aspect that needs to be taken into account is that we have to have at least some statistics – it is impossible to perform high quality results without confirmation of observation in a number of replicates. At the moment many mass spectrometry laboratories use their own, very often homemade software, so we focused on explaining the principals of label free quantitation, rather than limiting the scope only to our solutions. Nevertheless after the lecture participants carried out a short data analysis of the label free quantitative experiment. Due to the time necessary to perform all of the steps, some of them were prepared in advance. Next lecture introduced the concept of labelling methods and described those, that are most commonly used: iCAT, SILAC, iTRAQ, mTRAQ etc. In those approaches there is more unique software available, also quantitation is possible in Mascot, participant analysed the data provided by one of the them. They have to point out proteins differing control cell line from stressed cells – in each case they were analysing in parallel two biological replicates.

During the last day of the meeting, there was one lecture entitled “Emerging MS applications”. It presented couple of other applications of mass spectrometry, which were beyond the curriculum. First one considered MSI – mass spectrometry imaging – a relatively new application, where tissue is cut into a slice (like for microscopy) and directly analysed by mass spectrometer. As a result a spatial distribution of masses within the tissue is provided, which might monitor changes within the tissue at the molecular level. Also more detailed applications of top-down sequencing were discussed and non routine methods like proton–deuter exchange. The rest of the day was scheduled for individual interactions with participants – they were discussing their experiments and how they should be adjusted.

Assessment of the results & impact of the event on the future direction of the field

The primary goal of the school was to familiarise researchers with mass spectrometry as a technique in modern molecular biology, its advantages and limitations. To reach the maximum impact, we have organised a course especially for those, who are already involved in proteomic experiments, but do not directly work at mass spectrometry laboratories. In the course curriculum 3 main layers could be identified: (i) introductory lectures and exercises to mass spectrometry based proteomics, (ii) exhaustive exercises in data analysis, ensuring that after the course participants will be capable to perform mass spectrometry data analysis individually, (iii) interactions with participants to discuss their scientific projects. Initially the main focus was allocated to the second layer, as it usually causes most of misunderstandings. To get participants more involved, we have asked them to deliver their own samples, so instead of working on course materials, they will be dealing with their own data. At the end, they were analysing the data very efficiently. Participants, who previously cooperated with us, learned how to prepare samples to face mass spectrometry demands, how to efficiently analyse the data by extraction of valuable information and being sensitive to false positives. In the first layer we have put the main emphasis on drawbacks of each technique, so participants who will be planning to perform mass spectrometry experiments, they will be aware which equipments fit their needs best, what is doable and how to begin. Surprisingly third layer was greater than our expectations as it raised a lot of interest. Many of participants had various, very exciting projects investigating completely different problems ranging from analysis of S-glycosylation, through prediction of protein complexes to analysis of protein which incorporate egzogennic amino acids from phagocyted bacteria. There were also in our opinion a few spectacular examples of the course impact, when participants decided to change significantly their experiments or when one said not to proceed with samples as he realised his experiment was not properly designed. We have also begun a scientific cooperation with 3 participants. At the end of the meeting there was a suggestion to organise a similar course, but with limited number of participants and focused on phosphorylation analysis.

Programme

Monday
13:00 – 15:00 Participants arrival and registration
15:00 – 15:15 Inauguration of the course
15:15 – 16:30 Inauguration lecture
16:30 – 17:00Establishing working groups, general affairs of the meeting, safety issues
17:00 Get together party

Tuesday
09:00 – 10:00 Introduction to mass spectrometry
10:00 – 11:00 General sample preparation methods for MS measurements. Protein precipitation and solubilisation techniques.
11:00 – 11:20 Coffee break
11:20 – 13:00 Fundamentals of mass spectrometry part I
13:00 – 14:00 Lunch Break
14:00 – 15:00 Protein identification by mass spectrometry
15:00 – 16:00 Introduction to Mascot – MS data analysis software
16:00 – 16:20 Coffee Break
16:20 – 18:00 Protein identification from low complexity samples

Wednesday
09:00 – 11:00 Fundamentals of mass spectrometry part II
11:00 – 11:20 Coffee Break
11:20 – 12:20 Interpretation of MS and MS/MS spectra
12:20 – 13:20 Search engines
13:20 – 14:00 Lunch break
14:00 – 16:00 Protein identification from highly complex samples
16:00 – 16:20 Coffee break
16:20 – 17:30 Identification of post translational modifications
17:30 – 18:00 Discovery of protein complexes

Thursday
09:00 – 10:00 Introduction to label – free protein quantitation
10:00 – 11:00 Differences in protein profiles originating from technical and biological variability or originating from different cell conditions.
11:00 – 11:20 Coffee Break
11:20 – 13:00 Application of statistical methods in proteome data analysis
13:00 – 14:00 Lunch Break
14:00 – 16:00 Quantitative analysis of label – free samples
16:00 – 16:20 Coffee Break
16:20 – 18:30 Quantitative analysis of iTRAQ labelled samples

Friday
09:00 – 10:00 Data extraction and protein interactions network building
10:00 – 10:30 MALDI imaging
10:30 – 11:30 Individual discussions – part I
11:30 – 12:30 Brown bag lunch
12:30 – 13:00 Individual discussions – part II
13:00 – 13:20 Thank God – it’s over.