The standard service provided by the CPC includes data pre-processing (feature extraction) to asses the quality of the acquired data. It does not include statistical analysis, although it can be requested aside.
Data pre-processing
The distinct metabolite signals have to be detected, extracted and collated in a single matrix, in order to be able to compare, analyse and interpret the results. This process of rendering raw analytical data into a consolidated matrix of features and intensities is called feature extraction. Due to the size and complexity of the datasets, automated tools within computational resources are required.
- NMR spectra are processed using in-house Matlab scripts
- UPLC-MS raw data are processed using Progenesis QI
The resulting matrix is used to asses the anlytical quality of the data.
Statistical analysis
When hundreds of samples are analysed, the matrices produced will be extremly large. The comparison of the intensities of all the extracted feature amongst all samples requires the application of sophisticated mathematical tools. Such comparison can be done using classical univariate statistical approaches but those approaches are likely to produce false positives with respect to the study design. Multivariate statistical approaches are preferred in the metabonomics field, as they can detect correlation patterns between related metabolites and reduce the risk of false positive associations.
Several methods exist for the multivariate statistical analysis of metabolic profiling data matrices. In all of them, the complex dataset is simplified and the sources of variation of the study are highlighted. Unsupervised analyses, such as principal component analysis (PCA), do not require any metadata related to the study and is useful for detecting trends, correlated variables, and outliers. The identication of the features responsible of the differences between the study groups is usually accomplished by the application of supervised analyses methods, such as orthogonal projection to latent structures (OPLS), where models of the data are built using the relevant metadata.