Multi-block, Multi-set, Multi-level and Data Fusion Methods
Course Description
This course considers a variety of model forms that can be used on data sets that do not fit the conventional 2-way approaches, such as Principal Components Analysis (PCA) and Partial Least Squares (PLS) regression, or multi-way approaches, such as Parallel Factor Analysis (PARAFAC) and Multi-way (unfold) PCA (MPCA). This includes instances where data is in distinctly different blocks that share a common mode, in which case the multi-block variants of PCA and PLS may be appropriate. Another case is when the data is semi-batch, such as where processes are run for periods of time and then “reset” e.g. when catalysts are regenerated or the process equipment is cleaned. In these cases tools such as Simultaneous Components Analysis (SCA) or Multi-level SCA (MLSCA) may be used to understand the difference between runs and the variation within runs. Multi-level PLS variants are also potentially useful. Finally, instances where data sets consisting of blocks with different numbers of modes must be fused are considered. For example a 3-way data set might share a mode with a 2-way data set such as when a number of batch data records must be related to multiple quality parameters. In these instances models based on coupled matrix and tensor factorizations (CMTF) may be applied. The course includes hands-on computer time using MATLAB and PLS_Toolbox for participants to understand better the differences between the various options.
Prerequisites
Linear Algebra for Chemometricians, MATLAB for Chemometricians, Chemometrics I -- PCA, and Chemometrics II - Regression and PLS or equivalent experience. Introduction to Multi-way Analysis maybe also be useful.
Course Outline
1. Introduction 1.1 Definition of multi-block, multi-set and multi-level 1.2 Goals of Data Fusion 2. Review of Multivariate and Multi-way Models 2.1 Principal Components Analysis (PCA) 2.2 Partial Least Squares (PLS) Regression 2.3 Multi-way (unfold) PCA 2.4 Parallel Factor Analysis (PARAFAC) 3. Multi-set and Multi-Level Models 3.1 Simultaneous Components Analysis (SCA) 3.2 ANOVA SCA (ASCA) 3.3 Multi-level SCA (MLSCA) 3.4 Multi-level PLS (MLPLA) 3.5 Example and hands-on exercises 4. Data Fusion Models 4.1 Coupled Matrix-Tensor Factorizations (CMTF) 4.2 Approaches for identifying CMTF models 4.3 Alternating Least Squares (ALS) 4.4 Direct optimization 4.5 Examples and hands-on exercises 5. Conclusions