@ALDAOperator(genericExecutionMode=ALL, level=STANDARD, allowBatchMode=false, shortDescription="This class implements the Karhunen-Loeve transformation, also known as PCA.") public class PCA extends MTBOperator
Given a data matrix A where each column contains a data vector, first the covariance matrix of the data, i.e., $A\cdot A^T$, is calculated. Then the eigenvalues and eigenvectors of this matrix are computed according to
In case that the dimensionality of the data is larger than the available number of samples, i.e., the input data matrix has more row than columns, the calculations are simplified by using the matrix $A^T\cdot A$ instead of the covariance matrix which is larger in this case. For the eigenvectors and values of this matrix the following equation holds:
Note that if no proper sub-space can be determined, e.g., because only a single data item is provided, no transformation is applied and the output data is identical to the input data.
Modifier and Type | Class and Description |
---|---|
static class |
PCA.ReductionMode
Available modes for determining the sub-space dimensionality.
|
Modifier and Type | Field and Description |
---|---|
protected Jama.Matrix |
C
Covariance matrix calculated from mean-free data.
|
private int |
componentNum
Number of sub-space components in mode
ReductionMode.NUMBER_COMPONENTS . |
protected int |
dataDim
Dimensionality of the input data.
|
private double[][] |
dataset
Input data with each column containing a data vector.
|
protected double[] |
eigenVals
Set of computed eigenvalues.
|
protected Jama.Matrix |
eigenVects
Matrix of eigenvectors, each column containing a vector.
|
private boolean |
isMeanFree
Flag for indicating if input data is already mean-free.
|
protected double[] |
mean
Average vector of input dataset.
|
protected double[][] |
meanfreeData
Normalized, i.e., mean-free, dataset.
|
protected Jama.Matrix |
meanfreeDataMatrix
Normalized, i.e., mean-free, data matrix.
|
private PCA.ReductionMode |
mode
Mode for dimension reduction, i.e., how to determine the sub-space
dimensionality.
|
protected Jama.Matrix |
P_t
The final transformation matrix to be used for dimension reduction.
|
private double |
percentageVar
Variance fraction for automatic dimension selection in mode
ReductionMode.PERCENTAGE_VARIANCE . |
private double[][] |
resultData
Resulting data set with each column containing a data vector.
|
protected int |
sampleCount
Number of data samples in input data.
|
protected int |
subDim
Dimensionality of the sub-space as either specified by the user or
automatically determined based on the percentage of variance.
|
Constructor and Description |
---|
PCA()
Default constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
calculateCovarianceMatrixAndEigenstuff()
Calculates covariance matrix and eigenvalues and -vectors.
|
protected void |
calculateMeanFreeData()
Computes the average data vector and makes data mean-free.
|
protected void |
determineSubspaceDimension()
Determines desired sub-space dimensionality according to selected mode.
|
protected void |
doDimensionReduction()
Does the actual dimension reduction by data projection into sub-space.
|
protected void |
examineDataset()
Extracts number of samples and their dimension from dataset.
|
String |
getDocumentation() |
double[] |
getEigenvalues()
Get calculated eigenvalues in ascending order.
|
double[][] |
getEigenvects()
Get calculated eigenvectors, one vector per column, in ascending order.
|
double[][] |
getResultData()
Get the transformed dataset.
|
protected void |
operate()
This method does the actual work.
|
void |
setDataset(double[][] ds)
Specify an input dataset.
|
void |
setMeanFreeData(boolean b)
Set flag to indicate if data is already mean-free.
|
void |
setNumberOfComponents(int compNum)
Number of sub-space components if reduction mode is NUMBER_COMPONENTS.
|
void |
setPercentageOfVariance(double p)
Fraction of variance to be represented in the sub-space if the
reduction mode is PERCENTAGE_VARIANCE.
|
void |
setReductionMode(PCA.ReductionMode rm)
Specify the mode for selecting the sub-space dimensionality.
|
readResolve
addOperatorExecutionProgressEventListener, addParameter, addParameter, addParameterUnconditioned, fieldContained, fireOperatorExecutionProgressEvent, getALDPortHashAccessKey, getConstructionMode, getHidingMode, getInactiveParameterNames, getInInoutNames, getInInoutNames, getInNames, getInOutNames, getMissingRequiredInputs, getName, getNumParameters, getOutInoutNames, getOutNames, getParameter, getParameterDescriptor, getParameterDescriptorUnconditioned, getParameterNames, getParameterUnconditioned, getSupplementalNames, getVerbose, getVersion, handleOperatorExecutionProgressEvent, hasInOutParameters, hasParameter, isAnnotatedParameter, isConfigured, print, print, print, printInterface, printInterface, readHistory, reinitializeParameterDescriptors, removeOperatorExecutionProgressEventListener, removeParameter, runOp, runOp, runOp, setConstructionMode, setConstructionMode, setConstructionMode, setHidingMode, setName, setParameter, setParameterUnconditioned, setVerbose, toStringVerbose, unconfiguredItems, validate, validateCustom, validateGeneric, writeHistory, writeHistory, writeHistory
@Parameter(label="Dataset", required=true, dataIOOrder=-1, direction=IN, description="Dataset.") private double[][] dataset
@Parameter(label="Is data mean-free?", required=true, direction=IN, dataIOOrder=1, mode=ADVANCED, description="Set to true, if data is already mean-free.") private boolean isMeanFree
@Parameter(label="Reduction Mode", required=true, dataIOOrder=2, direction=IN, description="Mode.") private PCA.ReductionMode mode
@Parameter(label="Number of Components", required=true, direction=IN, dataIOOrder=3, description="Number of components, i.e., sub-space dimensionality.") private int componentNum
ReductionMode.NUMBER_COMPONENTS
.@Parameter(label="Variance fraction", required=true, direction=IN, dataIOOrder=4, description="Percentage of data variance to be contained in sub-space.") private double percentageVar
ReductionMode.PERCENTAGE_VARIANCE
.@Parameter(label="Result Dataset", required=true, direction=OUT, description="Result dataset.") private transient double[][] resultData
protected int dataDim
protected int sampleCount
protected transient double[] mean
protected transient double[][] meanfreeData
protected transient Jama.Matrix meanfreeDataMatrix
protected transient Jama.Matrix C
The scaling by the number of samples is omitted here as this is just a constant factor in eigenvalue and -vector calculations.
protected transient double[] eigenVals
Note that the values are in ascending order.
protected transient Jama.Matrix eigenVects
The vectors are sorted according to their eigenvalues, i.e., the vector corresponding to the largest eigenvalue can be found in the last column.
protected transient int subDim
protected transient Jama.Matrix P_t
This matrix is already transposed, i.e., each row contains a sub-space basis vector and the number of rows is equal to the dimension of the sub-space.
public PCA() throws de.unihalle.informatik.Alida.exceptions.ALDOperatorException
de.unihalle.informatik.Alida.exceptions.ALDOperatorException
public void setDataset(double[][] ds)
ds
- Dataset to process.public void setMeanFreeData(boolean b)
b
- If true, the input data is assumed to be mean-free already.public void setReductionMode(PCA.ReductionMode rm)
rm
- Mode for dimension reduction.public void setNumberOfComponents(int compNum)
compNum
- Number of components, i.e., eigenvectors, to use.public void setPercentageOfVariance(double p)
p
- Fraction of variance to represent in sub-space.public double[][] getResultData()
public double[] getEigenvalues()
public double[][] getEigenvects()
protected void operate()
operate
in class de.unihalle.informatik.Alida.operator.ALDOperator
protected void examineDataset()
protected void calculateMeanFreeData()
protected void calculateCovarianceMatrixAndEigenstuff()
protected void determineSubspaceDimension()
protected void doDimensionReduction()
public String getDocumentation()
getDocumentation
in class de.unihalle.informatik.Alida.operator.ALDOperator
Copyright © 2010–2020 Martin Luther University Halle-Wittenberg, Institute of Computer Science, Pattern Recognition and Bioinformatics. All rights reserved.