Short Term Course on Machine Learning for Practitioners

Ankita Mandal and Aparajita Khan

Indian Statistical Institute

E-mail: amandal@isical.ac.in, aparajitak_r@isical.ac.in

November 19, 2019

Linear Discriminant Analysis¶

Does variance maximization always work?

N=250
Ca<-cbind(rnorm(N,mean=5,sd=6.85),rnorm(N,mean=8,sd=0.3))
Cb<-cbind(rnorm(N,mean=11,sd=7.5),rnorm(N,mean=2,sd=0.6))
X<-rbind(Ca,Cb)
Theta=0.26
R=matrix(c(cos(Theta),-sin(Theta),sin(Theta),cos(Theta)),2,2)
X=X%*%R
cat(" Glimpse of Dataset X: \n")
print(X[1:5,])
cat("\n Dimension of Dataset: \t Samples:",dim(X)[1],"\t Features:",dim(X)[2])
class=c(rep(1,N),rep(2,N))
colvec = c("darkseagreen3","coral3")[class]
pchs= c(24,22)[class]
plot(X,col="black",bg=colvec,pch=pchs,xlab="Feature1",ylab="Feature2",main="Scatter plot of Data")

 Glimpse of Dataset X: 
          [,1]      [,2]
[1,]  9.578163 10.674361
[2,] -1.815248  8.438180
[3,] -3.762399  7.089188
[4,]  3.367641  9.667955
[5,] -4.584371  6.764112

 Dimension of Dataset: 	 Samples: 500 	 Features: 2

PCA on Data ¶

pc=prcomp(X, center=TRUE, scale=FALSE, retx=TRUE)
cat("\n Directions/Eigenvectors: \n")
print(pc$rotation)
cat("\n Principal Components: \n")
print(pc$x[1:5,])
cat("\n Variance along principal components: ")
cat(pc$sdev^2)

 Directions/Eigenvectors: 
            PC1        PC2
[1,] -0.9936509  0.1125072
[2,] -0.1125072 -0.9936509

 Principal Components: 
           PC1       PC2
[1,] -3.408595 -3.396739
[2,]  8.164065 -2.456597
[3,] 10.250624 -1.335238
[4,]  2.875724 -3.095451
[5,] 11.103951 -1.104704

 Variance along principal components: 58.1352 8.055956

xm=min(X[,1])
ym=min(X[,2])
ProjF1=cbind(pc$x[,1],rep(ym,nrow(X)))
plot(X,col="black",bg=colvec,pch=pchs,xlab="Feature1",ylab="Feature2",main="Projection of Data along Feature1")
points(ProjF1,col="black",bg=colvec,pch=pchs,xlab="",ylab="",)

Variance is not necassarily a measure of class separability!

Project on a lower dimension which is a linear combination of original features as well as contains class information.

LDA Objective

Find direction $v$ a linear combination of original $d$ features that maximizes class separability of the projected data

Projected of sample $x \in \mathbb{R}^d$ :

\begin{matrix} (1) & y = v^{T} x \end{matrix}

$\begin{equation} y = v^T x \tag{1} \end{equation}$

Two Classes

\begin{aligned} Class ω_{1} with mean μ_{1} covariance Σ_{1} \\ Class ω_{2} with mean μ_{2} covariance Σ_{2} \end{aligned}

$\begin{align} \text{Class} \space \space \boldsymbol{\omega_1} \space \space \text{with mean} \space \space \mu_1 \space \space \text{covariance} \space \space \Sigma_1 \\ \text{Class} \space \space \boldsymbol{\omega_2} \space \space \text{with mean} \space \space \mu_2 \space \space \text{covariance} \space \space \Sigma_2 \end{align}$

Let

\begin{aligned} \tilde{μ_{1}} and {\tilde{σ_{1}}}^{2} be mean and variance of projected data for the points in class ω_{1}, \\ \tilde{μ_{2}} and {\tilde{σ_{2}}}^{2} be mean and variance of projected data for the points in class ω_{2} . \end{aligned}

$\begin{align} \tilde{\mu_1} \mbox{ and } \tilde{\sigma_1}^2 \mbox{ be mean and variance of projected data for the points in class } \boldsymbol{\omega_1},\\ \tilde{\mu_2} \mbox{ and } \tilde{\sigma_2}^2 \mbox{ be mean and variance of projected data for the points in class } \boldsymbol{\omega_2}. \end{align}$

Maximize Fisher Discriminant Ratio (FDR) on $y$

\begin{matrix} (2) & m a x i m i z e F D R = \frac{(\tilde{μ_{1}} - \tilde{μ_{2}})^{2}}{{\tilde{σ_{1}}}^{2} + {\tilde{σ_{2}}}^{2}} \end{matrix}

$\begin{equation} \mathrm{maximize} \hspace{3mm} FDR = \frac{ (\tilde{\mu_1} -\tilde{\mu_2})^2}{\tilde{\sigma_1}^2+\tilde{\sigma_2}^2} \tag{2} \end{equation}$

$\tilde{\mu_1} = v^T \mu_1$

$\tilde{\mu_2} = v^T \mu_2$

$\tilde{\sigma_1}^2 = \sum_{y \in \boldsymbol{\omega_1}} (y -\tilde{\mu_1})^2 =\sum_{X \in \boldsymbol{\omega_1}} (v^T x - v^T \mu_1)^2 =\sum_{X \in \boldsymbol{\omega_1}} v^T(x - \mu_1)(x - \mu_1)^T v = v^T \Sigma_1 v$

$\tilde{\sigma_2}^2 = v^T \Sigma_2 v$

(\tilde{μ_{1}} - \tilde{μ_{2}})^{2} = (v^{T} μ_{1} - v^{T} μ_{2})^{2} = v^{T} (μ_{1} - μ_{2}) (μ_{1} - μ_{2})^{T} v = v^{T} S_{b} v

$\begin{equation} (\tilde{\mu_1} - \tilde{\mu_2})^2= (v^T \mu_1 - v^T \mu_2)^2 = v^T(\mu_1 - \mu_2)(\mu_1 - \mu_2)^T v= v^T S_b v \end{equation}$

{\tilde{σ_{1}}}^{2} + {\tilde{σ_{2}}}^{2} = v^{T} Σ_{1} v + v^{T} Σ_{2} v = v^{T} S_{w} v

$\begin{equation} \tilde{\sigma_1}^2+ \tilde{\sigma_2}^2= v^T \Sigma_1 v + v^T \Sigma_2 v = v^T S_w v \end{equation}$

LDA Objective

\begin{matrix} (3) & m a x i m i z e \frac{(\tilde{μ_{1}} - \tilde{μ_{2}})^{2}}{{\tilde{σ_{1}}}^{2} + {\tilde{σ_{2}}}^{2}} = \frac{v^{T} S_{b} v}{v^{T} S_{w} v} ≃ v^{T} S_{w}^{- 1} S_{b} v \end{matrix}

$\begin{equation} \mathrm{maximize} \hspace{3mm} \frac{ (\tilde{\mu_1} -\tilde{\mu_2})^2}{\tilde{\sigma_1}^2+\tilde{\sigma_2}^2} = \frac{v^T S_b v}{v^T S_w v} \simeq v^T S_w^{-1} S_b v \tag{3} \end{equation}$

Solution for $v$ in $(3)$ given by the eigenvector of $J= S_w^{-1} S_b$ corresponding largest eigenvalue.

LDA: Project $X$ on the eigenvectors of its covaraince matrix $S_w^{-1} S_b$ ¶

N=nrow(X)
nclass=length(unique(class))
mu0<-colMeans(X)
S<-list()
Mu<-list()
Sw=0
for(i in 1:nclass)
{
    wi=X[which(class==i),]
    Ni=nrow(wi)
    Mu[[i]]<-colMeans(wi)
    S[[i]]<-cov(wi)
    cat("\n\n Class ",i,": ")
    cat("\n Mean Mu: ",Mu[[i]])
    cat("\n Covariance Matrix: \n")
    print(S[[i]])
    Sw=Sw+S[[i]]
}
MuDiff=Mu[[1]]-Mu[[2]]
Sb=outer(MuDiff,MuDiff)
cat("\n Within cluster scatter Sw: \n")
print(Sw)
cat("\n Between cluster scatter Sb: \n")
print(Sb)

 Class  1 : 
 Mean Mu:  3.475713 9.177823
 Covariance Matrix: 
         [,1]      [,2]
[1,] 45.21225 12.152184
[2,] 12.15218  3.353724


 Class  2 : 
 Mean Mu:  9.671021 4.653571
 Covariance Matrix: 
         [,1]      [,2]
[1,] 50.75330 13.138161
[2,] 13.13816  3.785352

 Within cluster scatter Sw: 
         [,1]      [,2]
[1,] 95.96555 25.290345
[2,] 25.29034  7.139075

 Between cluster scatter Sb: 
          [,1]      [,2]
[1,]  38.38184 -28.02913
[2,] -28.02913  20.46886

J=solve(Sw)%*%Sb
eigJ=eigen(J)
cat(" Eigenvalues of J: ",eigJ$values)
cat("\n Eigenvectors of J: \n\n")
print(eigJ$vectors)
v=eigJ$vectors[,1,drop=FALSE]
cat("\n LDA direction v= \n")
print(v)

 Eigenvalues of J:  80.34611 7.105427e-15
 Eigenvectors of J: 

           [,1]       [,2]
[1,]  0.2593221 -0.5897541
[2,] -0.9657909 -0.8075829

 LDA direction v= 
           [,1]
[1,]  0.2593221
[2,] -0.9657909

Eigenvalues of $J$

\begin{matrix} (3) & m a x i m i z e v^{T} J v \end{matrix}

$\begin{equation} \mathrm{maximize} \hspace{3mm} v^T J v \tag{3} \end{equation}$

$J = S_w^{-1} S_b \space$ where $\space S_b = (\mu_1 - \mu_2)(\mu_1 - \mu_2)^T$

Rank of $S_b$ is 1 (outer product of two vectors).

Rank of $J$ is also atmost 1, so atmost 1 eigenvalue of $J$ will be non-zero.

$\lambda_2(J)=0$

For $M$ classes

Between-class scatter matrix

\begin{matrix} (4) & S_{b} = \sum_{i = 1}^{M} P (ω_{i}) (μ_{i} - μ_{0}) (μ_{i} - μ_{0})^{T} \end{matrix}

$\begin{equation} S_b=\sum \limits_{i=1}^M P(\boldsymbol{\omega_i}) (\mu_i-\mu_0)(\mu_i-\mu_0)^T \tag{4} \end{equation}$

S_{b}

$S_b$ is the sum of

M

$M$ matrices of rank

\leq 1

$\leq 1$ .

Additional constraint on mean $\mu_i$ :

μ_{0} = \frac{1}{M} \sum_{i = 1}^{M} μ_{i}

$\begin{equation} \mu_0= \frac{1}{M} \sum_{i=1}^M \mu_i \end{equation}$

Rank of $S_b$ is atmost $(M-1)$ , atmost $(M-1)$ eigenvalues of $J$ will be non-zero.

LDA extracts atmost $(M-1)$ components.

First LDA component

\begin{matrix} (4) & y = X v \end{matrix}

$\begin{equation} y = Xv \tag{4} \end{equation}$

y=X%*%v
cat("\n LDA projection y= \n ")
cat(y[1:8],sep="\n")
pc1Mat=cbind(y,rep(0,nrow(X)))
plot(pc1Mat,col="black",bg=colvec,pch=pchs,ylab="",xlab="",main="LDA projection")

 LDA projection y= 
 -7.825371
-8.620251
-7.822347
-8.463919
-7.721546
-8.4255
-8.196046
-8.422273

LDA on IRIS dataset

cat("IRIS dataset\n")
head(iris)
X<-iris[,-5]
class=as.numeric(iris$Species)
cat("\n Samples: ",dim(X)[1],"\t Features: ",dim(X)[2],"\t Classes: ",levels(iris$Species))
colvec = c("coral3","darkseagreen3","darkgoldenrod2")[class]
pchs= c(22,23,24)[class]
pairs(X, col=colvec, pch=pchs)

IRIS dataset

 Samples:  150 	 Features:  4 	 Classes:  setosa versicolor virginica

N=nrow(X)
nclass=length(unique(class))
mu0<-colMeans(X)
cat("\n Global mean: ", mu0)
S<-list()
Mu<-list()
P<-list()
for(i in 1:nclass)
{
    wi=X[which(class==i),]
    Ni=nrow(wi)
    P[[i]]<-Ni/N
    Mu[[i]]<-colMeans(wi)
    S[[i]]<-cov(wi)
    cat("\n\n Class ",i,": ")
    cat("\n Prior P: ",P[[i]])
    cat("\n Mean Mu: ",Mu[[i]])
    cat("\n Covariance Matrix: \n")
    print(S[[i]])
}
Sw=0
Sb=0
for(i in 1:nclass)
{
    mui0=Mu[[i]]-mu0
    Sw=Sw+ P[[i]]*S[[i]]
    Sb=Sb+ P[[i]]*outer(mui0,mui0)
}

 Global mean:  5.843333 3.057333 3.758 1.199333

 Class  1 : 
 Prior P:  0.3333333
 Mean Mu:  5.006 3.428 1.462 0.246
 Covariance Matrix: 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   0.12424898 0.099216327  0.016355102 0.010330612
Sepal.Width    0.09921633 0.143689796  0.011697959 0.009297959
Petal.Length   0.01635510 0.011697959  0.030159184 0.006069388
Petal.Width    0.01033061 0.009297959  0.006069388 0.011106122


 Class  2 : 
 Prior P:  0.3333333
 Mean Mu:  5.936 2.77 4.26 1.326
 Covariance Matrix: 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   0.26643265  0.08518367   0.18289796  0.05577959
Sepal.Width    0.08518367  0.09846939   0.08265306  0.04120408
Petal.Length   0.18289796  0.08265306   0.22081633  0.07310204
Petal.Width    0.05577959  0.04120408   0.07310204  0.03910612


 Class  3 : 
 Prior P:  0.3333333
 Mean Mu:  6.588 2.974 5.552 2.026
 Covariance Matrix: 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   0.40434286  0.09376327   0.30328980  0.04909388
Sepal.Width    0.09376327  0.10400408   0.07137959  0.04762857
Petal.Length   0.30328980  0.07137959   0.30458776  0.04882449
Petal.Width    0.04909388  0.04762857   0.04882449  0.07543265

J=solve(Sw)%*%Sb
eigJ=eigen(J)
cat(" Eigenvalues of J: ",eigJ$values)
cat("\n Eigenvectors of J: \n\n")
print(eigJ$vectors)
v=eigJ$vectors[,1:2,drop=FALSE]
cat("\n dim X= ",dim(X)," dim v= ",dim(v))
y=as.matrix(X)%*%v
plot(y,col="black",bg=colvec,pch=pchs,xlab="LDA component 1",ylab="LDA component 2",main="LDA projection")

 Eigenvalues of J:  31.54809 0.2796832 -1.181696e-14 -4.918204e-16
 Eigenvectors of J: 

           [,1]         [,2]       [,3]       [,4]
[1,] -0.2087418 -0.006531964  0.4723122  0.8121103
[2,] -0.3862037 -0.586610553  0.1293130 -0.4010705
[3,]  0.5540117  0.252561540  0.2009089 -0.4082187
[4,]  0.7073504 -0.769453092 -0.8484309  0.1139158

 dim X=  150 4  dim v=  4 2

PCA on Pima Indians Diabetes Database ¶

library(mlbench)

data(Ionosphere)
Dataset<-Ionosphere
cat("\n Predict the onset of diabetes in female Pima Indians from medical record data.")
cat("\n Dimension of dataset: ",dim(Dataset))
cat("\n Classes: ",levels(Dataset$Class))
head(Dataset)
class=as.numeric(Dataset$Class)

 Predict the onset of diabetes in female Pima Indians from medical record data.
 Dimension of dataset:  351 35
 Classes:  bad good

X<-Dataset[,-c(1,2,ncol(Dataset))]
X<-as.matrix(as.data.frame(lapply(X, as.numeric)))
colvec = c("cyan3","plum3")[class]
pchs= c(22,24)[class]
pairs(X[,1:4], col=colvec, pch=pchs)

show=5
N=nrow(X)
nclass=length(unique(class))
mu0<-colMeans(X)
cat("\n Global mean: ", mu0[1:show])
S<-list()
Mu<-list()
P<-list()
for(i in 1:nclass)
{
    wi=X[which(class==i),]
    Ni=nrow(wi)
    P[[i]]<-Ni/N
    Mu[[i]]<-colMeans(wi)
    S[[i]]<-cov(wi)
    cat("\n\n Class ",i,": ")
    cat("\n Prior P: ",P[[i]])
    cat("\n Mean Mu: ",Mu[[i]][1:show])
    cat("\n Covariance Matrix: \n")
    print(S[[i]][1:show,1:show])
}
Sw=0
Sb=0
for(i in 1:nclass)
{
    mui0=Mu[[i]]-mu0
    Sw=Sw+ P[[i]]*S[[i]]
    Sb=Sb+ P[[i]]*outer(mui0,mui0)
}

 Global mean:  0.6413419 0.04437188 0.6010679 0.115889 0.5500951

 Class  1 : 
 Prior P:  0.3589744
 Mean Mu:  0.2965558 -0.02978048 0.242786 0.02420738 0.2539843
 Covariance Matrix: 
            V3          V4           V5           V6          V7
V3  0.43426932  0.04002177  0.096522312 -0.039307735  0.08547374
V4  0.04002177  0.43514427 -0.024668382 -0.187270821 -0.02095688
V5  0.09652231 -0.02466838  0.474206315 -0.001605633  0.17609616
V6 -0.03930773 -0.18727082 -0.001605633  0.413213757  0.03180918
V7  0.08547374 -0.02095688  0.176096162  0.031809176  0.39324439


 Class  2 : 
 Prior P:  0.6410256
 Mean Mu:  0.834422 0.0858972 0.8017057 0.1672307 0.7159171
 Covariance Matrix: 
            V3           V4           V5           V6          V7
V3 0.040399539  0.004447374  0.030413686  0.003430856  0.03138531
V4 0.004447374  0.056824996 -0.009134075  0.038050828 -0.02593230
V5 0.030413686 -0.009134075  0.045009860 -0.013583327  0.04757136
V6 0.003430856  0.038050828 -0.013583327  0.093826708 -0.04520063
V7 0.031385313 -0.025932297  0.047571364 -0.045200631  0.08284582

J=solve(Sw)%*%Sb
eigJ=eigen(J)
cat(" Eigenvalues of J: \n",eigJ$values[1:show])
cat("\n Eigenvectors of J: \n\n")
print(eigJ$vectors[1:show,1:show])
v=eigJ$vectors[,1:2,drop=FALSE]
cat("\n dim X= ",dim(X)," dim v= ",dim(v))
y=as.matrix(X)%*%v
plot(y,col="black",bg=colvec,pch=pchs,xlab="LDA component 1",ylab="LDA component 2",main="LDA projection")

 Eigenvalues of J: 
 1.427015+0i -3.742737e-16+0i 2.943462e-16+0i -2.47757e-17+2.59723e-16i -2.47757e-17-2.59723e-16i
 Eigenvectors of J: 

             [,1]           [,2]           [,3]                  [,4]
[1,] 0.3362541+0i  0.36027319+0i  0.20668966+0i  0.1756372+0.2198439i
[2,] 0.2199430+0i -0.26590017+0i -0.69460556+0i  0.2086648+0.3554162i
[3,] 0.2752621+0i  0.32155974+0i -0.09041056+0i -0.4578804+0.0000000i
[4,] 0.1502149+0i -0.01342564+0i  0.03440644+0i  0.0481045-0.1046714i
[5,] 0.1745621+0i -0.27856409+0i  0.23615969+0i  0.0210181-0.2209353i
                      [,5]
[1,]  0.1756372-0.2198439i
[2,]  0.2086648-0.3554162i
[3,] -0.4578804+0.0000000i
[4,]  0.0481045+0.1046714i
[5,]  0.0210181+0.2209353i

 dim X=  351 32  dim v=  32 2

J=Sb
eigJ=eigen(J)
cat(" Eigenvalues of J: \n",eigJ$values[1:show])
cat("\n Eigenvectors of J: \n\n")
print(eigJ$vectors[1:show,1:show])
v=eigJ$vectors[,1:2,drop=FALSE]
cat("\n dim X= ",dim(X)," dim v= ",dim(v))
y=as.matrix(X)%*%v
plot(y,col="black",bg=colvec,pch=pchs,xlab="LDA component 1",ylab="LDA component 2",main="LDA projection")

 Eigenvalues of J: 
 0.4261552 1.110223e-16 4.093895e-17 6.303159e-18 3.229158e-18
 Eigenvectors of J: 

            [,1]        [,2]       [,3]        [,4]        [,5]
[1,] -0.39523901  0.91857832  0.0000000  0.00000000  0.00000000
[2,] -0.08500316 -0.03657452  0.5908987 -0.27483204 -0.04258471
[3,] -0.41070967 -0.17671708 -0.3598403  0.07666403  0.13004968
[4,] -0.10509750 -0.04522057  0.1583930  0.26162740 -0.06208194
[5,] -0.33944102 -0.14605214 -0.1426606 -0.30307327 -0.23768824

 dim X=  351 32  dim v=  32 2

V1	V2	V3	V4	V5	V6	V7	V8	V9	V10	⋯	V26	V27	V28	V29	V30	V31	V32	V33	V34	Class
<fct>	<fct>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	⋯	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<dbl>	<fct>
1	0	0.99539	-0.05889	0.85243	0.02306	0.83398	-0.37708	1.00000	0.03760	⋯	-0.51171	0.41078	-0.46168	0.21266	-0.34090	0.42267	-0.54487	0.18641	-0.45300	good
1	0	1.00000	-0.18829	0.93035	-0.36156	-0.10868	-0.93597	1.00000	-0.04549	⋯	-0.26569	-0.20468	-0.18401	-0.19040	-0.11593	-0.16626	-0.06288	-0.13738	-0.02447	bad
1	0	1.00000	-0.03365	1.00000	0.00485	1.00000	-0.12062	0.88965	0.01198	⋯	-0.40220	0.58984	-0.22145	0.43100	-0.17365	0.60436	-0.24180	0.56045	-0.38238	good
1	0	1.00000	-0.45161	1.00000	1.00000	0.71216	-1.00000	0.00000	0.00000	⋯	0.90695	0.51613	1.00000	1.00000	-0.20099	0.25682	1.00000	-0.32382	1.00000	bad
1	0	1.00000	-0.02401	0.94140	0.06531	0.92106	-0.23255	0.77152	-0.16399	⋯	-0.65158	0.13290	-0.53206	0.02431	-0.62197	-0.05707	-0.59573	-0.04608	-0.65697	good
1	0	0.02337	-0.00592	-0.09924	-0.11949	-0.00763	-0.11824	0.14706	0.06637	⋯	-0.01535	-0.03240	0.09223	-0.07859	0.00732	0.00000	0.00000	-0.00039	0.12011	bad

A data.frame: 6 × 5
Sepal.Length	Sepal.Width	Petal.Length	Petal.Width	Species
<dbl>	<dbl>	<dbl>	<dbl>	<fct>
5.1	3.5	1.4	0.2	setosa
4.9	3.0	1.4	0.2	setosa
4.7	3.2	1.3	0.2	setosa
4.6	3.1	1.5	0.2	setosa
5.0	3.6	1.4	0.2	setosa
5.4	3.9	1.7	0.4	setosa