Short Term Course on Machine Learning for Practitioners

Ankita Mandal and Aparajita Khan

Indian Statistical Institute

E-mail: amandal@isical.ac.in, aparajitak_r@isical.ac.in

November 19, 2019

Linear Discriminant Analysis

Does variance maximization always work?

In [34]:
N=250
Ca<-cbind(rnorm(N,mean=5,sd=6.85),rnorm(N,mean=8,sd=0.3))
Cb<-cbind(rnorm(N,mean=11,sd=7.5),rnorm(N,mean=2,sd=0.6))
X<-rbind(Ca,Cb)
Theta=0.26
R=matrix(c(cos(Theta),-sin(Theta),sin(Theta),cos(Theta)),2,2)
X=X%*%R
cat(" Glimpse of Dataset X: \n")
print(X[1:5,])
cat("\n Dimension of Dataset: \t Samples:",dim(X)[1],"\t Features:",dim(X)[2])
class=c(rep(1,N),rep(2,N))
colvec = c("darkseagreen3","coral3")[class]
pchs= c(24,22)[class]
plot(X,col="black",bg=colvec,pch=pchs,xlab="Feature1",ylab="Feature2",main="Scatter plot of Data")
 Glimpse of Dataset X: 
          [,1]      [,2]
[1,]  9.578163 10.674361
[2,] -1.815248  8.438180
[3,] -3.762399  7.089188
[4,]  3.367641  9.667955
[5,] -4.584371  6.764112

 Dimension of Dataset: 	 Samples: 500 	 Features: 2

PCA on Data

In [35]:
pc=prcomp(X, center=TRUE, scale=FALSE, retx=TRUE)
cat("\n Directions/Eigenvectors: \n")
print(pc$rotation)
cat("\n Principal Components: \n")
print(pc$x[1:5,])
cat("\n Variance along principal components: ")
cat(pc$sdev^2)
 Directions/Eigenvectors: 
            PC1        PC2
[1,] -0.9936509  0.1125072
[2,] -0.1125072 -0.9936509

 Principal Components: 
           PC1       PC2
[1,] -3.408595 -3.396739
[2,]  8.164065 -2.456597
[3,] 10.250624 -1.335238
[4,]  2.875724 -3.095451
[5,] 11.103951 -1.104704

 Variance along principal components: 58.1352 8.055956
In [36]:
xm=min(X[,1])
ym=min(X[,2])
ProjF1=cbind(pc$x[,1],rep(ym,nrow(X)))
plot(X,col="black",bg=colvec,pch=pchs,xlab="Feature1",ylab="Feature2",main="Projection of Data along Feature1")
points(ProjF1,col="black",bg=colvec,pch=pchs,xlab="",ylab="",)

Variance is not necassarily a measure of class separability!

Project on a lower dimension which is a linear combination of original features as well as contains class information.

LDA Objective

Find direction v a linear combination of original d features that maximizes class separability of the projected data

Projected of sample xRd:

(1)y=vTx

Two Classes

Class  ω1  with mean  μ1  covariance  Σ1Class  ω2  with mean  μ2  covariance  Σ2

Let

μ1~ and σ1~2 be mean and variance of projected data for the points in class ω1,μ2~ and σ2~2 be mean and variance of projected data for the points in class ω2.

Maximize Fisher Discriminant Ratio (FDR) on y

(2)maximizeFDR=(μ1~μ2~)2σ1~2+σ2~2

μ1~=vTμ1

μ2~=vTμ2

σ1~2=yω1(yμ1~)2=Xω1(vTxvTμ1)2=Xω1vT(xμ1)(xμ1)Tv=vTΣ1v

σ2~2=vTΣ2v

(μ1~μ2~)2=(vTμ1vTμ2)2=vT(μ1μ2)(μ1μ2)Tv=vTSbv
σ1~2+σ2~2=vTΣ1v+vTΣ2v=vTSwv

LDA Objective

(3)maximize(μ1~μ2~)2σ1~2+σ2~2=vTSbvvTSwvvTSw1Sbv

Solution for v in (3) given by the eigenvector of J=Sw1Sb corresponding largest eigenvalue.

LDA: Project X on the eigenvectors of its covaraince matrix Sw1Sb

In [37]:
N=nrow(X)
nclass=length(unique(class))
mu0<-colMeans(X)
S<-list()
Mu<-list()
Sw=0
for(i in 1:nclass)
{
    wi=X[which(class==i),]
    Ni=nrow(wi)
    Mu[[i]]<-colMeans(wi)
    S[[i]]<-cov(wi)
    cat("\n\n Class ",i,": ")
    cat("\n Mean Mu: ",Mu[[i]])
    cat("\n Covariance Matrix: \n")
    print(S[[i]])
    Sw=Sw+S[[i]]
}
MuDiff=Mu[[1]]-Mu[[2]]
Sb=outer(MuDiff,MuDiff)
cat("\n Within cluster scatter Sw: \n")
print(Sw)
cat("\n Between cluster scatter Sb: \n")
print(Sb)
 Class  1 : 
 Mean Mu:  3.475713 9.177823
 Covariance Matrix: 
         [,1]      [,2]
[1,] 45.21225 12.152184
[2,] 12.15218  3.353724


 Class  2 : 
 Mean Mu:  9.671021 4.653571
 Covariance Matrix: 
         [,1]      [,2]
[1,] 50.75330 13.138161
[2,] 13.13816  3.785352

 Within cluster scatter Sw: 
         [,1]      [,2]
[1,] 95.96555 25.290345
[2,] 25.29034  7.139075

 Between cluster scatter Sb: 
          [,1]      [,2]
[1,]  38.38184 -28.02913
[2,] -28.02913  20.46886
In [38]:
J=solve(Sw)%*%Sb
eigJ=eigen(J)
cat(" Eigenvalues of J: ",eigJ$values)
cat("\n Eigenvectors of J: \n\n")
print(eigJ$vectors)
v=eigJ$vectors[,1,drop=FALSE]
cat("\n LDA direction v= \n")
print(v)
 Eigenvalues of J:  80.34611 7.105427e-15
 Eigenvectors of J: 

           [,1]       [,2]
[1,]  0.2593221 -0.5897541
[2,] -0.9657909 -0.8075829

 LDA direction v= 
           [,1]
[1,]  0.2593221
[2,] -0.9657909

Eigenvalues of J

(3)maximizevTJv

J=Sw1Sb  where  Sb=(μ1μ2)(μ1μ2)T

Rank of Sb is 1 (outer product of two vectors).

Rank of J is also atmost 1, so atmost 1 eigenvalue of J will be non-zero.

λ2(J)=0

For M classes

Between-class scatter matrix

(4)Sb=i=1MP(ωi)(μiμ0)(μiμ0)T
Sb is the sum of M matrices of rank 1.

Additional constraint on mean μi:

μ0=1Mi=1Mμi

Rank of Sb is atmost (M1), atmost (M1) eigenvalues of J will be non-zero.

LDA extracts atmost (M1) components.

First LDA component

(4)y=Xv

In [39]:
y=X%*%v
cat("\n LDA projection y= \n ")
cat(y[1:8],sep="\n")
pc1Mat=cbind(y,rep(0,nrow(X)))
plot(pc1Mat,col="black",bg=colvec,pch=pchs,ylab="",xlab="",main="LDA projection")
 LDA projection y= 
 -7.825371
-8.620251
-7.822347
-8.463919
-7.721546
-8.4255
-8.196046
-8.422273

LDA on IRIS dataset

In [40]:
cat("IRIS dataset\n")
head(iris)
X<-iris[,-5]
class=as.numeric(iris$Species)
cat("\n Samples: ",dim(X)[1],"\t Features: ",dim(X)[2],"\t Classes: ",levels(iris$Species))
colvec = c("coral3","darkseagreen3","darkgoldenrod2")[class]
pchs= c(22,23,24)[class]
pairs(X, col=colvec, pch=pchs)
IRIS dataset
A data.frame: 6 × 5
Sepal.LengthSepal.WidthPetal.LengthPetal.WidthSpecies
<dbl><dbl><dbl><dbl><fct>
5.13.51.40.2setosa
4.93.01.40.2setosa
4.73.21.30.2setosa
4.63.11.50.2setosa
5.03.61.40.2setosa
5.43.91.70.4setosa
 Samples:  150 	 Features:  4 	 Classes:  setosa versicolor virginica
In [41]:
N=nrow(X)
nclass=length(unique(class))
mu0<-colMeans(X)
cat("\n Global mean: ", mu0)
S<-list()
Mu<-list()
P<-list()
for(i in 1:nclass)
{
    wi=X[which(class==i),]
    Ni=nrow(wi)
    P[[i]]<-Ni/N
    Mu[[i]]<-colMeans(wi)
    S[[i]]<-cov(wi)
    cat("\n\n Class ",i,": ")
    cat("\n Prior P: ",P[[i]])
    cat("\n Mean Mu: ",Mu[[i]])
    cat("\n Covariance Matrix: \n")
    print(S[[i]])
}
Sw=0
Sb=0
for(i in 1:nclass)
{
    mui0=Mu[[i]]-mu0
    Sw=Sw+ P[[i]]*S[[i]]
    Sb=Sb+ P[[i]]*outer(mui0,mui0)
}
 Global mean:  5.843333 3.057333 3.758 1.199333

 Class  1 : 
 Prior P:  0.3333333
 Mean Mu:  5.006 3.428 1.462 0.246
 Covariance Matrix: 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   0.12424898 0.099216327  0.016355102 0.010330612
Sepal.Width    0.09921633 0.143689796  0.011697959 0.009297959
Petal.Length   0.01635510 0.011697959  0.030159184 0.006069388
Petal.Width    0.01033061 0.009297959  0.006069388 0.011106122


 Class  2 : 
 Prior P:  0.3333333
 Mean Mu:  5.936 2.77 4.26 1.326
 Covariance Matrix: 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   0.26643265  0.08518367   0.18289796  0.05577959
Sepal.Width    0.08518367  0.09846939   0.08265306  0.04120408
Petal.Length   0.18289796  0.08265306   0.22081633  0.07310204
Petal.Width    0.05577959  0.04120408   0.07310204  0.03910612


 Class  3 : 
 Prior P:  0.3333333
 Mean Mu:  6.588 2.974 5.552 2.026
 Covariance Matrix: 
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length   0.40434286  0.09376327   0.30328980  0.04909388
Sepal.Width    0.09376327  0.10400408   0.07137959  0.04762857
Petal.Length   0.30328980  0.07137959   0.30458776  0.04882449
Petal.Width    0.04909388  0.04762857   0.04882449  0.07543265
In [42]:
J=solve(Sw)%*%Sb
eigJ=eigen(J)
cat(" Eigenvalues of J: ",eigJ$values)
cat("\n Eigenvectors of J: \n\n")
print(eigJ$vectors)
v=eigJ$vectors[,1:2,drop=FALSE]
cat("\n dim X= ",dim(X)," dim v= ",dim(v))
y=as.matrix(X)%*%v
plot(y,col="black",bg=colvec,pch=pchs,xlab="LDA component 1",ylab="LDA component 2",main="LDA projection")
 Eigenvalues of J:  31.54809 0.2796832 -1.181696e-14 -4.918204e-16
 Eigenvectors of J: 

           [,1]         [,2]       [,3]       [,4]
[1,] -0.2087418 -0.006531964  0.4723122  0.8121103
[2,] -0.3862037 -0.586610553  0.1293130 -0.4010705
[3,]  0.5540117  0.252561540  0.2009089 -0.4082187
[4,]  0.7073504 -0.769453092 -0.8484309  0.1139158

 dim X=  150 4  dim v=  4 2

PCA on Pima Indians Diabetes Database

In [43]:
library(mlbench)
In [44]:
data(Ionosphere)
Dataset<-Ionosphere
cat("\n Predict the onset of diabetes in female Pima Indians from medical record data.")
cat("\n Dimension of dataset: ",dim(Dataset))
cat("\n Classes: ",levels(Dataset$Class))
head(Dataset)
class=as.numeric(Dataset$Class)
 Predict the onset of diabetes in female Pima Indians from medical record data.
 Dimension of dataset:  351 35
 Classes:  bad good
A data.frame: 6 × 35
V1V2V3V4V5V6V7V8V9V10V26V27V28V29V30V31V32V33V34Class
<fct><fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><fct>
100.99539-0.05889 0.85243 0.02306 0.83398-0.377081.00000 0.03760-0.51171 0.41078-0.46168 0.21266-0.34090 0.42267-0.54487 0.18641-0.45300good
101.00000-0.18829 0.93035-0.36156-0.10868-0.935971.00000-0.04549-0.26569-0.20468-0.18401-0.19040-0.11593-0.16626-0.06288-0.13738-0.02447bad
101.00000-0.03365 1.00000 0.00485 1.00000-0.120620.88965 0.01198-0.40220 0.58984-0.22145 0.43100-0.17365 0.60436-0.24180 0.56045-0.38238good
101.00000-0.45161 1.00000 1.00000 0.71216-1.000000.00000 0.00000 0.90695 0.51613 1.00000 1.00000-0.20099 0.25682 1.00000-0.32382 1.00000bad
101.00000-0.02401 0.94140 0.06531 0.92106-0.232550.77152-0.16399-0.65158 0.13290-0.53206 0.02431-0.62197-0.05707-0.59573-0.04608-0.65697good
100.02337-0.00592-0.09924-0.11949-0.00763-0.118240.14706 0.06637-0.01535-0.03240 0.09223-0.07859 0.00732 0.00000 0.00000-0.00039 0.12011bad
In [45]:
X<-Dataset[,-c(1,2,ncol(Dataset))]
X<-as.matrix(as.data.frame(lapply(X, as.numeric)))
colvec = c("cyan3","plum3")[class]
pchs= c(22,24)[class]
pairs(X[,1:4], col=colvec, pch=pchs)
In [46]:
show=5
N=nrow(X)
nclass=length(unique(class))
mu0<-colMeans(X)
cat("\n Global mean: ", mu0[1:show])
S<-list()
Mu<-list()
P<-list()
for(i in 1:nclass)
{
    wi=X[which(class==i),]
    Ni=nrow(wi)
    P[[i]]<-Ni/N
    Mu[[i]]<-colMeans(wi)
    S[[i]]<-cov(wi)
    cat("\n\n Class ",i,": ")
    cat("\n Prior P: ",P[[i]])
    cat("\n Mean Mu: ",Mu[[i]][1:show])
    cat("\n Covariance Matrix: \n")
    print(S[[i]][1:show,1:show])
}
Sw=0
Sb=0
for(i in 1:nclass)
{
    mui0=Mu[[i]]-mu0
    Sw=Sw+ P[[i]]*S[[i]]
    Sb=Sb+ P[[i]]*outer(mui0,mui0)
}
 Global mean:  0.6413419 0.04437188 0.6010679 0.115889 0.5500951

 Class  1 : 
 Prior P:  0.3589744
 Mean Mu:  0.2965558 -0.02978048 0.242786 0.02420738 0.2539843
 Covariance Matrix: 
            V3          V4           V5           V6          V7
V3  0.43426932  0.04002177  0.096522312 -0.039307735  0.08547374
V4  0.04002177  0.43514427 -0.024668382 -0.187270821 -0.02095688
V5  0.09652231 -0.02466838  0.474206315 -0.001605633  0.17609616
V6 -0.03930773 -0.18727082 -0.001605633  0.413213757  0.03180918
V7  0.08547374 -0.02095688  0.176096162  0.031809176  0.39324439


 Class  2 : 
 Prior P:  0.6410256
 Mean Mu:  0.834422 0.0858972 0.8017057 0.1672307 0.7159171
 Covariance Matrix: 
            V3           V4           V5           V6          V7
V3 0.040399539  0.004447374  0.030413686  0.003430856  0.03138531
V4 0.004447374  0.056824996 -0.009134075  0.038050828 -0.02593230
V5 0.030413686 -0.009134075  0.045009860 -0.013583327  0.04757136
V6 0.003430856  0.038050828 -0.013583327  0.093826708 -0.04520063
V7 0.031385313 -0.025932297  0.047571364 -0.045200631  0.08284582
In [47]:
J=solve(Sw)%*%Sb
eigJ=eigen(J)
cat(" Eigenvalues of J: \n",eigJ$values[1:show])
cat("\n Eigenvectors of J: \n\n")
print(eigJ$vectors[1:show,1:show])
v=eigJ$vectors[,1:2,drop=FALSE]
cat("\n dim X= ",dim(X)," dim v= ",dim(v))
y=as.matrix(X)%*%v
plot(y,col="black",bg=colvec,pch=pchs,xlab="LDA component 1",ylab="LDA component 2",main="LDA projection")
 Eigenvalues of J: 
 1.427015+0i -3.742737e-16+0i 2.943462e-16+0i -2.47757e-17+2.59723e-16i -2.47757e-17-2.59723e-16i
 Eigenvectors of J: 

             [,1]           [,2]           [,3]                  [,4]
[1,] 0.3362541+0i  0.36027319+0i  0.20668966+0i  0.1756372+0.2198439i
[2,] 0.2199430+0i -0.26590017+0i -0.69460556+0i  0.2086648+0.3554162i
[3,] 0.2752621+0i  0.32155974+0i -0.09041056+0i -0.4578804+0.0000000i
[4,] 0.1502149+0i -0.01342564+0i  0.03440644+0i  0.0481045-0.1046714i
[5,] 0.1745621+0i -0.27856409+0i  0.23615969+0i  0.0210181-0.2209353i
                      [,5]
[1,]  0.1756372-0.2198439i
[2,]  0.2086648-0.3554162i
[3,] -0.4578804+0.0000000i
[4,]  0.0481045+0.1046714i
[5,]  0.0210181+0.2209353i

 dim X=  351 32  dim v=  32 2
In [48]:
J=Sb
eigJ=eigen(J)
cat(" Eigenvalues of J: \n",eigJ$values[1:show])
cat("\n Eigenvectors of J: \n\n")
print(eigJ$vectors[1:show,1:show])
v=eigJ$vectors[,1:2,drop=FALSE]
cat("\n dim X= ",dim(X)," dim v= ",dim(v))
y=as.matrix(X)%*%v
plot(y,col="black",bg=colvec,pch=pchs,xlab="LDA component 1",ylab="LDA component 2",main="LDA projection")
 Eigenvalues of J: 
 0.4261552 1.110223e-16 4.093895e-17 6.303159e-18 3.229158e-18
 Eigenvectors of J: 

            [,1]        [,2]       [,3]        [,4]        [,5]
[1,] -0.39523901  0.91857832  0.0000000  0.00000000  0.00000000
[2,] -0.08500316 -0.03657452  0.5908987 -0.27483204 -0.04258471
[3,] -0.41070967 -0.17671708 -0.3598403  0.07666403  0.13004968
[4,] -0.10509750 -0.04522057  0.1583930  0.26162740 -0.06208194
[5,] -0.33944102 -0.14605214 -0.1426606 -0.30307327 -0.23768824

 dim X=  351 32  dim v=  32 2