When you need a friend, MatchIt.

I’ve recently fallen in love with a great R package called MatchIt, and I’m going to talk about it in an upcoming Minneanalytics conference, Big Data Tech 2017.

MatchIt was created by Ho, King, Imea, and Stuart with the original intent of measuring political figures’ behavior regarding particular topics. Interestingly, a powerful side effect of this usage is that the tool can be used to help find a “like” sub-population between two different samples, whereby recreating similar distributions and attributes across multiple populations. Not surprisingly, this has a direct impact to retail testing.

A “mocked” version of my code to perform the matching can be found below. Here, I’m identifying 5 “matched” control stores for each test store. My poor choice for a blog post title refers to the tool “finding a friend” or finding a match for each specified test store.

If you’d like to see the original documentation by the authors, a great reference can be found here: https://www.jstatsoft.org/article/view/v042i08/v42i08.pdf. It includes a deep description of what’s happening within the tool. Everything made a lot more sense to me after reading the doc.

UPDATE: I’ve found a much better parameter for the MatchIt script when using it for match-pair purposes. The distance=”mahalanobis” parameter will actually choose nearest neighbors instead of matching on propensity score which is the default method. I much prefer my results with the mahalanobis parameter. Model code updated below to reflect this addition.


#Assign "test / control" flag
testdlrsDF <- filter(subsetDF,Focus.Market != "CONTROL")
controldlrsDF <- filter(subsetDF,Focus.Market == "CONTROL")
testdlrsDF$treatment <- 1
controldlrsDF$treatment <- 0
pairsDF <- union(testdlrsDF,controldlrsDF)


#apply Match-Pair Assignment
m.out1 <- matchit(treatment ~ latitude + metric1 + metric2 + dim1 + dim2
                  , method = "nearest", distance="mahalanobis",
                  ratio=5,data = pairsDF,replace=TRUE)

m.out2 <- m.out1$match.matrix


#construct output dataframe
pairsdlr <- data.frame(TestDealerNum = integer(), ControlDealerNum = integer())
newRow1 <- data.frame(TestDealerNum = pairsDF[row.names(m.out2),]$DealerNum, ControlDealerNum = pairsDF[m.out2[,1],]$DealerNum)
newRow2 <- data.frame(TestDealerNum = pairsDF[row.names(m.out2),]$DealerNum, ControlDealerNum = pairsDF[m.out2[,2],]$DealerNum)
newRow3 <- data.frame(TestDealerNum = pairsDF[row.names(m.out2),]$DealerNum, ControlDealerNum = pairsDF[m.out2[,3],]$DealerNum)
newRow4 <- data.frame(TestDealerNum = pairsDF[row.names(m.out2),]$DealerNum, ControlDealerNum = pairsDF[m.out2[,4],]$DealerNum)
newRow5 <- data.frame(TestDealerNum = pairsDF[row.names(m.out2),]$DealerNum, ControlDealerNum = pairsDF[m.out2[,5],]$DealerNum)
#note: actual analysis used 10 samples instead of 5...

pairsdlr <- dplyr::bind_rows(pairsdlr, newRow1, newRow2, newRow3,newRow4,newRow5)

#write data
write.csv(pairsdlr, "matchpairs.csv")