February 2015 – Peter Eliason

LinkedIn API

TBD – I will add more once I learn more! (14 Feb 2015).

Intro to Test vs. Control

One of the beautiful things of CRM analytics is the ability to design controlled experiments. Random sampling of a homogeneous group of individuals allows CRM professionals to test the impact of a particular CRM offer or message against another offer or no offer, and then after applying some statistics conclude whether or not that offer or message drove a desired behavior, like an open, click, or conversion. This process is generally referred to as Test versus Control.

One potential pitfall that muddies the analytical waters is dealing with customers that potentially “cross-over” from the Control side to the Test side during the campaign. Take, for example, a customer who was assigned to the Control group but somehow ended up redeeming an offer that was assigned to the Test group.

*Gasp!* What do we do now? Our experiment is ruined!

Well, not to fear (sort of). One widely accepted method to handle this problem is to exclude these Control Redeemers, as they are called, from the analysis entirely. The decision to exclude them is supported by the fact that these customers are marked as “dirty” for somehow getting their hands on an offer that was intended for someone else. Herein lies the issue!

These “Control Redeemers” tend to be more engaged customers.

Therefore, I believe that excluding them creates an unfair bias.
It seems that this bias is magnified with longer campaigns.
In the below, “Best” refers to the best, most engaged customers, and “Sec.” refers to secondary customers, or customers who are not as likely to redeem anyway.

The longer a campaign, the higher the impact from Control Redeemers.

I noticed this pattern within a recent campaign I analyzed and was having a hard time explaining why results appeared so much stronger as the campaign carried on.
To support my hypothesis of the Control Redeemer Effect, I conducted Sales Lift calculations using two methods:
- Calculating Test vs. Control lift including Control Redeemers in the control group.
- Calculating Test vs. Control lift excluding Control Redeemers in the control group.
For a 4-week campaign, the lift is 1.2 times higher when excluding Control Redeemers.
For a 52-week campaign the lift is 3.7 times higher when excluding Control Redeemers.

Conclusion

I felt my hypothesis had been validated. As the length of the campaign increased, it allowed highly-engaged customers more of an opportunity to get their hands on a coupon (either from a friend/family member, from generous Test group customers who leave their coupons in public places, or from gracious sales associates at the store), and therefore dilute the overall sales of the control group.

There are several ways to mitigate this Control Redeemer issue.

Assign unique offer codes to customers. This way coupon sharing is impossible, and Test vs. Control groups stay cleaner.
Stratify Test vs. Control results by some sort of “Customer Value” segmentation. Some companies have a “Metals” or “Value” segmentation which ranks customers from Best to Worst. Stratifying results by this segmentation would alleviate some of the pressure.
Consider replacing the “control redeemer” with a composite of match-pair control customers (from a pool of control customers who have not redeemed), matching on keys from an arbitrary pre-period (say 1 month, 3 months, or a year depending on average purchase cycles). Note: this option is likely going to be analysis-heavy.
If neither of these above methods are feasible, then ultimately, the “real” answer of “campaign lift” is probably somewhere in between the “including control redeemers” and “excluding control redeemers” methods.

Please let me know if you have other thoughts or ideas on this measurement methodology topic!

Appendix

Sales Lift calculation (not standard, by any means, but a common one):

$left { frac{[TestGroupCampaignSales]}{[TestGroupTargetedCounts]}-frac{[ControlGroupCampaignSales]}{[ControlGroupTargetedCounts]} right } ast [TestGroupTargetedCounts]= [SalesLift]$

In the above, we include Control Redeemers in the total sales lift. To exclude Control Redeemers, we use the below (note: I had to remove the vowels from a few terms below because there is a text limit to the equation editor):

$left { frac{[TestGrpCampaignSales]}{[TestGrpTargetedCnts]}-frac{[CtlGrpCampaignSales] - [CtlRedeemerSales]}{[CtlGrpTargetedCnts]-[CtlRedeemerCnts]} right } ast [TestGrpTargetedCnts]=[SlsLift]$

Twitter API

I’m finding out that the Twitter API has been fairly consistent between Python environments (my home and my work machines – both running Windows 7). However, I’m running into significant issues with twitter data encoding on my work machine. By encoding, I’m talking about which characters are interpretable or printable. In terms of Python IDE, I’m running Canopy at home and Spyder on my work machine.

For collecting Tweets, I have been using Tweepy. There are many other classes available to extract tweets from Twitter’s API, but Tweepy was very easy to install and manage. Like I mentioned above, the problem now is extracting the data from the JSON file created by Tweepy so that it’s available to my work machine. Once I am able to figure out how to extract data on my work machine, I will post the code for both environments.

The picture below has been particularly helpful in extracting data from the Twitter JSON record. It’s a bit dated, but much of the record structure is still true. I found the image from another post by Mike Teczno.

Here is the map of a tweet, published by Raffi Krikorian:

Context Filters with Tableau, Important for Top N Filters

This took me FOREVER to finally figure out, so I wanted to share a method to avoid a common mistake when trying to use Tableau’s Top N or Bottom N filter. The issue is that often times when the Top filter is applied, it applies the filter against the entire, unfiltered source data, while the user is likely expecting the Top N (or Bottom N) to be select only after all the other filters have already been applied. Here are the steps I’ve taken in some sample data with the ultimate goal of selecting the “Top 3 Markets in Texas.”

Step 1: our original data.

Here, I’ve taken a list of customers by state.

Step 2: Filter the top 3 markets.

Right-click on LocationMetro > Filter > Top tab. Then select “By Field” and enter 3.

Step 3: Results – top 3 markets overall (still need to filter on Texas).

Step 4: Filter on Texas.

Wait! Our results have only 1 Market? I wanted 3 markets!

Step 5: Apply Context Filter on State

In order to preserve our “Top 3” filter, we must add a Context Filter. A Context Filter will apply the filter FIRST, prior to any other filters on the page.

What was happening in Step 4, was that the worksheet was choosing the “Top 3” markets out of all of the states first, and then applied the Texas filter.

Step 6: Make Sure your Context Filter didn’t reset. In this example, make sure Texas is the only state selected.

In my experience, Tableau often resets all of the filters in the context filter, which requires the user to go back a re-select the filters. In this case, all the states were selected again, so I had to go back and unselect them all and then choose Texas.

We’re done! Our chart now shows the Top 3 Markets in Texas!

Happy filtering!

Parsing Exact Target (aka SalesForce Marketing Cloud) List Detective Import Error Files

When sending emails from ExactTarget (aka: SalesForce Marketing Cloud) — you know, we’ll just call it “ET” for short — we occasionally receive error logs which contain ET subscriber keys where a small subset of targeted customers match profiles of other customers who have unsubscribed or been flagged as undeliverable. Our goal is to process these log files and mark these customers as “undeliverable” in our parent system. Since the log files are not formatted for data import, we need to use a parsing tool to extract the subscriber keys. I chose Python.

In the code below, I enter the path of the log file which was emailed to me from ET, and then I use re (regular expressions) to find all instances that match the subscriber key format, which is “[varchars-varchars-varchars-varchars-varchars]”.

Your version of subscriber keys will undoubtedly look different than mine, but you should be able to modify the regular expression in the re.compile() function to search for the right format. More info about Python’s regular expression class.

Let me know what you think!

I have included sample files with fake data for reference:
Input File (9989FFA6-BD29-8DBA-B712-C6E8ED32F0X9 Results 988623122.txt)
Output File (cleanFile.txt)

[su_heading]Python Code[/su_heading]


# -*- coding: utf-8 -*-
"""
Created on Wed Feb  4 13:59:50 2015

@author: peliason

Purpose: Parse out subscriber key from ET List detective.

"""

from __future__ import division, print_function



##PARAMETERS
inputFile = "C://Users//peliason//Documents//reference//python//ET parse subscriber//for post//9989FFA6-BD29-8DBA-B712-C6E8ED32F0X9 Results 988623122.txt"

outputFile = "C://Users//peliason//Documents//reference//python//ET parse subscriber//for post/cleanFile.txt"
###END PARAMETERS





import re
import codecs


#initialize output list
found = []

#this is my search function
subscriberKeySearch = re.compile(r'[([A-Za-z0-9_] - [A-Za-z0-9_] - [A-Za-z0-9_] - [A-Za-z0-9_] - [A-Za-z0-9_] )]')

## Open the file with read only permit
f = codecs.open(inputFile, 'r','utf-16')

##IMPORTANT: if your input file is utf-8, then you should be able to run this 
#   file open code instead (without the codes class):
#   f = open(inputFile, 'r')


## Read the first line 
line = f.readline()

## till the file is empty
while line:
    
    found.extend(subscriberKeySearch.findall(line))
    
    line = f.readline()
f.close()

#write the list to the outputFile
myOutput = open(outputFile, 'w')
for ele in found:
    myOutput.write(ele 'n')
    
myOutput.close()