COVID-19 Weekly Averages

EDIT (JULY 2020): I AM NO LONGER REFRESHING THIS DATA, AS THERE ARE FINALLY MORE OPTIONS TO VISUALLY OBSERVE THE TRENDS. I WILL KEEP THE POST HERE IN ORDER TO REMIND PEOPLE ABOUT FALLACIES IN DATA COLLECTION.

Lots of data has been published regarding COVID-19, but one frustrating factor (in my opinion) has been that the data is often at the “daily” level which makes analysis difficult. The way that counties collect and publish data vary greatly, so we tend to see repeatable patterns of new cases being highest on Fridays and lowest on Mondays. These fluctuations distract from the overall theme, so one way to deal with it is to use moving averages or weekly averages.

By doing so, we can get a better sense of how cases and deaths are trending week over week.

In this dashboard, it is important to remember the issues regarding data collection:

  1. Testing is not administered at random. So the number of cases is skewed based on test kits available and individuals experiencing symptions (people not experiencing symptoms are likely NOT being tested, so the true number of cases is unknown).
  2. Deaths are not attributed to co-morbidity. We’ve seen that COVID-19 has largely affected populations that are already at risk (elderly, chronic health issues). The deaths counts here are attributing 100% of the death to be caused by COVID-19 which is not entirely true. The data is still useful but must be considered in this context.

Unfortunately, Tableau Public does NOT allow for real-time data updating, so the only time that this data can be updated is when I re-publish the report using an updated data set.

For anyone interested in refreshing this viz with new data, you may do so by downloading the viz and then refreshing it with the latest data available from the NY Times git.

Context Filters with Tableau, Important for Top N Filters

This took me FOREVER to finally figure out, so I wanted to share a method to avoid a common mistake when trying to use Tableau’s Top N or Bottom N filter.  The issue is that often times when the Top filter is applied, it applies the filter against the entire, unfiltered source data, while the user is likely expecting the Top N (or Bottom N) to be select only after all the other filters have already been applied.  Here are the steps I’ve taken in some sample data with the ultimate goal of selecting the “Top 3 Markets in Texas.”

 

Step 1: our original data.

Here, I’ve taken a list of customers by state.

01 customers in all markets

Step 2: Filter the top 3 markets.

Right-click on LocationMetro > Filter > Top tab. Then select “By Field” and enter 3.

02 top 3 metro areas

 

Step 3: Results – top 3 markets overall (still need to filter on Texas).

03 result top 3 metro areas

 

Step 4: Filter on Texas.

Wait! Our results have only 1 Market? I wanted 3 markets!

04 select TX

Step 5: Apply Context Filter on State

In order to preserve our “Top 3” filter, we must add a Context Filter. A Context Filter will apply the filter FIRST, prior to any other filters on the page.

What was happening in Step 4, was that the worksheet was choosing the “Top 3” markets out of all of the states first, and then applied the Texas filter.

 

05 click add to context

 

Step 6: Make Sure your Context Filter didn’t reset.  In this example, make sure Texas is the only state selected.

In my experience, Tableau often resets all of the filters in the context filter, which requires the user to go back a re-select the filters. In this case, all the states were selected again, so I had to go back and unselect them all and then choose Texas.

 

06-ensure-proper-filter-is-applied

 

We’re done! Our chart now shows the Top 3 Markets in Texas!

Happy filtering!

 

 

 

 

.

Python and Web Scraping (using Scrapy)

Certainly the most extensible scripting language I have ever used, Python allows the user to build powerful programs ranging from web crawling to text mining to machine learning. With invaluable packages, NumPy and SciPy, Python is able to tackle complex modeling tasks, while at the same time, other packages such as BeautifulSoup and Scrapy allow for thorough data collection through web crawling and scraping.

In the Tableau Project below, I have provided an example (with code included on the second tab) of how web crawling and data collection work, by taking a snapshot of my old motorcycle model and comparing prices from two different markets. The data was scraped using Scrapy and exported into a CSV file which I imported into Tableau.

https://public.tableausoftware.com/javascripts/api/viz_v1.js

[su_heading]Here is the Spider code:[/su_heading]

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from craigslist_mcy.items import CraigslistMcyItem
import re, string
			
     
			
class MySpider2(BaseSpider):
  name = "craigmcy2"
  allowed_domains = ["craigslist.org"]
  start_urls = ["http://minneapolis.craigslist.org/search/mca?query=vulcan 900",
                "http://phoenix.craigslist.org/search/mca?query=vulcan 900",
                "http://phoenix.craigslist.org/search/mca?query=vulcan 900&s=100"]

  def parse(self, response):
      hxs = HtmlXPathSelector(response)
      
      titles = hxs.select("//p[@class='row']")
      items = []
      for title in titles:
          item = CraigslistMcyItem()
          item ["title"] = title.select("span[@class='txt']/span[@class='pl']/a/text()").extract()
          item ["link"] = title.select("span[@class='txt']/span[@class='pl']/a/@href").extract()
          item ["postedDt"] = title.select("span[@class='txt']/span[@class='pl']/time/@datetime").extract()
          item ["price"] =title.select("a[@class='i']/span[@class='price']/text()").extract()
          item ["debug"] = "" #blank for now...before, it was: title.select("a[@class='i']").extract()
          item ["location"] = re.split('[s"] ',string.strip(str(hxs.select("//title/text()").extract())))
          items.append(item)
      return items	

[su_heading]Items code:[/su_heading]

from scrapy.item import Item, Field

class CraigslistMcyItem(Item):
  title = Field()
  link = Field()
  postedDt = Field()
  price = Field()
  debug = Field()
  location = Field()

[su_heading]Run code (aka “Main”):[/su_heading]


import os
import scrapy  # object-oriented framework for crawling and scraping


os.system('scrapy list & pause')
os.system('scrapy crawl craigmcy2 -o craigslist_peter.csv')

.

Grad School Progress

The field of analytics is constantly evolving. I have enrolled in Northwestern University’s Masters of Science, Predictive Analytics program (in Evanston, IL) to help provide me with a fresh perspective on today’s top methodologies, tools, and business case studies.  You can track my grad school progress with a gantt chart that I created using Tableau. I will keep this up-to-date until I’ve earned my degree (expected 2016).