For an NLP project, I used Latent Dirichlet Allocation to group 100,000 Amazon music reviews into a set of distinct topics. I also performed a sentiment analysis of the reviews, all of which were scored on a 1-5 scale (1-2 were assumed as negative, 4-5 were labeled as positive, and neutral 3-star reviews were removed). For the sentiment analysis, I experimented with using a support vector machine classifier, Naive Bayes and regularized logistic regression.
Using linear mixed effects models to evaluate the extent to which policy measures intended to slow COVID helped explain variation in fatality rates across Midwestern counties.
I took raw data for all of Giannis’ career shots and built a variety of shot charts looking at how his shot selection and shooting efficiency have changed as his career has progressed.
A (mostly visual) exploratory analysis detailing the ways crime has shifted (and in some cases has not shifted) in Milwaukee since spring 2020. As part of this, I developed a model to forecast crime in more “typical” circumstances based on past trend, seasonaltiy and the actual weather conditions during COVID to serve as a more nuanced point of comparison than a simple year-over-year benchmark. Includes a lot of maps and time series visualizations.
I built a naive bayes classifier to model and predict locations of tweets based on the text in the tweets. This project involed processing, tokenizing and vectorizing the raw text of tweets from a training file, which was then used to build a model that accurately classifies about 68% of the locations in an unseen test dataset.
I downloaded a dataset of Milwaukee single family home assessed values (I know these don’t always accurately capture market value, but it’s the best free, publicly available dataset available for all city parcels) and joined in a variety of Census economic and demographic data as well as crime statistics to build a regression model to 1) examine what variables appeared most influential in understanding property values and 2) give a sense of which neighborhoods had median values that were above/below what the model expected.