My analysis work spans a wide array of domains and methods, but focuses on bringing both traditional and contemporary tools of statistics and data science, as well as domain-specific techniques from public policy and digital analytics, to bear on pressing problems.
I have experience building and evaluating models using linear and logistic regression, time series modeling, clustering, association rules, network and path analysis, predictive modeling with machine learning algorithms (random forests, support vector machines, etc.), and Natural Language Processing (including language modeling, POS tagging, word sense disambiguation, and machine translation). I have worked on both "big" and small data, including datasets with n ≥ 10^7 and massive data stored in relational databases, such as Amazon Redshift and Google BigQuery.
My work in public policy is particularly applicable in analysis projects, as disentangling a variety of interrelated factors to make causal inferences is an essential public policy task that also applies to a broad range of other data problems across domains.
I am especially interested in the analysis of open datasets--I think it is both fascinating and important for practitioners to make visible the trends, patterns, and insights lying dormant within publicly-available datasets on pressing issues. For more examples of this work, see the Open Data section of this site.