One of the most common questions I get is how I found my way into statistics and data science. Honestly, it wasn’t on purpose, nor was it something I ever imagined. It just happened to work out that way, and I have found that the job can be quite interesting and fulfilling. Nevertheless, for someone who hadn’t taken a stats class until their forties, imagining me as a data scientist may seem wild; so allow me to explain:
When I was in grad school, my research interests were in Online Learning and Self Efficacy (i.e. confidence), both of which required a deeper understanding of survey design and measurement. For anyone that has done much survey research, they will be able to tell you that it is relatively easy to conduct a survey, but incredibly difficult to do it well. Fortunately, many universities have courses that are specific to deepening those skills, but they often require additional coursework in statistics as a prerequisite. While I could have taken another class in the social statistics, I wanted to see what it would be like to take a class in applied statistics. Also, I was curious if I could hang in a class with people from the hard sciences. Turns out I could, and I lucked out with a great professor who further sparked my interest. Inspired to learn more, the following semester I took another course in Applied Statistics (Sampling Methods), in addition to the Survey Design class I had originally wanted to take. By that point, I was fully invested in learning as much as I could and followed up with coursework in Regression Methods and Design of Experiments. In these classes I learned to use scripting languages, like Python and R, to clean, visualize, and model data. Once I had those skills, things really started to take off.
The coursework where we were required to write in R helped me tremendously. First off, I am a strong believer that “writing is thinking.” When you use a scripting language for statistical analysis, you literally have to write out your models, which reinforces understanding. Since R is a vector based language, it works like a really big calculator; making it great for modeling and visualizing, as well extracting, transforming, and loading (ETL) of data. Statistics and Machine Learning can often times seem like computer magic to some people. I can assure you that it’s not. Most of the time the math is based on relatively simple concepts; and we let the software do heavy lifting with respect to calculation. This gives you the ability to create projects that can be replicable, transparent, and shared when using a scripting software like R.
By the end of grad school, I had leveraged these skills into part time work in statistics and data science to bring in some extra income, while doing something I enjoyed. Some of the projects I worked on ended up being a lot of fun, and were very well received, which led to more and more work. Then came a global pandemic that upended how many people viewed their work / life balance, so I decided to find a full time position as a data scientist and haven’t looked back since.
Reflecting on my transition into statistics and data science, a few lessons stand out. The most important one is the role of finding data projects to work on. There is a reason why educational theorists inculcate the importance of Project Based Learning (PBL). Being able to investigate a problem by finding, cleaning, transforming, analyzing, and communicating the story of the data is the most valuable experience you can have if you are looking into a career in data science. This is where the role of a scripting language comes in. While menu driven statistical programs like SPSS and Minitab used to be the norm, scripting languages such as Python, R, SQL, etc. are the standard now for their flexibility and the fact that most of them are completely open sourced. Finally, I wasn’t prepared for how much my experience teaching and presenting would help me as a data scientist. Many people are scared of math, and many statistician aren’t the best at communicating with non-statisticians. So, if are able to tell the story of the data, you can bring a lot of value to an organization.
Below are some resources I have found particularly useful along the way. If you have any questions or advice about data science, please leave them in the comments below!
Thanks for reading!