Meet the Data Scientist: Stuart Horsman

Meet Stuart Horsman, among the first to earn the CCP: Data Scientist distinction.

Big Data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack. However, experts predict a major shortage of advanced analytics skills over the next few years. At Cloudera, we’re drawing on our industry leadership and early corpus of real-world experience to address the Big Data talent gap with the Cloudera Certified Professional (CCP) program.

Meet the Data Scientist: Stuart Horsman

As part of this blog series, we’ll introduce the proud few who have earned the CCP: Data Scientist distinction. Featured today is CCP-01, Stuart Horsman.  You can start on your own journey to data science and CCP:DS with Cloudera’s new Data Science Challenge on “Detecting Anomalies in Medicare Claims.”

What’s your current role?

I am a Systems Engineer with Cloudera, based in Sydney, Australia.

Prior to taking CCP:DS, what was your experience with Big Data, Hadoop, and data science?

I was the Business Development Manager for Big Data with Oracle, focused on building Oracle’s Big Data business in Australia and New Zealand. Central to this offering is the Oracle Big Data Appliance, which runs CDH, as well as Oracle’s statistical and data mining capabilities, which integrate R, the Oracle RDBMS, and Hadoop. I wore a number of hats and managed a variety of responsibilities, from creating and presenting marketing content and providing architectural consultation to customers to running POCs and writing data processing pipelines on Hadoop.

What’s most interesting about data science, and what made you want to become a data scientist?

I studied Economics and Statistics at Lancaster University in the U.K. and was mostly interested in the application of macro-economic statistical analysis. After graduation, I started my career as a software programmer before moving into systems and database administration. I discovered Hadoop in the context of my work as a DBA.

As a natural extension of Hadoop’s capabilities, I became interested in machine learning—Big Data is all about machine learning, in my opinion—and was reintroduced to statistics. I had forgotten how much I enjoyed delving deeper into large data sets and performing advanced analyses. My background as a programmer and administrator helped me incorporate machine learning techniques and ignited my interest in data science.

I earned the CCP:DS credential with the first class of participants in the Web Analytics Challenge, which focused on classification, clustering, and collaborative filtering. I’m proud to carry this distinction, but it just makes me hungrier to learn more data science. Shortly after becoming CCP-01, I joined the Cloudera team in Sydney. Working with Cloudera’s customers has been the best next step in my journey, since I get to think about, gain experience with, and help organizations address their real Big Data challenges in the wild. I can’t imagine a better way to continue my data science learning path.

How did you prepare for the Data Science Essentials exam and CCP:DS? What advice would you give to aspiring data scientists?

People contact me all the time to ask how I passed the Data Science Essentials exam and earned the top score on the Data Science Challenge. In all honesty, a deep interest in and enthusiasm for the subject matter make all the difference.

I read a number of books, the most useful of which were Machine Learning for Hackers by Drew Conway and John Myles White and An Introduction to Statistical Learning by Trevor Hastie, et al.  I also completed a Coursera course on machine learning.  Cloudera offers both Data Analyst Training to get experience with ecosystem tools like Impala, Hive, and Pig and an Introduction to Data Science course, which would serve as a great onramp to machine learning and recommender systems.

When all was said and done, I had been preparing for CCP:DS for two years without even knowing it. Ultimately, there are no shortcuts, and dedicated study and practice always yield the best results. By the time I took the written exam, I had all the experience I needed to answer the questions—it was clearly a challenge, but I was prepared.  When I read the requirements for the practicum, I knew I had the techniques down cold and was eager to get started.

I’d recommend using the study guide and Data Science Challenge Solution Kit and study, study, study (then study some more).

Since becoming a CCP:DS in November 2013, what has changed in your career and/or in your life?

Cloudera hired me! It’s great to be working at one of the most exciting companies in the world and helping to change the way businesses store, process, and analyze all their data. I’m a member of an amazing team.

Why should aspiring data scientists consider taking CCP:DS?

Why should you take the CCP:DS Challenge? You might land your dream job like I did!

Seriously, I have had (and continue to have) lots of employers and recruiters ask me about my current employment status. There’s huge demand in the market right now for people who understand scalable programming with machine learning and who have experience applying these techniques within a business context. CCP:DS provides proof of experience and a necessary level of expertise to close the talent gap.

Further reading:

Leave a Reply

Your email address will not be published. Required fields are marked *