Think You Know What It Takes To Lead A Rewarding Data Science Career? Think Again
The content of this article was presented in a guest lecture at Melbourne’s RMIT University in May 2023 for the AI/DS Professional course under the title “The 3 Things I Wish I Learned About Data Science In University”. The course has over 200 students enrolled from across the Master of Data Science, Master of Artificial Intelligence and Bachelor of Data Science programmes.
My formal training in the sciences — Computer Science to be more specific has been invaluable to my pursuit of a career at the intersection of data science and product management. The training has ingrained in me the thought process and the behaviour of a scientist and gave me the theoretical foundation to stay on top of the ever evolving suite of techniques and algorithms to work with data. I had everything I need to lead a rewarding data science career, or so I thought.
Once I entered the workforce and start to move between jobs and companies over the course of almost two decades, I discovered a different side of the field. My lack of appreciation for and competence in a number of skills was preventing job satisfaction and career progression as a data science professional. Do not get me wrong. Taking a scientific approach to decision making and being more data-informed are more critical than ever. However, for a data science professional to be impactful, it is simply not enough.
the biggest successes stemmed not simply from technical excellence but from softer factors such as a deep understanding of business problems [1]
The gist of the article is to introduce the 3 categories of know-how that I have come to realise as immensely valuable in a cross-functional environment to ensure and maximise the impact of data science work. In the next two sections, we will explore the significance of these 3 categories of know-how as they underpin 1 of the 3 pillars of the data science field, which is problem framing and disaggregation. From experience, this pillar and the associated skills are not typically covered in science degrees, which I argue ought to be given more attention especially in data science related programmes.
A relook at data science skills, holistically
Before The excitement we see today with generative AI (more to come on the topic in future articles) reminds me of the rose to popularity that the field of data science experienced throughout the 2000s and early 2010s. Articles like this one — ‘Data Scientist: The Sexiest Job of the 21st Century’ which was published in 2012, ushered in the beginning of a hype cycle. And spoilers alert, the profession remains ‘sexy’ a decade later.
workers are clamoring to become data scientists or at least label themselves as such…many people say they are data scientists, but may have simply taken some online programming courses and don’t know what they don’t know. [2]
The interest in data science has brought with it opportunities. At the same time, the hype surrounding the field has also made it hard to sift through what it is and what it isn’t, and more importantly, what are the skills needed to lead rewarding data science careers. Is it just about learning Python/R and all the software tools available under the sun? Or is it about doing a degree in data science?
the CEO…was especially proud that his firm had hired 1,000 data scientists, each…cost of $250,000 a year…became apparent that the new hires were not delivering…discovered that they were not…data scientists [3]
It is fair to say that at this day and age, being fluent in “doing science” and using software tools for data analysis are merely the prerequisite. It may not even matter if you acquired those skills by doing a degree or through self-taught. To truly understand the full breadth of skills needed for the data science profession, we need to look at the field more holistically. In 2010, Mike Loukides shared his thoughts on all things data science and in amongst his notes, he wrote “Data scientists combine entrepreneurship with patience…and the ability to iterate over a solution….They can tackle all aspects…from initial data collection and data conditioning to drawing conclusions. They can…come up with new ways to view the problem…work with very broadly defined problems”.
There may be an element of bias here but I see 3 pillars of the field from his words. His mention of the ability to iterate conjures up the test-and-learn mindset through the use of the scientific method. A data scientist’s ability to “tackle all aspects” of working with data encompasses the techniques and algorithms and the corresponding software tools, which I simply refer to as the data analysis toolkit. Lastly, which is perhaps the missing piece, is the ability to both look at problems in new ways as well as dealing with broad problems, which sounds to me like problem framing. As the saying goes “A problem well-framed is a problem half-solved.”
To validate the 3 pillars, we pulled the definition of data science from 3 dictionaries and did a manual clustering of the key terms, which we performed together with the students during the guest lecture. During the exercise, the students’ attention was immediately drawn to terms like “mathematics”, “statistics”, “programming”, “machine learning” and handling “large amount of data”. It was not surprising as these are the core topics of any data science programmes, and they mapped nicely to the data analysis toolkit. The mentions of the scientific method and being scientific with the approach to analysis also stood out with the students. The least intuitive pillar for the students was the problem framing one. The terms like “valuable predictive information”, “useful information” and “learn about…behaviour” mentioned in the definitions essentially point to the need for data science professionals to stay close to the problem and outcome. As I have always said, data science is a field that demands proximity to the problems.
we think that data patterns are unusual and therefore meaningful. Patterns are, in fact, inevitable and therefore meaningless….good data scientists are not seduced by discovered patterns because they don’t put data before theory [4]
Problem framing is all about ensuring that we solve for the right problems so that outcomes can be achieved or goals met with the least amount of wastage. There are many resources out there about problem framing and disaggregation which will not be repeated here. By and large, it is an extremely collaborative and humbling exercise, where we challenge assumptions and navigate through constraints. Ensuring that you can be understood by others in the cross-functional team during the process is clearly a critical quality. These are essentially the 3 categories of know-how put forward in this article, to outline the qualities required in a data scientist to be able to lead or at least participate productively in problem framing and disaggregation.
A data scientist needs more to stand out
The scientific method offers a systematic approach for gathering evidence and coming up with answers to discreet, testable questions. As an example, say we found that through regression studies or online controlled experiments, the number of debtors of a small business is a key factor that drives their decision to pay or not for an e-invoicing product. The data scientist uses a combination of items from their toolkit to prepare the data, to perform regression analysis or set up A/B tests and so on to come up with the answer. One thing however is missing. Why was the question worth answering in the first place? How would the answer be used?
Coming up with the answers may be an intellectually invigorating exercise but in isolation, they are not worth much. Often, the answers from data science are part of solving bigger problems and achieving business outcomes. For example, answering the question of whether there is any causal relationship between debtor number and willingness to pay can be a piece of causal-predictive analysis work that is part of an opportunity solution tree or a driver/lever/hypothesis tree to improve the uptake of an e-invoicing product.
If you are ‘lucky’, you may work in teams where the problem framing and disaggregation has been done for you. As a data scientist, you get discreet, well-formed questions like the above to work on. This is one extreme. I have also experienced the other extreme where you are entrusted with broad complex or wicked problems to solve, like driving up conversion or stopping the decline in subscription renewal. This latter extreme is more common than you think. Obviously, which one you get depends a lot on the maturity of the executives and the product teams in the organisation.
This is potentially where the disconnect is. Science training focuses on helping students get better at “doing science” and for that to happen, the emphasis tends to be on the scientific method and iterating on the solution. The process often assumes a rather discrete problem and the rest of the work becomes a search for a satisfactory or optimal solution. Hence, when big, amorphous problems or goals land on a data scientist especially in a less mature organisation, they struggle. The ability to (re-)frame the problems, disaggregate them into smaller constituents and ultimately, come up with testable hypotheses for them to do their magic becomes essential.
with the growing popularity and diversity of data science, institutions have created dedicated data science programs [5]
As the technology and many other sectors are entering a period of renewed emphasis on return on investment and doing more with less, anyone who can contribute over and above what is expected to maximise the chances of success of projects can prove to be invaluable. Problem framing and the 3 underpinning qualities introduced in this article have been consistently identified as critical skills for data scientists in a professional setting [6][7][8], capable of making or breaking projects. From my experience working as a data scientist as well as hiring and serving alongside other data professionals, those who exhibit these skills are more highly valued, have better job satisfaction and career progression. For training providers, I believe these skills are totally teachable and should be taught more formally. As the demand for data scientists remain high, the field slowly converging and more courses and degrees are being offered, we should seize this opportunity to evolve and bolster our curriculums to better prepare the next generation of data science professionals.