The phenomenon known as big data has been growing exponentially.
Reports suggest that, by 2020, there will be around 40 trillion gigabytes of data, and the big data analytics market is set to reach $103 billion by 2023!
This has been further fueled by the fact that it has become extremely economical to store and analyze petabyte & zettabyte-sized data. Once the data is stored and analyzed, organizations need someone to crunch the numbers and provide actionable insights on that. This is where Data Science comes into the picture.
With the growing amount of data, the need for qualified data scientists is also growing. No wonder, Data Scientists have been termed as the sexiest job of the 21st century. The demand for the role of Data Scientist is only going to increase from here on, and if various research reports are anything to go by, then we already have a shortage of Data Science talent. The demand for the skill has far surpassed the supply of the same.
The growing interest in the field also fuels a lot of misconceptions. While there is already a lot of literature about what Data Science means, in this blog, we will take a look at what Data Science is not.
Data Science is NOT AI/ ML
Quite often, the terms data science, artificial intelligence, and machine learning are used interchangeably. However, they are not the same. We do agree on the fact that there is a lot of overlap between the three. Data Science is more of a general field where we require a lot of human intelligence to work in tandem with machines. Use cases to emphasize this case in point are fraud detection, churn prediction, etc. AI and ML become more niche and are used for repetitive tasks, which, as of now, do not require human intervention. AI use cases like chatbots, and ML use cases like recommendation engines can throw light on the difference.
Data Science is NOT a silver bullet
Organizations often make the mistake of thinking that Data Science is the panacea to all the problems they are facing. Like what Simon Sinek said, one needs to first understand “why”. Before starting their data science initiatives, organizations need to first define their business case and the problem they plan to solve using data science. This definition then helps in defining the data attributes to be collected and the analysis of the same. Without a clear problem statement, investments in skills, domain knowledge, and top management involvement, there are high chances it will end up being another failed initiative.
Data Science is NOT just about Visualization
Visualization of data, powered through tools such as Power BI or Tableau, is all good, but it should aid the process of decision-making, and only then it can have a direct bearing on the bottom line. Data Science, by its very meaning, implies that human intelligence becomes imperative for drawing insights from them. Visualization aids that, but data science is not only about visualization. Visualization is about reporting the various data points available to the organization. However, it cannot be either predictive or prescriptive without the Data Scientist with domain understanding adding value.
Data Science is NOT just R or Python
There is a common perception in the market that people with expertise in languages such as Python, R, SPSS, etc. are data scientists. This is very far from the truth. These languages are merely tools for doing data analysis. But one needs to know the problem statement to know which algorithm, model, or methodology will best suit the purpose. For example, for an OOT provider, Data Scientist is supposed to draw a decision tree to understand what the customers are watching on an OTT channel, create clusters to understand the demographics, and do a regression to understand the causal effect relationship.
Data Science is NOT only about Data
Data ingestion plays a significant role in data science. To get a complete view, organizations need to analyze data available internally, and data also needs to be collected from sources outside the organization. The internal data may be lying in siloes across various systems such as ERP, CRM, etc. For a complete analysis, all the data needs to be collected and reconciled. But like all the others, this is a crucial subset of Data Science but not data science per se. Based on the end goal and what analysis needs to be conducted, the tools, as well as the data needed for the tools, will change.
One needs to acknowledge that Data Management is a huge ask. There are several tools available in the market that can help organizations manage their data. This includes cleansing, scrubbing, de-duplication of the data. Data management is essential for data science and is an integral part of the process, but needless to say, this is not data science in its entirety. Data Scientists are aided by the quality of the data, which is enhanced through robust data management.
Data Science is NOT Statistics
An ace Data Scientist needs to possess three critical skills – 1) Strong programming knowledge and skills to run the models 2) Understanding of statistics to run the correct statistical tools, and 3) Strong domain knowledge to come up with the correct inference. Many believe that knowledge and understanding of any one of these skills can make you a Data Scientist, which is not valid.
How many such myths did you believe about data science?