Wednesday, 3 August 2022

Do You Love Data? The 60 Years of Data Career Story! Every Computer Literate should know from Early “Database Management” to Contemporary “Data Thinking” in Simple Terms! Read this!!

Dear Students and Friends

Recently many students are aspiring for CSE, CSE(Data Science), CSE (AI-ML), etc. Yesterday I was talking to a parent. He is not computer literate but he is betting big on Data Science for his ward’s software career. He was bullish that if his ward does BTech in Data Science, he gets a huge package!! I asked a simple question-what is Data Science sir? Chuckling! He sincerely said, he doesn’t know anything beyond this word! Moreover, he was not expecting this question from me!

It is not one parent, but many educated parents are getting into this domain specific trap. Last year I made a YouTube video explaining the difference between CSE, CSE (AI-ML), IT and CSE (Data Science)- https://www.youtube.com/watch?v=puRhpSSzHJ4 . I am happy that I can see more than 15000 views of this video. However, I thought we would write a special article on Data oriented careers. Please read this!

Computer Science is no longer a branch of other disciplines (earlier it was part of EEE and then specialized to ECE and CSE etc. That was an old story). Similarly, Data World is another dedicated discipline within computer science. People who take up Data related careers need different qualities and traits.

For e.g., If I want to take up a core algorithm development job using some programming language, I require a lot of IQ and mathematical & logical aptitude. If I have to do a Testing job, I need a lot of creativity and patience. If I have to do a software project management job, I require a lot of EQ and SQ. If I have to do an IT Infrastructure related job, I require critical thinking, Problem-solving and decision-making skills. If I have to do Data Family related jobs, I require analytical thinking, visualization capability, mathematical modelling, articulation and storytelling capability!

The Data Family is huge! Let me try to explain concepts with a few examples.

Database: Anywhere you need to store, analyze and retrieve data, you use a database. A Phone book is a database. Your transactions in your bank account are a database. When you go to the doctor, your diagnosis and treatment are stored in a database. Your company sales is a database.

Datawarehouse: Databases provide real-time data, whereas warehouses store data to be accessed for big analytical queries including historical data. Data warehouses are primarily designed to facilitate searches and analyses and usually contain large amounts of historical data.

Datamart: A data mart is very similar to a Datawarehouse, However, unlike a Datawarehouse, the scope of visibility is limited. A data mart supplies subject-oriented data necessary to support a specific business unit.

Data Lake: A data lake stores an organization’s raw and processed (unstructured and structured) data at both large and small scales. Unlike a Datawarehouse or Database, a Data Lake captures anything the organization deems valuable for future use including images, videos, PDFs, etc. However, the technology used in a Data Lake is much more complex than in a Datawarehouse.

Big Data: Big Data in the simplest of words is huge amounts of DATA. A Data Lake is a repository for Big Data.

Datamining, Data Analytics/Data Analysis, Data Science, Data Thinking: Data mining is a process of extracting useful information, patterns, and trends from huge databases. It is a subset of Data Analytics/ Data Analysis. Data Analysis is a subset of Data Science. Data Science + Design Thinking is now termed as Data Thinking. Data Thinking is the new philosophy adapted in the data world. It is a human centered way of dealing with data strategy (empathy, synthesis, ideation, prototyping and testing). 

Let me share a simple example to understand these technical terms better:

ABC marriage bureau stored all prospective brides and grooms’ information in a Database. However, if you add their Horoscope, their family history and all relevant information then it is called Datawarehouse. If we want only one sect in a particular caste, brides and grooms’ details, it is called Datamart. If we add entire social media details of all brides and grooms belonging to one particular city, their Facebook and Instagram’s photos, their educational degree proofs in pdf formats, their employment details and certificates in multiple images and so on, it is called a Data Lake. If we store all city profiles of this size into one repository, then it is called Big Data.

If we want to get meaningful information from a particular family/surname specific detail, it is called Data Mining. If we want to bring some visualizations/ analysis using some statistical models in a particular family and their upbringing over a period of time then it is called data analysis/data analytics. If we want to know how this family is going to behave or be financially positioned in the next 10 years, or how they change their lifestyle etc, then we have to use advanced statistical tools and it is called Data Science. If we want to understand why one particular segment of grooms are not getting married or why particular segment of pairs are divorcing, then you have to deal that data with an empathetic view and create an analysis, generate ideas to solve this problem, develop a few solutions etc which is called as Design Thinking way of dealing Data. Now this kind of view is called Data Thinking.

When Data Science meets Design Thinking, it is called Data Thinking. Design Think employs Human-centeredness, associated with art and creativity while Data Science is a rigorous, quantitative discipline.

Hope you understand how to develop a Data oriented career? Do You? Transition from Database Management to Data Thinking! Will you?

Ravi Saripalle

No comments:

Post a Comment