Dear Students and Friends
Recently
many students are aspiring for CSE, CSE(Data Science), CSE (AI-ML), etc. Yesterday
I was talking to a parent. He is not computer literate but he is
betting big on Data Science for his ward’s software career. He was bullish that
if his ward does BTech in Data Science, he gets a huge package!! I asked a
simple question-what is Data Science sir? Chuckling! He sincerely said, he
doesn’t know anything beyond this word! Moreover, he was not expecting this
question from me!
It
is not one parent, but many educated parents are getting into this domain
specific trap. Last year I made a YouTube video explaining the difference
between CSE, CSE (AI-ML), IT and CSE (Data Science)- https://www.youtube.com/watch?v=puRhpSSzHJ4 .
I am happy that I can see more than 15000 views of this video. However, I
thought we would write a special article on Data oriented careers. Please read
this!
Computer
Science is no longer a branch of other disciplines (earlier it was part of EEE
and then specialized to ECE and CSE etc. That was an old story). Similarly,
Data World is another dedicated discipline within computer science. People who
take up Data related careers need different qualities and traits.
For
e.g., If I want to take up a core algorithm development job using
some programming language, I require a lot of IQ and mathematical & logical
aptitude. If I have to do a Testing job, I need a lot of creativity
and patience. If I have to do a software project management job, I
require a lot of EQ and SQ. If I have to do an IT Infrastructure
related job, I require critical thinking, Problem-solving and
decision-making skills. If I have to do Data Family related jobs, I
require analytical thinking, visualization capability, mathematical modelling,
articulation and storytelling capability!
The
Data Family is huge! Let me try to explain concepts with a few examples.
Database:
Anywhere you need to store, analyze and retrieve data, you use a database. A
Phone book is a database. Your transactions in your bank account are a
database. When you go to the doctor, your diagnosis and treatment are stored in
a database. Your company sales is a database.
Datawarehouse:
Databases provide real-time data, whereas warehouses store data to be accessed
for big analytical queries including historical data. Data warehouses are
primarily designed to facilitate searches and analyses and usually contain
large amounts of historical data.
Datamart: A
data mart is very similar to a Datawarehouse, However, unlike a Datawarehouse,
the scope of visibility is limited. A data mart supplies subject-oriented data
necessary to support a specific business unit.
Data
Lake: A data lake stores an organization’s raw and
processed (unstructured and structured) data at both large and small scales.
Unlike a Datawarehouse or Database, a Data Lake captures anything the
organization deems valuable for future use including images, videos, PDFs, etc.
However, the technology used in a Data Lake is much more complex than in a
Datawarehouse.
Big
Data: Big Data in the simplest of words is huge amounts of
DATA. A Data Lake is a repository for Big Data.
Datamining,
Data Analytics/Data Analysis, Data Science, Data Thinking:
Data mining is a process of extracting useful information, patterns, and trends
from huge databases. It is a subset of Data Analytics/ Data Analysis. Data
Analysis is a subset of Data Science. Data Science + Design Thinking is now
termed as Data Thinking. Data Thinking is the new philosophy adapted in the
data world. It is a human centered way of dealing with data strategy (empathy,
synthesis, ideation, prototyping and testing).
Let
me share a simple example to understand these technical terms better:
ABC
marriage bureau stored all prospective brides and grooms’ information in a
Database. However, if you add their Horoscope, their
family history and all relevant information then it is called Datawarehouse.
If we want only one sect in a particular caste, brides and grooms’ details, it
is called Datamart. If we add entire social media details of all
brides and grooms belonging to one particular city, their Facebook and
Instagram’s photos, their educational degree proofs in pdf formats, their
employment details and certificates in multiple images and so on, it is called
a Data Lake. If we store all city profiles of this size into one
repository, then it is called Big Data.
If
we want to get meaningful information from a particular family/surname specific
detail, it is called Data Mining. If we want to bring some
visualizations/ analysis using some statistical models in a particular family
and their upbringing over a period of time then it is called data
analysis/data analytics. If we want to know how this family is going
to behave or be financially positioned in the next 10 years, or how they change
their lifestyle etc, then we have to use advanced statistical tools and it is
called Data Science. If we want to understand why one particular
segment of grooms are not getting married or why particular segment of pairs
are divorcing, then you have to deal that data with an empathetic view and
create an analysis, generate ideas to solve this problem, develop a few
solutions etc which is called as Design Thinking way of dealing Data.
Now this kind of view is called Data Thinking.
When Data
Science meets Design Thinking, it is called Data Thinking. Design
Think employs Human-centeredness, associated with art and creativity while Data
Science is a rigorous, quantitative discipline.
Hope
you understand how to develop a Data oriented career? Do You? Transition from
Database Management to Data Thinking! Will you?
Ravi Saripalle
No comments:
Post a Comment