Understanding Data Science

Here is my understanding about Data Science, Data Scientist and Machine Learning for Data Science.

Huge volume of data is available for the companies/agencies from the internal applications and other sources like social media/third party applications/external sources etc., The challenge is figuring out what to do with the data. The insights gained from proper data analysis have major implications for businesses/Govt Agencies/Research Firms. Proper data analysis takes careful consideration, appropriate infrastructure, and understanding of data science.

WHAT IS DATA SCIENCE?
Data science is more than just some industry buzzword. Data science is the application of advanced processes and technologies to extract knowledge or insights from large volumes of disparate datasets. Both scientific research and commercial operations rely on understanding and processing large amounts of data. Previous approaches were more basic, using purely statistical analysis and data mining, whereas data science enables the creation of advanced machine learning algorithms, along with predictive and prescriptive analytics, to give a much richer view of more complex data. When these techniques are applied to datasets using a powerful hardware and software analytics platform, large volumes of data can be processed very quickly to accelerate business decision-making and results.

This is the process of using data to draw conclusions/forecasting/predict outcomes. All the companies use data science upto some extent using traditional Business Intelligence/Reporting Tools, But these tools has some limitations like It can handle relational data only. Secondly, The data size is should be reasonable.

The big challenge with traditional BI Tools is that it can’t understand unstructured data like Social media/External sources data. Lately social Media is the big source of the data for understanding the world, market trends and making informed decisions.

It’s that you really have to understand when you have a data set, some big table of numbers and descriptions, what’s really going on behind those numbers and what they represent about the world. You combine computing and statistics and domain knowledge together to draw useful conclusions from data using computation as our primary tool.

Data Science has three core activities.

1. Identify Patterns: Exploration is figuring out what patterns exist in the data. When you have many observations about some phenomenon, what can you conclude about the phenomenon itself? Often times instead of just looking at large tables of numbers, we’ll draw data visualizations because it’s much easier to interpret lot of information at once if it’s portrayed in some kind of visual way.

2. Quantifying Patterns: Once we’ve found a pattern, we need to perform statistical inference, and that’s because some patterns are there just by chance and some are there because they’re a reflection of some underlying process that’s really interesting about the world. So the goal of statistical inference is to quantify whether the patterns that we observe during the exploration phase are reliable. If we collected more data, would we see this pattern again or not? The primary tool we have is randomization because by simulating random processes, we can see what kinds of patterns appear just by chance. If the pattern we observe is not the kind of thing that could just appear by chance, then we can conclude that it’s because of some robust or reliable pattern in the underlying phenomenon we want to study.

3. Prediction: Finally, we’ll perform prediction. This is where we have partial information about something we want to know, and we want to guess about the things we don’t know yet. Here we’re making informed guesses, quantitative guesses using a discipline called machine learning. Normally when we write programs, we just focus on the particular logic of what the computer should do, but machine learning is about not programming every detail, but instead using the data to make decisions or choice within that program. So when we write a program, for instance, to recognize speech or automatically translate languages or control a car or a robot, we don’t actually write down all the details of what to do, but instead use examples from the world to help computers automatically learn how to behave.

What does a Data Scientist do?
Transform data and creating models on top of the data. Typically as a data scientist, the majority of the time is spent dealing with data, munging data; doing what we call feature engineering, so taking that raw input data and transforming it in a way that we can actually provide value. A great deal of the data scientist time is spent cleaning data and getting data ready to actually be good enough to be trust worthy.

Sometimes it also has to do accessing data from multiple data sources and see what kind of data formulations that we need to do. Sometimes it would be a big data problem which you’ll have to parse the data into various algorithms and then make sense out of it. So this is on a day-to-day basis, making sense out of that data.

If you listen carefully to the data, it talks, it will tell you stories. So it is very interesting to see – watch out the user trends and see what all the users are doing, and then it’s not someone’s opinion in the business or something.

Data Science programming languages are R language and Python. The presentation layer may be any visualization tools like Microsoft Power BI/Tableau/SAP BusinessObjects/IBM Cognos/Oracle BI any other. These tools help to transform the data. Use Machine learning for modeling techniques Azure machine learning or Spark or again R, Python types of tools.

What is Machine Learning? How does it help the Data Science?
Machine Learning takes it one step further. If you have really large quantities of calculations to run, or large amounts of data collected to mine, machine learning can help you. This is the practice of encouraging machines to learn like humans do, with experiential data. Learning algorithms are designed to process large sets of test data that are used to draw conclusions about the world. Based on these conclusions, the machine can then process new information and make informed decisions. These programs can restructure themselves based on input learned from data sets. Machine learning applications are valuable to businesses for a few reasons.

i. Automate processes – This is maybe the first thing people will think about with machine learning. The computer can learn to automate workflow and clerical tasks. With Robotic Process Automation (RPA) and AI you can reduce the clerical tasks of your personnel and free them up for more strategic tasks.
ii. Identify Patterns – Machine learning algorithms can comb incredible quantities of data and make connections no human brain could detect. They can be used to solve problems, identify market trends, redesign your business model, or even diagnose diseases. They are most effective when directed to solve a specific problem or task.

SO HOW CAN YOU IMPLEMENT DATA SCIENCE AND MACHINE LEARNING?
Define your problem – Data science and machine learning work best when used as a targeted approach. Don’t get bogged down with too much data before you know what you’re looking for. Come to the table with a specific problem or question. Then you can gather the relevant data and ignore the noise.

Gather the data – Once you have identified your problem, it’s time to collect the data. You may already have this in your database. If so, great. If you find that you don’t have what you need, it’s time to look at ways to gather your relevant data. Send out customer surveys, add pertinent form questions on subscriptions, etc. The more specific, the better.

Process/analyze the data – Data scientists are trained to design algorithms and processes to extract what you need from your data sets. If you decide to hire one, it’s important to have the infrastructure in place to support them. You will get the most useful insight if you aren’t asking them to design your data warehouse from the ground up or take care of any IT needs. Let them do what they’ve been trained to do.

Make strategic changes – Use the insights you have learned from your analysis to effect data-driven changes. Relaunch your marketing campaign toward a more receptive audience. Restructure your operational processes. Redesign your business processes. Trust the data.