Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.
Data Science: Dealing with unstructured and structured data, Data Science is a field that comprises of everything that related to data cleansing, preparation, and analysis. ... A buzzword that is used to describe immense volumes of data, both unstructured and structures, Big Data inundates a business on a day-to-day basis.
DESCRIPTIVE STATISTICS AND PROBABILITY DISTRIBUTIONS:
Introduction about Statistics
Different Types of Variables
Measures of Central Tendency with examples
Measures of Dispersion
Probability & Distributions
Probability Basics
Binomial Distribution and its properties
Poisson distribution and its properties
Normal distribution and its properties
INFERENTIAL STATISTICS AND TESTING OF HYPOTHESIS
Sampling methods
Different methods of estimation
Testing of Hypothesis & Tests
Analysis of Variance
PREDICTIVE MODELING STEPS AND METHODOLOGY WITH LIVE EXAMPLE:
Data Preparation
Exploratory Data analysis
Model Development
Model Validation
Model Implementation
MULTIPLE LINEAR REGRESSION
Linear Regression - Introduction - Applications
Assumptions of Linear Regression
Building Linear Regression Model
Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global hypothesis etc)
Validation of Linear Regression Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc)
Interpretation of Results - Business Validation - Implementation on new data
Real time case study of Manufacturing and Telecom Industry to estimate the future revenue using the models
LOGISTIC REGRESSION - INTRODUCTION - APPLICATIONS
Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
Building Logistic Regression Model
Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification etc)
Validation of Logistic Regression Models (Re running Vs. Scoring)
Standard Business Outputs (Decile Analysis, ROC Curve)
Probability Cut-offs, Lift charts, Model equation, drivers etc)
Interpretation of Results - Business Validation - Implementation on new data
Real time case study to Predict the Churn customers in the Banking and Retail industry
PARTIAL LEAST SQUARE REGRESSION
Partial Least square Regression - Introduction - Applications
Difference between Linear Regression and Partial Least Square Regression
Building PLS Model
Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global hypothesis etc)
Interpretation of Results - Business Validation - Implementation on new data
Sharing the real time example to identify the key factors which are driving the Revenue
VARIABLE REDUCTION TECHNIQUES
FACTOR ANALYSIS
PRINCIPLE COMPONENT ANALYSIS
Assumptions of PCA
Working Mechanism of PCA
Types of Rotations
Standardization
Positives and Negatives of PCA
CHAID
CART
DIFFERENCE BETWEEN CHAID AND CART
RANDOM FOREST
Decision tree vs. Random Forest
Data Preparation
Missing data imputation
Outlier detection
Handling imbalance data
Random Record selection
Random Forest R parameters
Random Variable selection
Optimal number of variables selection
Calculating Out Of Bag (OOB) error rate
Calculating Out of Bag Predictions
COUPLE OF REAL TIME USE CASES WHICH ARE RELATED TO TELECOM AND RETAIL INDUSTRY. IDENTIFICATION OF THE CHURN.
UNSUPERVISED TECHNIQUES:
SEGMENTATION FOR MARKETING ANALYSIS
Need for segmentation
Criterion of segmentation
Types of distances
Clustering algorithms
Hierarchical clustering
K-means clustering
Deciding number of clusters
Case study
BUSINESS RULES CRITERIA
REAL TIME USE CASE TO IDENTIFY THE MOST VALUABLE REVENUE GENERATING CUSTOMERS.
TIME SERIES COMPONENTS( TREND, SEASONALITY, CYCLICITY AND LEVEL) AND DECOMPOSITION
BASIC TECHNIQUES
Averages,
Smoothening etc
ADVANCED TECHNIQUES
AR Models,
ARIMA
UCM
Hybrid Model
UNDERSTANDING FORECASTING ACCURACY - MAPE, MAD, MSE ETC
COUPLE OF USE CASES, TO FORECAST THE FUTURE SALES OF PRODUCTS
TEXT ANALYTICS:
GATHERING TEXT DATA FROM WEB AND OTHER SOURCES
PROCESSING RAW WEB DATA
COLLECTING TWITTER DATA WITH TWITTER API
NAIVE BAYES ALGORITHM
Assumptions and of Naïve Bayes
Processing of Text data
Handling Standard and Text data
Building Naïve Bayes Model
Understanding standard model metrics
Validation of the Models (Re running Vs. Scoring)
SENTIMENT ANALYSIS
Goal Setting
Text Preprocessing
Parsing the content
Text refinement
Analysis and Scoring
USE CASE OF HEALTH CARE INDUSTRY, TO IDENTIFY THE SENTIMENT OF THE PATIENTS ON SPECIFIED HOSPITAL BY EXTRACTING THE DATA FROM THE TWITTER.
LIVE CONNECTIVITY FROM R TO TABLEAU
GENERATING THE REPORTS AND CHARTS