Thursday 7 September 2017

Data Science Training in Hyderabad

Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes, and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to data mining.

The following is a comprehensive list of Data Science courses and resources that explain or teach skills within Data Science, such as machine learning, data mining, analytics, cleaning, visualization, scraping, using APIs to make data products, artificial intelligence, and much more.

Data Science: Dealing with unstructured and structured data, Data Science is a field that comprises of everything that related to data cleansing, preparation, and analysis. ... A buzzword that is used to describe immense volumes of data, both unstructured and structures, Big Data inundates a business on a day-to-day basis.



DATASCIENCE CONTENT:

 DESCRIPTIVE STATISTICS AND PROBABILITY DISTRIBUTIONS:

  • Introduction about Statistics
  • Different Types of Variables
  • Measures of Central Tendency with examples
  • Measures of Dispersion
  • Probability & Distributions
  • Probability Basics
  • Binomial Distribution and its properties
  • Poisson distribution and its properties
  • Normal distribution and its properties
  • INFERENTIAL STATISTICS AND TESTING OF HYPOTHESIS

  • Sampling methods
  • Different methods of estimation
  • Testing of Hypothesis & Tests
  • Analysis of Variance
  • COVARIANCE & CORRELATION

    PREDICTIVE MODELING STEPS AND METHODOLOGY WITH LIVE EXAMPLE:

  • Data Preparation
  • Exploratory Data analysis
  • Model Development
  • Model Validation
  • Model Implementation
  • SUPERVISED TECHNIQUES:

    MULTIPLE LINEAR REGRESSION

  • Linear Regression - Introduction - Applications
  • Assumptions of Linear Regression
  • Building Linear Regression Model
  • Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global hypothesis etc)
  • Validation of Linear Regression Models (Re running Vs. Scoring)
  • Standard Business Outputs (Decile Analysis, Error distribution (histogram), Model equation, drivers etc)
  • Interpretation of Results - Business Validation - Implementation on new data
  • Real time case study of Manufacturing and Telecom Industry to estimate the future revenue using the models
  • LOGISTIC REGRESSION - INTRODUCTION - APPLICATIONS

  • Linear Regression Vs. Logistic Regression Vs. Generalized Linear Models
  • Building Logistic Regression Model
  • Understanding standard model metrics (Concordance, Variable significance, Hosmer Lemeshov Test, Gini, KS, Misclassification etc)
  • Validation of Logistic Regression Models (Re running Vs. Scoring)
  • Standard Business Outputs (Decile Analysis, ROC Curve)
  • Probability Cut-offs, Lift charts, Model equation, drivers etc)
  • Interpretation of Results - Business Validation - Implementation on new data
  • Real time case study to Predict the Churn customers in the Banking and Retail industry
  • PARTIAL LEAST SQUARE REGRESSION

  • Partial Least square Regression - Introduction - Applications
  • Difference between Linear Regression and Partial Least Square Regression
  • Building PLS Model
  • Understanding standard metrics (Variable significance, R-square/Adjusted R-Square, Global hypothesis etc)
  • Interpretation of Results - Business Validation - Implementation on new data
  • Sharing the real time example to identify the key factors which are driving the Revenue
  • VARIABLE REDUCTION TECHNIQUES

    FACTOR ANALYSIS

    PRINCIPLE COMPONENT ANALYSIS

  • Assumptions of PCA
  • Working Mechanism of PCA
  • Types of Rotations
  • Standardization
  • Positives and Negatives of PCA
  • SUPERVISED TECHNIQUES CLASSIFICATION:

    CHAID

    CART

    DIFFERENCE BETWEEN CHAID AND CART

    RANDOM FOREST

  • Decision tree vs. Random Forest
  • Data Preparation
  • Missing data imputation
  • Outlier detection
  • Handling imbalance data
  • Random Record selection
  • Random Forest R parameters
  • Random Variable selection
  • Optimal number of variables selection
  • Calculating Out Of Bag (OOB) error rate
  • Calculating Out of Bag Predictions
  • COUPLE OF REAL TIME USE CASES WHICH ARE RELATED TO TELECOM AND RETAIL INDUSTRY. IDENTIFICATION OF THE CHURN.

    UNSUPERVISED TECHNIQUES:

    SEGMENTATION FOR MARKETING ANALYSIS

  •  Need for segmentation
  • Criterion of segmentation
  • Types of distances
  • Clustering algorithms
  • Hierarchical clustering
  • K-means clustering
  • Deciding number of clusters
  • Case study
  • BUSINESS RULES CRITERIA

    REAL TIME USE CASE TO IDENTIFY THE MOST VALUABLE REVENUE GENERATING CUSTOMERS.

    TIME SERIES ANALYSIS:

    TIME SERIES COMPONENTS( TREND, SEASONALITY, CYCLICITY AND LEVEL) AND DECOMPOSITION

    BASIC TECHNIQUES

  • Averages,
  • Smoothening etc
  • ADVANCED TECHNIQUES

  • AR Models,
  • ARIMA
  • UCM
  • Hybrid Model
  • UNDERSTANDING FORECASTING ACCURACY - MAPE, MAD, MSE ETC

    COUPLE OF USE CASES, TO FORECAST THE FUTURE SALES OF PRODUCTS

    TEXT ANALYTICS:

    GATHERING TEXT DATA FROM WEB AND OTHER SOURCES

    PROCESSING RAW WEB DATA

    COLLECTING TWITTER DATA WITH TWITTER API

    NAIVE BAYES ALGORITHM

  • Assumptions and of Naïve Bayes
  • Processing of Text data
  • Handling Standard and Text data
  • Building Naïve Bayes Model
  • Understanding standard model metrics
  • Validation of the Models (Re running Vs. Scoring)
  • SENTIMENT ANALYSIS

  • Goal Setting
  • Text Preprocessing
  • Parsing the content
  • Text refinement
  • Analysis and Scoring
  • USE CASE OF HEALTH CARE INDUSTRY, TO IDENTIFY THE SENTIMENT OF THE PATIENTS ON SPECIFIED HOSPITAL BY EXTRACTING THE DATA FROM THE TWITTER.

    VISUALIZATION USING TABLEAU:

    LIVE CONNECTIVITY FROM R TO TABLEAU

    GENERATING THE REPORTS AND CHARTS


    5+-REAL TIME PROJECTS BY USING DIFFERENT USE CASES