Categories
sharing

Data Science & Machine Learning-A (Quick) Introduction to the Course


Webinar by Prof. Philippe Rigollet

The goal of data science is to turn data into information, and information into insight. Through the various data in our new era of BIG DATA the data science helps you to increase revenue, provide insight, improve operations, open markets, process information, optimize cost, and many more. Through the amount of data, the user’s behavior can be defined as well.

Methods deep learning, clustering all the data science ecosystem. Many of the methods achieve the same pipeline. The following are some of the categories that we can make:

  1. Making sense of unstructured data | Clustering – PCA
    Principal Component Analysis (PCA)
  • Tool for dimension reduction
  • Useful to visualize data (in dimension at most 3)
  • Gain insight and select important variables
  • very efficient algorithmically
  • implemented in all statistical software
    Example of dimension reduction technique
  • ISOmap, Sammon mapping, Locally linear embedding, and t-SNE
    -CLUSTERING-
    Example Netflix Data
    infinite collection in data set, rated by million users. When we use the dimension reduction, we can see the cluster for each movie.
  1. Prediction to decisions | Classification | Deep Learning – Lasso
    Using the prediction formula. Where get those data? predict from the data using Regression -Linear regression or nonlinear regression.

Cookies collect the metadata. can be used also to predict
Use the Automatic variable selection using the lesson algorithm
However, sometimes the reality of the data does not give us a clean cut to predict. Then, here where we use Deep Learning to find representation where things are nicer.

So deep learning will learn the data and it needs much more data. Delicate to fit and good for images/speech.

  1. From data to decisions | Causal inference – Graphical models, A/B Testing

How to make a precise decision?
turn into a graphical model, where each node is a movie and relates
to each other. Connection of each node learned by the data.

“Imitate the superficial exterior of a process or system without having any understanding of the underlying substance” – Cargo Cult