Azure Machine Learning - Data Camp

These are some notes that I typed up during an Azure Machine Learning - Data Camp that I attended at the Microsoft learning center in the Aon building. It was entertaining, informative, and all around a great session; unfortunately, it was entirely useless for my job, since this really has nothing to do with the work that I perform. On the plus side - I was able to recommend to my co-workers on the Analytics team that they should attend it.

Ross Loforte, Data Scientist, Microsoft https://github.com/melzoghbi/datacamp

Data science is about using data to make decisions that drive actions, involving Data Selection, Preprocessing, Transformation, Data Mining, Delivering value from data (interpretation and evaluation).
Challenges have been extremely complex overhead, expensive startup, and steep learning curves. The goal of Azure Machine Learning Services is to provide a simpler and scalable method.
Machine Learning is using known data to develop a model to predict unknown data.
- Known Data: Big enough archive, previous observations, past data
- Model: Known data + algorithms (ML algorithms)
- Unknown Data: Missing, unseen, not existing, future data
Microsoft's usage of ML: 1997 Hotmail spam filters, 2008 Bing maps routing, 2009 Bing search, 2010 Kinect
Azure Machine Learning's goal - Make machine learning accessible to every enterprise, data scientist, developer, information worker, consumer, and device anywhere in the world.
Expect that 75% of the time will be building the model and testing validity of predictions.
Cycle of Steps to building a machine learning solution
- Frame the problem
- Get and Prepare Data
- Develop the Model
  - Analysis / Metric Definition
  - Feature Engineering
  - Model Training
  - Parameter Tuning
  - Evaluation
- Deploy Model
- Evaluate / Track Performance
ML Algorithm defines how your model will react
- Written in R or Python
Classification: answer yes/no
Regression: predict a numeric value
Anomaly Detection: value outside of bounds
Clustering: unsupervised learning, group like data