Azure Machine Learning - Data Camp
These are some notes that I typed up during an Azure Machine Learning - Data Camp that I attended at the Microsoft learning center in the Aon building. It was entertaining, informative, and all around a great session; unfortunately, it was entirely useless for my job, since this really has nothing to do with the work that I perform. On the plus side - I was able to recommend to my co-workers on the Analytics team that they should attend it.
Ross Loforte, Data Scientist, Microsoft https://github.com/melzoghbi/datacamp
-
Data science is about using data to make decisions that drive actions, involving Data Selection, Preprocessing, Transformation, Data Mining, Delivering value from data (interpretation and evaluation).
-
Challenges have been extremely complex overhead, expensive startup, and steep learning curves. The goal of Azure Machine Learning Services is to provide a simpler and scalable method.
-
Machine Learning is using known data to develop a model to predict unknown data.
- Known Data: Big enough archive, previous observations, past data
- Model: Known data + algorithms (ML algorithms)
- Unknown Data: Missing, unseen, not existing, future data
-
Microsoft's usage of ML: 1997 Hotmail spam filters, 2008 Bing maps routing, 2009 Bing search, 2010 Kinect
-
Azure Machine Learning's goal - Make machine learning accessible to every enterprise, data scientist, developer, information worker, consumer, and device anywhere in the world.
-
Expect that 75% of the time will be building the model and testing validity of predictions.
-
Cycle of Steps to building a machine learning solution
- Frame the problem
- Get and Prepare Data
- Develop the Model
- Analysis / Metric Definition
- Feature Engineering
- Model Training
- Parameter Tuning
- Evaluation
- Deploy Model
- Evaluate / Track Performance
-
ML Algorithm defines how your model will react
- Written in R or Python
-
Classification: answer yes/no
-
Regression: predict a numeric value
-
Anomaly Detection: value outside of bounds
-
Clustering: unsupervised learning, group like data