Azure Machine Learning - Data Camp

Never Stop Learning

These are some notes that I typed up during an Azure Machine Learning - Data Camp that I attended at the Microsoft learning center in the Aon building. It was entertaining, informative, and all around a great session; unfortunately, it was entirely useless for my job, since this really has nothing to do with the work that I perform. On the plus side - I was able to recommend to my co-workers on the Analytics team that they should attend it.

Ross Loforte, Data Scientist, Microsoft

  • Data science is about using data to make decisions that drive actions, involving Data Selection, Preprocessing, Transformation, Data Mining, Delivering value from data (interpretation and evaluation).
  • Challenges have been extremely complex overhead, expensive startup, and steep learning curves. The goal of Azure Machine Learning Services is to provide a simpler and scalable method.
  • Machine Learning is using known data to develop a model to predict unknown data.
    • Known Data: Big enough archive, previous observations, past data
    • Model: Known data + algorithms (ML algorithms)
    • Unknown Data: Missing, unseen, not existing, future data
  • Microsoft’s usage of ML: 1997 Hotmail spam filters, 2008 Bing maps routing, 2009 Bing search, 2010 Kinect
  • Azure Machine Learning’s goal - Make machine learning accessible to every enterprise, data scientist, developer, information worker, consumer, and device anywhere in the world.

  • Expect that 75% of the time will be building the model and testing validity of predictions.
  • Cycle of Steps to building a machine learning solution
    • Frame the problem
    • Get and Prepare Data
    • Develop the Model
      • Analysis / Metric Definition
      • Feature Engineering
      • Model Training
      • Parameter Tuning
      • Evaluation
    • Deploy Model
    • Evaluate / Track Performance
  • ML Algorithm defines how your model will react
    • Written in R or Python
  • Classification: answer yes/no
  • Regression: predict a numeric value
  • Anomaly Detection: value outside of bounds
  • Clustering: unsupervised learning, group like data