Rovani's Sandbox

Rovani's Sandbox

About Me
Projects
If you like what you see (view the About Me page for more) and you're interested in exploring opportunities for us to partner up, please reach out to me at david@rovani.net or message me on LinkedIn #OpenToWork

Azure Machine Learning - Data Camp

These are some notes that I typed up during an Azure Machine Learning - Data Camp that I attended at the Microsoft learning center in the Aon building. It was entertaining, informative, and all around a great session; unfortunately, it was entirely useless for my job, since this really has nothing to do with the work that I perform. On the plus side - I was able to recommend to my co-workers on the Analytics team that they should attend it.

Ross Loforte, Data Scientist, Microsoft https://github.com/melzoghbi/datacamp

  • Data science is about using data to make decisions that drive actions, involving Data Selection, Preprocessing, Transformation, Data Mining, Delivering value from data (interpretation and evaluation).

  • Challenges have been extremely complex overhead, expensive startup, and steep learning curves. The goal of Azure Machine Learning Services is to provide a simpler and scalable method.

  • Machine Learning is using known data to develop a model to predict unknown data.

    • Known Data: Big enough archive, previous observations, past data
    • Model: Known data + algorithms (ML algorithms)
    • Unknown Data: Missing, unseen, not existing, future data
  • Microsoft's usage of ML: 1997 Hotmail spam filters, 2008 Bing maps routing, 2009 Bing search, 2010 Kinect

  • Azure Machine Learning's goal - Make machine learning accessible to every enterprise, data scientist, developer, information worker, consumer, and device anywhere in the world.

  • Expect that 75% of the time will be building the model and testing validity of predictions.

  • Cycle of Steps to building a machine learning solution

    • Frame the problem
    • Get and Prepare Data
    • Develop the Model
      • Analysis / Metric Definition
      • Feature Engineering
      • Model Training
      • Parameter Tuning
      • Evaluation
    • Deploy Model
    • Evaluate / Track Performance
  • ML Algorithm defines how your model will react

    • Written in R or Python
  • Classification: answer yes/no

  • Regression: predict a numeric value

  • Anomaly Detection: value outside of bounds

  • Clustering: unsupervised learning, group like data