Machine Learning with Azure ML Studio

Directions on How to Build the Predictive Model In Microsoft Azure ML

  • Sign in to Microsoft Azure using your login credentials in the  Azure portal 
  • Create a workspace for you to store your work
    • In the upper-left corner of Azure portal, select + Create a resource.
    • Use the search bar to type Machine Learning.
    • Select Machine Learning.
    • In the Machine Learning pane, select Create to begin.
    • You will provide the following information below to configure your new workspace:
      • Subscription – Select the Azure subscription that you would like to use.
      • Resource group – Create a name for your resource group which will hold resources for your Azure solution.
      • Workspace name – Create a unique name that identifies your workspace.
      • Region – select the region closest to the users to reduce latency
      • Storage account – created by default
      • Key Vault – created by default
      • Application insights – created by default
    • When you have completed configuring the workspace, select Review + Create.
    • Review the settings and make any additional changes or corrections. Lastly, select Create. When deployment of workspaces has completed you will see the message “Your deployment is Complete”. Please see the visual below as a reference. 
  • To Launch your workspace, click Go to resource
  • Next, Click the blue Launch Studio button which is under Manage your Machine Learning Lifecycle. Now you are ready to begin!!!!
  • Click on Experiments in the left panel
  • Click on NEW in the lower left corner 
  • Select Blank Experiment. The new experiment is created with a default name. You can change the name at the top of the page. 
  • Upload the data above into Ml studio
    • Drag the datasets on to the experiment canvas. (We uploaded preprocessed data
    • If you would like to see what the data looks like, click on the outpost port at the bottom on the dataset and select Visualize. Given this data we are going to try and predict if there the IoT sensors have communication errors. 
  • Next, prepare the data
    • Remove unnecessary columns /data
      • Type “Select Columns” in the Search box  and select Select Columns in the Dataset  module, then drag and drop it on the canvas. This allows you to exclude any columns that you do not want in the model. 
      • Connect Select Columns in Dataset to the Data on the canvas.
    • Choose and Apply a Learning Algorithm
      • Click on Data Transformation in the left column 
        • Next, click on the drop down Manipulation 
        • Drag the Select Edit the Metadata (use this to change the metadata that is associated with columns inside the dataset. This changes the metadata inside Azure Machine Learning that tells the downstream components how to use the selected columns.)
      • Split the data 
        • Then, click on the drop down Sample and Split.
        • Choose Split Data and add it to the canvas and connect it to Edit the Metadata.
        • Click on Split Data and find the Fraction of rows in the output dataset and set it to .80. You are splitting the data to train the model using 80% of the data and test the model using 20% of the data.
  • Then you train the data 
    • Choose the drop down under Machine Learning
    • Choose the drop down under Initialize Model
    • Choose the drop down under Anomaly Detection 
    • Click on PCA- Based Anomaly Detection and add this to the canvas and connect with the Split data.  
    • Choose the drop down under Machine Learning
    • Choose the drop down under Initialize Model
    • Choose the drop down under Anomaly Detection 
    • Click on One-Class Support Vector machine and add this to the canvas and connect with the Split data.  
    • Choose the drop down under Machine Learning
    • Then, choose the drop down under Train
    • Click on Tune Model Hyperparameters and add this to the canvas and connect with the Split Data.
    • Choose the drop down under Machine Learning
    • Then, choose the drop down under Train
    • Click on Train Anomaly Detection Model
  • Then score the model 
    • Choose the drop down under Machine Learning
    • Then, choose the drop down – Score
    • Click on Score Model
  • Normalize the data
    • Choose the drop down under Data Transformation
    • Then, choose the drop down under Scale and Reduce
    • Click on Normalize Data
  • Evaluate the model – this will compare the one-class SVM and PCA – based anomaly detectors.
    • Choose the drop down under Machine Learning
    • Then, choose the drop down under Evaluate
    • Click on Evaluate Model
  • Click Run at the bottom of the screen to run the experiment. Below is how the model should look. Please click on the link to use our experiment (Experiment Name: IOT Anomaly Detection) for further reference.  This link requires that you have a Azure ML account.  To access the gallery, click the following public link:  https://gallery.cortanaintelligence.com/Experiment/IOT-Anomaly-Detection

Derek MooreErica Davis, and Hank Galbraith, authors.

Information Technology Management Strategies in Industry Series

Use cases for Information Technology in the energy and manufacturing industry continue to expand beyond typical financials, asset management and plant management applications. Now IT is being use more often in driving goals such as resourcefulness and efficiency in current business processes and products.

Starting next month until May, I will be releasing a series of blogs addressing IT Strategies in areas of business and Industry in manufacturing and energy.  The series will include:

  • Information Technology Management Strategies for Energy Management
  • Information Technology Management Strategies for Customer Service 
  • Information Technology Management Strategies for the Internet of Things and Smart Cities Initiatives 
  • Information Technology Management Strategies for Data Analysis in Manufacturing

The main objective of this series is to apply IT management knowledge to experiences in energy, manufacturing and customer service.  With the advent of new innovative solutions in areas such as data science and machine learning, there are more opportunities than ever to make these industries more resourceful, efficient and effective in its business processes and beyond.

I will post links to the blogs from this page for easy reference in the future.

Data Science Project tools for 2022

Software that will be used

Azure IoT Hub

Python

R Studio

Azure ML Studio

SAS Enterprise Miner

Python SciKit-Learn for Machine Learning (ML)

Python SciPy Numpy for data analytics (DA)

Python Parallel Processing and Distributed Computing

Hardware that will be used

Raspberry Pie 3

Arduino and Sensors

Mathematical Modeling techniques

PCA

SVM

Cluster Analysis

Deep Learning/NN

Data Pipelines

Kubernetes

Scala/Spark

Azure Events Hub

Kafka

DataSciCon.Tech 2017 Review

Saturday, December 2nd , 2017

DataSciCon.Tech is a data science conference held in Atlanta, Georgia Wednesday November 29th to Friday, December 1st and includes both workshops and conference lectures. It took place at the Global Learning Center on the campus of Georgia Tech.  This was the first year of this conference, and I attended to get a sense of the data science scene in Atlanta.  Overall, the experience was very enlightening and introduced me to the dynamic and intensive work being conducted in the area of data science.

IMG_1786

Keynote speaker Rob High, CTO of IBM Watson, discussing IBM Watson and Artificial Intelligence (DataSciCon.Tech 2017).

DataSciCon.Tech Workshops

Four workshop tracks were held Wednesday including Introduction to Machine Learning with Python and TensorFlow, Tableau Hands-on Workshop, Data Science for Discover, Innovation and Value Creation and Data Science with R Workshop.  I elected to attend the Machine Learning with Python with TensorFlow track.  TensorFlow is an open source software library for numerical computations using data flow graphs for Machine Learning.

To prepare for the conference, I installed the TensorFlow module downloaded from https://www.tensorflow.org/install.  In addition to TensorFlow, I downloaded Anaconda (https://www.anaconda.com/), a great Python development environment for those practicing data science programming and includes many of the Python data science packages such as Numpy and SciKit-Learn.

Among the predictive and classification modeling techniques discussed in the workshop:

  • Neural Networks
  • Naive Bayes
  • Linear Regression
  • k -nearest neighbor (kNN)  analysis

These modeling techniques are popular for classifying data and predictive analysis.    Few training sessions on Python, SciKit-Learn or Numpy go into these algorithms in detail due to the various math educational levels of the audience members.  For the course, we used Jupyter Notebook, a web-based python development environment which allows you to share and present your code and results using web services.  Jupyter Notebook can also be hosted in Microsoft Azure, as well as, in other cloud platforms such as Anaconda Cloud and AWS.  To host Python Jupyter Notebook in Azure sign into  https://notebooks.azure.com.

TensorFlow

TensorFlow has a series of functions that uses neural networks and machine learning to test, train and score models.  The advantage of TensorFlow is its ability to train models faster than other modules, which is a very big advantage since splitting data for training models is a process intensive operation. It is particularly powerful on the Graphics Processing Unit (GPU) architecture popular for Machine Learning and Deep Learning.

Download Tensorflow from http://tensorflow.org.  The website also includes a Neural Network Tensorflow sandlot at http://playground.tensorflow.org.

2017-11-29_16-11-32

source:  http://playground.tensorflow.org.  tensorflow.org (DataSciCon.Tech)

DataSciCon.Tech Sessions

I’m going to break down the sessions I attended into the main topics that were covered.  So this is a very high level, one hundred foot point-of-view of the topics covered at the conference.  My plan is to create a few more blogs on the topic that will go into my work as an aspiring data scientist/data architect.  All the information in this blog is based on information presented at the DataSciCon.Tech 2017 conference.

Machine Learning and Artificial Intelligence

The conference emphasized Artificial Intelligence and Machine Learning pretty heavily.  Artificial Intelligence was discussed more in theory and direct applications than design and development.  There were a few demonstrations of the popular IBM Watson Artificial Intelligence system; but I want to focus this blog primarily on Machine Learning, as it’s something that interests me and other data architects.  Artificial Intelligence and Machine Learning are both based on computerized learning algorithms.  Machine Learning uses past data to learn, predict events or identify anomalies.

Another key fact presented at the conference is the number of open source projects and companies that have produced software modules, libraries and packages devoted to the use and implementation of Machine Learning in business applications.  I strongly recommend anyone interested in learning more to research the software solutions discussed in this blog and how they can be implemented.

For those who are new to the concept of Machine Learning (like me), essentially it is defined as follows:

Machine Learning is a subset of Artificial Intelligence that focuses on creating models that learn and predict events based on past data without a human computer programmer having to change code to adapt to new events.  An example would be a spam filter learning new exploits and then blocking those exploits.

2017-12-02_8-17-02 Continue reading