Directions on How to Build the Predictive Model In Microsoft Azure ML
Sign in to Microsoft Azure using your login credentials in the Azure portal
Create a workspace for you to store your work
In the upper-left corner of Azure portal, select + Create a resource.
Use the search bar to type Machine Learning.
Select Machine Learning.
In the Machine Learning pane, select Create to begin.
You will provide the following information below to configure your new workspace:
Subscription – Select the Azure subscription that you would like to use.
Resource group – Create a name for your resource group which will hold resources for your Azure solution.
Workspace name – Create a unique name that identifies your workspace.
Region – select the region closest to the users to reduce latency
Storage account – created by default
Key Vault – created by default
Application insights – created by default
When you have completed configuring the workspace, select Review + Create.
Review the settings and make any additional changes or corrections. Lastly, select Create. When deployment of workspaces has completed you will see the message “Your deployment is Complete”. Please see the visual below as a reference.
To Launch your workspace, click Go to resource
Next, Click the blue Launch Studio button which is under Manage your Machine Learning Lifecycle. Now you are ready to begin!!!!
Click on Experiments in the left panel
Click on NEW in the lower left corner
Select Blank Experiment. The new experiment is created with a default name. You can change the name at the top of the page.
Upload the data above into Ml studio
Drag the datasets on to the experiment canvas. (We uploaded preprocessed data
If you would like to see what the data looks like, click on the outpost port at the bottom on the dataset and select Visualize. Given this data we are going to try and predict if there the IoT sensors have communication errors.
Next, prepare the data
Remove unnecessary columns /data
Type “Select Columns” in the Search box and select Select Columns in the Dataset module, then drag and drop it on the canvas. This allows you to exclude any columns that you do not want in the model.
Connect Select Columns in Dataset to the Data on the canvas.
Choose and Apply a Learning Algorithm
Click on Data Transformation in the left column
Next, click on the drop down Manipulation
Drag the Select Edit the Metadata (use this to change the metadata that is associated with columns inside the dataset. This changes the metadata inside Azure Machine Learning that tells the downstream components how to use the selected columns.)
Split the data
Then, click on the drop down Sample and Split.
Choose Split Data and add it to the canvas and connect it to Edit the Metadata.
Click on Split Data and find the Fraction of rows in the output dataset and set it to .80. You are splitting the data to train the model using 80% of the data and test the model using 20% of the data.
Then you train the data
Choose the drop down under Machine Learning
Choose the drop down under Initialize Model
Choose the drop down under Anomaly Detection
Click on PCA- Based Anomaly Detection and add this to the canvas and connect with the Split data.
Choose the drop down under Machine Learning
Choose the drop down under Initialize Model
Choose the drop down under Anomaly Detection
Click on One-Class Support Vector machine and add this to the canvas and connect with the Split data.
Choose the drop down under Machine Learning
Then, choose the drop down under Train
Click on Tune Model Hyperparameters and add this to the canvas and connect with the Split Data.
Choose the drop down under Machine Learning
Then, choose the drop down under Train
Click on Train Anomaly Detection Model
Then score the model
Choose the drop down under Machine Learning
Then, choose the drop down – Score
Click on Score Model
Normalize the data
Choose the drop down under Data Transformation
Then, choose the drop down under Scale and Reduce
Click on Normalize Data
Evaluate the model – this will compare the one-class SVM and PCA – based anomaly detectors.
Choose the drop down under Machine Learning
Then, choose the drop down under Evaluate
Click on Evaluate Model
Click Run at the bottom of the screen to run the experiment. Below is how the model should look. Please click on the link to use our experiment (Experiment Name: IOT Anomaly Detection) for further reference. This link requires that you have a Azure ML account. To access the gallery, click the following public link: https://gallery.cortanaintelligence.com/Experiment/IOT-Anomaly-Detection
Part two of a series of LinkedIn articles based on Cognitive Computing and Artificial Intelligence Applications
Background
Several high profile incidents of ransomware attacks have called attention to IoT networks security. An assessment of security vulnerabilities and penetration testing have become increasingly important to sufficient design. Most of this assessment and testing takes place at the software and hardware level. However, a more broad approach is vital to the security of IoT networks. The protocol and traffic analysis is of importance to structured dedicated IoT networks since communication and endpoints are tracked and managed. Understanding all the risks posed to these types of network allows for more complete risk management plan and strategy. Beside network challenges, there are challenges to scalability, operability, channels and also the information being transmitted and collected with such networks. In IoT networks, looking for vulnerabilities spans the network architecture, endpoint devices and services, where services include the hardware, software and processes that build an overall IoT architecture. Building a threat assessment or map, as part of an overall security plan, as well as, updating it on a schedule basis allows security professionals and stakeholders to manage for all possible threats to the architecture. Whenever possible, creating simulations of possible attack vectors, understanding the behavior of such attacks and then creating models will help build upon a overall security management plan.
Open ports, SQL injection flaws, unencrypted services, insecure network interfaces, buffer overflow risks, lack of firewall protocols, authorization settings, web interface insecurity are among some of the types of vulnerabilities in an IoT network and devices.
Where is the location of a impending attack? Is it occurring at the device, server or service? Is it occurring in the location where the data is stored or while the data is in transit? What type of attacks can be identified? Types of attacks include distributed denial of service, man-in-the-middle, ransomware, botnets, spoofing, account penetrations, etc.
Business Use Case
For this business use case research study, a fictional company was created. The company is a national farmland and agricultural cooperative that supplies food to local and state markets. Part of the company’s IT infrastructure is an IoT network that uses endpoint devices for monitoring and controlling temperature, humidity and moisture for the company’s large agricultural farmlands. This network has over 2000 IoT devices in operations on 800 acres. Any intrusion into the network by a rogue service or bad actor could have consequences in regards to delivering fresh produce with quality and on time. The network design in the simulation below is a concept of this agricultural network. Our team created a simulation network using Cisco Packet Tracer, a tool which allows users to create and simulate package traffic throughout a computerized network at multiple ISO levels.
Simulated data was generated for using the packet tracer simulator to track and build. In the simulation network below using multiple routers, switches, servers and IoT devices for packets such as TCP, UDP, RIPv4 and ICMP, for instance.
Network Simulation
Below is a simulation of packet routing throughout the IoT network.
Problem Statement
Our fictional company will be the basis of our team’s mock network for monitoring for intrusions and anomaly. Being a simulated IoT network, it contains only a few dozen IoT enabled sensors and devices such as sprinklers, temperature and water level sensors, and drains. Since our model will be designed for large scale IoT deployment, it will be trained on publicly available data, while the simulated data will serve as a way to score the accuracy of the model. The simulation has the ability to generate the type of threats that would create anomalies. It is important to distinguish between an attack and a known issue or event (see part one of this research for IoT communication issues). The company is aware of those miscommunications and has open work orders for them. The goal is for our model is to be able to detect an actual attack on the IP network by a bad actor. Although miscommunication is technically an anomaly, it is known by the IT staff and should not raise an alarm. Miscommunicating devices are fairly easy to detect, but to a machine learning or deep learning model, it can be a bit more tricky. Creating a security alarm for daily miscommunication issues that originate from the endpoints, would constitute a prevalence of false positives (FP) in a machine learning confusion matrix.
A running simulation
Project Significance and Implementation
In today’s age of modern technology and the internet, it is becoming increasingly more difficult to protect enterprise networks against malicious attacks. Not only are malicious actors becoming more advanced with the methodologies of their attacks, but also the number IoT devices that live and operate in a business environment is ever increasing. It needs to be a top priority for any business to create an IT business strategy that protects the company’s technical architecture systems and core intellectual property. When accessing all potential security weakness, you must decompose the network model and define trust zones within the IoT architecture.
This application was designed to use Microsoft Azure Machine Learning analyze and detect anomalies in large data sets collected from all devices on the business’ network. In an actual implementation of this application, there would be a constant data flow running through our predictive model to classify traffic as Normal, Incorrect Setup, Distributed Denial of Service (DDOS attack), Data Type Probing, Scan Attack, or Man in the Middle. Using a supervised learning method to iteratively train our model, the application would grow increasingly more cognitive, and accurate at identifying these network traffic patterns correctly. If this system were to be fully implemented, there would need to also be actions for each of these classification patterns. For instance, if the model detected a DDOS attack coming from a certain device, the application would automatically send shutdown commands to the device, thus isolating it from the network and containing the attack. When these actions occur, there would be logs taken, and notifications automatically sent to appropriate IT administrators and management teams, so that quick and effective action could be taken. Applications such as the one we have designed are already being used throughout the world by companies in all sectors. Crowdstrike for instance, is a cyber technology company that produces Information Security applications with machine learning capabilities. Cyber technology companies such as Crowdstrike have grown ever more popular over the past few years as the number of cyber attacks have increased. We have seen first hand how advanced these attacks can be with data breaches on the US Federal government, Equifax, Facebook, and more. The need for advanced information security applications is increasing daily, not just for large companies, but small- to mid-sized companies as well. While outsourcing information security is an easy choice for some companies, others may not have the budget to afford such technology. That is where our application gives an example of the low barrier to entry that can be attained when using machine learning applications, such as Microsoft Azure ML or IBM Watson. Products such as these create relatively easy interfaces for IT Security Administrators to take the action into their own hands, and design their own anomaly detection applications. In conclusion, our IOT Network Anomaly Detection Application is an example of how a company could design and implement it’s own advanced cyber security defense applications. This would better enable any company to protect it’s network devices, and intellectual property against the ever growing malicious attacks.
Methodology
For this project, our team acquired public data from Google, Kaggle and Amazon. For the IoT model, preprocessed data was selected for the anomaly detection model. Preprocessed data from the Google open data repository was collected to test and train the models. R Studio programming served as an initial data analysis and data analytics process to determine Receiver Operating Characters (ROC) and Area Under the Curve (AUC) and evaluate the sensitivity and specificity of the models for scoring the predictability of the response variables. In R, predictability was compared between with logistic regression, random forest, and gradient boosting models. In the preprocessed data, a predictor (normality) variable was used for training and testing purposes. After the initial data discovery stage, the data was processed by a machine learning model in Azure ML using support vector machine and principal component analysis pipelines for anomaly detection. The response variable has the following values:
Normal – 0
Wrong Setup – 1
DDOS – 2
Scan Attack – 4
Man in the Middle – 5
The preprocessed dataset for intrusion detection for network-based IoT devices includes ultrasonic sensors using Arduino microcontrollers and Node MCU, a low-cost open source IoT platform that can run on the ESP8266 Wi-Fi Module used to send data.
The following table represents data from the ethernet frame which is part of the TCP/IP packet that is transmitted from a source device to a destination device for network communication. The following dataset is preprocessed according to the network intrusion detection based system.
The following table represents data from the ethernet frame which is part of the TCP/IP packet that is transmitted from a source device to a destination device for network communication.
Source: Google.com
In the next article, we’ll be exploring the R code and Azure ML trained anomaly detection models in greater depth.