Anomaly and Intrusion Detection in IoT Networks with Enterprise Scale Endpoint Communication

This is part one of a series of articles to be published on LinkedIn based on a classroom project for ISM 647: Cognitive Computing and Artificial Intelligence Applications taught by Dr. Hamid R. Nemati at the University of North Carolina at Greensboro Bryan School of Business and Economics.

The Internet of Things (IoT) continues to be one of the most innovative and exciting areas of technology in the last decade. IoT are a collection of devices that reside in the world that collect data from the environment around it or through mechanical, electrical, thermodynamic or hydrological processes. These environments could be the human body, geological areas, the atmosphere, etc. The networking of IoT devices has been more prevalent in the many industries for years including the gas, oil and utilities industry. As companies create demand for higher sample read rates of data from sensors, meters and other IoT devices and bad actors from foreign and domestic sources have become more prevalent and brazen, these networks have become vulnerable to security threats due to their increasing ubiquity and evolving role in industry. In addition to this, these networks are also prone to read rate fluctuations that can produce false positives for anomaly and intrusion detection systems when you have enterprise scale deployment of devices that are sending TCP/IP transmissions of data upstream to central office locations. This paper focuses on developing an application for anomaly detection using cognitive computing and artificial Intelligence as a way to get better anomaly and intrusion detection in enterprise scale IoT applications.

This project is to use the capabilities of automating machine learning to develop a cognitive application that addresses possible security threats in high volume IoT networks such as utilities, smart city, manufacturing networks. These are networks that have high communication read success rates with hundreds of thousands to millions of IoT sensors; however, they still may have issues such as:

  1. Noncommunication or missing/gap communication.
  2. Maintenance Work Orders
  3. Alarm Events (Tamper/Power outages)

In large scale IoT networks, such interruptions are normal to business operations. Certainly, noncommunication is typically experienced because devices fail, or get swapped out due to a legitimate work order. Weather events and people, can also cause issues with the endpoint device itself, as power outages can cause connected routers to fail, and tampering with a device, such as people trying to do a hardwire by-pass or removing a meter.

The scope of this project is to build machine learning models that address IP specific attacks on the IoT network such as DDoS from within and external to the networking infrastructure. These particular models should be intelligent enough to predict network attacks (true positive) versus communication issues (true negative). Network communication typical for such an IoT network include:

  1. Short range: Wi-Fi, Zigbee, Bluetooth, Z-ware, NFC.
  2. Long range: 2G, 3G, 4G, LTE, 5G.
  3. Protocols: IPv4/IPv6, SLIP, uIP, RLP, TCP/UDP.

Eventually, as such machine learning and deep learning models expand, these types of communications will also be monitored.

Scope of Project

This project will focus on complex IoT systems typical in multi-tier architectures within corporations. As part of the research into the analytical properties of IT systems, this project will focus primarily on the characteristics of operations that begin with the collection of data through transactions or data sensing, and end with storage in data warehouses, repositories, billing, auditing and other systems of record. Examples include:

  1. Building a simulator application in Cisco Packet Tracer for a mock IoT network.
  2. Creating a Machine Learning anomaly detection model in Azure.
  3. Generating and collecting simulated and actual TCP/IP network traffic data from open data repositories in order to train and score the team machine learning model.

Other characteristics of the IT systems that will be researched as part of this project, include systems that preform the following:

  1. Collect, store, aggregate and transport large data sets
  2. Require application integration, such as web services, remote API calls, etc.
  3. Are beyond a single stack solution.

Next: Business Use Cases and IoT security

Derek MooreErica Davis, and Hank Galbraith, authors.

Deep Learning, Oracle Database Performance and the Future of Autonomous Databases

“The goal is to have databases in the Cloud run autonomously.  The Cloud should be about scale, elasticity, statelessness and ease of operation and interoperability.  Cloud infrastructures are about moving processes into microservices and the agile deployment of business services.  Deep Learning has the potential to give databases innovative and powerful level autonomy in a multitenant environment, allowing DBAs the freedom to offer expertise in system architecture and design…”.

Introduction

This article details initial research performed using deep learning algorithms to detect anomalies in Oracle performance.  It does not serve as a “deep” dive into deep learning and machine learning algorithms.  Currently, there are many really good resources available from experts on the subject matter and I strongly recommend those who are interested in learning more about these topics to check out the list of references at the end of this article.  Mathematical terminology is used throughout this article (it’s almost impossible to avoid), but I attempted to keep the descriptions brief, as it’s best that people interested in these topics seek to out the rich resources available online to get a better breadth of information on individual subjects.

 

In this final article on Oracle performance tuning and machine learning, I will discuss the application of deep learning models in predicting performance and detecting anomalies in Oracle.  Deep Learning is a branch of Machine Learning method that uses intensive Artificial Intelligence (AI) techniques with data to learn iteratively; while deploying optimization and minimization functions. Applications for these techniques include natural language processing, image recognition, self-driving cars, anomaly and fraud detection.  With the number of applications for deep learning models growing substantially in the last few years, it was only a matter of time that it would find its way into relational databases.   Relational databases have sort of become the workhorses of the IT industry and still generate massive amounts of revenue.  Many data-driven applications still use some type of relational database; even with the growth of Hadoop and NoSQL databases.  It’s been a business goal of Oracle Corporation, one of the largest relational database software companies in the world, to create database services that are easier manage, secure and operate.

As I mentioned in my previous article, Oracle Enterprise Edition has a workload data repository that it already uses to produce great analysis for performance and workload.  Microsoft SQL-Server also has a warehouse that can store performance data, but I’ve decided to devote my research into Oracle.

For this analysis, the focus was specifically on the Oracle Program Global Area (PGA).

Oracle Program Global Area

 

The Program Global Area (PGA) is a private memory in the database that contains information for server processes.  Each user session gets a private memory region within the PGA.  Oracle will read and write information to the PGA based on requests from server processes.  The PGA performance metrics accessed for this article are based on Oracle Automatic Shared Memory Management (ASMM).

As a DBA, when troubleshooting PGA performance, I typically look at the PGA advisor, which are a series of modules that collects monitoring and performance data from PGA.  It recommends how large the PGA should be in order to fulfill process requests for private memory and is based on the Cache Hit Percentage value.

 

Methodology

 

The database was staged in a Microsoft Azure virtual machine processing large scale data from a data generator.  Other data was compiled from public portals such as EAI (Energy Administration Institute) and PJM Interconnection, an eastern regional transmission organization.

Tools used to perform the analysis include SAS Enterprise Miner, Azure Machine Learning studio and the SciKit Learn with TensorFlow machine learning libraries.  I’ve focused my research on a few popular techniques for which I continuously do research.  These include

  • Recurrent Neural Networks
  • Autoencoders
  • K-Nearest Neighbors
  • Naїve Bayes
  • Principal Component Analysis
  • Decision Trees
  • Support Vector Machines
  • Convolutional Neural Network
  • Random Forest

For this research into databases, I focused primarily on SVM, PCA and CNN. The first step was to look at the variable worth (the variables that had the greatest weight on the model) for data points per sample.

 

picture2

 

 

The analysis of Oracle Performance data on Process Memory within dedicated process memory in Oracle in the program global area of the database.

Once the data was collected, cleaned, imputed and partitioned, Azure ML studio was used to build two types of classifiers for anomaly detection.

 

Support Vector Machine (SVM):  Implements a binary classifier where the training data consists of examples of only one class (normal data).  The model attempts to separate the collection of training data from the origin using maximum margin.

 

Principal Component Analysis (PCA): Create subspace spanned by orthonormal eigenvectors associated with the top eigenvalues of the data covariance matrix for approximation of classifiers.

 

For prediction, I compared Artificial Neural Networks and Regression models.  For Deep Learning, I researched the use of CNN specifically for anomaly detection.

 

Deep Learning and Oracle Database Performance Tuning

My article Using Machine Learning and Data Science for Performance Tuning in Oracle  discusses the use of Oracle’s automated workflow repository, a data warehouse which stores snapshots of views for SQL, O/S and system state and active session history among many other areas of system performance.  Standard data science methods require having a strong understanding of business processes through qualitative and quantitative methods, cleaning data to find outliers and missing values, and applying data partitioning strategies to get better data validation and scoring of models.  As a final step, a review of the results would be required to determine its hypothetical testing accuracy.

 

Deep Learning has changed these methodologies a bit by applying artificial intelligence into building models.  These models learn from iteratively training as data moves from hidden layers with activation functions from input to output.  The hidden layers in this article are convolutional and are specific to spatial approximations such as convolution, pooling and fully connected layers (FCL).  This has opened many opportunities to automate a lot of the steps typically used in typical data science models.  If there is data generated which would require interpretation by a human operator, this can now be interpreted using deep neural networks at much higher rates that can possibly be done by a human operator.

 

Deep Learning is a subset of Machine Learning which is loosely based on how neurons learn in in the brain.  Neural networks have been around for decades but have just recently gained popularity in the information technology for its ability to identify and classify images.  Image data has exploded with the increase in social media platforms, digital images and image data storage.  Imaging data, along with text data how a multitude of applications in the real world, so there is no shortage of work being done in this area.  The latest popularity of neural networks can be attributed to Alexnet, a deep neural network that on the ImageNet classification challenge for achieving low error rates on the ImageNet dataset.

 

With anomaly detection, the idea is to train a deep learning models to detect anomalies without overfitting data.  As the model iterates through the layers of a deep neural network, cost functions help to determine how close it is classifying real-world data.  The model should have no prior knowledge of the processes and should be iteratively trained in the data for the cost functions from input arrays and activation functions of other previous layers [7].

 

Anomaly detection is the process of detecting outliers in the data streams such as financial transactions and network traffic. It can also be applied to deviations in system performance for the purpose of this article.

 

Predictive Analysis versus Anomaly Detection

Using predictive analytics to model targets through supervised learning techniques is most useful in planning for capacity and performing aggregated analysis of resource consumption and database performance.  For the model, we analyzed regression and neural network models to determine how well each one scored based on inputs from PGA metrics.

Predictive analysis requires cleansing of data, supervised and non-supervised classification, imputation and variable worth selection to create model. Most applications can be scored best with linear or logistic regression.  In the analysis on PGA performance, I found a logistic regression model scored better than an artificial neural network for predictive ability.

 

picture3

 

In my previous article, I mentioned the role that machine learning and data science can play in Oracle performance data.

  1. Capacity Planning and IT Asset Planning.
  2. Performance Management
  3. Business Process Analysis

The fourth application for data science and machine learning in Oracle is anomaly detection.  Which specifically means applying artificial intelligence to the training of algorithms mostly used in image recognition and language processing and credit fraud detection.  It’s also a possibly less efficient way of detecting performance problems in Oracle performance.  To attempt to obtain accuracy in the algorithm presents a risk itself, since such models could result in overfitting and high dimensionality that you want to avoid in deep neural networks.  Getting accuracy that is comparable to what I human operator can do, works better because basically you don’t want the process to overthink things.  The result of an overfitting model is a lot of false positives.  You want the most accurate signs of an anomaly, not a model that is oversensitive.  Deep Learning techniques also perform intense resource consumption to generate output in a neural network.  Most business scale applications require GPUs to build them efficiently.

Convolutional Neural Networks

 

Convolutional Neural Networks (CNN) are designed for high dimensional data such as images and signals.  It’s used for computer vision as well as network intrusion detection and anomaly detection.  Oracle performance data is designed as normal text (ASCII) data and contains many different ranges of metrics like seconds versus bytes of memory. Using a mathematical normalization formula, text data can be converted in vector arrays that can be mapped, pooled and compressed.  Convolutional Neural Networks are good distinguishing features in an image matrix.  Computationally, it is efficient to represent images as multi-dimensional arrays.

 

The first step is to normalize the PGA data, which contains multiple scales and features.  Below is a sample of the data.

picture4

 

Normalizing the data can be done with the following formula[8]:

2019-03-29_10-33-41

 

The second step is to convert this data into image format.  This would require building a dimensional array of all the features.  Filtering the array can be done by removing small variances and nonlinear features to generate an overall neutral vector.  The goal is to normalize and create a multidimensional array of the data.

 

CNN is often used to identify the NMIST data, which is a set of handwritten numbers.  It contains 60,000 training images and 10,000 testing images.  Researchers have used CNN to get an error rate on the NMIST data of less than 1%.

 

Convolution Neural Networks have five basic components, input layer, convolution layer, pooling layer, fully connected layer and output layer.  Below is a visual of how CNN works to recognize an image of a bird versus and image of a cat.

 

picture4picture5

The activation function uses a popular rectified linear unit ReLU, which is typical used for CNN.  Popular activation functions include logistic sigmoid and hyperbolic tangents.  ReLU is defined as a linear y=x for positive values and linear y=0 for negative values.  It’s great as an activation function for CNN, due to it’s simplicity and because it helps the time it takes to iterate in the neural network.

 

 

 

Comparing Support Vector Machines (SVM) and Principal Component Analysis (PCA)

Support Vector Machines or SVM are good for finding large margin classifications and identifying vectors of data that are related.  The nice thing about SVM is that it has features to deal with outliers built into it. Support Vector Machines is a feature-rich supervised machine learning technique used for classification of observations by their coordinates.  I compared the SVM with principal component analysis (PCA) to approximate.  PCA creates subspaces spanned by orthonormal eigenvectors associated with the top eigenvalues of the data covariance matrix.  PCA based methods help to remove redundancy and reduce dimensionality that is persistent in performance data.   Once data was split into training and testing, we used SVM and PCA to optimize multiple dimensions in the data.

 

 

Evaluation of Machine Learning Models for Oracle Workloads

For this test, we compared neural networking regression models and ANN.  Deep Learning of patterns concerned with anomalies within a database require AI style learning techniques.  Finding the correct classifier for performance metrics to improve the accuracy of an Oracle anomaly detection system can include ANN, naive Bayes, k-nearest neighbors and general algorithms.

 

There are several classification methods that can be used when evaluating anomaly detection models

 

  • RoC Curve
  • Area under RoC
  • Precision-Recall Curve
  • Mean average precision (mAP)
  • Accuracy of classification

 

Below is a RoC chart used to score PCA and SVM models.  RoC charts plot false positive rates against true positive rates.   When comparing the PCA and the SVM model, PCA had a higher true positive rate.

picture6

Summary:  The Future of Autonomous Databases

Oracle has released its first deep learning database, marketed as “The world’s first self-driving database”.  Oracle has announced 18c as a new autonomous database that requires no human labor for daily operational task, can provide more security, and automate most database processes.  The database will self-tune, self-upgrade and self-patch – all while maintaining %99.995 availability with machine learning.  For many companies, especially those working on cloud and PaaS infrastructures, this will mean lower costs.  With Exadata, this would include compression techniques that would add further benefits to very large and enterprise level workloads.

 

Will there be more databases that will be completely run by Artificial Intelligence and Deep Learning algorithms?  As a DBA, maintaining a database can be arduous, but many of my DBA colleagues enjoy the respect and prestige of database management and database tuning.  With the role of a DBA evolving rapidly, autonomous database may provide the freedom for DBAs to provide database design and development to corporate teams.

 

It remains to be seen if databases as a service (DBaaS) will reach the reality of full autonomy.  It’s bound to happen before automobiles become level 5 autonomous.  Selecting the service on this platform could provide opportunities of minimal configurations – and you’re done.  Everything else is taken care of.  There would be no operator, either in the hosted environment or on premise, nor would anyone ever touch the database for any reason except for application and software development.

 

In summary, this is a very high-level article on techniques for using deep learning and machine learning on Oracle performance data.  I hope that this cursory introduction will inspire DBAs and operators to do their own research and apply it to their toolbox.

 

References

 

1http://deeplearning.net/reading-list/

 

2https://www.analyticsvidhya.com/

 

3http://www.kdnuggets.com/

 

4http://www.ieee.org/

 

5https://www.computer.org/

 

6https://www.udacity.com/course/deep-learning-nanodegree-nd101se/deep-learning-nanodegree–nd101

 

7https://www.fast.ai

 

8“A Novel Intrusion Detection Model for Massive Network Using Convolutional Neural Networks” Kehe Wu; Zuge Chen; Wei Li.  IEEE Access. Received July 29, 2018.

 

9“Enhanced Network Anomaly Detection Based on Deep Neural Networks”.  Naseer, Sheraz; Saleem, Yasir; Khalid, Shezad, Bashir, Muhammad Khawar; Jihun Han, Iqbal, Muhammad Munwar; Kijun Han.  IEEE Acwwwcess Received June 3, 2018.  Accepted July 16, 2018.

 

10https://www.pyimagesearch.com Dr. Adrian Rosebrock

 

11U.S. Energy Information Administration. https://www.eia.gov/.

 

12PJM Interconnection.  https://www.pjm.com/markets-and-operations.aspx

 

13Oracle Corporation.  https://www.oracle.com/index.html

Big Data as the Next Major Utility: Musings on the Future of Autonomous Vehicles and CASE.

“Big Data” is everywhere.  It powers business solutions as well as drives economic opportunity.  Is it possible that “Big Data” will become the next major utility?  By utility, I don’t mean its usefulness to businesses.  Can data be a utility like electricity, gas or water which is distributed reliably through major cities for customer demand?  With the Smart City initiatives, that certainly appears to becoming more and more a reality, but smart cities programs do not necessarily build the B2C model that major utilities do.  Autonomous vehicles (AV) and Machine Learning (ML) may fill the gap that makes “Big Data” a utility.  One possible business model includes customers who pay for how much data they use and the times they use it.  Since AV technology will have data from internal and external sensors to evaluate road conditions and anomalies, the utility business model may come into play as a way to pay for such computation and classification.   Machine learning algorithms will help create reinforcement of anomaly and object detection scenarios for AV.

Currently, cars on the market have Advanced Driver Assistance Systems (ADAS) development and includes driver assist technology such as accident avoidance sensors, drowsiness warnings, pedestrian detection, and lane departure warnings. Today’s driver-less cars are actually vehicles that are retrofitted with components that allow drivers to remove their hands from the steering wheel.  To have fully autonomous vehicles, there must be a supply of historical and near real-time data to train ML models that will guide future AV.  Like the generation of electrical power from a turbine, there has to be a supply and distribution approach to ML systems that is continuously providing reinforcement learning to AV.  The generation of AV data must be ongoing every hour of the day for years in order to continuously train the ML models to build reliability in future AV algorithms and models.

The future on Autonomous Vehicles

CASE stands for Connected, Autonomous, Shared, Electrification (Vehicles).  In many regards, its the evolution of modern transportation: A vehicle that doesn’t need a human operator, but transports people or goods to different destinations effectively, safely and efficiently with little or no impact on the environment.  But not only will this vehicle be able to transport, but it will serve as a data collector and generator that could be used to determine road conditions, connect with businesses and establish business to customer or customer to business relationships.

The development of AV must be based on electrification (electric vehicles).  Direct digital control and feedback systems of electrical consumption is ideal for clean and efficient generation of power.  The autonomous capabilities of vehicles would not only control direction and speed but also the granularity of electrical consumption needed by the AV that would be imperceptible to a live human operator.  Metrics could then be displayed to the passenger, owner or the manufacturer of the AV as feedback of its efficiency.

The main focus of the future generation of fully autonomous vehicles will be the ability keep a driver safe and successfully navigate any condition or obstacle as the AV transports its passengers to their destination…from leaving their home to getting into the vehicle, to walking into the destination.   Services will be available to businesses that will allow AVs to follow exact directions where the business is and have approved parking spaces that the vehicle will navigate to.  Most interfacing will be conducted through the passenger(s) smart phone(s).

Here is an example.  David picks up his smart phone and clicks on an app to request reservations at a restaurant for his wedding anniversary.  The service request is paired with an AV smart phone application that also sends the request to the cloud and the restaurant reservation API.  The ML system in the cloud then programs the AV to navigate to the restaurant as well as park in a designated parking space (no valet needed).  When the dinner is complete, David clicks on the app to pick up him and his wife and return home.

Future autonomous vehicles will not have manual overrides or speed up to make it to that movie on time.

In order for autonomous vehicles to build trust within the driving community, it must maintain consistent patterns and make decisions that ensure the safety and comfort of all its passengers.  What you don’t want is the AV to immediately speed up to make a light or make sharp or quick turns to avoid oncoming traffic.  This mean the automobile needs to have AI and machine learning capabilities that obeys all traffic laws and makes correct predictions on any anomaly or object.  Future AV and CASE will not have steering wheels or brake pedals because that represents a manual override which in turn erodes trust with the occupants.

The future generations of AV should not have steering wheels.  Most modern cars rely on a steering system that includes a “rack and pinion” assembly by which a live operator (driver) can turn the car right or left when needed.  Removing the steering mechanism will allow for passenger only occupancy and create a system that is principally controlled by computerized systems instead of mechanisms that require human intervention.  In the event that the vehicle requires override control by an operator, that operator will be in a vehicle control and command operations center (VOC).  The center will be maned by trained commercial drivers.  Such command operation centers would be third-party, provided by the manufacturer of the vehicle, or by a municipality.

Future autonomous vehicles will be fully connected mobile platforms.

Think of a smart phone and everything that it does.  Now, imagine an autonomous vehicle as essentially a large smart phone that can transport passengers who are connected to  what’s happening outside the car.  These riders will expect to map the course to their destination through connected devices, data, cloud computing and sensors that will then be shared with businesses and users before, during and after they reach their destination.  The applications for such connectivity are tremendous.

The impact of Big Data on autonomous vehicles.

As 5G wireless networks come online, smart cities and autonomous vehicles will fully utilize data to the cloud and back.  5G will facilitate unprecedented communication speed from the vehicle to the outside world allowing sensing and tracking of nearly 5,000 GBs of data per vehicle per day, making vehicles more efficient and safe.  New computer processor architectures will test, train and build Machine Learning and Deep Learning models faster than in the past and help train AVs to become better equipped to conditions in cities and on highways.

Maintaining a competitive advantage has become a important business strategy.

One of the things I love about data science and data analytics is that most of the innovation done in this area has been shared in open data and open source communities.  Internet sites like Kaggle, Amazon and Google have offered public data to anyone wanting to perform Machine Learning, Predictive Analysis and Deep Learning (see my review of DataSciCon.Tech).  Open Source software and platforms has grown quickly as well.

This is not the case for vendors invested in the future of AV.  The data collected from sensors and IoT devices in the vehicle as well as in big data cloud systems are a well guarded secret.  Development SDKs for AV technology is accessible only to clients of these AV manufacturers and their partners.  What this will mean to the future for AV innovation is still up for debate; However, companies certainly have the right to safeguard their proprietary research in this area.  It’s not completely known what impact this strategy will have on long-term adoption of AV.