Anomaly and Intrusion Detection in IoT Networks with Enterprise Scale Endpoint Communication

Posted on July 11, 2021 by rtpopendata

This is part one of a series of articles to be published on LinkedIn based on a classroom project for ISM 647: Cognitive Computing and Artificial Intelligence Applications taught by Dr. Hamid R. Nemati at the University of North Carolina at Greensboro Bryan School of Business and Economics.

The Internet of Things (IoT) continues to be one of the most innovative and exciting areas of technology in the last decade. IoT are a collection of devices that reside in the world that collect data from the environment around it or through mechanical, electrical, thermodynamic or hydrological processes. These environments could be the human body, geological areas, the atmosphere, etc. The networking of IoT devices has been more prevalent in the many industries for years including the gas, oil and utilities industry. As companies create demand for higher sample read rates of data from sensors, meters and other IoT devices and bad actors from foreign and domestic sources have become more prevalent and brazen, these networks have become vulnerable to security threats due to their increasing ubiquity and evolving role in industry. In addition to this, these networks are also prone to read rate fluctuations that can produce false positives for anomaly and intrusion detection systems when you have enterprise scale deployment of devices that are sending TCP/IP transmissions of data upstream to central office locations. This paper focuses on developing an application for anomaly detection using cognitive computing and artificial Intelligence as a way to get better anomaly and intrusion detection in enterprise scale IoT applications.

This project is to use the capabilities of automating machine learning to develop a cognitive application that addresses possible security threats in high volume IoT networks such as utilities, smart city, manufacturing networks. These are networks that have high communication read success rates with hundreds of thousands to millions of IoT sensors; however, they still may have issues such as:

Noncommunication or missing/gap communication.
Maintenance Work Orders
Alarm Events (Tamper/Power outages)

In large scale IoT networks, such interruptions are normal to business operations. Certainly, noncommunication is typically experienced because devices fail, or get swapped out due to a legitimate work order. Weather events and people, can also cause issues with the endpoint device itself, as power outages can cause connected routers to fail, and tampering with a device, such as people trying to do a hardwire by-pass or removing a meter.

The scope of this project is to build machine learning models that address IP specific attacks on the IoT network such as DDoS from within and external to the networking infrastructure. These particular models should be intelligent enough to predict network attacks (true positive) versus communication issues (true negative). Network communication typical for such an IoT network include:

Short range: Wi-Fi, Zigbee, Bluetooth, Z-ware, NFC.
Long range: 2G, 3G, 4G, LTE, 5G.
Protocols: IPv4/IPv6, SLIP, uIP, RLP, TCP/UDP.

Eventually, as such machine learning and deep learning models expand, these types of communications will also be monitored.

Scope of Project

This project will focus on complex IoT systems typical in multi-tier architectures within corporations. As part of the research into the analytical properties of IT systems, this project will focus primarily on the characteristics of operations that begin with the collection of data through transactions or data sensing, and end with storage in data warehouses, repositories, billing, auditing and other systems of record. Examples include:

Building a simulator application in Cisco Packet Tracer for a mock IoT network.
Creating a Machine Learning anomaly detection model in Azure.
Generating and collecting simulated and actual TCP/IP network traffic data from open data repositories in order to train and score the team machine learning model.

Other characteristics of the IT systems that will be researched as part of this project, include systems that preform the following:

Collect, store, aggregate and transport large data sets
Require application integration, such as web services, remote API calls, etc.
Are beyond a single stack solution.

Next: Business Use Cases and IoT security

Derek Moore, Erica Davis, and Hank Galbraith, authors.

Deep Learning, Oracle Database Performance and the Future of Autonomous Databases

Posted on March 29, 2019 by rtpopendata

“The goal is to have databases in the Cloud run autonomously. The Cloud should be about scale, elasticity, statelessness and ease of operation and interoperability. Cloud infrastructures are about moving processes into microservices and the agile deployment of business services. Deep Learning has the potential to give databases innovative and powerful level autonomy in a multitenant environment, allowing DBAs the freedom to offer expertise in system architecture and design…”.

Introduction

This article details initial research performed using deep learning algorithms to detect anomalies in Oracle performance. It does not serve as a “deep” dive into deep learning and machine learning algorithms. Currently, there are many really good resources available from experts on the subject matter and I strongly recommend those who are interested in learning more about these topics to check out the list of references at the end of this article. Mathematical terminology is used throughout this article (it’s almost impossible to avoid), but I attempted to keep the descriptions brief, as it’s best that people interested in these topics seek to out the rich resources available online to get a better breadth of information on individual subjects.

In this final article on Oracle performance tuning and machine learning, I will discuss the application of deep learning models in predicting performance and detecting anomalies in Oracle. Deep Learning is a branch of Machine Learning method that uses intensive Artificial Intelligence (AI) techniques with data to learn iteratively; while deploying optimization and minimization functions. Applications for these techniques include natural language processing, image recognition, self-driving cars, anomaly and fraud detection. With the number of applications for deep learning models growing substantially in the last few years, it was only a matter of time that it would find its way into relational databases. Relational databases have sort of become the workhorses of the IT industry and still generate massive amounts of revenue. Many data-driven applications still use some type of relational database; even with the growth of Hadoop and NoSQL databases. It’s been a business goal of Oracle Corporation, one of the largest relational database software companies in the world, to create database services that are easier manage, secure and operate.

As I mentioned in my previous article, Oracle Enterprise Edition has a workload data repository that it already uses to produce great analysis for performance and workload. Microsoft SQL-Server also has a warehouse that can store performance data, but I’ve decided to devote my research into Oracle.

For this analysis, the focus was specifically on the Oracle Program Global Area (PGA).

Oracle Program Global Area

The Program Global Area (PGA) is a private memory in the database that contains information for server processes. Each user session gets a private memory region within the PGA. Oracle will read and write information to the PGA based on requests from server processes. The PGA performance metrics accessed for this article are based on Oracle Automatic Shared Memory Management (ASMM).

As a DBA, when troubleshooting PGA performance, I typically look at the PGA advisor, which are a series of modules that collects monitoring and performance data from PGA. It recommends how large the PGA should be in order to fulfill process requests for private memory and is based on the Cache Hit Percentage value.

Methodology

The database was staged in a Microsoft Azure virtual machine processing large scale data from a data generator. Other data was compiled from public portals such as EAI (Energy Administration Institute) and PJM Interconnection, an eastern regional transmission organization.

Tools used to perform the analysis include SAS Enterprise Miner, Azure Machine Learning studio and the SciKit Learn with TensorFlow machine learning libraries. I’ve focused my research on a few popular techniques for which I continuously do research. These include

Recurrent Neural Networks
Autoencoders
K-Nearest Neighbors
Naїve Bayes
Principal Component Analysis
Decision Trees
Support Vector Machines
Convolutional Neural Network
Random Forest

For this research into databases, I focused primarily on SVM, PCA and CNN. The first step was to look at the variable worth (the variables that had the greatest weight on the model) for data points per sample.

The analysis of Oracle Performance data on Process Memory within dedicated process memory in Oracle in the program global area of the database.

Once the data was collected, cleaned, imputed and partitioned, Azure ML studio was used to build two types of classifiers for anomaly detection.

Support Vector Machine (SVM): Implements a binary classifier where the training data consists of examples of only one class (normal data). The model attempts to separate the collection of training data from the origin using maximum margin.

Principal Component Analysis (PCA): Create subspace spanned by orthonormal eigenvectors associated with the top eigenvalues of the data covariance matrix for approximation of classifiers.

For prediction, I compared Artificial Neural Networks and Regression models. For Deep Learning, I researched the use of CNN specifically for anomaly detection.

Deep Learning and Oracle Database Performance Tuning

My article Using Machine Learning and Data Science for Performance Tuning in Oracle discusses the use of Oracle’s automated workflow repository, a data warehouse which stores snapshots of views for SQL, O/S and system state and active session history among many other areas of system performance. Standard data science methods require having a strong understanding of business processes through qualitative and quantitative methods, cleaning data to find outliers and missing values, and applying data partitioning strategies to get better data validation and scoring of models. As a final step, a review of the results would be required to determine its hypothetical testing accuracy.

Deep Learning has changed these methodologies a bit by applying artificial intelligence into building models. These models learn from iteratively training as data moves from hidden layers with activation functions from input to output. The hidden layers in this article are convolutional and are specific to spatial approximations such as convolution, pooling and fully connected layers (FCL). This has opened many opportunities to automate a lot of the steps typically used in typical data science models. If there is data generated which would require interpretation by a human operator, this can now be interpreted using deep neural networks at much higher rates that can possibly be done by a human operator.

Deep Learning is a subset of Machine Learning which is loosely based on how neurons learn in in the brain. Neural networks have been around for decades but have just recently gained popularity in the information technology for its ability to identify and classify images. Image data has exploded with the increase in social media platforms, digital images and image data storage. Imaging data, along with text data how a multitude of applications in the real world, so there is no shortage of work being done in this area. The latest popularity of neural networks can be attributed to Alexnet, a deep neural network that on the ImageNet classification challenge for achieving low error rates on the ImageNet dataset.

With anomaly detection, the idea is to train a deep learning models to detect anomalies without overfitting data. As the model iterates through the layers of a deep neural network, cost functions help to determine how close it is classifying real-world data. The model should have no prior knowledge of the processes and should be iteratively trained in the data for the cost functions from input arrays and activation functions of other previous layers [7].

Anomaly detection is the process of detecting outliers in the data streams such as financial transactions and network traffic. It can also be applied to deviations in system performance for the purpose of this article.

Predictive Analysis versus Anomaly Detection

Using predictive analytics to model targets through supervised learning techniques is most useful in planning for capacity and performing aggregated analysis of resource consumption and database performance. For the model, we analyzed regression and neural network models to determine how well each one scored based on inputs from PGA metrics.

Predictive analysis requires cleansing of data, supervised and non-supervised classification, imputation and variable worth selection to create model. Most applications can be scored best with linear or logistic regression. In the analysis on PGA performance, I found a logistic regression model scored better than an artificial neural network for predictive ability.

In my previous article, I mentioned the role that machine learning and data science can play in Oracle performance data.

Capacity Planning and IT Asset Planning.
Performance Management
Business Process Analysis

The fourth application for data science and machine learning in Oracle is anomaly detection. Which specifically means applying artificial intelligence to the training of algorithms mostly used in image recognition and language processing and credit fraud detection. It’s also a possibly less efficient way of detecting performance problems in Oracle performance. To attempt to obtain accuracy in the algorithm presents a risk itself, since such models could result in overfitting and high dimensionality that you want to avoid in deep neural networks. Getting accuracy that is comparable to what I human operator can do, works better because basically you don’t want the process to overthink things. The result of an overfitting model is a lot of false positives. You want the most accurate signs of an anomaly, not a model that is oversensitive. Deep Learning techniques also perform intense resource consumption to generate output in a neural network. Most business scale applications require GPUs to build them efficiently.

Convolutional Neural Networks

Convolutional Neural Networks (CNN) are designed for high dimensional data such as images and signals. It’s used for computer vision as well as network intrusion detection and anomaly detection. Oracle performance data is designed as normal text (ASCII) data and contains many different ranges of metrics like seconds versus bytes of memory. Using a mathematical normalization formula, text data can be converted in vector arrays that can be mapped, pooled and compressed. Convolutional Neural Networks are good distinguishing features in an image matrix. Computationally, it is efficient to represent images as multi-dimensional arrays.

The first step is to normalize the PGA data, which contains multiple scales and features. Below is a sample of the data.

Normalizing the data can be done with the following formula[8]:

The second step is to convert this data into image format. This would require building a dimensional array of all the features. Filtering the array can be done by removing small variances and nonlinear features to generate an overall neutral vector. The goal is to normalize and create a multidimensional array of the data.

CNN is often used to identify the NMIST data, which is a set of handwritten numbers. It contains 60,000 training images and 10,000 testing images. Researchers have used CNN to get an error rate on the NMIST data of less than 1%.

Convolution Neural Networks have five basic components, input layer, convolution layer, pooling layer, fully connected layer and output layer. Below is a visual of how CNN works to recognize an image of a bird versus and image of a cat.

The activation function uses a popular rectified linear unit ReLU, which is typical used for CNN. Popular activation functions include logistic sigmoid and hyperbolic tangents. ReLU is defined as a linear y=x for positive values and linear y=0 for negative values. It’s great as an activation function for CNN, due to it’s simplicity and because it helps the time it takes to iterate in the neural network.

Comparing Support Vector Machines (SVM) and Principal Component Analysis (PCA)

Support Vector Machines or SVM are good for finding large margin classifications and identifying vectors of data that are related. The nice thing about SVM is that it has features to deal with outliers built into it. Support Vector Machines is a feature-rich supervised machine learning technique used for classification of observations by their coordinates. I compared the SVM with principal component analysis (PCA) to approximate. PCA creates subspaces spanned by orthonormal eigenvectors associated with the top eigenvalues of the data covariance matrix. PCA based methods help to remove redundancy and reduce dimensionality that is persistent in performance data. Once data was split into training and testing, we used SVM and PCA to optimize multiple dimensions in the data.

Evaluation of Machine Learning Models for Oracle Workloads

For this test, we compared neural networking regression models and ANN. Deep Learning of patterns concerned with anomalies within a database require AI style learning techniques. Finding the correct classifier for performance metrics to improve the accuracy of an Oracle anomaly detection system can include ANN, naive Bayes, k-nearest neighbors and general algorithms.

There are several classification methods that can be used when evaluating anomaly detection models

RoC Curve
Area under RoC
Precision-Recall Curve
Mean average precision (mAP)
Accuracy of classification

Below is a RoC chart used to score PCA and SVM models. RoC charts plot false positive rates against true positive rates. When comparing the PCA and the SVM model, PCA had a higher true positive rate.

Summary: The Future of Autonomous Databases

Oracle has released its first deep learning database, marketed as “The world’s first self-driving database”. Oracle has announced 18c as a new autonomous database that requires no human labor for daily operational task, can provide more security, and automate most database processes. The database will self-tune, self-upgrade and self-patch – all while maintaining %99.995 availability with machine learning. For many companies, especially those working on cloud and PaaS infrastructures, this will mean lower costs. With Exadata, this would include compression techniques that would add further benefits to very large and enterprise level workloads.

Will there be more databases that will be completely run by Artificial Intelligence and Deep Learning algorithms? As a DBA, maintaining a database can be arduous, but many of my DBA colleagues enjoy the respect and prestige of database management and database tuning. With the role of a DBA evolving rapidly, autonomous database may provide the freedom for DBAs to provide database design and development to corporate teams.

It remains to be seen if databases as a service (DBaaS) will reach the reality of full autonomy. It’s bound to happen before automobiles become level 5 autonomous. Selecting the service on this platform could provide opportunities of minimal configurations – and you’re done. Everything else is taken care of. There would be no operator, either in the hosted environment or on premise, nor would anyone ever touch the database for any reason except for application and software development.

In summary, this is a very high-level article on techniques for using deep learning and machine learning on Oracle performance data. I hope that this cursory introduction will inspire DBAs and operators to do their own research and apply it to their toolbox.

References

¹http://deeplearning.net/reading-list/

²https://www.analyticsvidhya.com/

³http://www.kdnuggets.com/

⁴http://www.ieee.org/

⁵https://www.computer.org/

⁶https://www.udacity.com/course/deep-learning-nanodegree-nd101se/deep-learning-nanodegree–nd101

⁷https://www.fast.ai

⁸“A Novel Intrusion Detection Model for Massive Network Using Convolutional Neural Networks” Kehe Wu; Zuge Chen; Wei Li. IEEE Access. Received July 29, 2018.

⁹“Enhanced Network Anomaly Detection Based on Deep Neural Networks”. Naseer, Sheraz; Saleem, Yasir; Khalid, Shezad, Bashir, Muhammad Khawar; Jihun Han, Iqbal, Muhammad Munwar; Kijun Han. IEEE Acwwwcess Received June 3, 2018. Accepted July 16, 2018.

¹⁰https://www.pyimagesearch.com Dr. Adrian Rosebrock

¹¹U.S. Energy Information Administration. https://www.eia.gov/.

¹²PJM Interconnection. https://www.pjm.com/markets-and-operations.aspx

¹³Oracle Corporation. https://www.oracle.com/index.html

DataSciCon.Tech 2017 Review

Posted on December 4, 2017 by rtpopendata

Saturday, December 2nd , 2017

DataSciCon.Tech is a data science conference held in Atlanta, Georgia Wednesday November 29th to Friday, December 1st and includes both workshops and conference lectures. It took place at the Global Learning Center on the campus of Georgia Tech. This was the first year of this conference, and I attended to get a sense of the data science scene in Atlanta. Overall, the experience was very enlightening and introduced me to the dynamic and intensive work being conducted in the area of data science.

Keynote speaker Rob High, CTO of IBM Watson, discussing IBM Watson and Artificial Intelligence (DataSciCon.Tech 2017).

DataSciCon.Tech Workshops

Four workshop tracks were held Wednesday including Introduction to Machine Learning with Python and TensorFlow, Tableau Hands-on Workshop, Data Science for Discover, Innovation and Value Creation and Data Science with R Workshop. I elected to attend the Machine Learning with Python with TensorFlow track. TensorFlow is an open source software library for numerical computations using data flow graphs for Machine Learning.

To prepare for the conference, I installed the TensorFlow module downloaded from https://www.tensorflow.org/install. In addition to TensorFlow, I downloaded Anaconda (https://www.anaconda.com/), a great Python development environment for those practicing data science programming and includes many of the Python data science packages such as Numpy and SciKit-Learn.

Among the predictive and classification modeling techniques discussed in the workshop:

Neural Networks
Naive Bayes
Linear Regression
k -nearest neighbor (kNN) analysis

These modeling techniques are popular for classifying data and predictive analysis. Few training sessions on Python, SciKit-Learn or Numpy go into these algorithms in detail due to the various math educational levels of the audience members. For the course, we used Jupyter Notebook, a web-based python development environment which allows you to share and present your code and results using web services. Jupyter Notebook can also be hosted in Microsoft Azure, as well as, in other cloud platforms such as Anaconda Cloud and AWS. To host Python Jupyter Notebook in Azure sign into https://notebooks.azure.com.

TensorFlow

TensorFlow has a series of functions that uses neural networks and machine learning to test, train and score models. The advantage of TensorFlow is its ability to train models faster than other modules, which is a very big advantage since splitting data for training models is a process intensive operation. It is particularly powerful on the Graphics Processing Unit (GPU) architecture popular for Machine Learning and Deep Learning.

Download Tensorflow from http://tensorflow.org. The website also includes a Neural Network Tensorflow sandlot at http://playground.tensorflow.org.

source: http://playground.tensorflow.org. tensorflow.org (DataSciCon.Tech)

DataSciCon.Tech Sessions

I’m going to break down the sessions I attended into the main topics that were covered. So this is a very high level, one hundred foot point-of-view of the topics covered at the conference. My plan is to create a few more blogs on the topic that will go into my work as an aspiring data scientist/data architect. All the information in this blog is based on information presented at the DataSciCon.Tech 2017 conference.

Machine Learning and Artificial Intelligence

The conference emphasized Artificial Intelligence and Machine Learning pretty heavily. Artificial Intelligence was discussed more in theory and direct applications than design and development. There were a few demonstrations of the popular IBM Watson Artificial Intelligence system; but I want to focus this blog primarily on Machine Learning, as it’s something that interests me and other data architects. Artificial Intelligence and Machine Learning are both based on computerized learning algorithms. Machine Learning uses past data to learn, predict events or identify anomalies.

Another key fact presented at the conference is the number of open source projects and companies that have produced software modules, libraries and packages devoted to the use and implementation of Machine Learning in business applications. I strongly recommend anyone interested in learning more to research the software solutions discussed in this blog and how they can be implemented.

For those who are new to the concept of Machine Learning (like me), essentially it is defined as follows:

Machine Learning is a subset of Artificial Intelligence that focuses on creating models that learn and predict events based on past data without a human computer programmer having to change code to adapt to new events. An example would be a spam filter learning new exploits and then blocking those exploits.

Continue reading →

What Companies Need to Know About Big Data and Social Computing in Information Technology Management

Posted on July 18, 2017 by rtpopendata

Internet statistics estimate that 500 million tweets are produced per day. That translates to millions of conversations about a vast array of topics. “Big data” is a term that has become more prominent as social media sites such as Twitter, Facebook, Instagram, etc. continue to generate large data streams. Consumers produce click stream data and complete transactions visiting corporate websites to make purchases, schedule appointments for services or typing reviews on Yelp, Amazon and Uber about an experience that they’ve had. With a well-planned IS strategy, this data can be analyzed to gain insight into their customers and make critical strategic decisions necessary to compete. Here are a few things companies should know about “Big Data” and social media computing as a business strategy.

Understand that social media and social networking is more a concept than a platform.

One of the biggest problems with companies adopting social media as part of their IT business strategy is that the concept of social media for many IT managers does not extend beyond Twitter and Facebook. There are many platforms for which social media is beneficial to business. Slack and Github build on crowd-sourcing by emulating project management, software development and agile methodologies; even though those platforms are not primarily used for social media.

As more engineering firms adopt open source solutions, agile and DevOps development companies are deciding to use code development repositories such as GitHub. Microsoft has already adopted GitHub as part of its Visual Studio Team Foundation options for source control. The power of GitHub is very evident as global communities of developers use it to make some of the most innovative software products in languages such as Python, Java, C#, Ruby, etc. It’s has also become a viable social media platform for software engineers who frequently collaborate on sprints. Companies are also turning to solutions such as Slack to build entire global teams of developers to collaborate of on projects and sprints.

Social media as an IT business strategy is about understanding its contextual design and how the user interacts with it. Part of understanding the contextual design of social media includes identifying the actors (primary and secondary) for which the platform are based and how those users interact with it to build relationships and communities.

Context also extends to how a user interfaces with social media. Take, for example, the device many currently have in their pockets. Apply classifications of contextual scope to this device and determine all the ways users interact through a platform (tablet, smartphone, computer, etc).

A method known as the 4-I’s framework¹ is a good model to understand the user interaction in the context of social media. The method is typically utilized in classifying interactions with information systems as described above. The 4-I’s include:

Inscriptive (inputs)
Informative (outputs)
Interactive (processing)
Isolated (stored data)

This framework is useful for looking at ways to interact as a user that can perform as well as the information exchanged within that platform. Another method that is popular is the MVC model or Model-View-Controller model which is used in software analysis and engineering as an architectural platform for implementing user interfaces on computers through separation of layers of those systems.

Do not dismiss “Big Data” as a gimmick.

The term “Big Data” itself may seem oversold through marketing, but the production of large data sets is very real, very fast and very large – with new data set being produced every day through public and private portals.

Big data is described as data that has variety (video, text, images, unstructured and structured), volume (over a terabyte, scale of brand), velocity (constant production of data streams), and veracity (the data needs to be cleaned and managed) .

Information has become more fluid and available to more people faster and easier. Although no company should drive business decisions by what happens on Twitter or Facebook (or on the Dow), the power of “Big Data” as a tool can help in trending analysis, customer segmentation and insight into short to long term business decisions.

With “Big Data” companies will be able to:

Respond more quickly to market by making faster decisions.
Make patterns more evident to make changes to processes and products.
Better realize innovations and products and services and bring those to market faster.
Build and manage new and current data streams.
Create a data analytics ecosystem. Make analyzing and aggregating data a business process all employees to utilize.

For a “Big Data” strategy to be successful, companies must:

Create data lakes and systems where raw data can live prior to being transformed for the business intelligence and reporting.
Remove data silos where data exists but is only accessible to a few internal stakeholders.
Create a data analytics ecosystem
Create hybrid cloud solutions and begin moving applications to the cloud.

Know what association and segmentation analysis are and how to use them to learn about your customers.

With data streams, most coming online every day, new analytical methods can be used to gain insight into what consumers need in products and services. Two popular analytical methods include association analysis and segmentation analysis. In my next blog, I will discuss how these methods give insights into customers to better predict how they shop and what campaign ads are more likely to be successful with consumers.

With the popularity of Map Reduce and Hadoop, the business world is seeing an increase in “Big Data” analytics based on click stream and social media data. Large data sets which would have taken days to analyze can now be done in minutes.

Conclusion

As data has become more prominent within an organization, and the means of collecting because easier and more ubiquitous, new skills will be necessary in certain roles to take full advantage of this data to drive value. The corporate culture will need to adhere more to a data culture, where there is a value quotient to it collecting, cleansing, aggregating and analyzing data sources and data repositories. Business leaders must establish new models that take advantage of social media and big data assets.

Works Cited

Pitt, Leyland; Berthon, Pierre; Robson, Karen. Deciding When to Use Tablets for Business Applications. MIS Quarterly Executive Volume 10 Number 3 September 2011.

IT Strategies: Applying Data Analytics to Information Technology Management

Posted on June 14, 2017 by rtpopendata

In this third and final blog on IT Strategies, I look at some examples and techniques of using data analytics in Information Technology Management. In previous postings, I wrote “information technology is the interaction between people, information and technology”. When planning IT investments, it’s important that business value be the main driver for delivering solutions. When evaluating IT value, a business must look beyond a particular product or service and identify value using the following criteria:

Identification

Understand what value is to the business.
Have a process to assess and define potential value.

Conversion

Find opportunities for IT to build success.
Don’t be afraid to revisit business models and business processes.
Have a plan to train and hire qualified people (IT and Business).

Realization

Create proactive and long-term processes.
Create a sustainable knowledge management process.
Continuously measure outcomes against expected results.
Access value.

As a practitioner and researcher of information technology management, I am constantly looking for new approaches to bring IT value to my company. Information is mostly about making decisions. The first blogs discussed creating value from IT assets. Data analytics can provide a way to properly quantify that value by analyzing performance, sizing and monitor data.

Data analytics provides the ability to drive the decision-making process. However, no decision should be made by data analytics alone. When deciding on how analytics can impact decisions to be made there are two specific categories: qualitative and quantitative analytics. Qualitative requires in-depth understanding of business processes and functions to determine reasons in certain conditions and events. Quantitative analysis requires statistical, mathematical and computational methods.

In information technology management, data can be generated by multiple systems as well as business workflows, the amount of which can easily be within the domain of Big Data. Analyzing large and potentially unstructured data sets “Big Data” can give crucial insight into data-intensive environments.

Business Analysis Process

I also find it helpful to form a business analysis process as part of the overall strategy of IT systems. The business analysis process includes

Problem recognition
Review previous problems and findings
Modeling
Data collection
Data analysis
Communicating and acting on results
Business decisions

Data Analytics Ecosystem

When coordinating data analytics into action that involves operation and business optimization, one imperative is to develop policies and processes that adhere to data analytic standards and practices. Applying data analytics to only one a project or process, but leaving out other areas in project or steps tends to weaken the impact or create biases in the end result or deliverable.

The evolution of business is to create data governance, new business models, policy and procedures that adhere to analytical practices. This is know as the data analytics ecosystem.

In this blog, I use examples from SAS Enterprise Miner® a data mining and predictive analytics tool. Part of the SAS Enterprise Miner paradigm for data analysis is identified by the SEMMA™ method, which includes

Sample: Create a sample set of data either through random sampling or top tier sampling. Create a test, training and validation set of data.
Explore: Use exploratory methods on the data. This includes descriptive statistics, scatter plots, histograms, etc.
Modify: Create imputation or filter data. Perform cluster analysis, association and segmentation.
Model: Model the data using Logistic or Linear regression, Neural Networking, and Decision Trees.
Assess: Access the model by comparing it to other model types and again real data. Determine how close your model is to reality. Test the data using hypothesis testing.

Information Technology Management

IT strategy involves aligning overall business goals and technology investment. The first priority is for IT resources, people and functions to be planned around the overall business organization goals. In order for such alignment to take place, IT managers need to communicate their strategy in business terms. What makes such efforts inefficient is not making communication and transparency a top priority.

In many companies, funding for strategic initiatives is allocated in stages so their potential value can be reassessed between those stages. When executives introduce a new business plan to increase market share by 15 percent with a new technology, IT managers must also meet those goals by assessing the quality of the IT infrastructure.

Executives must have confidence that the IT assets that they purchase are sound. There must be mutual trust, visible business support, and IT staff who are part of the business problem-solving team. All of these factors are needed to properly determine the business value of IT.

When creating an IT Strategy that can align to business objectives, five themes should be addressed. These include business improvement, business enabling, business opportunities, opportunity leverage and infrastructure. Research has shown that companies who have a framework for making targeted investments in IT infrastructure will further their overall strategic development and direction. When companies fail to make IT infrastructure investment strategic, they struggle on how to justify or fund for it.

Communication is critical to executives and business decision makers. IT staff typically work across many organizational units and must be effective at translating technical requirements into business requirements and vice versa. Communication has become mission critical in the IT business value proposition. When deciding how to apply data analytics across the organizations, IT should work with business leaders by looking at the IT function areas that produce the most data for their organization. These areas include:

business analysis
system analysis
data management
project management
architecture
application development
quality assurance and testing
infrastructure
application and system support
data center operations

IT strategies require full business integration. When IT managers are proposing new strategies, an executive summary should be the most important part of the proposal, prototype, roadmap, technical architecture document, etc.

Along with IT system metrics, IT managers must also keep in mind business operational metrics which are metrics based more on labor and time. IT managers need to factor both IT and operational metrics in reports to business stakeholders. There are several ways of reporting IT strategies to the business. Key Performance Indicators (KPIs) are fundamental to business decisions and are used to correlate business performance such as the how often a transaction results in a customer satisfaction. KPIs examples include:

Efficiency rates.
Customer satisfaction scores
Capacity rates
Incident reporting rate
Total penalties paid per incident

Balanced Scorecards are strategic initiatives that align business strategy to corporate vision and goals. It’s typically not the responsibility of IT managers to build scorecards, but rather understand the corporate balanced scorecards when building IT strategies.

Dashboards are visual representations of success, risk, status and failure of business operations. In a very high paced organization, they allow information to be quickly disseminated and assessed by stakeholders for business decision making. Dashboards tend to have more quantitative analysis than other types of reporting styles.

System Monitoring

Maintaining system health can be an arduous and time consuming task for system administrators. System administration include areas such as databases, network, hardware and software. Aggregating the large volumes of raw data can save time and help administrators respond more quickly to issues. Creating analytical methods around such aggregated data can help determine the present and future value of such systems, predict possible failures and security risks, planning budgets for new IT, maintaining existing assets or help plan for the migration to new platforms such as cloud. For example, data that tracks the amount of storage area network (SAN) usage over a period of time can help create sizing requirements for new systems that will grow at similar rates.

Below are examples of the type of system performance data that can be used when creating data analytics for sizing and performance analysis.

CPU utilization based on user, system, waits and idle times.

Disk read kilobytes per second versus disk write kilobytes per second.

Data Analytics

In the past year, I’ve learned various methods to predict trends and detect anomalies of the data I’ve received through the operation of IT systems. IT systems are constantly collecting sensing and monitoring data on CPU, networking, applications, etc. that can been used to build strategies for planning IT budgets. The types of methods I used include

Data Exploration, Cleansing and Sampling

Scatter Plots
Imputation
Filtering
Classification
Hypothesis Testing
Statistics Analysis (descriptive, process control)

Predictive Analysis

Logistic/Linear regression
Neural Network
Probability Distribution

Segmentation Analysis

Clustering
Association

Model Assessment, Testing and Scoring

ROC Charts
Lift Charts
Model Comparison
Data Partitioning (separating data into testing, training and validation sets)

Below are visualizations of based on analytical methods I’ve deployed for information technology management. I recommend researching these methods to get a better understanding of how they work. Much of this work was performed in Microsoft Excel, SAS Enterprise Miner® and Python.

Above, liner regression based on input and output (I/O) waits and the number of disk reads

Segmentation analysis based on number of processes to CPU utilization rates for various UNIX systems.

Statistical process control (SPC) Shewart analysis of process elapsed time in seconds.

Above, a receiver operating character curve or ROC curve, plots true positive rate against a false positive rate for points in a diagnostic test. A ROC curve can diagnose the performance of a model. The baseline is linear where each model curve demonstrates the trade-off between sensitivity and specificity. More accurate models have curves that follow the left side of the chart to the upper border. As in the model assessment tool, the data is partitioned into training and validation sets and then the models for each set are assessed for predictability.

Model scoring for logistic regression

Model comparison using cumulative life (training and validation data). Lift measures the effectiveness of a predictive model using using results of that model when it is applied and when it’s not.

Again, I strongly recommend researching these techniques since there are many super intelligent people out there that I consult. Also, If there is anything I’ve mentioned that is incorrect, please comment.

Recommendations

Below are guidelines and recommendations on how IT departments and IT managers can leverage business and data analytics to drive IT value proposition.

Determine important business metrics and create a metric measurement plan.

IT managers must understand which metrics are most important for their business. Start by having a strong understanding of business scorecards and key performance indicators. This goes beyond just understanding an organization’s goals and objectives. IT System metrics are principally designed only for IT managers and IT Staff; The business understands operational metrics. When deciding which metrics to collect, focus specifically on business level KPIs and balanced scorecards. Getting an understanding of what the business wants will drive all further actions in creating IT value for the business. Create a metric measurement plan that formalizes the process and nomenclature of measuring IT metrics including creating a process to applying them to business functions.

Create categories for metrics.

Specify categories of metrics to communicate including operational, KPIs, dashboards, tolerances and analytical metrics.

Operational metrics include basic observations in the IT management of specific business functional areas. It is typically revised to include operational metrics with analytical metrics. Types of operational metrics include measurements of function area incidents, including labor and time allocation for those incidents. These types of metrics tend to be non-technical in nature but have a definite impact on IT management.

Analytical metrics include metrics that are used for statistical analysis, forecasting, prediction and segmentation. The data collected for these metrics are typically produced by IT systems.

Tolerance threshold metrics measure tolerances of KPIs values. Tolerance is very similar to the control chart example in the preceding section, except it is used more for business level control limits.

Key Performance Indicators are perhaps the most important way of communicating metrics to business stakeholders.

Build a management report.

Incident management tracks specific events that deviate from business and operational efficiency of an organization. It can be clearly stated that server values can have a huge business and operational impact. Less empirical incidents such as server performance issues and application response time can play a role in adverse events. Incident management can include operational metrics and KPIs. For example, the following list describes the type of incidents reported:

Total number of incidents.
Average time to resolve severity 1 and severity 2 incidents.
Number of incidents with customer impact.
Incident management labor hours.
Total available hours to work on incidents
Total labor hours to resolve incidents.

Data analytics can provide supportive evidence of how an incident occurred. Data analytics more importantly can help reduce major incidents by lowering incident costs and time and help improve KPI values. Typically data analytics is not appropriate in an incident report, however, it allows IT managers the ability to report mitigation and risk factors by rating the level of risks these incidents have to business. Analytics can provide more insight into risk management and mitigation.

As mentioned earlier, data analytics can provide supportive evidence of how an incident occurred, but it can also be used to build a risk management plan and scoring system. Since analytics provides huge benefits to IT managers about the health of systems and operations, having such information can help lower risks from incidents by allowing IT personnel to respond to problems faster and even predict problems before they occur. This in turn helps improve the KPIs in incident management reporting. Since KPI work on a scoring system, the IT staff can produce calculations based in part on values produced from example analytics. For example, for metrics A, B, and C, operational KPI scores can be established through the use of proportionality. The table below demonstrates the use of IT metrics in establishing KPI scores.

Reference Number	KPI	Calculation
1	Number of system incidents	B/A
2	Number of network incidents	C/A
3	Incident resolution rate	(B/A + C/A)

Example of how KPIs are critical to managing and controlling Incident Management.

Incident management just one type of management system that can be built for metric categories where communication on metrics with the business should occur. Other management systems include:

Event management
Access management
Service desk management
Change management.
Release management
Configuration management
Service level management
Availability management
Capacity management
Continuity management
IT financial management

Build an IT governance program for IT business communication.

Having a data and IT governance program will ensure that data is verified and accurate before being sent to the executives. Establishing such a program will give some formal assurance that information provided by IT comes from validated sources, has been approved, and has accountability.

Communicate effectively with executives with an executive summary and report.

As mentioned earlier, effective and regular communication will help ensure that IT managers will receive proper feedback, align with the business and prevent unexpected surprises when budget time arrives.

Give executives something to be excited about.

Business executives do not respond well to complex technical details. Contrary to popular belief, very few people, especially in executive and mid-level positions are impressed by wordy technical details about system architecture and applications. They need high level examples that show how the business will grow and achieve a project goals using IT management for a business function. This can include bar charts or diagrams, but they must be business related and clearly indicate how they would achieve business objectives.

Propose a well-planned budget.

A well plan budget consist of replacement costs, unplanned purchases, reoccurring costs and tracking expenses year round. It’s important to have a complete budget that builds out the solution for current and new architecture with an evaluation of the cost differences.

Executives will always ask for more clarity and more relevance.

An IT team may have worked many hours to produce a clean, bound and lamented report delivered with precious care and a bow to business executives, and still it can be rejected, scrutinized or sent back for clarification. This is normal and is to be expected. It is important for IT managers to keep in mind that the goal is always to provide the most factual and relevant information to business decision-makers.

Blog includes excerpts from Analytical Properties of Data-Driven Systems and its uses in Information Technology Management. University of North Carolina at Greensboro Bryan School of Business and Economics, Department of Information System and Supply Chain Management ISM 698-01D 2016.

IT Strategies and Data Analytics

Posted on May 12, 2017 by rtpopendata

In an extension to my first blog, I research quantitative analysis of enterprise IT functions to demonstrate how to create IT business value. It has to be established that, with so much data being collected from IT systems, IT managers can use this type of pervasive data to their advantage. Functionality such as maintaining health, securing systems, and properly sizing new systems all have an impact to IT budgets.

Data analytics promotes value in IT. Strategies using data analytics aim to create incremental value that can build on itself. One of the keys of strategic IT value is to adopt a holistic approach to technology value, ignoring gimmicks, gadgets and marketing and instead looking at innovation as a combination of people, information and technology. This balanced business strategy involves taking ownership of IT assets. In order for businesses to understand the value of those assets, it is crucial for IT managers to communicate that value. Data analysis is a part of that communication. Although data analytics can provide great insight into business technology, it will not always be successful in that goal. The mission of data analytics as an IT strategy is to experiment often and to not be fearful of failure.

Executives also must have confidence that the IT assets that they purchase are sound. There must be mutual trust, visible business support, and IT staff who are part of the business problem-solving team. All of these factors are needed to properly determine the business value of IT.

One of the principals of business technology innovation is to aim for joint ownership of technology initiatives. The quality of the IT-business relationship is central to delivering quality IT solutions that scale and meet production requirements. Imagine a scenario where IT wasn’t aware that a utility would bring 1,000,000 new meters online that read electrical data every hour within two years, but instead, only sized for the initial 5,000 meter deployment. This type of scenario would directly result in an utility customer having to upgrade all of their hardware only a year after the full deployment.

Innovations have created new ways of automating analysis to give more visibility into IT infrastructure. This data can be analyzed using trending and predictive analytics to determine how much growth is needed based on specific targets and parameters.

Ideally, business and IT strategies should complement and support each other. In order to improve the IT “Value Proposition”, IT projects must stop being considered the responsibility of only IT. The definition of value must be clearly designed and presented by IT, but there must be a greater understanding that business executives have to take leadership in making technology investments shape and align the business strategy. IT strategy must always be closely linked with sound business strategy.

Not only should IT and business be aligned, they must also complement each other strongly in order to build the type of relationship essential to achieve business goals. It is a mistake to consider technology projects solely the responsibility of IT or to make IT solely accountable. Business and IT must be accountable to each other when implementing and executing IT projects.

When creating an IT Strategy that can align to business objectives, five themes should be addressed. These include:

business improvement
business enabling
business opportunities
opportunity leverage
infrastructure.

Research has shown that companies that have a framework for making targeted investments in IT infrastructure will further their overall strategic development and direction. When companies fail to make IT infrastructure investment strategic, struggle on how to justify or fund for it. In order for IT expenditures to be justified, many companies have concentrated on determining the business value of specific IT project deliverables, because it allows projects that focus on specific business goals to be properly scoped to include IT expenditures.

How a company measures business performance can be an accumulation of metrics both on the business side and the IT side. Undelivered IT investment remains a big problem for organizations. Many CEOs and CIOs believe that their Return on Investment (ROI) expectations for IT investments have not been properly met. Although IT measures can be qualitative, meaning that expertise and knowledge from IT managers and staff contribute to understanding current and future IT growth and capacity, there are also ways to measure value quantitatively to help in the decisions making.

Non-technical communication is critical to executives. IT staff typically work across many organizational units and must be effective at translating technical requirements into business requirements and vice versa. Communication has become mission critical in the IT business value proposition. When deciding how to apply data analytics across the organizations, IT should work with business leaders by looking at the IT function areas that produce the most data for their organization. These areas include:

business analysis
system analysis
data management
project management
architecture
application development
quality assurance and testing
infrastructure
application and system support
data center operations

Efficiency rates.
Customer satisfaction scores
Capacity rates
Incident reporting rate
Total penalties paid per incident

IT Governance

In the area of governance, the International Standards Organization (ISO) certification 27002 addresses monitoring and information security incidents. Many of the methods used in the collection of data about system health can complement the adherence to information system security. Monitors log user access and security events such as unauthorized access to information systems. Keeping security audit logs synchronized with specific system activity logs can indicate coordinated attacks on the system or denial of service (DOS) attacks that are popular for web applications and application service provides. Using data analytics can help determine if deviations in system performance are related to security events such as unauthorized access, security threats such as malware, or other security issues; or if there is an issue with a functional issue within the system itself. The boundaries between security and system health are consistently breached with networking, services and databases where the integrity and size of user traffic can be impacted. Any unauthorized access can impact the availability and integrity of an information systems.

DevOps and Agile Software Development

DevOps is a corporate culture that emphasizes collaboration between developers (typically software developers) and operational business units. DevOps provide tools and automation that can create a better customer experience by addressing issues and product changes faster. Information systems can assist this functional area by providing analytical techniques about the readiness of release product code in the software development life cycle.

The principles of DevOps is to develop and test against production-like systems, deploy reliable processes, monitoring and validate operational quality and to improve the customer feedback loop to turn issues around faster. Part of the power of data analysis is the ability to assist in agile, continuous delivery of software. Automated testing and feedback with data analytical methods can provide the most qualitative information for business. Providing data analysis on performance analysis, error logging and customer feedback as dashboards and visualizations can help make software development life cycle visible to all business stakeholders. As a rule of thumb business leaders are not interested in code or complex spreadsheets. They are much more interested in quality scores, key performance indicators (KPIs) and business metrics.

IT Budgets

IT budgets are addressed in two categories: operational costs and strategic investments. Operation are “keep the lights on” cost that involve running IT like a utility. Operation cost include maintenance, computing, storage, network and support, to name a few examples. Strategic investments is a balance of initiative spending and coordination with organizational strategic objectives. Strategic investment becomes more efficient from the corporate to department level.

IT budgets are also about reducing costs. Many organizations have legacy systems that are not used efficiently and have requirements that create problems for strategic investments in new innovations. Having an application portfolio is a good way of understanding the risks versus benefits of maintaining legacy systems. Creating a data integration strategy as part of a data analysis ecosystem allows businesses to fully utilize all of their assets. Most of these systems contain metadata that has long since been de-supported. Part of the power of data analysis services such as online analytical processing (OLAP), business intelligence (BI) and master data management (MDM) is the ability to integrate with legacy systems.

Budgets are a key components of corporate performance management. The most important thing to understand about IT budgets are that they assist in the establishment of strategic goals. Systems provide data about the various level of utilization of resources. An example question that a business client would pose to an IT manager would include:

What are the annual storage requirements of our Enterprise Billing System?

This question could be answered by tracking the amount storage consumed throughout the year based on the number of data sets stored in megabytes and looking at the interval of time that those data sets are stored. From there an IT manager can translate that requirement in yearly terms, which in turn gives the budgeting team a metric of how much storage they need to purchase or maintain each year.

For large corporate firms in utilities, energy and manufacturing where literally, there could be hundreds of servers, there needs to be a more centralized structure for IT operations budgets. The mandate given to IT managers in centralized IT Budget structures is to standardize and streamline multiple processes on hardware and software services. The introduction of both private and public cloud architectures, and virtual architectures has made this possible. Another question likely to be posed to IT managers:

Can our physical servers be migrated to a cloud or virtual infrastructure with higher performance and availability?

Having the right kind of analysis on current systems helps to ensure that dollars are spent appropriately when systems are consolidated or provisioned, and that they perform ideally according to business requirements. IT managers are receiving pressure from executives to do more with less. Data analysis has been a catalyst for innovation in cross delivery business development through the integration of systems and data. Operational questions regarding IT include:

How much operational labor is expended providing IT services to an organization?

How much of the IT budget expended implementing changes to infrastructure?

Other budget concerns includes transitioning from a physical architecture to a cloud service based model. Typically, with public cloud architecture, the resources are provisioned and managed by a hosting team. Most cloud services will propose “elastic” solutions such as Amazon’s EC2 solution or Microsoft Azure which allows companies to use only what they need. Therefore, the methodologies of sizing may not be as appropriate in such architecture. However, in very data intensive industries where there are large scale architectures and multiple interaction of business and server processes, placing everything in a cloud domain is not only impractical, but very expensive and potentially illegal. For example, in the utilities industry, state regulations may prohibit customer data from being off site. An energy company’s proprietary information stored in an international data center that does not recognize the source country’s regulatory body could represent a public trust violation.

If migrating from a multi-tier architecture to a complete cloud-base services, it’s important to understand the type of cost involved. Cloud based services typically have subscription model, where all the management, configuration and provisioning (unless self-provisioned) is handled by the hosting company. There is a contract that specifies a level of service and support and that cost reflects how many resources the company is utilizing and the level of service for which to service its customers. Payment terms can be yearly and quarterly, and there is usually a renewal date when payment is due [20].

The IT Values Proposition

IT value measures the worth and effectiveness of business technology solutions. It is mostly a subjective assessment of how a business measures its assets when it pertains to business goals. Value in information technology is typically defined in Return on Investment (ROI) and Key Performance Indicators (KPI) and other economic terms. IT is most valuable when tied to business goals and objectives. Adding value to IT also includes ensuring that IT assets are part of a data analytics ecosystem. A data analytics ecosystem is where IT assets generate insight into how businesses produce, collect, store and learn from data and data analytics. Data analytics is an important part of the IT value proposition, because of the tremendous treasure trove of knowledge and insight that can be gained from it. A data analytics ecosystem helps to create processes to turn data into actionable business decisions.

Other best practices in IT value includes:

Evaluating the corporate business model in order to promote innovation.

Have strategic themes around data collection, dissemination and analysis.

Get the right people involved. This can include data scientist, engineers, business analysis, and many others.