Anomaly and Intrusion Detection in IoT Networks with Enterprise Scale Endpoint Communication – Pt 2

Derek MooreErica Davis, and Hank Galbraith, authors.

Part two of a series of LinkedIn articles based on Cognitive Computing and Artificial Intelligence Applications

Background

Several high profile incidents of ransomware attacks have called attention to IoT networks security. An assessment of security vulnerabilities and penetration testing have become increasingly important to sufficient design. Most of this assessment and testing takes place at the software and hardware level. However, a more broad approach is vital to the security of IoT networks. The protocol and traffic analysis is of importance to structured dedicated IoT networks since communication and endpoints are tracked and managed. Understanding all the risks posed to these types of network allows for more complete risk management plan and strategy. Beside network challenges, there are challenges to scalability, operability, channels and also the information being transmitted and collected with such networks. In IoT networks, looking for vulnerabilities spans the network architecture, endpoint devices and services, where services include the hardware, software and processes that build an overall IoT architecture. Building a threat assessment or map, as part of an overall security plan, as well as, updating it on a schedule basis allows security professionals and stakeholders to manage for all possible threats to the architecture. Whenever possible, creating simulations of possible attack vectors, understanding the behavior of such attacks and then creating models will help build upon a overall security management plan.

Open ports, SQL injection flaws, unencrypted services, insecure network interfaces, buffer overflow risks, lack of firewall protocols, authorization settings, web interface insecurity are among some of the types of vulnerabilities in an IoT network and devices.

Where is the location of a impending attack? Is it occurring at the device, server or service? Is it occurring in the location where the data is stored or while the data is in transit? What type of attacks can be identified? Types of attacks include distributed denial of service, man-in-the-middle, ransomware, botnets, spoofing, account penetrations, etc.

Business Use Case

For this business use case research study, a fictional company was created. The company is a national farmland and agricultural cooperative that supplies food to local and state markets. Part of the company’s IT infrastructure is an IoT network that uses endpoint devices for monitoring and controlling temperature, humidity and moisture for the company’s large agricultural farmlands. This network has over 2000 IoT devices in operations on 800 acres. Any intrusion into the network by a rogue service or bad actor could have consequences in regards to delivering fresh produce with quality and on time. The network design in the simulation below is a concept of this agricultural network. Our team created a simulation network using Cisco Packet Tracer, a tool which allows users to create and simulate package traffic throughout a computerized network at multiple ISO levels.

Simulated data was generated for using the packet tracer simulator to track and build. In the simulation network below using multiple routers, switches, servers and IoT devices for packets such as TCP, UDP, RIPv4 and ICMP, for instance.

Network Simulation

Below is a simulation of packet routing throughout the IoT network.

Cisco Packet Tracer Simulation for IoT network.  Packet logging to test anomaly detection deep learning models.

Problem Statement

Our fictional company will be the basis of our team’s mock network for monitoring for intrusions and anomaly. Being a simulated IoT network, it contains only a few dozen IoT enabled sensors and devices such as sprinklers, temperature and water level sensors, and drains. Since our model will be designed for large scale IoT deployment, it will be trained on publicly available data, while the simulated data will serve as a way to score the accuracy of the model. The simulation has the ability to generate the type of threats that would create anomalies. It is important to distinguish between an attack and a known issue or event (see part one of this research for IoT communication issues). The company is aware of those miscommunications and has open work orders for them. The goal is for our model is to be able to detect an actual attack on the IP network by a bad actor. Although miscommunication is technically an anomaly, it is known by the IT staff and should not raise an alarm. Miscommunicating devices are fairly easy to detect, but to a machine learning or deep learning model, it can be a bit more tricky. Creating a security alarm for daily miscommunication issues that originate from the endpoints, would constitute a prevalence of false positives (FP) in a machine learning confusion matrix.

No alt text provided for this image

A running simulation

Project Significance and Implementation

In today’s age of modern technology and the internet, it is becoming increasingly more difficult to protect enterprise networks against malicious attacks. Not only are malicious actors becoming more advanced with the methodologies of their attacks, but also the number IoT devices that live and operate in a business environment is ever increasing. It needs to be a top priority for any business to create an IT business strategy that protects the company’s technical architecture systems and core intellectual property. When accessing all potential security weakness, you must decompose the network model and define trust zones within the IoT architecture.

This application was designed to use Microsoft Azure Machine Learning analyze and detect anomalies in large data sets collected from all devices on the business’ network. In an actual implementation of this application, there would be a constant data flow running through our predictive model to classify traffic as Normal, Incorrect Setup, Distributed Denial of Service (DDOS attack), Data Type Probing, Scan Attack, or Man in the Middle. Using a supervised learning method to iteratively train our model, the application would grow increasingly more cognitive, and accurate at identifying these network traffic patterns correctly. If this system were to be fully implemented, there would need to also be actions for each of these classification patterns. For instance, if the model detected a DDOS attack coming from a certain device, the application would automatically send shutdown commands to the device, thus isolating it from the network and containing the attack. When these actions occur, there would be logs taken, and notifications automatically sent to appropriate IT administrators and management teams, so that quick and effective action could be taken. Applications such as the one we have designed are already being used throughout the world by companies in all sectors. Crowdstrike for instance, is a cyber technology company that produces Information Security applications with machine learning capabilities. Cyber technology companies such as Crowdstrike have grown ever more popular over the past few years as the number of cyber attacks have increased. We have seen first hand how advanced these attacks can be with data breaches on the US Federal government, Equifax, Facebook, and more. The need for advanced information security applications is increasing daily, not just for large companies, but small- to mid-sized companies as well. While outsourcing information security is an easy choice for some companies, others may not have the budget to afford such technology. That is where our application gives an example of the low barrier to entry that can be attained when using machine learning applications, such as Microsoft Azure ML or IBM Watson. Products such as these create relatively easy interfaces for IT Security Administrators to take the action into their own hands, and design their own anomaly detection applications. In conclusion, our IOT Network Anomaly Detection Application is an example of how a company could design and implement it’s own advanced cyber security defense applications. This would better enable any company to protect it’s network devices, and intellectual property against the ever growing malicious attacks.

Methodology

For this project, our team acquired public data from Google, Kaggle and Amazon. For the IoT model, preprocessed data was selected for the anomaly detection model. Preprocessed data from the Google open data repository was collected to test and train the models. R Studio programming served as an initial data analysis and data analytics process to determine Receiver Operating Characters (ROC) and Area Under the Curve (AUC) and evaluate the sensitivity and specificity of the models for scoring the predictability of the response variables. In R, predictability was compared between with logistic regression, random forest, and gradient boosting models. In the preprocessed data, a predictor (normality) variable was used for training and testing purposes. After the initial data discovery stage, the data was processed by a machine learning model in Azure ML using support vector machine and principal component analysis pipelines for anomaly detection. The response variable has the following values:

  • Normal – 0
  • Wrong Setup – 1
  • DDOS – 2
  • Scan Attack – 4
  • Man in the Middle – 5

The preprocessed dataset for intrusion detection for network-based IoT devices includes ultrasonic sensors using Arduino microcontrollers and Node MCU, a low-cost open source IoT platform that can run on the ESP8266 Wi-Fi Module used to send data.

The following table represents data from the ethernet frame which is part of the TCP/IP packet that is transmitted from a source device to a destination device for network communication.  The following dataset is preprocessed according to the network intrusion detection based system.

The following table represents data from the ethernet frame which is part of the TCP/IP packet that is transmitted from a source device to a destination device for network communication. 

Source:  Google.com

Source: Google.com

In the next article, we’ll be exploring the R code and Azure ML trained anomaly detection models in greater depth.

Anomaly and Intrusion Detection in IoT Networks with Enterprise Scale Endpoint Communication

This is part one of a series of articles to be published on LinkedIn based on a classroom project for ISM 647: Cognitive Computing and Artificial Intelligence Applications taught by Dr. Hamid R. Nemati at the University of North Carolina at Greensboro Bryan School of Business and Economics.

The Internet of Things (IoT) continues to be one of the most innovative and exciting areas of technology in the last decade. IoT are a collection of devices that reside in the world that collect data from the environment around it or through mechanical, electrical, thermodynamic or hydrological processes. These environments could be the human body, geological areas, the atmosphere, etc. The networking of IoT devices has been more prevalent in the many industries for years including the gas, oil and utilities industry. As companies create demand for higher sample read rates of data from sensors, meters and other IoT devices and bad actors from foreign and domestic sources have become more prevalent and brazen, these networks have become vulnerable to security threats due to their increasing ubiquity and evolving role in industry. In addition to this, these networks are also prone to read rate fluctuations that can produce false positives for anomaly and intrusion detection systems when you have enterprise scale deployment of devices that are sending TCP/IP transmissions of data upstream to central office locations. This paper focuses on developing an application for anomaly detection using cognitive computing and artificial Intelligence as a way to get better anomaly and intrusion detection in enterprise scale IoT applications.

This project is to use the capabilities of automating machine learning to develop a cognitive application that addresses possible security threats in high volume IoT networks such as utilities, smart city, manufacturing networks. These are networks that have high communication read success rates with hundreds of thousands to millions of IoT sensors; however, they still may have issues such as:

  1. Noncommunication or missing/gap communication.
  2. Maintenance Work Orders
  3. Alarm Events (Tamper/Power outages)

In large scale IoT networks, such interruptions are normal to business operations. Certainly, noncommunication is typically experienced because devices fail, or get swapped out due to a legitimate work order. Weather events and people, can also cause issues with the endpoint device itself, as power outages can cause connected routers to fail, and tampering with a device, such as people trying to do a hardwire by-pass or removing a meter.

The scope of this project is to build machine learning models that address IP specific attacks on the IoT network such as DDoS from within and external to the networking infrastructure. These particular models should be intelligent enough to predict network attacks (true positive) versus communication issues (true negative). Network communication typical for such an IoT network include:

  1. Short range: Wi-Fi, Zigbee, Bluetooth, Z-ware, NFC.
  2. Long range: 2G, 3G, 4G, LTE, 5G.
  3. Protocols: IPv4/IPv6, SLIP, uIP, RLP, TCP/UDP.

Eventually, as such machine learning and deep learning models expand, these types of communications will also be monitored.

Scope of Project

This project will focus on complex IoT systems typical in multi-tier architectures within corporations. As part of the research into the analytical properties of IT systems, this project will focus primarily on the characteristics of operations that begin with the collection of data through transactions or data sensing, and end with storage in data warehouses, repositories, billing, auditing and other systems of record. Examples include:

  1. Building a simulator application in Cisco Packet Tracer for a mock IoT network.
  2. Creating a Machine Learning anomaly detection model in Azure.
  3. Generating and collecting simulated and actual TCP/IP network traffic data from open data repositories in order to train and score the team machine learning model.

Other characteristics of the IT systems that will be researched as part of this project, include systems that preform the following:

  1. Collect, store, aggregate and transport large data sets
  2. Require application integration, such as web services, remote API calls, etc.
  3. Are beyond a single stack solution.

Next: Business Use Cases and IoT security

Derek MooreErica Davis, and Hank Galbraith, authors.

DataSciCon.Tech 2017 Review

Saturday, December 2nd , 2017

DataSciCon.Tech is a data science conference held in Atlanta, Georgia Wednesday November 29th to Friday, December 1st and includes both workshops and conference lectures. It took place at the Global Learning Center on the campus of Georgia Tech.  This was the first year of this conference, and I attended to get a sense of the data science scene in Atlanta.  Overall, the experience was very enlightening and introduced me to the dynamic and intensive work being conducted in the area of data science.

IMG_1786

Keynote speaker Rob High, CTO of IBM Watson, discussing IBM Watson and Artificial Intelligence (DataSciCon.Tech 2017).

DataSciCon.Tech Workshops

Four workshop tracks were held Wednesday including Introduction to Machine Learning with Python and TensorFlow, Tableau Hands-on Workshop, Data Science for Discover, Innovation and Value Creation and Data Science with R Workshop.  I elected to attend the Machine Learning with Python with TensorFlow track.  TensorFlow is an open source software library for numerical computations using data flow graphs for Machine Learning.

To prepare for the conference, I installed the TensorFlow module downloaded from https://www.tensorflow.org/install.  In addition to TensorFlow, I downloaded Anaconda (https://www.anaconda.com/), a great Python development environment for those practicing data science programming and includes many of the Python data science packages such as Numpy and SciKit-Learn.

Among the predictive and classification modeling techniques discussed in the workshop:

  • Neural Networks
  • Naive Bayes
  • Linear Regression
  • k -nearest neighbor (kNN)  analysis

These modeling techniques are popular for classifying data and predictive analysis.    Few training sessions on Python, SciKit-Learn or Numpy go into these algorithms in detail due to the various math educational levels of the audience members.  For the course, we used Jupyter Notebook, a web-based python development environment which allows you to share and present your code and results using web services.  Jupyter Notebook can also be hosted in Microsoft Azure, as well as, in other cloud platforms such as Anaconda Cloud and AWS.  To host Python Jupyter Notebook in Azure sign into  https://notebooks.azure.com.

TensorFlow

TensorFlow has a series of functions that uses neural networks and machine learning to test, train and score models.  The advantage of TensorFlow is its ability to train models faster than other modules, which is a very big advantage since splitting data for training models is a process intensive operation. It is particularly powerful on the Graphics Processing Unit (GPU) architecture popular for Machine Learning and Deep Learning.

Download Tensorflow from http://tensorflow.org.  The website also includes a Neural Network Tensorflow sandlot at http://playground.tensorflow.org.

2017-11-29_16-11-32

source:  http://playground.tensorflow.org.  tensorflow.org (DataSciCon.Tech)

DataSciCon.Tech Sessions

I’m going to break down the sessions I attended into the main topics that were covered.  So this is a very high level, one hundred foot point-of-view of the topics covered at the conference.  My plan is to create a few more blogs on the topic that will go into my work as an aspiring data scientist/data architect.  All the information in this blog is based on information presented at the DataSciCon.Tech 2017 conference.

Machine Learning and Artificial Intelligence

The conference emphasized Artificial Intelligence and Machine Learning pretty heavily.  Artificial Intelligence was discussed more in theory and direct applications than design and development.  There were a few demonstrations of the popular IBM Watson Artificial Intelligence system; but I want to focus this blog primarily on Machine Learning, as it’s something that interests me and other data architects.  Artificial Intelligence and Machine Learning are both based on computerized learning algorithms.  Machine Learning uses past data to learn, predict events or identify anomalies.

Another key fact presented at the conference is the number of open source projects and companies that have produced software modules, libraries and packages devoted to the use and implementation of Machine Learning in business applications.  I strongly recommend anyone interested in learning more to research the software solutions discussed in this blog and how they can be implemented.

For those who are new to the concept of Machine Learning (like me), essentially it is defined as follows:

Machine Learning is a subset of Artificial Intelligence that focuses on creating models that learn and predict events based on past data without a human computer programmer having to change code to adapt to new events.  An example would be a spam filter learning new exploits and then blocking those exploits.

2017-12-02_8-17-02 Continue reading

IT Strategies: Applying Data Analytics to Information Technology Management

In this third and final blog on IT Strategies, I look at some examples and techniques of using data analytics in Information Technology Management.  In previous postings, I wrote “information technology is the interaction between people, information and technology”. When planning IT investments, it’s important that business value be the main driver for delivering solutions. When evaluating IT value, a business must look beyond a particular product or service and identify value using the following criteria:

Identification

  • Understand what value is to the business.
  • Have a process to assess and define potential value.

Conversion

  • Find opportunities for IT to build success.
  • Don’t be afraid to revisit business models and business processes.
  • Have a plan to train and hire qualified people (IT and Business).

Realization

  • Create proactive and long-term processes.
  • Create a sustainable knowledge management process.
  • Continuously measure outcomes against expected results.
  • Access value.

As a practitioner and researcher of information technology management, I am constantly looking for new approaches to bring IT value to my company. Information is mostly about making decisions.  The first blogs discussed creating value from IT assets. Data analytics can provide a way to properly quantify that value by analyzing performance, sizing and monitor data.

Data analytics provides the ability to drive the decision-making process. However, no decision should be made by data analytics alone. When deciding on how analytics can impact decisions to be made there are two specific categories: qualitative and quantitative analytics. Qualitative requires in-depth understanding of business processes and functions to determine reasons in certain conditions and events. Quantitative analysis requires statistical, mathematical and computational methods.

In information technology management, data can be generated by multiple systems as well as business workflows, the amount of which can easily be within the domain of Big Data. Analyzing large and potentially unstructured data sets “Big Data” can give crucial insight into data-intensive environments.

Business Analysis Process

I also find it helpful to form a business analysis process as part of the overall strategy of IT systems. The business analysis process includes

  • Problem recognition
  • Review previous problems and findings
  • Modeling
  • Data collection
  • Data analysis
  • Communicating and acting on results
  • Business decisions

DataAnalytics

Data Analytics Ecosystem

2017-06-12_16-56-38

When coordinating data analytics into action that involves operation and business optimization, one imperative is to develop policies and processes that adhere to data analytic standards and practices. Applying data analytics to only one a project or process, but leaving out other areas in project or steps tends to weaken the impact or create biases in the end result or deliverable.

The evolution of business is to create data governance, new business models, policy and procedures that adhere to analytical practices. This is know as the data analytics ecosystem.

In this blog, I use examples from SAS Enterprise Miner® a data mining and predictive analytics tool.  Part of the SAS Enterprise Miner paradigm for data analysis is identified by the SEMMA™ method, which includes

  1. Sample: Create a sample set of data either through random sampling or top tier sampling.  Create a test, training and validation set of data.
  2. Explore: Use exploratory methods on the data.  This includes descriptive statistics, scatter plots, histograms, etc.
  3. Modify:  Create imputation or filter data.  Perform cluster analysis, association and segmentation.
  4. Model:  Model the data using Logistic or Linear regression, Neural Networking, and Decision Trees.
  5. Assess:  Access the model by comparing it to other model types and again real data. Determine how close your model is to reality.  Test the data using hypothesis testing.

Information Technology Management

IT strategy involves aligning overall business goals and technology investment.  The first priority is for IT resources, people and functions to be planned around the overall business organization goals.  In order for such alignment to take place, IT managers need to communicate their strategy in business terms.   What makes such efforts inefficient is not making communication and transparency a top priority.

In many companies, funding for strategic initiatives is allocated in stages so their potential value can be reassessed between those stages.  When executives introduce a new business plan to increase market share by 15 percent with a new technology, IT managers must also meet those goals by assessing the quality of the IT infrastructure.

Executives must have confidence that the IT assets that they purchase are sound.  There must be mutual trust, visible business support, and IT staff who are part of the business problem-solving team.   All of these factors are needed to properly determine the business value of IT.

When creating an IT Strategy that can align to business objectives, five themes should be addressed.  These include business improvement, business enabling, business opportunities, opportunity leverage and infrastructure.  Research has shown that companies who have a framework for making targeted investments in IT infrastructure will further their overall strategic development and direction.  When companies fail to make IT infrastructure investment strategic, they struggle on how to justify or fund for it.

Communication is critical to executives and business decision makers.  IT staff typically work across many organizational units and must be effective at translating technical requirements into business requirements and vice versa.  Communication has become mission critical in the IT business value proposition.  When deciding how to apply data analytics across the organizations, IT should work with business leaders by looking at the IT function areas that produce the most data for their organization.  These areas include:

  • business analysis
  • system analysis
  • data management
  • project management
  • architecture
  • application development
  • quality assurance and testing
  • infrastructure
  • application and system support
  • data center operations

IT strategies require full business integration.  When IT managers are proposing new strategies, an executive summary should be the most important part of the proposal, prototype, roadmap, technical architecture document, etc.

Along with IT system metrics, IT managers must also keep in mind business operational metrics which are metrics based more on labor and time.  IT managers need to factor both IT and operational metrics in reports to business stakeholders.  There are several ways of reporting IT strategies to the business. Key Performance Indicators (KPIs) are fundamental to business decisions and are used to correlate business performance such as the how often a transaction results in a customer satisfaction.  KPIs examples include:

  • Efficiency rates.
  • Customer satisfaction scores
  • Capacity rates
  • Incident reporting rate
  • Total penalties paid per incident

Balanced Scorecards are strategic initiatives that align business strategy to corporate vision and goals.  It’s typically not the responsibility of IT managers to build scorecards, but rather understand the corporate balanced scorecards when building IT strategies.

Dashboards are visual representations of success, risk, status and failure of business operations.  In a very high paced organization, they allow information to be quickly disseminated and assessed by stakeholders for business decision making.  Dashboards tend to have more quantitative analysis than other types of reporting styles.

System Monitoring

Maintaining system health can be an arduous and time consuming task for system administrators. System administration include areas such as databases, network, hardware and software. Aggregating the large volumes of raw data can save time and help administrators respond more quickly to issues. Creating analytical methods around such aggregated data can help determine the present and future value of such systems, predict possible failures and security risks, planning budgets for new IT, maintaining existing assets or help plan for the migration to new platforms such as cloud.  For example, data that tracks the amount of storage area network (SAN) usage over a period of time can help create sizing requirements for new systems that will grow at similar rates.

Below are examples of the type of system performance data that can be used when creating data analytics for sizing and performance analysis.

2016-03-12_22-25-03
CPU utilization based on user, system, waits and idle times.

Disks
Disk read kilobytes per second versus disk write kilobytes per second.

Data Analytics

In the past year, I’ve learned various methods to predict trends and detect anomalies of the data I’ve received through the operation of IT systems. IT systems are constantly collecting sensing and monitoring data on CPU, networking, applications, etc. that can been used to build strategies for planning IT budgets. The types of methods I used include

Data Exploration, Cleansing and Sampling

  • Scatter Plots
  • Imputation
  • Filtering
  • Classification
  • Hypothesis Testing
  • Statistics Analysis (descriptive, process control)

Predictive Analysis

  • Logistic/Linear regression
  • Neural Network
  • Probability Distribution

Segmentation Analysis

  • Clustering
  • Association

Model Assessment, Testing and Scoring

  • ROC Charts
  • Lift Charts
  • Model Comparison
  • Data Partitioning (separating data into testing, training and validation sets)

Below are visualizations of based on analytical methods I’ve deployed for information technology management.  I recommend researching these methods to get a better understanding of how they work.  Much of this work was performed in Microsoft Excel, SAS Enterprise Miner® and Python.

1-22-2014 12-53-18 AM

Above, liner regression based on input and output (I/O) waits and the number of disk reads

Image3

Segmentation analysis based on number of processes to CPU utilization rates for various UNIX systems.

2014-11-13_16-38-27

Statistical process control (SPC) Shewart analysis of process elapsed time in seconds.

30658D68

Above, a receiver operating character curve or ROC curve, plots true positive rate against a false positive rate for points in a diagnostic test.  A ROC curve can diagnose the performance of a model.  The baseline is linear  where each model curve demonstrates the trade-off between sensitivity and specificity. More accurate models have curves that follow the left side of the chart to the upper border.  As in the model assessment tool, the data is partitioned into training and validation sets and then the models for each set are assessed for predictability.

Image6

Model scoring for logistic regression

Model_comp_Cum_lift2

Model comparison using cumulative life (training and validation data).  Lift measures the effectiveness of a predictive model using using results of that model when it is applied and when it’s not.

Again, I strongly recommend researching these techniques since there are many super intelligent people out there that I consult.  Also, If there is anything I’ve mentioned that is incorrect, please comment.

Recommendations

Below are guidelines and recommendations on how IT departments and IT managers can leverage business and data analytics to drive IT value proposition.

Determine important business metrics and create a metric measurement plan.

IT managers must understand which metrics are most important for their business.  Start by having a strong understanding of business scorecards and key performance indicators.  This goes beyond just understanding an organization’s goals and objectives. IT System metrics are principally designed only for IT managers and IT Staff; The business understands operational metrics.  When deciding which metrics to collect, focus specifically on business level KPIs and balanced scorecards.  Getting an understanding of what the business wants will drive all further actions in creating IT value for the business.   Create a metric measurement plan that formalizes the process and nomenclature of measuring IT metrics including creating a process to applying them to business functions.

Create categories for metrics.

Specify categories of metrics to communicate including operational, KPIs, dashboards, tolerances and analytical metrics.

Operational metrics include basic observations in the IT management of specific business functional areas.  It is typically revised to include operational metrics with analytical metrics.  Types of operational metrics include measurements of function area incidents, including labor and time allocation for those incidents.  These types of metrics tend to be non-technical in nature but have a definite impact on IT management.

Analytical metrics include metrics that are used for statistical analysis, forecasting, prediction and segmentation.  The data collected for these metrics are typically produced by IT systems.

Tolerance threshold metrics measure tolerances of KPIs values.  Tolerance is very similar to the control chart example in the preceding section, except it is used more for business level control limits.

Key Performance Indicators are perhaps the most important way of communicating metrics to business stakeholders.

Build a management report.

Incident management tracks specific events that deviate from business and operational efficiency of an organization.  It can be clearly stated that server values can have a huge business and operational impact.  Less empirical incidents such as server performance issues and application response time can play a role in adverse events.  Incident management can include operational metrics and KPIs.   For example, the following list describes the type of incidents reported:

  • Total number of incidents.
  • Average time to resolve severity 1 and severity 2 incidents.
  • Number of incidents with customer impact.
  • Incident management labor hours.
  • Total available hours to work on incidents
  • Total labor hours to resolve incidents.

Data analytics can provide supportive evidence of how an incident occurred. Data analytics more importantly can help reduce major incidents by lowering incident costs and time and help improve KPI values.  Typically data analytics is not appropriate in an incident report, however, it allows IT managers the ability to report mitigation and risk factors by rating the level of risks these incidents have to business. Analytics can provide more insight into risk management and mitigation.

As mentioned earlier, data analytics can provide supportive evidence of how an incident occurred, but it can also be used to build a risk management plan and scoring system.  Since analytics provides huge benefits to IT managers about the health of systems and operations, having such information can help lower risks from incidents by allowing IT personnel to respond to problems faster and even predict problems before they occur.  This in turn helps improve the KPIs in incident management reporting.  Since KPI work on a scoring system, the IT staff can produce calculations based in part on values produced from example analytics.  For example, for metrics A, B, and C, operational KPI scores can be established through the use of proportionality.  The table below demonstrates the use of IT metrics in establishing KPI scores.

Reference Number KPI Calculation
1 Number of system incidents B/A
2 Number of network incidents C/A
3 Incident resolution rate (B/A + C/A)

Example of how KPIs are critical to managing and controlling Incident Management.

Incident management just one type of management system that can be built for metric categories where communication on metrics with the business should occur. Other management systems include:

  • Event management
  • Access management
  • Service desk management
  • Change management.
  • Release management
  • Configuration management
  • Service level management
  • Availability management
  • Capacity management
  • Continuity management
  • IT financial management

Build an IT governance program for IT business communication.

Having a data and IT governance program will ensure that data is verified and accurate before being sent to the executives.  Establishing such a program will give some formal assurance that information provided by IT comes from validated sources, has been approved, and has accountability.

Communicate effectively with executives with an executive summary and report.

As mentioned earlier, effective and regular communication will help ensure that IT managers will receive proper feedback, align with the business and prevent unexpected surprises when budget time arrives.

Give executives something to be excited about.

Business executives do not respond well to complex technical details.  Contrary to popular belief, very few people, especially in executive and mid-level positions are impressed by wordy technical details about system architecture and applications.  They need high level examples that show how the business will grow and achieve a project goals using IT management for a business function.  This can include bar charts or diagrams, but they must be business related and clearly indicate how they would achieve business objectives.

Propose a well-planned budget.

A well plan budget consist of replacement costs, unplanned purchases, reoccurring costs and tracking expenses year round.  It’s important to have a complete budget that builds out the solution for current and new architecture with an evaluation of the cost differences.

Executives will always ask for more clarity and more relevance.

An IT team may have worked many hours to produce a clean, bound and lamented report delivered with precious care and a bow to business executives, and still it can be rejected, scrutinized or sent back for clarification.  This is normal and is to be expected.  It is important for IT managers to keep in mind that the goal is always to provide the most factual and relevant information to business decision-makers.

Blog includes excerpts from Analytical Properties of Data-Driven Systems and its uses in Information Technology Management. University of North Carolina at Greensboro Bryan School of Business and Economics, Department of Information System and Supply Chain Management ISM 698-01D 2016.