Excerpt of submission to the Southern Data Science Conference that will be held in Atlanta GA. This proposal is a first in a series of proposals in IoT and Analytics research that will posted on Data Flux. For more info go to https://www.southerndatascience.com/submission-guidline
Internet-enabled devices and the Internet of Things (IoT) will continue to become a major component of networked computing systems. Such systems leverage “big data” processes that collect, clean, analyze and model large data streams. This project demonstrates techniques and strategies in maintaining baselines for system performance metrics for IoT. Statistics and probability are fundamental to statistical process control (SPC) and quality improvement in engineering systems. Machine learning (ML) can be used to find anomalies and patterns in the performance of IoT systems by using large datasets to learn and predict events. The purpose of this project is to compare and contrast these two strategies qualitatively and quantitatively while providing guidance for IoT system optimization and monitoring.
Statistical techniques such as normalization, hypothesis testing and error minimization; and ML strategies such as regression modeling, neural networking and classification are used. Business applications for this project include system sizing, system health checks, and baseline performance monitoring. IoT systems must meet business, as well as, technical requirements to perform in the real world. This project performs analysis on a series of metrics across multiple layers in an IoT architecture. The Open Systems Interconnection model (OSI model) of IoT will serve as a dichotomy. Quantitative and qualitative analysis of results will allow businesses to determine scale, performance, accessibility and availability of these networked systems.
PROBLEM AND MOTIVATION
Rapid advancements in IoT and “big data” analytics has created opportunities in performance measurement of multi-tiered architectures. These types of architectures utilize a variety of platforms including physical, virtual, and cloud for a complete end-to-end business solution. As the market to industrialize IoT platforms continues to expand, information technology (IT) systems will play a crucial role in collecting, aggregating, and analyzing data from these new endpoints. IT and business will need to become more aligned in corporate practices and strategies with IoT. IT managers, in turn, will need to rely on analytics-based system performance models that demonstrate system capabilities in order to satisfy service-level and reliability requirements.
Information systems log and monitor all aspects of utilization, throughput, resource management and user access. Evaluating and modeling performance will require benchmarks for IoT components such as internet-connected physical endpoints, cloud based services, aggregation systems, networks and collection systems. This latest effort is to compare SPC and ML models that extend beyond basic performance metrics for utilization, throughput and resource management to areas such as anomaly detection, process control, and forecasting.
APPROACH AND UNIQUENESS
This project collected IT system performance data including network monitoring tools, database monitoring tools, web logs, file system logs, and data from sensors and Internet-enabled devices from large multi-tiered systems to demo IoT systems. The project approach includes:
- Build ML and SPC models using IoT system performance data in Azure ML Studio, SAS Enterprise Miner and Python Scikit-Learn.
- Train and score ML and SPC models.
- Build IoT prototype system using Raspberry Pi, Microsoft Azure IoT Suite, and Python Distributed Parallel Processing Programming.
- Train and score models for prototype performance data.