Accelerating the
Industrial Internet of Things

Advanced Edge Analytics, IOT & More...

Published on 11/17/2016 | Operations

915 3

Gus Segura

Director, Principal Data Science Engineer, Big Data, Cloud Architecture, Analytics, Business Intelligence, Data Warehouse, Data Mining, Recommendation, Relevance, Sentiment, Classification, Clustering, Churn, Retention, Marketing, Monetize, Channel, Attribution, ROI, Financial, Banking, Time Value Money, Risk, Fraud, Statistics, Session Event, Machine Learning, Algorithms, Metrics, Customer Segmentation, Advertising, Incentive grouping, HR, Insurance, Applied Mathematics, Data Model, Data Quality, ETL, Modeling, Data Conditioning, Security: SIEM, HIPPA, ePHI, MOLAP/OLAP/ROLAP, Reporting, Visualization. AWS Solution Architect, MS Azure, HPE Proliant / Moonshot, HPE OneView, ArcSight, Op Analytics. Salesforce, PeopleSoft. TPM, Agile, Scrum, CSM, PSM.  



What is Edge Analytics? Edge analytics is an approach to data collection and analysis in which an automated analytical computation is performed on data at a sensor, network switch or other device instead of waiting for the data to be sent back to a centralized data store.  

Edge Analytics is normally related to IOT.  That's changing. In addition, We will need to include the more traditional type of data sources not normally linked to an IOT Device.  There are limitation and considerations. The cloud is still a "Big" part of the solution. However, You need to be creative with your architecture, analytics and feature requirements.

Technology Stack

Does your stack change? Hopefully, not much. If you're lucky, most of the code you have written specifically for the cloud will run in an edge system. Lucky? (or if you have planned appropriately).  A few years ago, we decided, "Where it makes sense, We need to make sure that the solutions we build to scale very well in the cloud should scale down into a micro environment.  Again, Sensibly."

We also knew that we would balance ... some things in the micro environment. Yet, What we gained in the ability to scale in both directions was in-calculable and of great benefit. We started investing more in technologies like Docker, VMWare, MongoDB, Redis and Javascript (as a lightweight analytics language).

Simple Example

Consider, We have a IOT Device (Temperature: 98.6) – (streaming) [via kafka, mqtt] -> Redis or (memcache) or MongoDB. Pause there and think about the great metrics we can gather... min, max, average, standard deviation, variance, rolling average, maybe even a boolean [alarm out of rolling average] and More.

[Note: Some of the smarter devices will store min, max and average at the device, but over what time window? ]

Complex analytics at or very near IOT Device

What if we wanted something like rolling average over a custom time interval? [When you're doing root cause analytics; You will normally up and down scale time - depending on the where and how often the failure has occurred. This could require considerable resources to re-compute.]

Are your data transfer cost out of control? – Have you considered filtering garbage data out before the transfer? Do you have a need to processes something in the sub-second response domain? [high-speed testing].

All these analytics could and are performed in the cloud. Yet, What if you had the power to move some of them closer to the device and action on the data earlier before a data collection, aggregation and analytics step that included "All" of your data from "Everywhere".

Limitations - Considerations

Yes, You can perform complex analytics in the edge – including running R and Python Data Science Tools. The data sets on the edge are tiny compared to the big picture that may include the data from 1000’s of devices in your complete architecture that are constantly changing.  The code that performs the analysis should run fast and produce results as quickly as the data changes.  Yet, it doesn’t mean you have to use a different tool set completely.  You should be able to migrate a MapReduce, Pyspark, R ..etc. job to the edge if you're careful about the datasets and limitations.

The following are few limitations to Edge Analytics to consider:

Data Sets – The data sets are local to the field devices and NOT global.

Compute – There are typically limitation on CPU [GPU Acceleration].

Memory –  Usually finite on the edge network resource.

Network bandwidth – Device Messaging vs. Analytics Interaction.

and more …

“Tread Lightly in both Directions”... You would not want to saturate your edge network at the expense of loosing a critical data message from a device.  You need to balance resources carefully.  Note: You can always access the aggregate metrics from the cloud if your computation needs that resource.

Final Thoughts

Edge Analytics is not for everyone.  A lot of IoT projects are just fine sending everything to the Cloud and using traditional analytics. However, If your action window is smaller? Your collocated with your production? You have constraints or issues with resources, data corruption, security and others. You may want to consider moving some of the traditionally "Big Data Analytics" from the cloud and move your them to the Edge.

This article was originally posted on LinkedIn.

Feature New Record