This document describes how the Simple IoT project fulfills the basic requirements as described in the top level README.
IoT systems are inherently distributed where data needs to be synchronized between a number of different systems including:
Typically, the cloud instance stores all the system data, and the edge, browser, and mobile devices access a subset of the system data.
Any siot
app can function as a standalone, client, server or both. As an
example, siot
can function both as an edge (client) and cloud apps (server).
We also need the concept of a lean client where an effort is made to minimize the application size to facilitate updates over IoT cellular networks where data is expensive.
In an IoT system, data from sensors is continually streaming, so we need some type of messaging system to transfer the data between various instances in the system. This project uses NATS.io for messaging. Some reasons:
For systems that only need to send one value several times a day, CoAP is probably a better solution than NATS. Initially we are focusing on systems that send more data -- perhaps 5-30MB/month. There is no reason we can't support CoAP as well in the future.
Where possible, modifying data (especially nodes) should be initiated over nats vs direct db calls. This ensures anything in the system can have visibility into data changes. Eventually we may want to hide db operations that do writes to force them to be initiated through a NATS message.
As we work on IoT systems, data structures (types) tend to emerge. Common data structures allow us to develop common algorithms and mechanism to process data. Instead of defining a new data type for each type of sensor, define one type that will work with all sensors. Then the storage (both static and time-series), synchronization, charting, and rule logic can stay the same and adding functionality to the system typically only involves changing the edge application and the frontend UI. Everything between these two end points can stay the same. This is a very powerful and flexible model as it is trivial to support new sensors and applications.
The core data structures are currently defined in the data
directory for Go code, and frontend/src/Data
directory
for Elm code. The fundamental data structures for the system are
Nodes
, Points
, and
Edges
. A Node
can have one or more Points
. A Point
can represent a sensor value, or a configuration parameter for the node. With
sensor values and configuration represented as Points
, it becomes easy to use
both sensor data and configuration in rule or equations because the mechanism to
use both is the same. Additionally, if all Point
changes are recorded in a
time series database (for instance Influxdb), you automatically have a record of
all configuration and sensor changes for a node
.
Treating most data as Points
also has another benefit in that we can easily
simulate a device -- simply provide a UI or write a program to modify any point
and we can shift from working on real data to simulating scenarios we want to
test.
Edges
are used to describe the relationships between nodes as a graph. Nodes
can have parents or children and thus be represented in a hierarchy. To add
structure to the system, you simply add nested Nodes
. The Node
hierarchy can
represent the physical structure of the system, or it could also contain virtual
Nodes
. These virtual nodes could contain logic to process data from sensors.
Several examples of virtual nodes:
Node
that converts motor current readings into pump events.Edges can also contain metadata (Value
, Text
fields) that further describe
the relationship between nodes. Some examples:
Being able to arranged nodes in an arbitrary hierarchy also opens up some interesting possibilities such as creating virtual nodes that have a number of children that are collecting data. The parent virtual nodes could have rules or logic that operate off data from child nodes. In this case, the virtual parent nodes might be a town or city, service provider, etc., and the child nodes are physical edge nodes collecting data, users, etc.
The same Simple IoT application can run in both the cloud and device instances. The node tree in a device would then become a subset of the nodes in the cloud instance. Changes can be made to nodes in either the cloud or device and data is sycnronized in both directions.
The following diagram illustrates how nodes might be arranged in a typical system.
A few notes this structure of data:
The distributed parts of the system include the following instances:
As this is a distributed system where nodes may be created on any number of connected systems, node IDs need to be unique. A unique serial number or UUID is recommended.
See research for information on techniques that may be applicable to this problem.
Typically, configuration is modified through a user interface either in the
cloud, or with a local UI (ex touchscreen LCD) at an edge device. Rules may also
eventually change values that need to be synchronized. As mentioned above, the
configuration of a Node
will be stored as Points
. Typically the UI for a
node will present fields for the needed configuration based on the Node
Type
, whether it be a user, rule, group, edge device, etc.
In the system, the Node configuration will be relatively static, but the points in a node may be changing often as sensor values changes, thus we need to optimize for efficient synchronization of points. We can't afford the bandwidth to send the entire node data structure any time something changes.
As IoT systems are fundamentally distributed systems, the question of synchronization needs to be considered. Both client (edge), server (cloud), and UI (frontend) can be considered independent systems and can make changes to the same node.
Although multiple systems may be updating a node at the same time, it is very rare that multiple systems will update the same node point at the same time. The reason for this is that a point typically only has one source. A sensor point will only be updated by an edge device that has the sensor. A configuration parameter will only be updated by a user, and there are relatively few admin users, and so on. Because of this, we can assume there will rarely be collisions in individual point changes, and thus this issue can be ignored. The point with the latest timestamp is the version to use.
Point changes are handled by sending points to a NATS topic for a node any time it changes. There are three primary instance types:
With Point Synchronization, each instance is responsible for updating the node data in its local store.
Sending points over NATS will handle 99% of data synchronization needs, but there are a few cases this does not cover:
There are two types of data:
Any node that produces sample data should send values every 10m, even if the value is not changing. There are several reasons for this:
Config data is not sent periodically. To manage synchronization of config data,
each edge
will have a Hash
field.
The edge Hash
field is a hash of:
Hash
fieldsWe store the hash in the edge
structures because nodes (such as users) can
exist in multiple places in the tree.
The points are sorted by timestamp and child nodes are sorted by hash so that the order is consistent when the hash is computed.
This is essentially a Merkle Tree -- see research.
Comparing the node Hash
field allows us to detect node differences. We then
compare the node points and child nodes to determine the actual differences.
Any time a node point (except for sample date) is modified, the node's Hash
field is updated, and the Hash
field in parents, grand-parents, etc are also
computed and updated. This may seem like a lot of overhead, but if the database
is local, and the graph is reasonably constructed, then each update might
require reading a dozen or so nodes and perhaps writing 3-5 nodes. Additionally,
non sample-data changes are relatively infrequent.
Initially synchronization between edge and cloud nodes is supported. The edge device will contain an "upstream" node that defines a connection to another instance's NATS server -- typically in the cloud. The edge node is responsible for synchronizing of all state using the following algorithm:
The md5 algorithm is used to compute the hash fields because it is relatively efficient to compute and reasonably small. While sha246 might be more secure, the application of this hash is not security, but rather verifiability. The failure mode is that two different trees will generate the same hash. Because the root hash is always changing, this is not really a problem as the next change to the tree will likely trigger a new hash -- usually within a short amount of time.
Node additions are detected in real-time by sending the points for the new node as well as points for the edge node that adds the node to the tree.
Node copies are are similar to add, but only the edge points are sent.
Node deletions are recorded by setting a tombstone point in the edge above the node to true. If a node is deleted, this information needs to be recorded, otherwise the synchronization process will simply re-create the deleted node if it exists on another instance.
Move is just a combination of Copy and Delete.
If the any real-time data is lost in any of the above operations, the catch up syncronization will propogate any node changes.
Much of the frontend architecture is already defined by the Elm architecture. The current frontend is based on the elm-spa.dev project, which defines the data/page model. Data is fetched using REST APIs, but eventually we would like to use the same synchronization method that is used in edge devices to make the web UI more real-time.
We'd like to keep the UI optimistic if possible.