Data

Contents

See also:

Data Structures

As a client developer, there are two main primary structures: NodeEdge and Point. A Node can be considered a collection of Points.

These data structures describe most data that is stored and transferred in a Simple IoT system.

The core data structures are currently defined in the data directory for Go code, and frontend/src/Api directory for Elm code.

A Point can represent a sensor value, or a configuration parameter for the node. With sensor values and configuration represented as Points, it becomes easy to use both sensor data and configuration in rule or equations because the mechanism to use both is the same. Additionally, if all Point changes are recorded in a time series database (for instance Influxdb), you automatically have a record of all configuration and sensor changes for a node.

Treating most data as Points also has another benefit in that we can easily simulate a device -- simply provide a UI or write a program to modify any point and we can shift from working on real data to simulating scenarios we want to test.

Edges are used to describe the relationships between nodes as a directed acyclic graph.

dag

Nodes can have parents or children and thus be represented in a hierarchy. To add structure to the system, you simply add nested Nodes. The Node hierarchy can represent the physical structure of the system, or it could also contain virtual Nodes. These virtual nodes could contain logic to process data from sensors. Several examples of virtual nodes:

  • a pump Node that converts motor current readings into pump events.
  • implement moving averages, scaling, etc on sensor data.
  • combine data from multiple sensors
  • implement custom logic for a particular application
  • a component in an edge device such as a cellular modem

Like Nodes, Edges also contain a Point array that further describes the relationship between Nodes. Some examples:

  • role the user plays in the node (viewer, admin, etc)
  • order of notifications when sequencing notifications through a node's users
  • node is enabled/disabled -- for instance we may want to disable a Modbus IO node that is not currently functioning.

Being able to arranged nodes in an arbitrary hierarchy also opens up some interesting possibilities such as creating virtual nodes that have a number of children that are collecting data. The parent virtual nodes could have rules or logic that operate off data from child nodes. In this case, the virtual parent nodes might be a town or city, service provider, etc., and the child nodes are physical edge nodes collecting data, users, etc.

Node Topology changes

Nodes can exist in multiple locations in the tree. This allows us to do things like include a user in multiple groups.

Add

Node additions are detected in real-time by sending the points for the new node as well as points for the edge node that adds the node to the tree.

Copy

Node copies are are similar to add, but only the edge points are sent.

Delete

Node deletions are recorded by setting a tombstone point in the edge above the node to true. If a node is deleted, this information needs to be recorded, otherwise the synchronization process will simply re-create the deleted node if it exists on another instance.

Move

Move is just a combination of Copy and Delete.

If the any real-time data is lost in any of the above operations, the catch up synchronization will propagate any node changes.

Tracking who made changes

The Point type has an Origin field that is used to track who generated this point. If the node that owned the point generated the point, then Origin can be left blank -- this saves data bandwidth -- especially for sensor data which is generated by the client managing the node. There are several reasons for the Origin field:

  • track who made changes for auditing and debugging purposes. If a rule or some process other than the owning node modifies a point, the Origin should always be populated. Tests that generate points should generally set the origin to "test".
  • eliminate echos where a client may be subscribed to a subject as well as publish to the same subject. With the Origin field, the client can determine if it was the author of a point it receives, and if so simply drop it. See client documentation for more discussion of the echo topic.

Converting Nodes to other data structures

Nodes and Points are convenient for storage and synchronization, but cumbersome to work with in application code that uses the data, so we typically convert them to another data structure. data.Decode, data.Encode, and data.MergePoints can be used to convert Node data structures to your own custom struct, much like the Go json package.

Evolvability

One important consideration in data design is the can the system be easily changed. With a distributed system, you may have different versions of the software running at the same time using the same data. One version may use/store additional information that the other does not. In this case, it is very important that the other version does not delete this data, as could easily happen if you decode data into a type, and then re-encode and store it.

With the Node/Point system, we don't have to worry about this issue because Nodes are only updated by sending Points. It is not possible to delete a Node Point. So it one version writes a Point the other is not using, it will be transferred, stored, synchronized, etc and simply ignored by version that don't use this point. This is another case where SIOT solves a hard problem that typically requires quite a bit of care and effort.