Multi-interface Mesh Networking

 

Early in our development process, it became clear a flexible, robust mesh network would be an extremely useful feature of the DroneLink platform.  Two of the primary drivers of this were:

  • Supporting multiple boats on the water at the same time (4 is our current record, plus support equipment)
  • Extending the range of our lake testing beyond 100m from the base station
This article outlines the background and operation of our solution, referred to as DroneMesh.  As prior reading, you'll need to understand the concepts in DroneNode Modules and Loosely Coupled Pub/Sub Messaging.

For general background see Wikipedia Wireless Mesh Network.  

Use Cases

Planning ahead, we also mapped out the various use cases we were likely to encounter on the way to attempting the actual Microtransat challenge:
  • Short-range (<50 m):
    • Bench testing - all devices located within the vicinity of a single building and/or all within WiFi range of a single router
    • Lake testing: Very close to shore/launch-point testing
  • Medium-range (<200 m):
    • Typical range for most lake tests
    • Line-of-sight radio practical at fairly high data rates, 433MHz or 915MHz viable
  • Long-range (<20 km):
    • Typical range for early sea trials
    • Line-of-sight radio is still possible at a low data rate - may require a lower frequency (433MHz) for range and directional base-station antenna
    • GSM connection may be possible
  • Very long range (over the horizon, international water):
    • No line of sight radio possible
    • Only satellite (Iridium) comms are practical
Note: We have tested WiFi range between an ESP32 and a portable 4G WiFi router and achieved approx. 50m max range

Requirements

DroneMesh sits at OSI layers 3-4 (see OSI Model) with the following summarised requirements:

  • Allow DroneLink nodes to act as routers and thereby form an ad-hoc mesh

  • Manage multiple network interfaces, each with different capabilities (e.g. point to point, broadcast) - OSI layers 1-2

  • Deal with node and/or interface failures, planned restarts, and poor link quality (e.g. high packet loss from poor radio reception)

  • Seamlessly transmit DroneLink pub/sub messages over the mesh network

  • Feedback a level of route info/status to “server” nodes for analysis/troubleshooting (web UI, etc)

  • Low overhead such that it can be implemented on the ESP32 with minimal resource consumption

  • Provide an estimate of link quality/bandwidth

  • Support differing packet priorities (relative to link quality/speed)

  • Support fire-and-forget vs reliable packet delivery

  • Minimise packet duplication across multiple interfaces / network paths


High-level Design Choices

To meet the broad requirements with minimum complexity, DroneMesh is implemented as a datagram-based, session-less network, which means that we can ignore packet Segmentation (layer 4) or Session management (layer 5).

Packet Sizes

Packet sizes are deliberately limited, with a fixed maximum packet size of 60 bytes (including header and CRC) to ensure they can be transmitted over the RFM69 radios without segmentation. This makes the network extremely inefficient at transferring large files/data structures, but that is a rare use case for our application and a worthwhile trade-off for a simpler implementation.

Addressing

Node addressing is also kept extremely simple and utilises the same 1-byte address space as defined for DroneLink (1-254 are usable addresses). Within the scope of our project, we do not expect to exceed this address space, but it would be a consideration if other teams/individuals were to use our platform in the future.

Types of network interface

At a minimum, we'd like the following interfaces to be supported:
  • WiFI (UDP)
  • RFM69 on either 915MHz or 433MHz
  • Point-to-point using a serial interface (wired link, point-to-point telemetry radio, transparent GSM, etc)
  • Iridium satellite radio

Decomposing the challenge by OSI Layers

Creating a mesh network from scratch was a significant challenge, which we tackle by breaking the problem down into a few building blocks based on the associated OSI layers:
  1. Point-to-point or broadcast interfaces with the associated physical interface and data link layer (transmission framing, and packet management) - OSI layers 1-2
  2. Packet management including addressing, routing, and traffic control - OSI Layer 3
  3. Optional reliable transport - OSI Layer 4

Within our overall system architecture, the DroneMesh system managers Mesh Routing via a set of Network Interfaces (group 5 below), seamlessly transferring DroneLink pub/sub messages to the internal message bus (group 4 below):


Physical Interfaces and Data Link Layer

Within the DroneNode firmware, network interfaces are implemented as a more specialised form of DroneModule that inherits from a NetworkInterface base class.  Network interface modules are responsible for:
  • Specific hardware interface to the ESP32 (via WiFi, serial, SPI, etc) 
  • Transmission framing including error detection
  • Transmit buffer  (messages queued for transmission)
  • Interface state (up/down)
  • Option to enable/disable the interface at run-time
When NetworkInterface modules are instantiated, they register with the core DroneMesh Router and thus participate in the mesh.

RFM69 Interface

As an example, the RFM69 interface implements:
  • SPI connection between the ESP32 and the RFM69HW hardware module (using the Radiohead library)
  • Transmission framing - wrapping a DroneMesh packet in a special start byte and a trailing 8-bit CRC
  • Error checking - on packet receive, the size and CRC are validated before passing the packet onto the Mesh router
  • Transmission queueing - new packets are only transmitted once the hardware has completed the transmission of previous packets
All packets are transmitted as broadcast to all receivers within reception, as that is the basic operation of the RFM69 radios.  There is a hard-coded network identifier and encryption key shared between all DroneNodes.

The RFM69 module also exposes parameters to set the operating frequency and transmit power.


Packet Management - Routing and traffic control

With the individual network interfaces defined, we can focus on the most complex problems - mesh routing and traffic control.  For this application, I selected the Babel routing protocol as the primary inspiration, but with a number of modifications including support for multiple interfaces/physical layers, a simpler addressing scheme, and a suitable link metric algorithm.

The following illustrates the routing problem for a conceptual multi-node network with limited transmission ranges (the large dash circles).  This is fairly representative of a selection of boats on a lake where intermediate nodes are acting as mesh routers to bridge the short transmission range of the radio links.










The Router takes care of the following core tasks:
  • Node discovery
  • Routing table maintenance
  • Processing received packets - either for local consumption or to transmit on the next hop to their destination
  • Maintaining the transmit queue, including rate limiting and packet prioritisation
  • Transmitting outbound packets to their next hop
  • Maintain routing statistics

Node Discovery

To populate and maintain the routing tables of their peers, each node transmits a "Hello" packet every ~5 seconds on each active interface.  Nodes receiving the Hello packets will re-transmit them to other nodes on their active interfaces after updating a link metric value, thus "flooding" the network with the latest info about the source node.  The Hello packet also contains the uptime of the source node, which helps identify when a node is restarted and thus force a routing update.


Routing Table Maintenance

Hello packets are used to construct and maintain the routing table on each node.  Each routing entry stores:
  • Target node address
  • When the node was last heard (any packet type)
  • When a Hello packet for this node was last transmitted 
  • The latest metric received in a Hello packet
  • The last received uptime for the node
  • The last routing sequence number heard from this node (used to avoid loops - see Babel protocol)
  • The next hop to reach this node - the node address to transmit a packet via to reach the target
  • Active interface - which interface did we use to last communicate with this node
  • Link statistics related to this node
  • Time we last received an Ack from this node
  • A sequencer map - a form of bitmask used to track what sequence numbers we've received in a circular array, helps to avoid processing duplicate packets received via multiple transmission paths
Routes are updated according to the Babel protocol, with a slight modification to include the node uptime as an override to consider a Hello packet a new valid route - i.e. if uptime is less than last received, consider the new route valid irrespective of the sequence number.

An example routing table summary is shown below:




Processing Received Packets

Packet processing is relatively straightforward.  From the point of view of the processing node: if the destination address is "our" address, then process the packet.  Processing means either passing to the DroneLink messaging sub-system or to a specific low-level handler for things like route statistics requests, trace routes, etc.

If the packet is not addressed to us, then lookup the next hop in our local routing table for the destination node and re-transmit the packet using the appropriate interface.


Maintaining the Transmit Queue

We need to maintain a (short, currently max 24 packets) transmit queue for the following reasons:
  • Some interfaces have higher bandwidth than others and require intermediate buffering
  • Reliable delivery (see details below) requires storing a packet for potential re-transmission until an Ack is received
  • The internal rate at which DroneLink pub/sub parameters get updated can be in the KHz range, whereas mesh packet bandwidth is practically limited to <50Hz, thus there is a need for rate limiting 
Space in the transmit buffer is given first to reliable delivery packets, ordered by packet priority (highest first).  A separate hash-table structure stores the last transmit time of each parameter address, limiting the maximum transmission rate based on packet priority:
  • Critical priority packets can transmitted up to 1Hz
  • High up to 0.5Hz
  • Medium up to 0.25Hz
  • Low up to 0.1Hz
Packets that are attempted at a higher rate are dropped.


Outbound Transmission

Outbound transmission is handled by the network interface modules.  The DroneMesh router assembles the packet and hands it to the interface to be wrapped in any transmission framing before being transmitted.


Routing Statistics

Each router instance maintains route-level statistics (e.g. average Ack time) as well as overall statistics such as packet transmit/receive rate, error rates, transmit queue size and choke rates.

There are also supporting management packets that support trace routes and gathering remote routing table info to allow a server or Web UI to visualise the network operation.  An example visualisation of a live mesh is shown below:



A more complex example with 9 active nodes and node 59 acting as a bridge between two different physical network interfaces (WiFi and RFM69):





Mesh Packet Structure

DroneMesh packets have a common 7-byte header structure:
  • Packed 1-byte value containing:
    • 1-bit for unicast or broadcast
    • 1-bit for guaranteed delivery or fire-and-forget
    • 6-bits for packet size -1 (i.e. represents values of 1-64 bytes)
  • 1-byte Transmitting node address
  • 1-byte Source node address
  • 1-byte Next node address
  • 1-byte Destination node address
  • 1-byte Sequence number (incremented on each packet transmitted from a source, used to ignore duplicate reception)
  • Packed 1-byte value containing:
    • 2-bit payload priority
    • 6-bit Payload type

Packet types 

These are the packet types defined/implemented so far, there is headroom to define others if needed:
  • 0: Hello
  • 1/2: Subscription Request/Response - establish a DroneLink subscription 
  • 3/4: Traceroute request/response - trace network path to/from a target node
  • 5/6: Routing entry request/response - get info about a specific routing entry on a target node
  • 7: DroneLink message - always sent by guaranteed/reliable delivery
  • 10/11: Filesystem file request/response: Get header info for an entry in the filesystem
  • 12/13: Request/response to resize a file in target filesystem (or set to zero to delete)
  • 14/15: Read file request/response: Read a data block from a file
  • 16/17: Write file request/response: Write a data block to a file
  • 20/21: Router info request/response - high level router statistics
  • 22/23: Firmware start request/response - place target node into firmware download mode
  • 24: Firmware write - write a block of data to the firmware (guaranteed)
  • 25: Firmware rewind - Request re-transmission of a data block from the transmitting server

Reliable Transport

Some packets (e.g. DroneLink messages) are marked for guaranteed delivery, in this situation, the router will wait for an acknowledge packet (Ack) to be received from the next hop before considering the packet delivered.  Timeouts and retry limits are used to ensure a packet doesn't get re-transmitted indefinitely.  

Waiting for Acks can place a significant additional load on the network, but it's a key feature when dealing with lossy radio links.


Result & Lessons Learnt

Implementing and robusting the DroneMesh protocol has been extremely challenging.  A year after the basics were first proven and I'm still uncovering issues, making performance enhancements and extending the supported interfaces.  That said, it's delivered well against the original objectives and we rely on it heavily when lake testing.  

The next extension is likely to be the addition of Serial over GSM modules, to allow for long-range communication over the cellular network.  This is something Josh is evaluating at a proof-of-concept level before we look to add it to DroneMesh as an additional interface. 


Power Management

During short and medium range testing, we are generally not out on the lake for more than 6hrs, so the power requirements of the mesh are easily accommodated by the boats on-board LiPo batteries.  As we approach endurance testing of several days or weeks, we will be implementing several power management strategies:
  • ESP32 WiFi radio consumes a significant amount of current (>120mA) - once out of range we can switch this off via the management system, reducing the ESP32 power draw to approx 20mA
  • ESP32 power consumption can further be reduced by dynamically reducing CPU frequency from normal 240MHz down to approx 80MHz
  • We can also put the ESP32 to sleep a lot of the time, as the main processing loop can be executed in just a few milliseconds.  However, we currently have I2C bus issues when returning from sleep - this needs further investgation
  • RFM69 radios are very low power to start with and we can reduce their transmit frequency and transmit power if required using our management interface
  • Long-range operation will rely entirely on satellite comms, where we will only power the modem up a few times an hour to send/receive.  All other radios will be shutdown in that mode.


Satellite Radio

Iridium

We had originally planned to use Iridium satellite communications for long-range sea trials and the eventual Microtransat attempt.  We've conducted a proof of concept and understand the basic mechanisms, but are a long way off using it in the wild, not least because of the ~10p per packet cost.  

Given the very low data rates and expensive packet fee, we may not integrate it as a DroneMesh interface, but rather create a custom ome-way transport for logging purposes only.  This may also include compressing a number of data items into each satellite packet to make efficient use of the bandwidth.  An interesting problem for the future! 

Swarm

As an alternative to Iridium, the Swarm satellite communications network has launched in the last few years, growing rapidly (and being acquired by SpaceX) and they appear to now have a more cost effective offering that includes UK coverage.  

Swarm vs Iridium RockBLOCK:

  • On a hardware level, Swarm M138 modules are now considerably cheaper:
    • Swarm M138 approx £150
    • RockBLOCK Mk2 module for Iridium approx £300
  • Power consumption:
    • Swarm: 80uA in sleep, 1A in transmit
    • RockBLOCK: 40mA in low power, 470mA max (leveraging internal super capacitor to smooth the demand when doing burst transmission)
  • Network connectivity is comparable - the server-side offers WebHOOK or REST integration
  • Iridium has the edge in uplink packet sizes:
    • 192 bytes for Swarm
    • 340 bytes for Iridium.
  • Swarm is the clear winner on data costs, even after converting to cost per byte transmitted:
    • Swarm is $5/month for up to 750 packets/month.  or less than 1p per packet
    • Iridium is a pay-as-you-go credit plan, which at similar volume is £90 for 1000 packets, or 9p per packet.  

Given the above, we will acquire and test a Swarm module before finalising the satellite radio setup for sea trials!

Thanks to Fr on the Microtransat forum for suggesting we take another look at Swarm :)

Source Code

If you're interested in reviewing the source code, it's available in the DroneNode github repo.
Some relevant files to start with are:
  • src/DroneLinkManager.h
  • src/DroneLinkMsg.h
  • src/DroneMeshMsg.h
  • src/droneModules/NetworkInterfaceModule.h
  • src/droneModules/RFM69TelemetryModule.h





Comments

Popular posts from this blog

Waypoint Navigation and Sailing Algorithm

DroneNode Motherboards - Evolution from v1 to v5

Volantex Compass Robotic Conversion