Verifying SmartMesh IP >99.999% Data Reliability for Industrial Internet of Things Applications

Verifying SmartMesh IP >99.999% Data Reliability for Industrial Internet of Things Applications

著者の連絡先情報

ross-yu-blue-background

Ross Yu

The Industrial Internet of Things (IoT) requires industrial wireless sensor networks (WSNs) with stringent reliability and security.1 Since such networks must operate reliably more than ten years without intervention, industrial WSNs must cope with severely changing environmental conditions over time. In addition, they must also be scalable and flexible so that the networks can support growing business needs and data traffic over a significant period of time.

SmartMesh wireless mesh network products from Analog Devices are engineered and rigorously tested for Industrial IoT applications, delivering >99.999% data reliability in some of the harshest environments. Prior to each new SmartMesh software production release, Analog Devices accumulates over a million node-hours of network operation with a minimum of five nines (>99.999% data reliability) before it is declared production quality.

This paper focuses on the methods Analog Devices uses to verify data reliability through radio hardware qualification, automated network test methodology, and systematic network testing. Performance statistics from a live production network are also reviewed. The topic of network security is addressed in another article and is not covered here.2

Radio Hardware Qualification Testing

The performance of a WSN is a function of both the underlying radio hardware and the protocol running on that chip. SmartMesh radios, such as the LTC5800, undergo rigorous testing to confirm the parameters of their operation. The results of these tests are ascertained across multiple production lots prior to publication of the data sheet, which includes all relevant specifications for the hardware. Consistent with Analog Devices’ focus on the industrial market, the design qualification of the hardware includes operational network testing, known as highly accelerated lifetime tests (HALT), involving a live network running while subjecting the hardware to extreme conditions, including cold thermal step stress, hot thermal step stress, voltage margining, rapid thermal transitions, vibration step stress, combined thermal and vibration stress, and extended temperature tests. (See Figure 1.)

Automated Network Test Methodology

In order to assure in-service reliability, testing must comprehensively cover the situations a network will encounter throughout many years of operation. Analog Devices makes extensive use of test automation to facilitate hundreds of network tests, each verifying a unique set of test conditions. To do so, a network test bed (see Figure 2) comprised of banks of hundreds of wireless nodes can be readily configured into any number of test networks, large or small. A centralized test server can quickly commission entire co-located networks, run multiple system tests and recommission nodes for the next set of tests by programming via each wireless node’s application programming interface (API). Full regression tests become pragmatic with automation, ensuring that existing functions and behaviors are preserved in subsequent software releases.

The test bed has a dense, noisy RF environment because each network under test is immersed in a sea of wireless traffic from the other networks operating simultaneously. This network traffic, along with nearby Wi-Fi routers, Bluetooth and cellular radios, create an elevated RF noise floor representative of an extremely challenging RF environment.

Figure 1. SmartMesh Nodes Operating in a Temperature Chamber.

Figure 2. Test Automation—By Instrumenting Hundreds of Wireless Nodes with an Automated Test Fixture, a Test Plan of Hundreds of Tests Can Be Executed in Days Instead of Months.

Systematic Network Testing

Using the network test bed, reliability is verified on hundreds of network topologies. For example, the following network (Table 1) was set up to benchmark a typical 100-node network, 4 hop network. Each node generated 2 data packets per minute and the network was run for a minimum of 500 hours (over 21 days). The environment simulates a typical commercial or light industrial setting, with the presence of electronic equipment and metal structures and people moving throughout the building. In spite of the lossy RF environment, the network, through packet retries and path and channel diversity, achieved better than 99.999% data reliability while sending over 43 million data packets.

The network test bed independently verifies the SmartMesh networking software’s built-in reliability metrics by counting the packets injected at each node’s API port and successfully received at the gateway node API. The built-in statistics are available to the user via a software API interface at the gateway node and enable developers and users to assess SmartMesh reliability in their own application during initial assessment and over the life of the network.

To attain >99.999% data reliability, Analog Devices engineers troubleshoot every packet transmission error during system test, no matter how rare. To monitor and capture such errors when they do occur, each node’s API port, CLI port and SPI flash programming port are connected in the network test bed, enabling a Analog Devices engineer to monitor each node and debug low level software as a message propagates across the mesh network.

Additionally, the network test bed is instrumented to gather detailed performance metrics, including average node current consumption, data throughput and network latency (the time it takes a message to propagate through multiple nodes in the mesh network). The network test bed injects sensor data from every wireless node into the network to measure latency and to characterize the gateway node’s ability to handle the traffic. These tests are repeated with optional network configurations, such as a low latency mode or with more bidirectional network traffic.

Finally, the network test bed facilitates stress testing to verify the network’s ability to handle problem conditions gracefully. These tests systematically introduce churn in various nodes within the network under test, such as deactivating nodes to verify that the remaining neighboring nodes never lose a packet. Other stress tests invoke widespread node failures to stress the gateway node’s ability to reroute traffic and heal the network when large portions of the network are compromised. Such stress testing verifies an industrial WSN’s ability to cope with such contingencies since they are often entrusted to monitor and control business-critical systems.

Table 1. Example of Network Test Bed Results for a 100-Node Network
Number of Nodes 100
Mesh Network Depth 4 Hops from Furthest Node to Gateway
Packet Generation Rate 1 Data Packet/30 Seconds from Each Node
Number of Data Packet Sent 43,792,812 Over 27 Days
Number of Data Packets Received at Final Destination 43,792,756 (99.99987% Data Reliability)

A Production Network in Analog Devices’ Wafer Fab Facility

SmartMesh IPhas been deployed at Analog Devices’ wafer fabrication (fab) facility in Silicon Valley to monitor pressure for hundreds of specialty gas cylinders used in the various etching and cleaning stages of wafer fabrication. Previously, each cylinder’s pressure was checked manually three times a day, for a total of 4 hours of manual work per day. A SmartMesh IP network was deployed to automate the measurements and send the readings directly to the facility’s control center software. In the gas bunker, thirty-two wireless nodes were deployed to measure each cylinder’s tank pressure and regulated pressure. Each wireless node is connected to a pair of cylinders, for a total of 4 data packets which are transmitted every 30 seconds from each node.

RF conditions in the fab are typical of an industrial environment, with wireless nodes surrounded by metal, concrete and with work crew and equipment moving in the area throughout the day. The network has been in operation over 83 days continuously, has sent over 26 million data packets and has experienced over 7 nines (>99.99999%) of reliability.

Table 2. Network Statistics—SmartMesh IP Network at Analog Devices' Wafer Fab Facility
Number of Nodes 32
Mesh Network Depth 4 Hops from Furthest Node to Gateway
Packet Generation Rate 4 Data Packet/30 Seconds from Each Node
Number of Data Packet Sent 26,137,382 Over 83 Days
Number of Data Packets Received at Final Destination 26,137,381 (99.999996% Data Reliability—Seven Nines of Reliability)

Figure 3. Toxic Gas Cabinets at Analog Devices’ Wafer Fab are Closely Monitored to Ensure Uptime.

Figure 4. Dense Metal and Concrete—Wireless Nodes Must Perform Reliably Even When Located Among Metal Equipment and Gas Distribution Pipes.

Conclusion

Wireless sensor networks used for Industrial Internet of Things applications must meet the highest bar for reliability over a long service life. In order to assure that networks meet these stringent requirements, system hardware and software must be designed from the ground up for industrial performance, tested at the component, interface and network level with rigor, and in-service networks must be operated under stress to confirm acceptable lifetime reliability metrics can be met. Analog Devices’ SmartMesh networks deliver >99.999% data reliability in rigorous end-to-end testing and in the field. Over 76,000 SmartMesh networks have been deployed worldwide in demanding applications such as data centers, factories, power utilities, fence line security, outdoor environmental monitoring, agricultural applications, mining and tunnels, and industrial process.3