Robust Generalization of Graph Neural Networks for Carrier Scheduling

Daniel F. Perez-Ramirez 0000-0002-1322-4367 RISE Computer Science &
KTH Royal Institute of TechnologySweden daniel.perez@ri.se , Carlos Pérez-Penichet 0000-0002-1903-4679 RISE Computer ScienceSweden carlos.penichet@ri.se , Nicolas Tsiftes 0000-0003-3139-2564 RISE Computer Science &
Digital FuturesSweden nicolas.tsiftes@ri.se , Dejan Kostić 0000-0002-1256-1070 KTH Royal Institute of Technology & RISE Computer ScienceSweden dmk@kth.se , Magnus Boman 0000-0001-7949-1815 Karolinska Institutet & MedTechLabsSweden magnus.boman@ki.se and Thiemo Voigt 0000-0002-2586-8573 Uppsala University &
RISE Computer ScienceSweden thiemo.voigt@angstrom.uu.se

Abstract.

Battery-free sensor tags are devices that leverage backscatter techniques to communicate with standard IoT devices, thereby augmenting a network’s sensing capabilities in a scalable way. For communicating, a sensor tag relies on an unmodulated carrier provided by a neighboring IoT device, with a schedule coordinating this provisioning across the network. Carrier scheduling—computing schedules to interrogate all sensor tags while minimizing energy, spectrum utilization, and latency—is an NP-Hard optimization problem. Recent work introduces learning-based schedulers that achieve resource savings over a carefully-crafted heuristic, generalizing to networks of up to 60 nodes. However, we find that their advantage diminishes in networks with hundreds of nodes, and degrades further in larger setups. This paper introduces RobustGANTT, a GNN-based scheduler that improves generalization (without re-training) to networks up to 1000 nodes ( $\mathbf{100}\boldsymbol{\times}$ training topology sizes). RobustGANTT not only achieves better and more consistent generalization, but also computes schedules requiring up to $\mathbf{2}\boldsymbol{\times}$ less resources than existing systems. Our scheduler exhibits average runtimes of hundreds of milliseconds, allowing it to react fast to changing network conditions. Our work not only improves resource utilization in large-scale backscatter networks, but also offers valuable insights in learning-based scheduling.

machine learning, scheduling, graph neural networks, sensor networks, backscatter networks

^†^†copyright: none^†^†conference: ; Pre-Print; Under Review.^†^†ccs: Computer systems organization Sensor networks^†^†ccs: Computing methodologies Planning and scheduling^†^†ccs: Computing methodologies Machine learning

1. Introduction

Refer to caption — Figure 1. RobustGANTT generates schedules for backscatter networks using a GNN-based Transformer model. Step 1: collect MAC and routing protocol information. Step 2: build the IoT network’s graph representation, only including edges strong enough for carrier provisioning (e.g., -75 dBm). Step 3: generate the schedule through iterative one-shot node classification. Step 4: disseminate the schedule using existing network flooding mechanisms and append it to the IoT device’s normal schedule.

Recent advancements in backscatter communication enable the battery-free operation of sensor devices—termed sensor tags—that perform bi-directional communication with standard Internet of Things (IoT) devices (Kellogg et al., 2016; Ensworth and Reynolds, 2015; Kellogg et al., 2014; Talla et al., 2017; Iyer et al., 2016; Pérez-Penichet et al., 2016). Such sensor tags can be added to an existing network of Commercial Off-The-Shelf (COTS) IoT devices to augment the network’s sensing capabilities without requiring additional modifications to the IoT devices (Pérez-Penichet et al., 2020). However, communication between a sensor tag and its hosting IoT device requires the provision of an unmodulated carrier by a neighboring IoT device. A schedule coordinates this provisioning globally across the network to interrogate all sensor values. Figure 1 shows the high-level procedure of computing a schedule, and its structure. It consists of one or more timeslots $s$ , each assigning one of three possible actions to the IoT devices in the network: provide unmodulated carrier $\mathtt{C}$ , interrogate one of its hosted tags $\mathtt{T}$ , or remain idle $\mathtt{O}$ .

Motivation. Battery-free sensor tags provide a scalable, cost- and energy-efficient way to augment the sensing capabilities of existing IoT networks (Kellogg et al., 2016; Pérez-Penichet et al., 2016, 2020). Their battery-free operation reduces electronic waste, and prevents extensive maintenance costs compared to battery-powered alternatives. It also allows placing sensors in hard-to-reach locations, such as medical implants, moving machinery, or embedded in physical infrastructure. The sensor tags may, e.g., prevent patients from undergoing surgery just to replace the battery of medical implants. Reducing the energy consumption of networks hosting sensor tags is of paramount importance not only for sustainability reasons, but also because such networks are often energy constrained.

Challenges. Carrier scheduling—computing a schedule to interrogate all sensor tags while minimizing energy, spectrum utilization, and latency—is, in general, an NP-Hard Combinatorial Optimization Problem (COP) (Pérez-Penichet et al., 2020). It is similar to the traditional wireless link scheduling, but must consider additional constraints for tag interrogations and resource minimization (see Sec. 3). There are also several symmetries involved, both in permuting the timeslots and in selecting carrier generators (Perez-Ramirez et al., 2023). E.g., in Figure 1, exchanging the timeslots’ order alters neither the number of carriers required, nor the latency to query all tags. Also, for timeslot $s_{3}$ , nodes $v_{2}$ and $v_{3}$ are equally valid carrier providers for $\mathtt{T}_{5}$ . A scheduler must process variable input-output structures: networks of different sizes, and schedules of different lengths. It must also leverage the topological structure of the network to favor using one carrier for multiple concurrent tag interrogations (e.g., timeslots $s_{1}$ and $s_{2}$ in Figure 1). Additionally, it must compute schedules in a timely manner to react to connectivity changes of the IoT network.

Current Learning-based Schedulers exhibit Limited Scalability. In general, it is impractical to compute the analytically optimal schedule for IoT networks of hundreds of nodes and sensor tags. This implies running a Constraint Optimizer (CO) for several hours, most likely yielding an obsolete schedule due to changes in the network’s connectivity. Alternatively, one can use the TagAlong scheduler (Pérez-Penichet et al., 2020), a carefully-crafted heuristic with polynomial runtime. However, its performance is increasingly sub-optimal as the network size increases. Recent work introduces DeepGANTT, a scheduler that learns from optimal schedules of small networks (up to 10 nodes) and scales to networks of up to 60 nodes, while reducing the number of carriers compared to TagAlong (Perez-Ramirez et al., 2023). As we show in Sec. 6.2.2, reducing the number of carriers directly translates to energy savings.

However, DeepGANTT presents two main issues when further scaling the problem to graphs larger than 60 nodes, as depicted in Figure 2. We train eight independent models (in accordance to (Perez-Ramirez et al., 2023)), while fixing the training data, hyperparameters and random seeds. DeepGANTT’s best model (isolated squares in Figure 2) is only marginally better than the heuristic for 100-node topologies. Moreover, while all eight models perform well on the training set, their generalization to larger networks significantly varies. The dashed line in Figure 2 shows how the average performance across the eight models is increasingly worse compared to TagAlong, even for 60-node topologies. We attribute this behavior both to the stochastic training procedure that leads most scheduler models to ”bad” local minima, and to the model’s inability to capture the full problem complexity.

Approach. In this paper, we leverage the latest advances in Graph Neural Networks and Machine Learning (ML) to present RobustGANTT, a scheduler for backscatter networks with strong and consistent generalization capabilities. To design RobustGANTT, we set out to explore ML-related training aspects, beginning with our own implementation of DeepGANTT. We train our scheduler with optimal schedules of networks of up to 10 nodes and 14 tags computed by a CO. The use of GNNs in our system design allows the scheduler to process variable input-output structures, and to process much larger, previously unseen topologies without the need for re-training. For designing our system, we investigate three aspects influencing the scheduler’s generalization as follows.

First, we assess the influence of warmup (Ma and Yarats, 2021), and prove it highly beneficial for the model’s ability to compute complete schedules for larger topologies. Furthermore, we explore incorporating Positional Encoding (PE) into the node features to enhance the GNN’s ability to handle symmetries in schedule computation. We find that the node-degree PE offers the best trade-off for achieving good generalization, while avoiding the computation overhead of Eigenvalue Decomposition (EVD)-based methods. Finally, we study the influence of increasing the number of attention heads of the GNN layers to capture more complex topological dependencies among the IoT nodes in the network (Nakkiran et al., 2020).

Contributions. Based on the former, we present RobustGANTT, a novel GNN-based scheduler that generalizes to networks of up to 1000 nodes ( $100\times$ training sizes), far beyond the capabilities of current learning-based systems (Perez-Ramirez et al., 2023), while delivering schedules that require up to $2\times$ less resources than those by the TagAlong heuristic (Pérez-Penichet et al., 2020). Our system exhibits polynomial time complexity, allowing it to react fast to changing network conditions. Figure 2 shows how our scheduler not only outperforms DeepGANTT, but also exhibits consistent generalization across the independently trained models.

To evaluate RobustGANTT’s capabilities on real-life IoT networks, we use it to compute schedules for a testbed with 23 nodes and varying number of sensor tags. Our system achieves 12% on average and up to 53% savings in energy and spectrum utilization compared to the TagAlong heuristic, which corresponds to up to $1.9\times$ more savings over the DeepGANTT scheduler. Furthermore, thanks to the polynomial time complexity of the model, it exhibits average runtime of 540 ms for the real IoT network, and achieves up to $2\times$ reduction in 95th percentile runtime against DeepGANTT. These characteristics enable RobustGANTT to compute schedules for IoT networks even in dynamic changing conditions.

We make the following specific contributions:

•

We present RobustGANTT, a learning-based scheduler that generalizes without re-training to networks of up to 1000 nodes ( $\mathbf{100}\boldsymbol{\times}$ larger than those used for training), far surpassing existing learning-based schedulers.
•

We use RobustGANTT to compute schedules for a real IoT network. Our model achieves 12% on average and up to 53% resource savings compared to TagAlong, which correspond to up to $1.9\times$ more savings than those achieved by DeepGANTT.
•

RobustGANTT reduces runtime’s 95th percentile by up to $2\times$ against DeepGANTT, which allows it to react faster to changing network conditions.

The paper is structured as follows. Sec. 2 provides background and related work. Sec. 3 formally describes the scheduling problem. Sec. 4 presents the RobustGANTT scheduler, and Sec. 5 describes our system’s GNN model design. Sec. 6 and Sec. 7 present the evaluation and discussion, respectively. Finally, Sec. 8 concludes the paper.

2. Background and Related Work

Our work draws upon backscatter communication, scheduling for backscatter networks and ML for scheduling.

2.1. Backscatter Communication

Several recent efforts advance backscatter communications and battery-free networks (Kellogg et al., 2016; Talla et al., 2017; Iyer et al., 2016; Ensworth and Reynolds, 2015; Kellogg et al., 2014; Zhang et al., 2017; Majid et al., 2019; Karimi et al., 2017; Geissdoerfer and Zimmerling, 2021; Ahmad et al., 2021; Li et al., 2018b; Guo et al., 2020). While some work focus on monostatic or multi-static backscatter configuration (Yang et al., 2011; Hamouda et al., 2011; Yue et al., 2012; Katanbaf et al., 2021), we focus on networks hosting sensor tags in the bistatic configuration (separated receiver from carrier generator).

Sensor tags leverage backscatter techniques to perform bidirectional communication with their hosting IoT node over standard physical layer protocols (Kellogg et al., 2014; Ensworth and Reynolds, 2015; Pérez-Penichet et al., 2016; Talla et al., 2017). They achieve their low-power operation by offloading the local oscillator to a neighboring IoT node (different from its host), which provides the tag with an unmodulated carrier (Pérez-Penichet et al., 2020). The COTS IoT nodes achieve this by, e.g., using their radio test mode (Pérez-Penichet et al., 2016). An IoT node in the network hosts zero or more sensor tags. Moreover, we assume that a sensor tag is hosted by exactly one IoT node responsible for querying the sensor readings. Sensor tags are located within decimeters range to its hosting IoT node, while the IoT nodes in the network are within meters from each other (see Figure 3).

Node-to-Tag Communication. The host-to-tag communication occurs over a time-slotted channel access mechanism due to its ease of integration of sensor tags and their widespread use in commodity devices. Both Bluetooth and Zigbee/IEEE 802.15.4 support this in their standards (Bluetooth SIG, 2021; IEEE, 2016). Figure 3 describes the communication between a tag $\mathtt{T}$ and its host $v_{2}$ , when assisted by a neighboring carrier provider IoT node $v_{1}$ . $t_{rx}$ and $t_{tx}$ are the times for the sensor tag to receive the request-to-transmit from its host, and for transmitting the sensor value back, respectively. $t_{cg}$ is the time spent in carrier provisioning for tag-to-host communication. The timeslot is long enough to complete one request-response cycle between a node and a tag—e.g., two consecutive Time-Slotted Channel Hopping (TSCH) timeslots (10 ms each) for both transmitting the request to the tag and receive the response (Pérez-Penichet et al., 2020; Perez-Ramirez et al., 2023). During $t_{req}$ , $v_{2}$ sends a request signal to $v_{1}$ to start carrier provisioning, allowing $v_{2}$ to regulate the frequency of tag interrogation—e.g., in a schedule with 10 timeslots (200 ms total duration with TSCH), a node might not want to query its tag $1000\text{ ms}/200\text{ ms}=4$ times per second.

Schedule. A schedule coordinates the interrogation of all sensor tags and the provisioning of unmodulated carriers by the IoT nodes for such purposes. It consists of $L\geq 1$ timeslots, each assigning one of three possible actions to IoT nodes in the network: interrogate one of its tags $\mathtt{T}$ , provide unmodulated carrier $\mathtt{C}$ for neighboring tags, or remain idle $\mathtt{O}$ . We leverage the spatial distribution of nodes and tags to perform concurrent tag interrogations with one carrier provider (see Figure LABEL:subfig:carrier-reuse). There are two constraints for performing tag interrogations (Pérez-Penichet et al., 2020). First, due to the time-slotted channel access control mechanism, a node can interrogate only one of its hosting tags per timeslot. Additionally, for a tag to communicate with its hosting node, exactly one neighboring IoT node must provide it with an unmodulated carrier. Multiple impinging carriers on a sensor tag causes interference, and prevents proper tag interrogations (see Figure LABEL:subfig:carrier-interference).

Resource Efficiency. Two metrics determine a schedule’s resource efficiency: the length of the schedule $L$ and the number of carrier slots $C$ . While $L$ indicates the latency of querying all sensor values, $C$ is directly related to spectrum utilization and energy consumption of the IoT network (see Sec. 6.2.2). Figures LABEL:subfig:schedule-types and LABEL:subfig:carrier-reuse show how resource efficient schedules exploit the topological structure of the network to re-use carrier generating nodes within a timeslot.

2.2. Existing Schedulers

A scheduler is a system that receives a description of the IoT network hosting sensor tags, and computes a schedule for interrogating the sensor tags. While recent work explores autonomous scheduling for TDMA based networks (Duquennoy et al., 2015), carrier scheduling requires more powerful hardware for such purposes. In general, carrier scheduling can be solved analytically by using a CO to obtain the optimal schedule. However, this is only feasible for small-sized IoT networks, since the NP-Hard nature of the problem prevents the practical application of the CO due to the long runtimes (e.g., up to 10 hours for a 10-node network).

Alternatively, Pérez-Penichet et al. present TagAlong (Pérez-Penichet et al., 2020), a heuristic algorithm that uses graph coloring to compute schedules. While TagAlong exhibits polynomial runtime, its performance becomes increasingly sub-optimal as the network size increases. Additionally, Pérez-Ramírez et al. present DeepGANTT (Perez-Ramirez et al., 2023), the first ML-based system for carrier scheduling that iteratively builds the schedule timeslot by timeslot. DeepGANTT learns from optimal schedules (computed by a CO) of networks of up to 10 IoT nodes and 14 sensor tags (Perez-Ramirez et al., 2023). It generalizes to networks of up to 60 nodes, achieving significant reduction in the number of carriers required in the schedule against TagAlong. In this work, we advance learning-based scheduling by considering networks of hundreds of nodes, far beyond DeepGANTT’s capabilities.

2.3. Learning-based Scheduling

Several works explore applying ML and GNNs for both COP and scheduling (Vinyals et al., 2015; Vesselinova et al., 2020; Bengio et al., 2021; Dai et al., 2017; Li et al., 2018a; Manchanda et al., 2020; Jeon et al., 2022; Mao et al., 2019), but few explore their usage for backscatter networks (Perez-Ramirez et al., 2023). In this work, we explore GNNs to design a system that generates schedules for backscatter networks consisting of hundreds of nodes.

Graph Neural Networks. GNNs are a flexible ML tool for tackling various inference tasks on graphs, such as node classification (Scarselli et al., 2009; Hamilton, 2020a; Wu et al., 2021). Intuitively, stacking $K$ GNN layers generates node embedding vectors that consider their $K$ -hop neighborhood by utilizing the graph’s structure and the relationships between nodes (Gilmer et al., 2017; Kipf and Welling, 2017). These embeddings are generally processed further with linear layers to produce the final output based on the specific task. For instance, node classification can be achieved by feeding each node embedding vector through a classification layer. For a graph $G=\langle V,E\rangle$ with nodes $v\in V$ and edges $(v,u)\in E$ , at GNN layer $i$ , each node feature vector $h_{v}$ is updated as:

(1)

h^{i}_{v}=f_{1}\left(\,\,h^{i-1}_{v},\,\,\bigcup_{u\in\mathcal{N}(v)}\left[f_{% 2}\left(h_{u}^{i-1}\right)\right]\,\,\right)\,\text{,}

where $\mathcal{N}(v)$ is the set of neighbors of node $v$ with $h_{u}$ representing their feature vectors, and $\bigcup$ is a commutative aggregation function. $f_{1},f_{2}$ are non-linear transformations (Gilmer et al., 2017). For attention-based GNNs, additional learnable scaling parameters are included within $\bigcup$ to weight the contributions of neighboring nodes differently.

One key advantage of GNNs is their ability to leverage the structural dependencies within the graph, and their ability to perform inference on new graphs not encountered during training without needing to retrain the model (Hamilton et al., 2017; Veličković et al., 2018; Vesselinova et al., 2020).

PE in GNNs. PE augments each node’s input feature vector with additional information of its structural role in the graph. The intuition is to aid subsequent GNN layers to better distinguish the nodes involved in symmetries—i.e., to perform injective aggregation of neighboring nodes’ features. Recent work explore PE with both local and global graph properties (Wang et al., 2022; Belkin and Niyogi, 2003; Dwivedi et al., 2023; Lim et al., 2023; Huang et al., 2024; Rampášek et al., 2022). While most focus on using PE to better distinguish different graphs, we are interested in assessing their advantage for effective node classification.

3. Carrier Scheduling Problem

The COP of computing a schedule to interrogate all sensor tags in an IoT network while minimizing both the length of the schedule $L$ and the number of carrier slots $C$ is described as follows. We model the network as an undirected connected graph $G$ , defined by the tuple $G=\langle V_{a},E\rangle$ , where $V_{a}$ is the set of $N$ IoT nodes in the network $V_{a}=\{v_{i}\}_{i=0}^{N-1}$ , and $E$ is the set of edges between the nodes $E=\{\langle u,v\rangle|u,v\in V_{a}\}$ . The connectivity among IoT nodes is determined by the wireless link signal strength, i.e., there is an edge between two nodes only if there is a sufficiently strong wireless signal for providing unmodulated carrier (Pérez-Penichet et al., 2020; Perez-Ramirez et al., 2023). We denote the set of $T$ tags in the network as $N_{t}=\{t_{i}\}_{i=1}^{T}$ , and their respective tag-to-host assignment as $H_{t}:t\in N_{t}\mapsto v\in V_{a}$ . The role of a node $v$ within a timeslot $s$ is indicated by the map $R_{v,s}:v\!\in\!V_{a},s\!\in\![1,L]\mapsto\{\mathtt{C},\mathtt{T},\mathtt{O}\}$ , where $L$ is the schedule length in timeslots. Hence, a timeslot $s_{j}$ consists of an $N$ -dimensional vector containing the roles assigned to every node during timeslot $j$ : $s_{j}=\left[R_{v_{i},j}|v_{i}\in V_{a}\right]^{\top}$ .

For a given problem instance $g=\langle G,N_{t},H_{t}\rangle$ , the carrier scheduling problem is formulated as follows:

(2)	$\displaystyle\min\,$	$\displaystyle\,\left(T\cdot C+L\right)$
(3)	s.t.	$\displaystyle\,\forall t\!\in\!N_{t}\,\,\exists!\,\,s\!\in\![1,L]:\,R_{H_{t},s% }=\mathtt{T}$
(4)		$\displaystyle\,\forall s\!\in\![1,L]\,\,\forall t\!\in\!N_{t}\,\|\,R_{H_{t},s}=% \mathtt{T}\;\exists!\,v_{j}\!\in\!V_{a}:$
(4)		$\displaystyle\,R_{v_{j},s}=\mathtt{C}\wedge(H_{t},v_{j})\in E\,\,\text{,}$

where $C$ is the total number of carriers required in the schedule. Constraints (3) and (4) enforce that tags are interrogated only once in the schedule and that there is exactly one carrier-providing neighbor per tag in each timeslot (to prevent collisions), respectively. The objective function (2) prioritizes reducing $C$ over $L$ because we are most concerned with energy and spectrum efficiency—reducing $C$ often implies a reduction of $L$ , but the converse is not necessarily true (Perez-Ramirez et al., 2023).

Symmetry-Breaking Constraints. Solutions to the carrier scheduling problem are highly symmetrical, which limits effective training of a supervised ML model (Perez-Ramirez et al., 2023). Symmetries arise both from the network topology and from the sensor tags’ distribution among the nodes. E.g., for a star topology hosting one sensor tag in the center node, any of the leaf nodes can be the carrier provider, but the scheduler needs to select only one of these. Additionally, we do not assume any a-priori order for tag interrogations. Hence, any of the $L!$ permutations of a schedule’s timeslots is also a valid schedule with the same length $L$ and number of carrier slots $C$ .

Symmetry-breaking constraints allow to efficiently learn the behavior of the optimal scheduler and properly train an ML model (Perez-Ramirez et al., 2023). For the training data generation procedure, we further constrain the optimization objective in Eq. 2 by enforcing two lexicographical minimizations: first of a vector of length $T$ (number of tags) that indicates the timeslot when each tag is interrogated, and another length- $T$ vector containing the node that provides the carrier for each tag.

4. RobustGANTT System Design

We consider networks consisting of COTS wireless IoT devices, or nodes, equipped with radio transceivers that support standard physical layer protocols, such as Bluetooth or IEEE 802.15.4/ZigBee. These nodes perform their regular computation and communication tasks according to their normal schedule (Pérez-Penichet et al., 2020; Duquennoy et al., 2015). The IoT nodes are either battery-powered or connected to mains power. We extend the sensing capabilities of the nodes with battery-free sensor tags (Pérez-Penichet et al., 2016, 2020), which require an additional schedule to coordinate carrier provisioning and tag interrogations. This schedule is appended to the IoT network’s normal schedule.

We base our system design on DeepGANTT and set to explore ML related aspects to design a scheduler with better and more robust generalization to larger networks.

4.1. System Description

RobustGANTT resides at the Edge/Cloud, and asynchronously receives requests by one or multiple IoT networks hosting battery-free sensor tags to compute schedules. Note that this is also true for any scheduler to tackle this problem due to the computational demands required in computing schedules. The interaction between RobustGANTT and the IoT network is depicted in Figure 1.

First, the IoT network collects the MAC and routing protocol information to build the network topology $G$ and the tag-to-host mapping $H_{t}$ . In our evaluation in Sec. 6, we use metrics from both TSCH (Duquennoy et al., 2017) and RPL (Winter, 2012), but the process is analogous for other physical layer and routing protocols. Upon detection of changes either in the network’s connectivity or in the tag-to-host mapping, the network issues a request to RobustGANTT for computing a new schedule. Next, the scheduler receives the network information $g$ and performs iterative node classification using a GNN model to compute the interrogation schedule timeslot by timeslot. Finally, RobustGANTT delivers the schedule back to the IoT network, where it is disseminated using existing network flooding mechanisms, such as Glossy (Ferrari et al., 2011).

At the core of RobustGANTT lies an attention-based GNN model to perform iterative one-shot node classification. In each iteration $j$ , the GNN model receives as input a node feature matrix $X_{j}\in\mathbb{R}^{N\times D}$ with $D=3$ features per node, and delivers as output the scheduling timeslot $s_{j}\in\mathbb{R}^{N}$ . The resulting $s_{j}$ corresponds to assigning each of the $N$ nodes to one of three possible classes $\{\mathtt{T}$ , $\mathtt{C}$ , $\mathtt{O}\}$ .

RobustGANTT keeps a cached representation of the topology $G$ and the tag-to-host mapping $H_{t}$ that is updated after each iteration. After computing the $j^{\text{th}}$ timeslot $s_{j}$ , the tags assigned to be interrogated are removed from the cached representation of the topology, and a new input feature matrix is generated $X_{j+1}$ to compute the next scheduling timeslot $s_{j+1}$ . Being a probabilistic model, RobustGANTT has a component for checking that $s_{j}$ complies with the scheduling constraints at each iteration. This process is repeated until there are no more tags in the cached topology.

4.2. Scheduling Approach

Input Node Feature Matrix. Upon receiving the IoT network information, RobustGANTT builds a graph representation of the topology and parses this information for input to the GNN model. The input node feature matrix to the GNN $X_{j}$ consists of $D=3$ features per node:

(1)

Hosted-Tags: the number of tags hosted by the node.
(2)

Node-ID: integer identifying the node in the graph.
(3)

Min. Tag-ID: integer that represents the minimum tag ID among tags hosted by the node.

Intuitively, Hosted-Tags is decisive for assigning carrier-generating nodes – the node hosting the greatest number of tags in the network should avoid providing unmodulated carriers. Thanks to the symmetry-breaking constraints (§3), including features 2 and 3 provides the scheduler with context on how to prioritize carrier-provider nodes, and with an order to interrogate the tags, respectively. In practice, network operators can exploit this by, e.g., prioritizing IoT nodes connected to mains power as carrier providers, or by prioritizing certain tags to be interrogated early in the schedule, simply by assigning them a lower node/tag-ID.

ML Model Architecture. Figure 5 depicts the system’s ML model. The node feature matrix is first passed through a node-wise embedding layer, followed by a concatenation and layer normalization operation. Subsequently, the hidden representation is passed through a stack of 12 GNN layers, each containing both a linear activation and self-attention GNN. We fix 12 as the number of layers due to its wide application in language modelling with both GPT and BERT (Devlin et al., 2018; Radford et al., 2019, 2018), and its success in learning-based schedulers (Perez-Ramirez et al., 2023). The linear layer is a fully-connected neural network that acts on each node intermediate feature vector independently, while the GNN uses a multi-head attention mechanism of $M$ heads for computing message passing operations (Veličković et al., 2018). The structure and skip connections of each GNN-Block is inspired by the Transformer architecture (Vaswani et al., 2017).

4.3. Model Training

We train RobustGANTT with optimal schedules from relatively small networks that are computed by the optimal scheduler. We then use RobustGANTT to compute schedules for much larger and previously-unseen networks without the need for the scheduler to be re-trained.

As loss function, we select the modified cross-entropy loss that includes both a scaling factor to give more importance to the carrier generator class $\mathtt{C}$ (Perez-Ramirez et al., 2023), and L2 weight regularization (LeCun et al., 1989; Krogh and Hertz, 1991). As optimizer, we use Adam with its default hyperparameters (Kingma and Lei Ba, 2015). We use learning rate decay by 2% every epoch, with an initial learning rate $\epsilon_{init}\!=\!10^{-3}$ . We early stop model training after 25 consecutive epochs without minimization of the validation loss, and save the best performing model based on the carrier-class F1-score (Perez-Ramirez et al., 2023).

5. System GNN Model Design

We explore ML-related design aspects that provide RobustGANTT with strong and consistent generalization to larger, previously unseen, IoT networks. We believe our findings not only advance carrier scheduling, but also provide insights on designing learning-based schedulers for IoT networks.

Setup. We undergo a structured and sequential process in three stages, selecting the best configuration in each stage before transitioning to the next one: i) learning rate warmup, ii) local and global PE, and iii) increasing the number of attention heads. For each stage, we train multiple models according to Sec. 4.3 using the training dataset from Sec. 5.1.1, while fixing the hyperparameter configuration. To mitigate the effect of randomness, we fix the random seeds from software libraries at the application level: Python, PyTorch, and NumPy (Paszke et al., 2019; Foundation, 2024). Since the best performance for a given model configuration may greatly diverge from its average (see Figure 2), we train multiple, but identical, ML models for each configuration to assess their performance consistency to larger topologies. However, we are limited to training 4-8 models per configuration, since the training and subsequent deployment to larger graphs takes between 10-45 hours for a single model, depending on its configuration. Our analysis results in the training of over 50 ML scheduler models.

After training, we deploy the models to compute schedules for the generalization dataset – previously unseen topologies of larger size than those trained (see Sec. 5.1.2). No re-training is done at this stage. We report mean and percentile statistics across the runs for each model configuration, and select the best one based on the performance metrics from Sec. 5.2.

We highlight the following key findings:

•

Warm-up significantly contributes to computing complete and correct schedules for larger topologies.
•

Node degree PE allows for a good trade-off to assist in breaking graph symmetries with a low-overhead PE method.
•

12 attention heads consistently achieves good generalization performance to larger topologies.

5.1. Datasets

We train all models using the data fom Sec. 5.1.1. After training, their performance is compared on the dataset described in Sec. 5.1.2, on which the models are not trained.

5.1.1. Training Dataset

We use artificially generated problem instances (topologies and tag assignments) according to Perez-Ramirez et al. (Perez-Ramirez et al., 2023). The dataset contains 580000 problem instances with networks of 2-10 nodes and 1-14 tags that are randomly assigned. We use the optimal scheduler to obtain schedules for these problem instances. This implies using a CO to solve analytically the COP described in Sec. 3. We use 80%-20% training and validation data splits.

5.1.2. Generalization Dataset

Consists of larger and previously unseen topologies on which models are not trained. We select the best performing model configuration in this dataset when deciding the final ML model. We consider $200$ problem instances (network topologies and tag assignments) for every $(N,T)$ pair from the sets $N\in\{10,20,40,60,80,100\}$ nodes and $T\in\{40,80,160,240\}$ tags—i.e., $4800$ networks.

5.2. Performance Metrics

In this work, we are interested in the system-related aspects of RobustGANTT. Hence, we consider the following application-related performance metrics in ML model design.

$\mathbf{\Pi}$ —Correctly Computed Schedules. Given a set of IoT networks, $\Pi\in[0,100]\%$ represents the percentage of networks for which RobustGANTT produces a complete schedule – one that interrogates all sensor tags. Since RobustGANTT is a probabilistic model, we must account for cases in which the scheduler cannot produce all the required timeslots to query all sensor values in the network. If RobustGANTT fails to deliver all timeslots, even if it correctly delivered some of them, we consider it a failed schedule.

$\boldsymbol{\Delta}_{\mathbf{C}}$ —Carriers Saved. This metric directly relates to the energy and spectrum utilization of the network. It compares the total number of carrier generator slots $C$ from the schedule generated by the TagAlong heuristic $C_{ta}$ against the total number of carrier slots from the schedule computed by a learning-based scheduler $C_{nn}$ as: $\boldsymbol{\Delta}_{\mathbf{C}}=C_{nn}-C_{ta}$ .

5.3. Results

We describe the considered ML design aspects and their influence in our system’s generalization to larger topologies.

5.3.1. Influence of Warmup

Based on the findings from Ma et al. (Ma and Yarats, 2021), we evaluate the influence of learning rate warm-up on the optimization. It involves starting training with a small learning rate $\tilde{\epsilon}\ll\epsilon_{init}$ and gradually increase $\tilde{\epsilon}$ until reaching the initial learning rate $\epsilon_{init}$ . Intuitively, warmup provides more stability by regularizing the magnitude of parameter updates in early stages of training for momentum-based optimizers. Since such optimizers perform the parameter updates considering past statistical moments of the gradients, warmup allows the optimizer to calculate moments’ statistics before performing big jumps in the parameter update, which reduces variance of the update steps (Ma and Yarats, 2021).

We choose an untunned linear warmup schedule (Ma and Yarats, 2021) due to its simplicity and competitive performance. It requires $2*(1-\beta_{2})^{-1}$ steps so that $\tilde{\epsilon}\approx\epsilon_{init}$ , where $\beta_{2}=0.999$ is Adam’s second moment decay rate (Kingma and Lei Ba, 2015). The warm-up update of the learning rate is performed as: $\tilde{\epsilon}=\epsilon_{init}*\min\left(1,\frac{1-\beta_{2}}{2}*i\right)% \text{,}$ where $i$ is the mini-batch iteration. We independently train two sets of eight identical models, with and without warmup.

Warmup contributes to higher $\Pi$ values. Without warmup, Figure LABEL:subfig:nowarmup shows how the performance from the percentage of correctly computed schedules $\Pi$ deteriorates (also with increasing std-err) as the topology size increases. Including warmup significantly mitigated the variance in $\Pi$ for the larger topologies, as shown in Figure LABEL:subfig:2H. Moreover, it improves Carriers Saved $\boldsymbol{\Delta}_{C}$ values for the 25th, mean, 75th and 95th percentiles in topologies of up to 60 nodes. However, the average performance of $\boldsymbol{\Delta}_{C}$ across the multiple runs is similar for 100 node topologies, with only marginal improvements when including warmup. Moreover, including warm-up also reduced the standard error of all metrics (vertical lines), regardless of the topology size.

5.3.2. Influence of Positional Encoding

We investigate augmenting the input node features to the GNN with PEs to aid the model in breaking symmetries. Based on the results from Sec. 5.3.1, all models are trained with warmup. We consider three types of PEs considering both local and global graph properties. We train four models for each PE configuration.

Node Degree PE

We include one additional vector in the input node feature matrix that corresponds to the normalized node degree vector. Given the adjacency matrix $\mathbf{A}\in\mathbb{R}^{N\times N}$ of an undirected graph $G=\langle V_{a},E\rangle$ with $|V_{a}|=N$ nodes, where $\mathbf{A}[u,v]=1$ if $\langle u,v\rangle\in E$ and $A[u,v]=0$ otherwise, the degree of node $u$ is $\tilde{d}_{u}=\sum_{v\in V_{a}}\textbf{A}[u,v]$ (Hamilton, 2020b). We append the node degree vector $\tilde{d}=[\tilde{d}_{u}/\tilde{d}_{max}]_{u\in V_{a}}^{\top}\in\mathbb{R}^{N}$ as a column to the input node feature matrix $X\in\mathbb{R}^{N\times D}$ , where $\tilde{d}_{max}$ is the degree with highest magnitude. Including node degree PE results in $D=3+1=4$ input features per node.

Eigenvalues of Graph Laplacian (Eigvals PE)

We investigate using global properties of the graph as PE. We define the symmetric normalized graph Laplacian as $\mathbf{L}=I-\mathbf{D}^{-\frac{1}{2}}\mathbf{A}\mathbf{D}^{-\frac{1}{2}}$ , where $\mathbf{D}=\operatorname{diag}(\tilde{d})$ is the diagonal node degree matrix and $I$ is the identity matrix. We perform EVD of $\mathbf{L}$ resulting in $\mathbf{L}=\mathbf{V}\mathbf{\Lambda}\mathbf{V}^{-1}$ , where $\mathbf{\Lambda}\in\mathbb{R}^{N\times N}$ is a diagonal matrix containing the eigenvalues $\lambda_{i}\in\mathbb{R}$ of $\mathbf{L}$ , and $\mathbf{V}\in\mathbb{R}^{N\times N}$ is a matrix containing the eigenvectors $\mathbf{v}_{i}\in\mathbb{R}^{N}$ for $i\in V_{a}$ . We first augment the node feature matrix with a vector that contains the eigenvalues of the graph $\tilde{\Lambda}=[\lambda_{i}]_{i\in V_{a}}\in\mathbb{R}^{N}$ . We normalize $\hat{\Lambda}$ using the highest eigenvalue. Including Eigvals PE results in $D=3+1=4$ input features per node.

Stable and Expressive Positional Encodings (SPE PE)

While eigenvalues provide an indication of magnitude and transformation strength, eigenvectors contain richer geometric information in the directional properties. Eigenvectors are not unique, and suffer from sign invariance—i.e., if $\mathbf{v}$ is an eigenvector, so is $-\mathbf{v}$ . Geometrically, this means that they are nontrivial solutions for finding the EVD: any orthogonal change of basis of $\mathbf{V}$ yields the same Laplacian $\mathbf{L}$ (Kwak and Hong, 2004).

While early work introduces random eigenvector sign flipping during training to account for sign invariance (Dwivedi et al., 2023; Kreuzer et al., 2021), recent work explores learning the invariances that account for changes in the eigenspace basis $\mathbf{V}$ (Lim et al., 2023; Huang et al., 2024). The goal is to learn a permutation-invariant transformation of $\tilde{\Lambda}$ and $\mathbf{V}$ that accounts for their geometrical significance. We choose the Stable and Expressive PE (SPE) method presented by Huang et al. (Huang et al., 2024) due to its benefits over previous methods (Lim et al., 2023). We construct a PE matrix $\Gamma\in\mathbb{R}^{N\times Z}$ using the first $Z$ smallest Eigenvalues $\hat{\Lambda}=[\tilde{\Lambda}_{i}]_{i\in[0:Z]}\in\mathbb{R}^{Z}$ and Eigenvectors $\hat{\mathbf{V}}=[\mathbf{V}_{[:,i]}]_{i\in[0:Z]}\in\mathbb{R}^{N\times Z}$ as (Huang et al., 2024):

(5)

\Gamma=\rho\left(\hat{\mathbf{V}}\operatorname{diag}(\phi_{1}(\hat{\Lambda}))% \hat{\mathbf{V}}^{\top},\dots,\hat{\mathbf{V}}\operatorname{diag}(\phi_{m}(% \hat{\Lambda}))\hat{\mathbf{V}}^{\top}\right)\text{,}

where $\rho$ is a permutation invariant function and $\{\phi_{i}\}_{i=1}^{m}$ are $m$ independent linear transformations. We implement $\rho$ using a Graph Isomorphism Network (Xu et al., 2019) and $\phi$ with multi-layer perceptrons, using the same hyperparameters as Huang et al. (Huang et al., 2024). However, as we operate on a supervised setting, the choice of $Z$ is determined by the training graph sizes (topologies up to 10 nodes). Hence, we choose the first $Z=9$ eigenvalues larger than $0$ and their eigenvectors. Including SPE PE results in $D=3+Z=12$ input features per node.

Node degree PE provides the best trade-off between symmetry-breaking and computational overhead. Figure 7 depicts the model’s performance for different PE methods. While SPE achieved the best carrier saved $\Delta_{C}$ results in topologies up to 20 nodes (Figure LABEL:subfig:spe-PE), its $\Pi$ value significantly reduces for an increasing number of sensor tags. Moreover, it is completely unable to compute schedules for topologies of 60 and 100 nodes ( $\Pi=0$ ). Moreover, Figures LABEL:subfig:eivals-PE and LABEL:subfig:degree-PE for Eigvals PE and node degree PE show similar profiles. Notably, node degree PE achieves higher 75th and 95th percentile values for both 60 node and 100 node topologies. Additionally, node degree PE does not incur in the expensive computation overhead of estimating the EVD. E.g., it takes on average 450 ms extra to compute the EVD on a multi-core processor for 100 node topologies. Hence, node degree PE represents a good trade-off to improve the performance, while avoiding the EVD computation overhead.

5.3.3. Influence of Attention Heads

We include warmup and node degree PE based on the results from the previous sections. We now evaluate the influence of model complexity by increasing the number of attention heads $M$ in each of the GNN layers. We train eight models for each number-of-heads value in $M=\{4,8,12\}$ , and four 16-head models due to their long runtimes (+40 hours per model). We report average and standard error from performance metrics’ statistics.

12 heads crucial for robust generalization. Figure 8 shows the influence of increasing the number of attention heads in the model. As observed from Figure LABEL:subfig:4H- LABEL:subfig:12H, increasing the attention heads implies an increase in the carriers saved $\Delta_{C}$ performance for all percentiles. While the models from 8 heads and 12 heads exhibit similar performance, the overall stability of 12 heads is better for both percentage of correctly computed schedules $\Pi$ and for pushing the 25th percentile of $\Delta_{C}$ above 0. Increasing the attention heads beyond 12 to 16 yields no benefit. On the contrary: Figure LABEL:subfig:16H shows how the mean and 25th percentile of $\Delta_{C}$ fall below 0.

5.4. Final GNN Model

Our analysis from Sec. 5.3 results in a RobustGANTT model of 12 attention heads with node degree PE that is trained with warmup. It exhibits strong generalization to larger topologies, and its performance is consistent across independently trained models. We train RobustGANTT’s model according to Sec. 4.3 using the dataset described in Sec. 5.1.1. Training the model with a mini-batch size of 1024 requires 22 hours on an NVIDIA A100 GPU.

6. Evaluation

In this section, we compare RobustGANTT’s performance against the DeepGANTT scheduler in terms of resource savings over the TagAlong heuristic (Pérez-Penichet et al., 2020). We use both simulated topologies and a real-life IoT network. The design choice of GNNs allows our scheduler to generalize to larger, previously unseen network topologies without retraining. Hence, no further RobustGANTT’s ML model training is performed for these experiments. We highlight the following key findings:

•

RobustGANTT far surpasses the generalization capabilities of DeepGANTT. It scales to 1000 node topologies, while increasingly saving resources compared to TagAlong without sacrificing latency (Figure 9).
•

Both RobustGANTT and DeepGANTT achieve resource savings against TagAlong for the real-life IoT network. However, our scheduler achieves up to $1.9\times$ more energy savings, up to a $5.7\times$ reduction in the schedule’s latency, and up to $2\times$ reduction in 95th percentile runtime compared to DeepGANTT (Figures 11 and 12).
•

For the real-life IoT network topology, RobustGANTT achieves an average runtime of 540ms, which allows it to react fast to changing network conditions.

Implementation. We implement RobustGANTT as Function as a Service with $\sim 2600$ lines of code in a server with an A100 NVIDIA GPU. In general, RobustGANTT’s ML model has $\sim 295$ million parameters, requiring $\sim 1.6$ GB GPU memory in total using single-point precision, which allows deploying RobustGANTT in lower-end GPUs.

6.1. Scalability to 1000-node topologies

We evaluate RobustGANTT’s generalization to larger topologies, far exceeding DeepGANTT’s capabilities, while still achieving significant resource savings against TagAlong.

6.1.1. Dataset

We consider 200 problem instances (simulated IoT networks with random sensor tag assignments) for $(N,T)$ pairs from the sets $N\in\{100,500,1000\}$ nodes and $T\in\{250,500,1000,1500\}$ sensor tags.

6.1.2. Performance metrics

Besides $\Pi$ and $\Delta_{C}$ (see Sec. 5.2), we consider a metric related to the schedule length $L$ .

$\boldsymbol{\Delta}_{L}$ —Timeslots Saved. Relates to the latency of querying all sensor tag values in the network. Given a network topology, it compares the length of the schedule produced by TagAlong $L_{ta}$ against the length from the schedule produced by a learning-based scheduler $L_{nn}$ as: $\Delta_{L}=L_{ta}-L_{nn}$ .

6.1.3. Results

Figure LABEL:subfig:robust-vs-deep-CARR depicts the carriers saved $\Delta_{C}$ of both RobustGANTT and DeepGANTT against the TagAlong heuristic. RobustGANTT consistently achieves higher savings with both an increase in the number of nodes and number of sensor tags. Notably, even its 1st percentile lies above zero, i.e., for at least $99\%$ of the cases RobustGANTT achieves savings against TagAlong. Our scheduler achieves on average $12\%$ and up to a $1.4\times$ reduction in the number of carriers compared to TagAlong. Sec. 6.2.2 demonstrates how $\Delta_{C}$ directly translates to energy savings. The DeepGANTT scheduler is, however, only marginally better than TagAlong for 100-node topologies, and increasingly worse for larger networks. Additionally, DeepGANTT’s correctly computed schedules $\Pi$ decreases for 100 nodes, while RobustGANTT’s values are consistently $\Pi=100\%$ .

RobustGANTT computes schedules requiring roughly the same number of timeslots as TagAlong ( $\Delta_{L}\approx 0$ ) as shown in Figure LABEL:subfig:robust-vs-deep-LEN. Hence, our scheduler achieves significant savings in energy and spectrum without a significant reduction in the latency to query all sensor tags. Across all topologies considered, RobustGANTT requires on average 1.12 additional timeslots compared to TagAlong. In contrast, DeepGANTT requires on average 20 additional timeslots, and achieves no resource savings for such large topologies. Moreover, Figure LABEL:subfig:robust-vs-deep-LEN shows how DeepGANTT requires increasingly more timeslots than our scheduler.

6.2. Performance for a Real IoT Network

We now evaluate RobustGANTT’s ability to compute schedules for a real-life IoT network.

6.2.1. Testbed

Our experimental setup utilizes an indoor IoT testbed composed of 23 Zolertia Firefly devices running Contiki-NG (Oikonomou et al., 2022) (see Figure 10). These devices employ the RPL routing protocol (Winter, 2012) and communicate via IPv6 over IEEE 802.15.4 TSCH (Duquennoy et al., 2017). Link connectivity data between IoT nodes was gathered at 30-minute intervals across a four-day span, assuming a link exists between node pairs when the signal strength reaches at least $-75\text{\,}\mathrm{d}$ Bm, suitable for carrier provisioning. Additionally, we enhanced each network topology by assigning simulated tags randomly to achieve various densities, defined as $N/T=23/T$ , for $T\in\{46,115,230,460\}$ , with each density configuration tested 100 times.

6.2.2. Performance metrics

Besides $\Pi$ , $\Delta_{C}$ (see Sec. 5.2), and $\Delta_{L}$ (see Sec. 6.1.2) we explicitly evaluate energy consumption. Moreover, percentages for $\Delta_{C}$ and $\Delta_{L}$ imply normalization with respect to the heuristic values, e.g., $\Delta_{C\%}=\Delta_{C}/C_{ta}$ .

$\boldsymbol{\Delta}_{\mathbf{E\%}}$ —Energy Saved. We consider the average energy required for querying the tag’s sensor values $\tilde{E}$ . It corresponds to the total energy required to interrogate all sensor tags $E_{tot}$ divided by the number of tags $T$ in the network (Perez-Ramirez et al., 2023):

(6)

\displaystyle\tilde{E}\!\!\!

\displaystyle\!\!\!=\!\!\!

\displaystyle\!\!\!\frac{E_{tot}}{T}\!=\!P_{tx}t_{tx}\!+\!P_{rx}\left(\frac{C}% {T}t_{req}\!+\!t_{rx}\right)\!+\!P_{tx}\left(t_{req}\!+\!\frac{C}{T}t_{cg}% \right)\!\text{,}

where both $P_{rx}$ and $P_{rx}$ correspond to the radio power at transmit and receive mode, respectively. $t_{rx}$ , $t_{tx}$ , $t_{req}$ , and $t_{cg}$ are defined as in Figure 3. Calculating a percentage of energy saved against the TagAlong scheduler corresponds to $\Delta_{E\%}=(\tilde{E}_{nn}-\tilde{E}_{ta})/\tilde{E}_{ta}$ . Given a schedule, all values in Eq. 6 except $C$ are constant for calculating both $\tilde{E}_{ta}$ and $\tilde{E}_{nn}$ . Hence, lower values of $C$ directly translates to energy savings.

We adopt $P_{rx}=72mW$ , $P_{tx}=102mW$ based on the Firefly’s reference values. Moreover, we assume $t_{req}=t_{tx}=128\mu s$ , $t_{rx}=256\mu s$ , and $t_{cg}=15.75ms$ (Pérez-Penichet et al., 2020; Perez-Ramirez et al., 2023).

6.2.3. Results

For the real-life IoT network, RobustGANTT achieves on average $14\%$ and up to $53\%$ energy savings $\Delta_{E\%}$ (i.e., up to $2\times$ less energy) compared to TagAlong. Even for the highest tag densities $N/T$ considered, our scheduler achieves $44\%$ energy savings. Such savings represent up to $1.9\times$ the savings achieved by DeepGANTT, as shown in Figure LABEL:subfig:pi_perc_energy. Figures LABEL:subfig:pi_diff_carr and LABEL:subfig:pi_perc_energy demonstrate the equivalence between $\Delta_{C\%}$ and $\Delta_{E\%}$ : a reduction in the number of carriers directly translates to energy savings.

In terms of latency to query all sensor values, Figure LABEL:subfig:pi_timeslots_saved shows how DeepGANTT always requires on average more timeslots than TagAlong. In contrast, our scheduler requires on average as many timeslots as TagAlong for tag densities 2.0 and 5.0, and 10. However, it requires on average 8.8 more timeslots for tag density 20. Figure 12 shows the runtime distributions of both schedulers across tag densities. Their profiles are those of heavy-tailed distributions. Both schedulers require roughly the same average runtimes across tag densities. In particular, RobustGANTT’s average runtimes are 120 ms, 260 ms, 540 ms, and 1.2 sec for the respective tag densities 2, 5, 10, and 20. However, Figure 12 demonstrates how RobustGANTT reduces the runtime’s 95th percentile up to a factor of $2\times$ compared to DeepGANTT.

While the real-life network’s size is within DeepGANTT’s proven generalization capabilities (Perez-Ramirez et al., 2023), we demonstrate that our scheduler requires on average up to $1.9\times$ less carriers (energy savings), up to $5.7\times$ less timeslots (reduction in latency to query all sensor tags), and up to a $2\times$ reduction in 95th percentile runtime to compute the schedule.

7. Discussion

RobustGANTT is a scheduler that far surpasses the generalization capabilities of existing learning-based systems. Our system can not only processes much larger IoT network topologies than previously possible, but also delivers more resource-efficient schedules.

Large-scale IoT networks. Our system is designed to reduce energy consumption in IoT networks. This is of paramount importance not only for sustainability reasons, but also because such networks are typically energy constrained. Moreover, ensuring energy savings without increasing querying latency is highly relevant, specially for dense network deployments, since it reduces spectrum utilization.

Serving IoT networks in parallel. The NP-Hard nature of generating resource-efficient schedules requires deploying RobustGANTT at the Edge/Cloud, which is also true for other schedulers (Pérez-Penichet et al., 2020; Perez-Ramirez et al., 2023). However, one does not require deploying a RobustGANTT scheduler for every IoT network. Rather, one RobustGANTT instance can process requests from multiple IoT networks either in sequence, or by batching those requests and computing their schedules in parallel. However, the number of requests processed in parallel is limited by the total the amount of GPU memory available.

Latency to query all sensor values. RobustGANTT’s schedules require roughly the same number of timeslots as those produced by TagAlong (see Figures LABEL:subfig:robust-vs-deep-LEN and LABEL:subfig:pi_timeslots_saved). This implies that our system does not sacrifice querying latency to achieve its significant energy savings. However, there are cases in which TagAlong schedules are shorter than those from RobustGANTT. We attribute this to the optimization objective (Eq. 2), which prioritizes reducing the number of carriers, since we are most interested in energy savings. Moreover, we do not envision backscatter sensor tags to assist in time-critical settings, but rather in energy-efficient sensing and monitoring.

Dynamic Environments. Our system exhibits average runtimes of hundreds of milliseconds, allowing it to react fast to connectivity changes in the IoT devices. Similarly, adding or removing IoT nodes would trigger a new request to compute a schedule. However, detecting the addition or removal of sensor tags to the IoT nodes is a general problem for the type of backscatter networks considered, and lies outside our scope.

8. Conclusion

We present RobustGANTT, a novel system that leverages the latest advancements in GNNs and ML to schedule communications in an IoT network augmented with backscatter sensor tags. We exploit our system design choice of using GNN to train our scheduler using optimal schedules from small networks of up to 10 nodes, and demonstrate that RobustGANTT can seamlessly generalize without re-training to networks of up to 1000 nodes. Our scheduler surpasses the generalization capabilities of current learning-based systems, while achieving significant savings in energy usage, spectrum utilization, and compute runtime. RobustGANTT facilitates the large-scale integration of IoT networks with sensor tags, and significantly reduces their operational expenses by efficiently utilizing their resources.

Acknowledgements.

This work was financially supported by the Swedish Foundation for Strategic Research (SSF). We acknowledge the usage of High-Performance Computing resources under the EuroHPC JU project No. EHPC-DEV-2023D08-049. We also thank M.Sc. Peder Hårderup for initial concept prototyping of the node degree PE during his M.Sc. thesis at RISE.

References

(1)
Ahmad et al. (2021) Abeer Ahmad, Xiao Sha, Milutin Stanaćević, Akshay Athalye, Petar M Djurić, and Samir R Das. 2021. Enabling passive backscatter tag localization without active receivers. In Proc. of the 19th ACM Conference on Embedded Networked Sensor Systems (SenSys). 178–191.
Belkin and Niyogi (2003) Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15, 6 (2003), 1373–1396.
Bengio et al. (2021) Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. 2021. Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res. 290, 2 (apr 2021), 405–421. https://doi.org/10.1016/j.ejor.2020.07.063 arXiv:1811.06128
Bluetooth SIG (2021) Bluetooth SIG. 2021. Bluetooth Core Specification 5.3.
Dai et al. (2017) Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, and Le Song. 2017. Learning Combinatorial Optimization Algorithms over Graphs. In Proc. Advances Neural Inf. Process. Syst. (NIPS), Vol. 2017-Decem. Neural information processing systems foundation, 6349–6359. arXiv:1704.01665
Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
Duquennoy et al. (2015) Simon Duquennoy, Beshr Al Nahas, Olaf Landsiedel, and Thomas Watteyne. 2015. Orchestra: Robust mesh networks through autonomously scheduled TSCH. In Proceedings of the 13th ACM conference on embedded networked sensor systems. 337–350.
Duquennoy et al. (2017) Simon Duquennoy, Atis Elsts, Beshr Al Nahas, and George Oikonomou. 2017. TSCH and 6TiSCH for Contiki: Challenges, Design and Evaluation. In 2017 13th International Conference on Distributed Computing in Sensor Systems (DCOSS). 11–18. https://doi.org/10.1109/DCOSS.2017.29
Dwivedi et al. (2023) Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2023. Benchmarking Graph Neural Networks. Journal of Machine Learning Research 24, 43 (2023), 1–48. http://jmlr.org/papers/v24/22-0567.html
Ensworth and Reynolds (2015) Joshua Ensworth and Matthew S. Reynolds. 2015. Every smart phone is a backscatter reader: Modulated backscatter compatibility with Bluetooth 4.0 Low Energy (BLE) devices. In Proc. Ann. Conf. RFID. IEEE.
Ferrari et al. (2011) Federico Ferrari, Marco Zimmerling, Lothar Thiele, and Olga Saukh. 2011. Efficient network flooding and time synchronization with Glossy. In Proc. 10th ACM/IEEE Int. Conf. Information Processing in Sensor Networks. 73–84.
Foundation (2024) The PyTorch Foundation. 2024. PyTorch Reproducibility. https://pytorch.org/docs/stable/notes/randomness.html
Geissdoerfer and Zimmerling (2021) Kai Geissdoerfer and Marco Zimmerling. 2021. Bootstrapping Battery-free Wireless Networks: Efficient Neighbor Discovery and Synchronization in the Face of Intermittency. In (NSDI’21). 439–455.
Gilmer et al. (2017) Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural Message Passing for Quantum Chemistry. Proc. 34th Int. Conf. Mach. Learn. (ICML) 3 (apr 2017), 2053–2070. arXiv:1704.01212
Guo et al. (2020) Xiuzhen Guo, Longfei Shangguan, Yuan He, Jia Zhang, Haotian Jiang, Awais Ahmad Siddiqi, and Yunhao Liu. 2020. Aloba: Rethinking ON-OFF keying modulation for ambient LoRa backscatter. In Proceedings of the 18th conference on embedded networked sensor systems. 192–204.
Hamilton (2020a) William L Hamilton. 2020a. Graph representation learning. Vol. 14. Morgan & Claypool Publishers.
Hamilton (2020b) William L Hamilton. 2020b. Graph representation learning. Morgan & Claypool Publishers.
Hamilton et al. (2017) William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proc. Advances Neural Inf. Process. Syst. (NIPS), Vol. 2017-Decem. Neural information processing systems foundation, 1025–1035.
Hamouda et al. (2011) Essia Hamouda, Nathalie Mitton, and David Simplot-Ryl. 2011. Reader Anti-collision in dense RFID networks with mobile tags. In 2011 IEEE International Conference on RFID-Technologies and Applications. 327–334. https://doi.org/10.1109/RFID-TA.2011.6068657
Huang et al. (2024) Yinan Huang, William Lu, Joshua Robinson, Yu Yang, Muhan Zhang, Stefanie Jegelka, and Pan Li. 2024. On the Stability of Expressive Positional Encodings for Graph Neural Networks. In Proc. 12th International Conference on Learning Representations (ICLR’24). https://openreview.net/forum?id=xAqcJ9XoTf
IEEE (2016) IEEE. 2016. IEEE Standard for Low-Rate Wireless Networks –Amendment 2: Ultra-Low Power Physical Layer.
Iyer et al. (2016) Vikram Iyer et al. 2016. Inter-Technology Backscatter: Towards Internet Connectivity for Implanted Devices. ACM, 356–369. https://doi.org/10.1145/2934872.2934894
Jeon et al. (2022) Wonseok Jeon, Mukul Gagrani, Burak Bartan, Weiliang Will Zeng, Harris Teague, Piero Zappi, and Christopher Lott. 2022. Neural DAG scheduling via one-shot priority sampling. In The Eleventh International Conference on Learning Representations.
Karimi et al. (2017) Y. Karimi, A. Athalye, S. R. Das, P. M. Djurić, and M. Stanaćević. 2017. Design of a backscatter-based Tag-to-Tag system. In 2017 IEEE International Conference on RFID (IEEE RFID). 6–12. https://doi.org/10.1109/RFID.2017.7945579
Katanbaf et al. (2021) Mohamad Katanbaf, Ali Saffari, and Joshua R Smith. 2021. Multiscatter: Multistatic backscatter networking for battery-free sensors. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. 69–83.
Kellogg et al. (2014) Bryce Kellogg et al. 2014. Wi-Fi Backscatter: Internet Connectivity for RF-powered Devices. In Proc. Special Interest Group Data Commun. (SIGCOMM). ACM, New York, NY, USA, 607–618. https://doi.org/10.1145/2619239.2626319
Kellogg et al. (2016) Bryce Kellogg et al. 2016. Passive Wi-Fi: Bringing Low Power to Wi-Fi Transmissions. In Proc. Symp. Networked Syst. Des. Implementation (NSDI). NSDI, 151–164.
Kingma and Lei Ba (2015) Diederik P Kingma and Jimmy Lei Ba. 2015. Adam: A Method For Stochastic Optimization. In Proc. Int. Conf. Learn. Representations (ICLR). arXiv:1412.6980v9
Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proc. 5th Int. Conf. Learn. Representations (ICLR). ICLR. arXiv:1609.02907
Kreuzer et al. (2021) Devin Kreuzer, Dominique Beaini, Will Hamilton, Vincent Létourneau, and Prudencio Tossou. 2021. Rethinking Graph Transformers with Spectral Attention. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 21618–21629. https://proceedings.neurips.cc/paper_files/paper/2021/file/b4fd1d2cb085390fbbadae65e07876a7-Paper.pdf
Krogh and Hertz (1991) Anders Krogh and John Hertz. 1991. A simple weight decay can improve generalization. Advances in neural information processing systems 4 (1991).
Kwak and Hong (2004) Jin Ho Kwak and Sungpyo Hong. 2004. Linear algebra. Springer Science & Business Media.
LeCun et al. (1989) Yann LeCun, John Denker, and Sara Solla. 1989. Optimal brain damage. Advances in neural information processing systems 2 (1989).
Li et al. (2018b) Yan Li, Zicheng Chi, Xin Liu, and Ting Zhu. 2018b. Passive-zigbee: Enabling zigbee communication in iot networks with 1000x+ less power consumption. In Proceedings of the 16th ACM conference on embedded networked sensor systems. 159–171.
Li et al. (2018a) Zhuwen Li, Qifeng Chen, and Vladlen Koltun. 2018a. Combinatorial optimization with graph convolutional networks and guided tree search. In Proc. Advances in Neural Inf. Process. Syst. (NeurIPS). 539–548.
Lim et al. (2023) Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, and Stefanie Jegelka. 2023. Sign and basis invariant networks for spectral graph representation learning. (2023).
Ma and Yarats (2021) Jerry Ma and Denis Yarats. 2021. On the adequacy of untuned warmup for adaptive optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8828–8836.
Majid et al. (2019) A. Y. Majid, M. Jansen, G. O. Delgado, K. S. Yildirim, and P. Pawełłzak. 2019. Multi-hop Backscatter Tag-to-Tag Networks. In Proc. Int. Conf. Comput. Commun. (INFOCOM). IEEE, 721–729. https://doi.org/10.1109/INFOCOM.2019.8737551
Manchanda et al. (2020) Sahil Manchanda, Akash Mittal, Anuj Dhawan, Sourav Medya, Sayan Ranu, and Ambuj Singh. 2020. Learning Heuristics over Large Graphs via Deep Reinforcement Learning. In Proc. 34th Conf. Neural Inf. Process. Syst. (NIPS). arXiv:1903.03332
Mao et al. (2019) Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM special interest group on data communication. 270–288.
Nakkiran et al. (2020) Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. 2020. Deep Double Descent: Where Bigger Models and More Data Hurt. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=B1g5sA4twr
Oikonomou et al. (2022) George Oikonomou, Simon Duquennoy, Atis Elsts, Joakim Eriksson, Yasuyuki Tanaka, and Nicolas Tsiftes. 2022. The Contiki-NG open source operating system for next generation IoT devices. SoftwareX 18 (2022), 101089. https://doi.org/10.1016/j.softx.2022.101089
Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, and et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proc. Advances Neural Inf. Process. Syst., H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035.
Pérez-Penichet et al. (2016) Carlos Pérez-Penichet, Frederik Hermans, Ambuj Varshney, and Thiemo Voigt. 2016. Augmenting IoT networks with backscatter-enabled passive sensor tags. In Proc. Annu. Int. Conf. Mobile Comput. Netw. (MOBICOM). ACM, 23–27. https://doi.org/10.1145/2980115.2980132
Pérez-Penichet et al. (2020) Carlos Pérez-Penichet, Dilushi Piumwardane, Christian Rohner, and Thiemo Voigt. 2020. A Fast Carrier Scheduling Algorithm for Battery-free Sensor Tags in Commodity Wireless Networks. In Proc. Int. Conf. Comput. Commun. (INFOCOM). IEEE, 994–1003. https://doi.org/10.1109/infocom41043.2020.9155241
Perez-Ramirez et al. (2023) Daniel F. Perez-Ramirez, Carlos Pérez-Penichet, Nicolas Tsiftes, Thiemo Voigt, Dejan Kostić, and Magnus Boman. 2023. DeepGANTT: A Scalable Deep Learning Scheduler for Backscatter Networks. In Proceedings of the 22nd International Conference on Information Processing in Sensor Networks (San Antonio, TX, USA) (IPSN ’23). Association for Computing Machinery, New York, NY, USA, 163–176. https://doi.org/10.1145/3583120.3586957
Pérez-Penichet et al. (2020) Carlos Pérez-Penichet, Dilushi Piumwardane, Christian Rohner, and Thiemo Voigt. 2020. TagAlong: Efficient Integration of Battery-Free Sensor Tags in Standard Wireless Networks. In Proc. 19th ACM/IEEE Int. Conf. Inf. Process. Sensor Netw. (IPSN). Sydney, Australia. https://doi.org/10.1109/IPSN48710.2020.00020
Radford et al. (2018) Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).
Radford et al. (2019) Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
Rampášek et al. (2022) Ladislav Rampášek, Michael Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. 2022. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems 35 (2022), 14501–14515.
Scarselli et al. (2009) Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Trans. Neural Netw. 20, 1 (jan 2009), 61–80. https://doi.org/10.1109/TNN.2008.2005605
Talla et al. (2017) Vamsi Talla, Mehrdad Hessar, Bryce Kellogg, Ali Najafi, Joshua R. Smith, and Shyamnath Gollakota. 2017. LoRa Backscatter: Enabling The Vision of Ubiquitous Connectivity. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3, 105:1–105:24. https://doi.org/10.1145/3130970
Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. Advances Neural Inf. Process. Syst. (NIPS), Vol. 2017-Decem. NIPS, 5999–6009.
Veličković et al. (2018) Petar Veličković, Arantxa Casanova, Pietro Liò, Guillem Cucurull, Adriana Romero, and Yoshua Bengio. 2018. Graph attention networks. In Proc. 6th Int. Conf. Learn. Representations (ICLR). ICLR. arXiv:1710.10903
Vesselinova et al. (2020) Natalia Vesselinova, Rebecca Steinert, Daniel F Perez-Ramirez, and Magnus Boman. 2020. Learning combinatorial optimization on graphs: A survey with applications to networking. IEEE Access 8 (2020), 120388–120416.
Vinyals et al. (2015) Oriol Vinyals, Google Brain, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. In Proc. Advances Neural Inf. Process. Syst. (NIPS). 2692–2700.
Wang et al. (2022) Haorui Wang, Haoteng Yin, Muhan Zhang, and Pan Li. 2022. Equivariant and stable positional encoding for more powerful graph neural networks. In Proc. 10th International Conference on Learning Representations (ICLR’22).
Winter (2012) T. Winter. 2012. RPL: IPv6 Routing Protocol for Low-Power and Lossy Networks. Retrieved Oct. 2022 from https://www.rfc-editor.org/rfc/rfc6550
Wu et al. (2021) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2021. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. 32, 1 (2021), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386
Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How powerful are graph neural networks?. In Proc. Int. Conf. Learn. Representations (ICLR’19).
Yang et al. (2011) L. Yang, J. Han, Y. Qi, C. Wang, T. Gu, and Y. Liu. 2011. Season: Shelving interference and joint identification in large-scale RFID systems. In Proc. Int. Conf. Comput. Commun. (INFOCOM). IEEE, 3092–3100. https://doi.org/10.1109/INFCOM.2011.5935154
Yue et al. (2012) H. Yue, C. Zhang, M. Pan, Y. Fang, and S. Chen. 2012. A time-efficient information collection protocol for large-scale RFID systems. In Proc. Int. Conf. Comput. Commun. (INFOCOM). IEEE, 2158–2166. https://doi.org/10.1109/INFCOM.2012.6195599
Zhang et al. (2017) Pengyu Zhang, Colleen Josephson, Dinesh Bharadia, and Sachin Katti. 2017. Freerider: Backscatter communication using commodity radios. In Proceedings of the 13th international conference on emerging networking experiments and technologies. 389–401.