Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Robust Generalization of Graph Neural Networks for Carrier Scheduling

Daniel F. Perez-Ramirez 0000-0002-1322-4367 RISE Computer Science &
KTH Royal Institute of Technology
Sweden
daniel.perez@ri.se
Carlos Pérez-Penichet 0000-0002-1903-4679 RISE Computer ScienceSweden carlos.penichet@ri.se Nicolas Tsiftes 0000-0003-3139-2564 RISE Computer Science &
Digital Futures
Sweden
nicolas.tsiftes@ri.se
Dejan Kostić 0000-0002-1256-1070 KTH Royal Institute of Technology & RISE Computer ScienceSweden dmk@kth.se Magnus Boman 0000-0001-7949-1815 Karolinska Institutet & MedTechLabsSweden magnus.boman@ki.se  and  Thiemo Voigt 0000-0002-2586-8573 Uppsala University &
RISE Computer Science
Sweden
thiemo.voigt@angstrom.uu.se
Abstract.

Battery-free sensor tags are devices that leverage backscatter techniques to communicate with standard IoT devices, thereby augmenting a network’s sensing capabilities in a scalable way. For communicating, a sensor tag relies on an unmodulated carrier provided by a neighboring IoT device, with a schedule coordinating this provisioning across the network. Carrier scheduling—computing schedules to interrogate all sensor tags while minimizing energy, spectrum utilization, and latency—is an NP-Hard optimization problem. Recent work introduces learning-based schedulers that achieve resource savings over a carefully-crafted heuristic, generalizing to networks of up to 60 nodes. However, we find that their advantage diminishes in networks with hundreds of nodes, and degrades further in larger setups. This paper introduces RobustGANTT, a GNN-based scheduler that improves generalization (without re-training) to networks up to 1000 nodes (𝟏𝟎𝟎×\mathbf{100}\boldsymbol{\times}bold_100 bold_× training topology sizes). RobustGANTT not only achieves better and more consistent generalization, but also computes schedules requiring up to 𝟐×\mathbf{2}\boldsymbol{\times}bold_2 bold_× less resources than existing systems. Our scheduler exhibits average runtimes of hundreds of milliseconds, allowing it to react fast to changing network conditions. Our work not only improves resource utilization in large-scale backscatter networks, but also offers valuable insights in learning-based scheduling.

machine learning, scheduling, graph neural networks, sensor networks, backscatter networks
copyright: noneconference: ; Pre-Print; Under Review.ccs: Computer systems organization Sensor networksccs: Computing methodologies Planning and schedulingccs: Computing methodologies Machine learning

1. Introduction

Refer to caption
Figure 1. RobustGANTT generates schedules for backscatter networks using a GNN-based Transformer model. Step 1: collect MAC and routing protocol information. Step 2: build the IoT network’s graph representation, only including edges strong enough for carrier provisioning (e.g., -75 dBm). Step 3: generate the schedule through iterative one-shot node classification. Step 4: disseminate the schedule using existing network flooding mechanisms and append it to the IoT device’s normal schedule.

Recent advancements in backscatter communication enable the battery-free operation of sensor devices—termed sensor tags—that perform bi-directional communication with standard Internet of Things (IoT) devices (Kellogg et al., 2016; Ensworth and Reynolds, 2015; Kellogg et al., 2014; Talla et al., 2017; Iyer et al., 2016; Pérez-Penichet et al., 2016). Such sensor tags can be added to an existing network of Commercial Off-The-Shelf (COTS) IoT devices to augment the network’s sensing capabilities without requiring additional modifications to the IoT devices (Pérez-Penichet et al., 2020). However, communication between a sensor tag and its hosting IoT device requires the provision of an unmodulated carrier by a neighboring IoT device. A schedule coordinates this provisioning globally across the network to interrogate all sensor values. Figure 1 shows the high-level procedure of computing a schedule, and its structure. It consists of one or more timeslots s𝑠sitalic_s, each assigning one of three possible actions to the IoT devices in the network: provide unmodulated carrier 𝙲𝙲\mathtt{C}typewriter_C, interrogate one of its hosted tags 𝚃𝚃\mathtt{T}typewriter_T, or remain idle 𝙾𝙾\mathtt{O}typewriter_O.

Motivation. Battery-free sensor tags provide a scalable, cost- and energy-efficient way to augment the sensing capabilities of existing IoT networks (Kellogg et al., 2016; Pérez-Penichet et al., 2016, 2020). Their battery-free operation reduces electronic waste, and prevents extensive maintenance costs compared to battery-powered alternatives. It also allows placing sensors in hard-to-reach locations, such as medical implants, moving machinery, or embedded in physical infrastructure. The sensor tags may, e.g., prevent patients from undergoing surgery just to replace the battery of medical implants. Reducing the energy consumption of networks hosting sensor tags is of paramount importance not only for sustainability reasons, but also because such networks are often energy constrained.

Challenges. Carrier scheduling—computing a schedule to interrogate all sensor tags while minimizing energy, spectrum utilization, and latency—is, in general, an NP-Hard Combinatorial Optimization Problem (COP(Pérez-Penichet et al., 2020). It is similar to the traditional wireless link scheduling, but must consider additional constraints for tag interrogations and resource minimization (see Sec. 3). There are also several symmetries involved, both in permuting the timeslots and in selecting carrier generators (Perez-Ramirez et al., 2023). E.g., in Figure 1, exchanging the timeslots’ order alters neither the number of carriers required, nor the latency to query all tags. Also, for timeslot s3subscript𝑠3s_{3}italic_s start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, nodes v2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and v3subscript𝑣3v_{3}italic_v start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT are equally valid carrier providers for 𝚃5subscript𝚃5\mathtt{T}_{5}typewriter_T start_POSTSUBSCRIPT 5 end_POSTSUBSCRIPT. A scheduler must process variable input-output structures: networks of different sizes, and schedules of different lengths. It must also leverage the topological structure of the network to favor using one carrier for multiple concurrent tag interrogations (e.g., timeslots s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT in Figure 1). Additionally, it must compute schedules in a timely manner to react to connectivity changes of the IoT network.

Current Learning-based Schedulers exhibit Limited Scalability. In general, it is impractical to compute the analytically optimal schedule for IoT networks of hundreds of nodes and sensor tags. This implies running a Constraint Optimizer (CO) for several hours, most likely yielding an obsolete schedule due to changes in the network’s connectivity. Alternatively, one can use the TagAlong scheduler (Pérez-Penichet et al., 2020), a carefully-crafted heuristic with polynomial runtime. However, its performance is increasingly sub-optimal as the network size increases. Recent work introduces DeepGANTT, a scheduler that learns from optimal schedules of small networks (up to 10 nodes) and scales to networks of up to 60 nodes, while reducing the number of carriers compared to TagAlong (Perez-Ramirez et al., 2023). As we show in Sec. 6.2.2, reducing the number of carriers directly translates to energy savings.

Refer to caption
Figure 2. RobustGANTT has better and more consistent generalization to larger topologies (higher is better). We train eight identical scheduler models for both RobustGANTT (stars) and DeepGANTT (Perez-Ramirez et al., 2023) (squares), and compare them against the TagAlong heuristic (Pérez-Penichet et al., 2020) on larger, previously unseen topologies (without re-training). Isolated markers depict the best performing model, markers joined by lines represent the average, and vertical lines depict standard error.

However, DeepGANTT presents two main issues when further scaling the problem to graphs larger than 60 nodes, as depicted in Figure 2. We train eight independent models (in accordance to  (Perez-Ramirez et al., 2023)), while fixing the training data, hyperparameters and random seeds. DeepGANTT’s best model (isolated squares in Figure 2) is only marginally better than the heuristic for 100-node topologies. Moreover, while all eight models perform well on the training set, their generalization to larger networks significantly varies. The dashed line in Figure 2 shows how the average performance across the eight models is increasingly worse compared to TagAlong, even for 60-node topologies. We attribute this behavior both to the stochastic training procedure that leads most scheduler models to ”bad” local minima, and to the model’s inability to capture the full problem complexity.

Approach. In this paper, we leverage the latest advances in Graph Neural Networks and Machine Learning (ML) to present RobustGANTT, a scheduler for backscatter networks with strong and consistent generalization capabilities. To design RobustGANTT, we set out to explore ML-related training aspects, beginning with our own implementation of DeepGANTT. We train our scheduler with optimal schedules of networks of up to 10 nodes and 14 tags computed by a CO. The use of GNNs in our system design allows the scheduler to process variable input-output structures, and to process much larger, previously unseen topologies without the need for re-training. For designing our system, we investigate three aspects influencing the scheduler’s generalization as follows.

First, we assess the influence of warmup (Ma and Yarats, 2021), and prove it highly beneficial for the model’s ability to compute complete schedules for larger topologies. Furthermore, we explore incorporating Positional Encoding (PE) into the node features to enhance the GNN’s ability to handle symmetries in schedule computation. We find that the node-degree PE offers the best trade-off for achieving good generalization, while avoiding the computation overhead of Eigenvalue Decomposition (EVD)-based methods. Finally, we study the influence of increasing the number of attention heads of the GNN layers to capture more complex topological dependencies among the IoT nodes in the network (Nakkiran et al., 2020).

Contributions. Based on the former, we present RobustGANTT, a novel GNN-based scheduler that generalizes to networks of up to 1000 nodes (100×100\times100 × training sizes), far beyond the capabilities of current learning-based systems (Perez-Ramirez et al., 2023), while delivering schedules that require up to 2×2\times2 × less resources than those by the TagAlong heuristic (Pérez-Penichet et al., 2020). Our system exhibits polynomial time complexity, allowing it to react fast to changing network conditions. Figure 2 shows how our scheduler not only outperforms DeepGANTT, but also exhibits consistent generalization across the independently trained models.

To evaluate RobustGANTT’s capabilities on real-life IoT networks, we use it to compute schedules for a testbed with 23 nodes and varying number of sensor tags. Our system achieves 12% on average and up to 53% savings in energy and spectrum utilization compared to the TagAlong heuristic, which corresponds to up to 1.9×1.9\times1.9 × more savings over the DeepGANTT scheduler. Furthermore, thanks to the polynomial time complexity of the model, it exhibits average runtime of 540 ms for the real IoT network, and achieves up to 2×2\times2 × reduction in 95th percentile runtime against DeepGANTT. These characteristics enable RobustGANTT to compute schedules for IoT networks even in dynamic changing conditions.

We make the following specific contributions:

  • We present RobustGANTT, a learning-based scheduler that generalizes without re-training to networks of up to 1000 nodes (𝟏𝟎𝟎×\mathbf{100}\boldsymbol{\times}bold_100 bold_× larger than those used for training), far surpassing existing learning-based schedulers.

  • We use RobustGANTT to compute schedules for a real IoT network. Our model achieves 12% on average and up to 53% resource savings compared to TagAlong, which correspond to up to 1.9×1.9\times1.9 × more savings than those achieved by DeepGANTT.

  • RobustGANTT reduces runtime’s 95th percentile by up to 2×2\times2 × against DeepGANTT, which allows it to react faster to changing network conditions.

The paper is structured as follows. Sec. 2 provides background and related work. Sec. 3 formally describes the scheduling problem. Sec. 4 presents the RobustGANTT scheduler, and Sec. 5 describes our system’s GNN model design. Sec. 6 and Sec. 7 present the evaluation and discussion, respectively. Finally, Sec. 8 concludes the paper.

2. Background and Related Work

Our work draws upon backscatter communication, scheduling for backscatter networks and ML for scheduling.

2.1. Backscatter Communication

Several recent efforts advance backscatter communications and battery-free networks (Kellogg et al., 2016; Talla et al., 2017; Iyer et al., 2016; Ensworth and Reynolds, 2015; Kellogg et al., 2014; Zhang et al., 2017; Majid et al., 2019; Karimi et al., 2017; Geissdoerfer and Zimmerling, 2021; Ahmad et al., 2021; Li et al., 2018b; Guo et al., 2020). While some work focus on monostatic or multi-static backscatter configuration (Yang et al., 2011; Hamouda et al., 2011; Yue et al., 2012; Katanbaf et al., 2021), we focus on networks hosting sensor tags in the bistatic configuration (separated receiver from carrier generator).

Sensor tags leverage backscatter techniques to perform bidirectional communication with their hosting IoT node over standard physical layer protocols (Kellogg et al., 2014; Ensworth and Reynolds, 2015; Pérez-Penichet et al., 2016; Talla et al., 2017). They achieve their low-power operation by offloading the local oscillator to a neighboring IoT node (different from its host), which provides the tag with an unmodulated carrier (Pérez-Penichet et al., 2020). The COTS IoT nodes achieve this by, e.g., using their radio test mode (Pérez-Penichet et al., 2016). An IoT node in the network hosts zero or more sensor tags. Moreover, we assume that a sensor tag is hosted by exactly one IoT node responsible for querying the sensor readings. Sensor tags are located within decimeters range to its hosting IoT node, while the IoT nodes in the network are within meters from each other (see Figure 3).

Refer to caption
Figure 3. Backscatter communication between a tag 𝚃𝚃\mathtt{T}typewriter_T and its host v2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT when assisted by neighboring node v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT during a timeslot s𝑠sitalic_s.

Node-to-Tag Communication. The host-to-tag communication occurs over a time-slotted channel access mechanism due to its ease of integration of sensor tags and their widespread use in commodity devices. Both Bluetooth and Zigbee/IEEE 802.15.4 support this in their standards (Bluetooth SIG, 2021; IEEE, 2016). Figure 3 describes the communication between a tag 𝚃𝚃\mathtt{T}typewriter_T and its host v2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, when assisted by a neighboring carrier provider IoT node v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. trxsubscript𝑡𝑟𝑥t_{rx}italic_t start_POSTSUBSCRIPT italic_r italic_x end_POSTSUBSCRIPT and ttxsubscript𝑡𝑡𝑥t_{tx}italic_t start_POSTSUBSCRIPT italic_t italic_x end_POSTSUBSCRIPT are the times for the sensor tag to receive the request-to-transmit from its host, and for transmitting the sensor value back, respectively. tcgsubscript𝑡𝑐𝑔t_{cg}italic_t start_POSTSUBSCRIPT italic_c italic_g end_POSTSUBSCRIPT is the time spent in carrier provisioning for tag-to-host communication. The timeslot is long enough to complete one request-response cycle between a node and a tag—e.g., two consecutive Time-Slotted Channel Hopping (TSCH) timeslots (10 ms each) for both transmitting the request to the tag and receive the response (Pérez-Penichet et al., 2020; Perez-Ramirez et al., 2023). During treqsubscript𝑡𝑟𝑒𝑞t_{req}italic_t start_POSTSUBSCRIPT italic_r italic_e italic_q end_POSTSUBSCRIPT, v2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT sends a request signal to v1subscript𝑣1v_{1}italic_v start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to start carrier provisioning, allowing v2subscript𝑣2v_{2}italic_v start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT to regulate the frequency of tag interrogation—e.g., in a schedule with 10 timeslots (200 ms total duration with TSCH), a node might not want to query its tag 1000 ms/200 ms=41000 ms200 ms41000\text{ ms}/200\text{ ms}=41000 ms / 200 ms = 4 times per second.

Schedule. A schedule coordinates the interrogation of all sensor tags and the provisioning of unmodulated carriers by the IoT nodes for such purposes. It consists of L1𝐿1L\geq 1italic_L ≥ 1 timeslots, each assigning one of three possible actions to IoT nodes in the network: interrogate one of its tags 𝚃𝚃\mathtt{T}typewriter_T, provide unmodulated carrier 𝙲𝙲\mathtt{C}typewriter_C for neighboring tags, or remain idle 𝙾𝙾\mathtt{O}typewriter_O. We leverage the spatial distribution of nodes and tags to perform concurrent tag interrogations with one carrier provider (see Figure LABEL:subfig:carrier-reuse). There are two constraints for performing tag interrogations (Pérez-Penichet et al., 2020). First, due to the time-slotted channel access control mechanism, a node can interrogate only one of its hosting tags per timeslot. Additionally, for a tag to communicate with its hosting node, exactly one neighboring IoT node must provide it with an unmodulated carrier. Multiple impinging carriers on a sensor tag causes interference, and prevents proper tag interrogations (see Figure LABEL:subfig:carrier-interference).

Resource Efficiency. Two metrics determine a schedule’s resource efficiency: the length of the schedule L𝐿Litalic_L and the number of carrier slots C𝐶Citalic_C. While L𝐿Litalic_L indicates the latency of querying all sensor values, C𝐶Citalic_C is directly related to spectrum utilization and energy consumption of the IoT network (see Sec. 6.2.2). Figures LABEL:subfig:schedule-types and LABEL:subfig:carrier-reuse show how resource efficient schedules exploit the topological structure of the network to re-use carrier generating nodes within a timeslot.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 4. Example of schedules for a network topology and efficient carrier re-use for concurrent tag interrogation.

2.2. Existing Schedulers

A scheduler is a system that receives a description of the IoT network hosting sensor tags, and computes a schedule for interrogating the sensor tags. While recent work explores autonomous scheduling for TDMA based networks (Duquennoy et al., 2015), carrier scheduling requires more powerful hardware for such purposes. In general, carrier scheduling can be solved analytically by using a CO to obtain the optimal schedule. However, this is only feasible for small-sized IoT networks, since the NP-Hard nature of the problem prevents the practical application of the CO due to the long runtimes (e.g., up to 10 hours for a 10-node network).

Alternatively, Pérez-Penichet et al. present TagAlong (Pérez-Penichet et al., 2020), a heuristic algorithm that uses graph coloring to compute schedules. While TagAlong exhibits polynomial runtime, its performance becomes increasingly sub-optimal as the network size increases. Additionally, Pérez-Ramírez et al. present DeepGANTT (Perez-Ramirez et al., 2023), the first ML-based system for carrier scheduling that iteratively builds the schedule timeslot by timeslot. DeepGANTT learns from optimal schedules (computed by a CO) of networks of up to 10 IoT nodes and 14 sensor tags (Perez-Ramirez et al., 2023). It generalizes to networks of up to 60 nodes, achieving significant reduction in the number of carriers required in the schedule against TagAlong. In this work, we advance learning-based scheduling by considering networks of hundreds of nodes, far beyond DeepGANTT’s capabilities.

2.3. Learning-based Scheduling

Several works explore applying ML and GNNs for both COP and scheduling (Vinyals et al., 2015; Vesselinova et al., 2020; Bengio et al., 2021; Dai et al., 2017; Li et al., 2018a; Manchanda et al., 2020; Jeon et al., 2022; Mao et al., 2019), but few explore their usage for backscatter networks (Perez-Ramirez et al., 2023). In this work, we explore GNNs to design a system that generates schedules for backscatter networks consisting of hundreds of nodes.

Graph Neural Networks. GNNs are a flexible ML tool for tackling various inference tasks on graphs, such as node classification (Scarselli et al., 2009; Hamilton, 2020a; Wu et al., 2021). Intuitively, stacking K𝐾Kitalic_K GNN layers generates node embedding vectors that consider their K𝐾Kitalic_K-hop neighborhood by utilizing the graph’s structure and the relationships between nodes (Gilmer et al., 2017; Kipf and Welling, 2017). These embeddings are generally processed further with linear layers to produce the final output based on the specific task. For instance, node classification can be achieved by feeding each node embedding vector through a classification layer. For a graph G=V,E𝐺𝑉𝐸G=\langle V,E\rangleitalic_G = ⟨ italic_V , italic_E ⟩ with nodes vV𝑣𝑉v\in Vitalic_v ∈ italic_V and edges (v,u)E𝑣𝑢𝐸(v,u)\in E( italic_v , italic_u ) ∈ italic_E, at GNN layer i𝑖iitalic_i, each node feature vector hvsubscript𝑣h_{v}italic_h start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT is updated as:

(1) hvi=f1(hvi1,u𝒩(v)[f2(hui1)]),subscriptsuperscript𝑖𝑣subscript𝑓1subscriptsuperscript𝑖1𝑣subscript𝑢𝒩𝑣delimited-[]subscript𝑓2superscriptsubscript𝑢𝑖1,h^{i}_{v}=f_{1}\left(\,\,h^{i-1}_{v},\,\,\bigcup_{u\in\mathcal{N}(v)}\left[f_{% 2}\left(h_{u}^{i-1}\right)\right]\,\,\right)\,\text{,}italic_h start_POSTSUPERSCRIPT italic_i end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( italic_h start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_v end_POSTSUBSCRIPT , ⋃ start_POSTSUBSCRIPT italic_u ∈ caligraphic_N ( italic_v ) end_POSTSUBSCRIPT [ italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ( italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_i - 1 end_POSTSUPERSCRIPT ) ] ) ,

where 𝒩(v)𝒩𝑣\mathcal{N}(v)caligraphic_N ( italic_v ) is the set of neighbors of node v𝑣vitalic_v with husubscript𝑢h_{u}italic_h start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT representing their feature vectors, and \bigcup is a commutative aggregation function. f1,f2subscript𝑓1subscript𝑓2f_{1},f_{2}italic_f start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are non-linear transformations (Gilmer et al., 2017). For attention-based GNNs, additional learnable scaling parameters are included within \bigcup to weight the contributions of neighboring nodes differently.

One key advantage of GNNs is their ability to leverage the structural dependencies within the graph, and their ability to perform inference on new graphs not encountered during training without needing to retrain the model (Hamilton et al., 2017; Veličković et al., 2018; Vesselinova et al., 2020).

PE in GNNs. PE augments each node’s input feature vector with additional information of its structural role in the graph. The intuition is to aid subsequent GNN layers to better distinguish the nodes involved in symmetries—i.e., to perform injective aggregation of neighboring nodes’ features. Recent work explore PE with both local and global graph properties (Wang et al., 2022; Belkin and Niyogi, 2003; Dwivedi et al., 2023; Lim et al., 2023; Huang et al., 2024; Rampášek et al., 2022). While most focus on using PE to better distinguish different graphs, we are interested in assessing their advantage for effective node classification.

3. Carrier Scheduling Problem

The COP of computing a schedule to interrogate all sensor tags in an IoT network while minimizing both the length of the schedule L𝐿Litalic_L and the number of carrier slots C𝐶Citalic_C is described as follows. We model the network as an undirected connected graph G𝐺Gitalic_G, defined by the tuple G=Va,E𝐺subscript𝑉𝑎𝐸G=\langle V_{a},E\rangleitalic_G = ⟨ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_E ⟩, where Vasubscript𝑉𝑎V_{a}italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT is the set of N𝑁Nitalic_N IoT nodes in the network Va={vi}i=0N1subscript𝑉𝑎superscriptsubscriptsubscript𝑣𝑖𝑖0𝑁1V_{a}=\{v_{i}\}_{i=0}^{N-1}italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT = { italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_N - 1 end_POSTSUPERSCRIPT, and E𝐸Eitalic_E is the set of edges between the nodes E={u,v|u,vVa}𝐸conditional-set𝑢𝑣𝑢𝑣subscript𝑉𝑎E=\{\langle u,v\rangle|u,v\in V_{a}\}italic_E = { ⟨ italic_u , italic_v ⟩ | italic_u , italic_v ∈ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT }. The connectivity among IoT nodes is determined by the wireless link signal strength, i.e., there is an edge between two nodes only if there is a sufficiently strong wireless signal for providing unmodulated carrier (Pérez-Penichet et al., 2020; Perez-Ramirez et al., 2023). We denote the set of T𝑇Titalic_T tags in the network as Nt={ti}i=1Tsubscript𝑁𝑡superscriptsubscriptsubscript𝑡𝑖𝑖1𝑇N_{t}=\{t_{i}\}_{i=1}^{T}italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = { italic_t start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT, and their respective tag-to-host assignment as Ht:tNtvVa:subscript𝐻𝑡𝑡subscript𝑁𝑡maps-to𝑣subscript𝑉𝑎H_{t}:t\in N_{t}\mapsto v\in V_{a}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT : italic_t ∈ italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ↦ italic_v ∈ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. The role of a node v𝑣vitalic_v within a timeslot s𝑠sitalic_s is indicated by the map Rv,s:vVa,s[1,L]{𝙲,𝚃,𝙾}:subscript𝑅𝑣𝑠formulae-sequence𝑣subscript𝑉𝑎𝑠1𝐿maps-to𝙲𝚃𝙾R_{v,s}:v\!\in\!V_{a},s\!\in\![1,L]\mapsto\{\mathtt{C},\mathtt{T},\mathtt{O}\}italic_R start_POSTSUBSCRIPT italic_v , italic_s end_POSTSUBSCRIPT : italic_v ∈ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_s ∈ [ 1 , italic_L ] ↦ { typewriter_C , typewriter_T , typewriter_O }, where L𝐿Litalic_L is the schedule length in timeslots. Hence, a timeslot sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT consists of an N𝑁Nitalic_N-dimensional vector containing the roles assigned to every node during timeslot j𝑗jitalic_j: sj=[Rvi,j|viVa]subscript𝑠𝑗superscriptdelimited-[]conditionalsubscript𝑅subscript𝑣𝑖𝑗subscript𝑣𝑖subscript𝑉𝑎tops_{j}=\left[R_{v_{i},j}|v_{i}\in V_{a}\right]^{\top}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = [ italic_R start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_j end_POSTSUBSCRIPT | italic_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT ] start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT.

For a given problem instance g=G,Nt,Ht𝑔𝐺subscript𝑁𝑡subscript𝐻𝑡g=\langle G,N_{t},H_{t}\rangleitalic_g = ⟨ italic_G , italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ⟩, the carrier scheduling problem is formulated as follows:

(2) min\displaystyle\min\,roman_min (TC+L)𝑇𝐶𝐿\displaystyle\,\left(T\cdot C+L\right)( italic_T ⋅ italic_C + italic_L )
(3) s.t. tNt!s[1,L]:RHt,s=𝚃:for-all𝑡subscript𝑁𝑡𝑠1𝐿subscript𝑅subscript𝐻𝑡𝑠𝚃\displaystyle\,\forall t\!\in\!N_{t}\,\,\exists!\,\,s\!\in\![1,L]:\,R_{H_{t},s% }=\mathtt{T}∀ italic_t ∈ italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∃ ! italic_s ∈ [ 1 , italic_L ] : italic_R start_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_s end_POSTSUBSCRIPT = typewriter_T
(4) s[1,L]tNt|RHt,s=𝚃!vjVa::for-all𝑠1𝐿for-all𝑡conditionalsubscript𝑁𝑡subscript𝑅subscript𝐻𝑡𝑠𝚃subscript𝑣𝑗subscript𝑉𝑎absent\displaystyle\,\forall s\!\in\![1,L]\,\,\forall t\!\in\!N_{t}\,|\,R_{H_{t},s}=% \mathtt{T}\;\exists!\,v_{j}\!\in\!V_{a}:∀ italic_s ∈ [ 1 , italic_L ] ∀ italic_t ∈ italic_N start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_R start_POSTSUBSCRIPT italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_s end_POSTSUBSCRIPT = typewriter_T ∃ ! italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT :
Rvj,s=𝙲(Ht,vj)E,subscript𝑅subscript𝑣𝑗𝑠𝙲subscript𝐻𝑡subscript𝑣𝑗𝐸,\displaystyle\,R_{v_{j},s}=\mathtt{C}\wedge(H_{t},v_{j})\in E\,\,\text{,}italic_R start_POSTSUBSCRIPT italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_s end_POSTSUBSCRIPT = typewriter_C ∧ ( italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_v start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) ∈ italic_E ,

where C𝐶Citalic_C is the total number of carriers required in the schedule. Constraints (3) and (4) enforce that tags are interrogated only once in the schedule and that there is exactly one carrier-providing neighbor per tag in each timeslot (to prevent collisions), respectively. The objective function (2) prioritizes reducing C𝐶Citalic_C over L𝐿Litalic_L because we are most concerned with energy and spectrum efficiency—reducing C𝐶Citalic_C often implies a reduction of L𝐿Litalic_L, but the converse is not necessarily true (Perez-Ramirez et al., 2023).

Symmetry-Breaking Constraints. Solutions to the carrier scheduling problem are highly symmetrical, which limits effective training of a supervised ML model (Perez-Ramirez et al., 2023). Symmetries arise both from the network topology and from the sensor tags’ distribution among the nodes. E.g., for a star topology hosting one sensor tag in the center node, any of the leaf nodes can be the carrier provider, but the scheduler needs to select only one of these. Additionally, we do not assume any a-priori order for tag interrogations. Hence, any of the L!𝐿L!italic_L ! permutations of a schedule’s timeslots is also a valid schedule with the same length L𝐿Litalic_L and number of carrier slots C𝐶Citalic_C.

Symmetry-breaking constraints allow to efficiently learn the behavior of the optimal scheduler and properly train an ML model (Perez-Ramirez et al., 2023). For the training data generation procedure, we further constrain the optimization objective in Eq. 2 by enforcing two lexicographical minimizations: first of a vector of length T𝑇Titalic_T (number of tags) that indicates the timeslot when each tag is interrogated, and another length-T𝑇Titalic_T vector containing the node that provides the carrier for each tag.

4. RobustGANTT System Design

We consider networks consisting of COTS wireless IoT devices, or nodes, equipped with radio transceivers that support standard physical layer protocols, such as Bluetooth or IEEE 802.15.4/ZigBee. These nodes perform their regular computation and communication tasks according to their normal schedule (Pérez-Penichet et al., 2020; Duquennoy et al., 2015). The IoT nodes are either battery-powered or connected to mains power. We extend the sensing capabilities of the nodes with battery-free sensor tags (Pérez-Penichet et al., 2016, 2020), which require an additional schedule to coordinate carrier provisioning and tag interrogations. This schedule is appended to the IoT network’s normal schedule.

We base our system design on DeepGANTT and set to explore ML related aspects to design a scheduler with better and more robust generalization to larger networks.

Refer to caption
Figure 5. RobustGANTT’s ML model architecture. It receives as input a node-feature matrix X^jsubscript^𝑋𝑗\hat{X}_{j}over^ start_ARG italic_X end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and produces the corresponding schedule timeslot sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT for every iteration j𝑗jitalic_j. There is a multi-head self-attention GNN in each of RobustGANTT’s layers (orange box). ||||| | represents a concatenation operation. Green boxes represent a non-linear transformation by a single-layer fully-connected neural network.

4.1. System Description

RobustGANTT resides at the Edge/Cloud, and asynchronously receives requests by one or multiple IoT networks hosting battery-free sensor tags to compute schedules. Note that this is also true for any scheduler to tackle this problem due to the computational demands required in computing schedules. The interaction between RobustGANTT and the IoT network is depicted in Figure 1.

First, the IoT network collects the MAC and routing protocol information to build the network topology G𝐺Gitalic_G and the tag-to-host mapping Htsubscript𝐻𝑡H_{t}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT. In our evaluation in Sec. 6, we use metrics from both TSCH (Duquennoy et al., 2017) and RPL (Winter, 2012), but the process is analogous for other physical layer and routing protocols. Upon detection of changes either in the network’s connectivity or in the tag-to-host mapping, the network issues a request to RobustGANTT for computing a new schedule. Next, the scheduler receives the network information g𝑔gitalic_g and performs iterative node classification using a GNN model to compute the interrogation schedule timeslot by timeslot. Finally, RobustGANTT delivers the schedule back to the IoT network, where it is disseminated using existing network flooding mechanisms, such as Glossy (Ferrari et al., 2011).

At the core of RobustGANTT lies an attention-based GNN model to perform iterative one-shot node classification. In each iteration j𝑗jitalic_j, the GNN model receives as input a node feature matrix XjN×Dsubscript𝑋𝑗superscript𝑁𝐷X_{j}\in\mathbb{R}^{N\times D}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_D end_POSTSUPERSCRIPT with D=3𝐷3D=3italic_D = 3 features per node, and delivers as output the scheduling timeslot sjNsubscript𝑠𝑗superscript𝑁s_{j}\in\mathbb{R}^{N}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. The resulting sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT corresponds to assigning each of the N𝑁Nitalic_N nodes to one of three possible classes {𝚃\{\mathtt{T}{ typewriter_T, 𝙲𝙲\mathtt{C}typewriter_C, 𝙾}\mathtt{O}\}typewriter_O }.

RobustGANTT keeps a cached representation of the topology G𝐺Gitalic_G and the tag-to-host mapping Htsubscript𝐻𝑡H_{t}italic_H start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT that is updated after each iteration. After computing the jthsuperscript𝑗thj^{\text{th}}italic_j start_POSTSUPERSCRIPT th end_POSTSUPERSCRIPT timeslot sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, the tags assigned to be interrogated are removed from the cached representation of the topology, and a new input feature matrix is generated Xj+1subscript𝑋𝑗1X_{j+1}italic_X start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT to compute the next scheduling timeslot sj+1subscript𝑠𝑗1s_{j+1}italic_s start_POSTSUBSCRIPT italic_j + 1 end_POSTSUBSCRIPT. Being a probabilistic model, RobustGANTT has a component for checking that sjsubscript𝑠𝑗s_{j}italic_s start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT complies with the scheduling constraints at each iteration. This process is repeated until there are no more tags in the cached topology.

4.2. Scheduling Approach

Input Node Feature Matrix. Upon receiving the IoT network information, RobustGANTT builds a graph representation of the topology and parses this information for input to the GNN model. The input node feature matrix to the GNN Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT consists of D=3𝐷3D=3italic_D = 3 features per node:

  1. (1)

    Hosted-Tags: the number of tags hosted by the node.

  2. (2)

    Node-ID: integer identifying the node in the graph.

  3. (3)

    Min. Tag-ID: integer that represents the minimum tag ID among tags hosted by the node.

Intuitively, Hosted-Tags is decisive for assigning carrier-generating nodes – the node hosting the greatest number of tags in the network should avoid providing unmodulated carriers. Thanks to the symmetry-breaking constraints (§3), including features 2 and 3 provides the scheduler with context on how to prioritize carrier-provider nodes, and with an order to interrogate the tags, respectively. In practice, network operators can exploit this by, e.g., prioritizing IoT nodes connected to mains power as carrier providers, or by prioritizing certain tags to be interrogated early in the schedule, simply by assigning them a lower node/tag-ID.

ML Model Architecture. Figure 5 depicts the system’s ML model. The node feature matrix is first passed through a node-wise embedding layer, followed by a concatenation and layer normalization operation. Subsequently, the hidden representation is passed through a stack of 12 GNN layers, each containing both a linear activation and self-attention GNN. We fix 12 as the number of layers due to its wide application in language modelling with both GPT and BERT (Devlin et al., 2018; Radford et al., 2019, 2018), and its success in learning-based schedulers (Perez-Ramirez et al., 2023). The linear layer is a fully-connected neural network that acts on each node intermediate feature vector independently, while the GNN uses a multi-head attention mechanism of M𝑀Mitalic_M heads for computing message passing operations (Veličković et al., 2018). The structure and skip connections of each GNN-Block is inspired by the Transformer architecture (Vaswani et al., 2017).

4.3. Model Training

We train RobustGANTT with optimal schedules from relatively small networks that are computed by the optimal scheduler. We then use RobustGANTT to compute schedules for much larger and previously-unseen networks without the need for the scheduler to be re-trained.

As loss function, we select the modified cross-entropy loss that includes both a scaling factor to give more importance to the carrier generator class 𝙲𝙲\mathtt{C}typewriter_C (Perez-Ramirez et al., 2023), and L2 weight regularization (LeCun et al., 1989; Krogh and Hertz, 1991). As optimizer, we use Adam with its default hyperparameters (Kingma and Lei Ba, 2015). We use learning rate decay by 2% every epoch, with an initial learning rate ϵinit=103subscriptitalic-ϵ𝑖𝑛𝑖𝑡superscript103\epsilon_{init}\!=\!10^{-3}italic_ϵ start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT = 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPT. We early stop model training after 25 consecutive epochs without minimization of the validation loss, and save the best performing model based on the carrier-class F1-score (Perez-Ramirez et al., 2023).

5. System GNN Model Design

We explore ML-related design aspects that provide RobustGANTT with strong and consistent generalization to larger, previously unseen, IoT networks. We believe our findings not only advance carrier scheduling, but also provide insights on designing learning-based schedulers for IoT networks.

Setup. We undergo a structured and sequential process in three stages, selecting the best configuration in each stage before transitioning to the next one: i) learning rate warmup, ii) local and global PE, and iii) increasing the number of attention heads. For each stage, we train multiple models according to Sec. 4.3 using the training dataset from Sec. 5.1.1, while fixing the hyperparameter configuration. To mitigate the effect of randomness, we fix the random seeds from software libraries at the application level: Python, PyTorch, and NumPy (Paszke et al., 2019; Foundation, 2024). Since the best performance for a given model configuration may greatly diverge from its average (see Figure 2), we train multiple, but identical, ML models for each configuration to assess their performance consistency to larger topologies. However, we are limited to training 4-8 models per configuration, since the training and subsequent deployment to larger graphs takes between 10-45 hours for a single model, depending on its configuration. Our analysis results in the training of over 50 ML scheduler models.

After training, we deploy the models to compute schedules for the generalization dataset – previously unseen topologies of larger size than those trained (see Sec. 5.1.2). No re-training is done at this stage. We report mean and percentile statistics across the runs for each model configuration, and select the best one based on the performance metrics from Sec. 5.2.

We highlight the following key findings:

  • Warm-up significantly contributes to computing complete and correct schedules for larger topologies.

  • Node degree PE allows for a good trade-off to assist in breaking graph symmetries with a low-overhead PE method.

  • 12 attention heads consistently achieves good generalization performance to larger topologies.

5.1. Datasets

We train all models using the data fom Sec. 5.1.1. After training, their performance is compared on the dataset described in Sec. 5.1.2, on which the models are not trained.

5.1.1. Training Dataset

We use artificially generated problem instances (topologies and tag assignments) according to Perez-Ramirez et al. (Perez-Ramirez et al., 2023). The dataset contains 580000 problem instances with networks of 2-10 nodes and 1-14 tags that are randomly assigned. We use the optimal scheduler to obtain schedules for these problem instances. This implies using a CO to solve analytically the COP described in Sec. 3. We use 80%-20% training and validation data splits.

5.1.2. Generalization Dataset

Consists of larger and previously unseen topologies on which models are not trained. We select the best performing model configuration in this dataset when deciding the final ML model. We consider 200200200200 problem instances (network topologies and tag assignments) for every (N,T)𝑁𝑇(N,T)( italic_N , italic_T ) pair from the sets N{10,20,40,60,80,100}𝑁1020406080100N\in\{10,20,40,60,80,100\}italic_N ∈ { 10 , 20 , 40 , 60 , 80 , 100 } nodes and T{40,80,160,240}𝑇4080160240T\in\{40,80,160,240\}italic_T ∈ { 40 , 80 , 160 , 240 } tags—i.e., 4800480048004800 networks.

5.2. Performance Metrics

In this work, we are interested in the system-related aspects of RobustGANTT. Hence, we consider the following application-related performance metrics in ML model design.

𝚷𝚷\mathbf{\Pi}bold_Π—Correctly Computed Schedules. Given a set of IoT networks, Π[0,100]%Πpercent0100\Pi\in[0,100]\%roman_Π ∈ [ 0 , 100 ] % represents the percentage of networks for which RobustGANTT produces a complete schedule – one that interrogates all sensor tags. Since RobustGANTT is a probabilistic model, we must account for cases in which the scheduler cannot produce all the required timeslots to query all sensor values in the network. If RobustGANTT fails to deliver all timeslots, even if it correctly delivered some of them, we consider it a failed schedule.

𝚫𝐂subscript𝚫𝐂\boldsymbol{\Delta}_{\mathbf{C}}bold_Δ start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT—Carriers Saved. This metric directly relates to the energy and spectrum utilization of the network. It compares the total number of carrier generator slots C𝐶Citalic_C from the schedule generated by the TagAlong heuristic Ctasubscript𝐶𝑡𝑎C_{ta}italic_C start_POSTSUBSCRIPT italic_t italic_a end_POSTSUBSCRIPT against the total number of carrier slots from the schedule computed by a learning-based scheduler Cnnsubscript𝐶𝑛𝑛C_{nn}italic_C start_POSTSUBSCRIPT italic_n italic_n end_POSTSUBSCRIPT as: 𝚫𝐂=CnnCtasubscript𝚫𝐂subscript𝐶𝑛𝑛subscript𝐶𝑡𝑎\boldsymbol{\Delta}_{\mathbf{C}}=C_{nn}-C_{ta}bold_Δ start_POSTSUBSCRIPT bold_C end_POSTSUBSCRIPT = italic_C start_POSTSUBSCRIPT italic_n italic_n end_POSTSUBSCRIPT - italic_C start_POSTSUBSCRIPT italic_t italic_a end_POSTSUBSCRIPT.

5.3. Results

We describe the considered ML design aspects and their influence in our system’s generalization to larger topologies.

5.3.1. Influence of Warmup

Based on the findings from Ma et al. (Ma and Yarats, 2021), we evaluate the influence of learning rate warm-up on the optimization. It involves starting training with a small learning rate ϵ~ϵinitmuch-less-than~italic-ϵsubscriptitalic-ϵ𝑖𝑛𝑖𝑡\tilde{\epsilon}\ll\epsilon_{init}over~ start_ARG italic_ϵ end_ARG ≪ italic_ϵ start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT and gradually increase ϵ~~italic-ϵ\tilde{\epsilon}over~ start_ARG italic_ϵ end_ARG until reaching the initial learning rate ϵinitsubscriptitalic-ϵ𝑖𝑛𝑖𝑡\epsilon_{init}italic_ϵ start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT. Intuitively, warmup provides more stability by regularizing the magnitude of parameter updates in early stages of training for momentum-based optimizers. Since such optimizers perform the parameter updates considering past statistical moments of the gradients, warmup allows the optimizer to calculate moments’ statistics before performing big jumps in the parameter update, which reduces variance of the update steps (Ma and Yarats, 2021).

We choose an untunned linear warmup schedule (Ma and Yarats, 2021) due to its simplicity and competitive performance. It requires 2(1β2)12superscript1subscript𝛽212*(1-\beta_{2})^{-1}2 ∗ ( 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT steps so that ϵ~ϵinit~italic-ϵsubscriptitalic-ϵ𝑖𝑛𝑖𝑡\tilde{\epsilon}\approx\epsilon_{init}over~ start_ARG italic_ϵ end_ARG ≈ italic_ϵ start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT, where β2=0.999subscript𝛽20.999\beta_{2}=0.999italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.999 is Adam’s second moment decay rate (Kingma and Lei Ba, 2015). The warm-up update of the learning rate is performed as: ϵ~=ϵinitmin(1,1β22i),~italic-ϵsubscriptitalic-ϵ𝑖𝑛𝑖𝑡11subscript𝛽22𝑖,\tilde{\epsilon}=\epsilon_{init}*\min\left(1,\frac{1-\beta_{2}}{2}*i\right)% \text{,}over~ start_ARG italic_ϵ end_ARG = italic_ϵ start_POSTSUBSCRIPT italic_i italic_n italic_i italic_t end_POSTSUBSCRIPT ∗ roman_min ( 1 , divide start_ARG 1 - italic_β start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_ARG start_ARG 2 end_ARG ∗ italic_i ) , where i𝑖iitalic_i is the mini-batch iteration. We independently train two sets of eight identical models, with and without warmup.

Refer to caption
(a)
Refer to caption
(b)
Figure 6. Warm-up proven crucial for more stable performance of percentage of correctly computed schedules 𝐒𝐜subscript𝐒𝐜\mathbf{S_{c}}bold_S start_POSTSUBSCRIPT bold_c end_POSTSUBSCRIPT for larger topologies. Performance over multiple runs (higher is better). Vertical lines represent the standard error of the mean of the respective metric.

Warmup contributes to higher ΠΠ\Piroman_Π values. Without warmup, Figure LABEL:subfig:nowarmup shows how the performance from the percentage of correctly computed schedules ΠΠ\Piroman_Π deteriorates (also with increasing std-err) as the topology size increases. Including warmup significantly mitigated the variance in ΠΠ\Piroman_Π for the larger topologies, as shown in Figure LABEL:subfig:2H. Moreover, it improves Carriers Saved 𝚫Csubscript𝚫𝐶\boldsymbol{\Delta}_{C}bold_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT values for the 25th, mean, 75th and 95th percentiles in topologies of up to 60 nodes. However, the average performance of 𝚫Csubscript𝚫𝐶\boldsymbol{\Delta}_{C}bold_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT across the multiple runs is similar for 100 node topologies, with only marginal improvements when including warmup. Moreover, including warm-up also reduced the standard error of all metrics (vertical lines), regardless of the topology size.

5.3.2. Influence of Positional Encoding

We investigate augmenting the input node features to the GNN with PEs to aid the model in breaking symmetries. Based on the results from Sec. 5.3.1, all models are trained with warmup. We consider three types of PEs considering both local and global graph properties. We train four models for each PE configuration.

Node Degree PE

We include one additional vector in the input node feature matrix that corresponds to the normalized node degree vector. Given the adjacency matrix 𝐀N×N𝐀superscript𝑁𝑁\mathbf{A}\in\mathbb{R}^{N\times N}bold_A ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT of an undirected graph G=Va,E𝐺subscript𝑉𝑎𝐸G=\langle V_{a},E\rangleitalic_G = ⟨ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT , italic_E ⟩ with |Va|=Nsubscript𝑉𝑎𝑁|V_{a}|=N| italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT | = italic_N nodes, where 𝐀[u,v]=1𝐀𝑢𝑣1\mathbf{A}[u,v]=1bold_A [ italic_u , italic_v ] = 1 if u,vE𝑢𝑣𝐸\langle u,v\rangle\in E⟨ italic_u , italic_v ⟩ ∈ italic_E and A[u,v]=0𝐴𝑢𝑣0A[u,v]=0italic_A [ italic_u , italic_v ] = 0 otherwise, the degree of node u𝑢uitalic_u is d~u=vVaA[u,v]subscript~𝑑𝑢subscript𝑣subscript𝑉𝑎A𝑢𝑣\tilde{d}_{u}=\sum_{v\in V_{a}}\textbf{A}[u,v]over~ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = ∑ start_POSTSUBSCRIPT italic_v ∈ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT A [ italic_u , italic_v ] (Hamilton, 2020b). We append the node degree vector d~=[d~u/d~max]uVaN~𝑑superscriptsubscriptdelimited-[]subscript~𝑑𝑢subscript~𝑑𝑚𝑎𝑥𝑢subscript𝑉𝑎topsuperscript𝑁\tilde{d}=[\tilde{d}_{u}/\tilde{d}_{max}]_{u\in V_{a}}^{\top}\in\mathbb{R}^{N}over~ start_ARG italic_d end_ARG = [ over~ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT / over~ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_u ∈ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT as a column to the input node feature matrix XN×D𝑋superscript𝑁𝐷X\in\mathbb{R}^{N\times D}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_D end_POSTSUPERSCRIPT, where d~maxsubscript~𝑑𝑚𝑎𝑥\tilde{d}_{max}over~ start_ARG italic_d end_ARG start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT is the degree with highest magnitude. Including node degree PE results in D=3+1=4𝐷314D=3+1=4italic_D = 3 + 1 = 4 input features per node.

Eigenvalues of Graph Laplacian (Eigvals PE)

We investigate using global properties of the graph as PE. We define the symmetric normalized graph Laplacian as 𝐋=I𝐃12𝐀𝐃12𝐋𝐼superscript𝐃12superscript𝐀𝐃12\mathbf{L}=I-\mathbf{D}^{-\frac{1}{2}}\mathbf{A}\mathbf{D}^{-\frac{1}{2}}bold_L = italic_I - bold_D start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT bold_AD start_POSTSUPERSCRIPT - divide start_ARG 1 end_ARG start_ARG 2 end_ARG end_POSTSUPERSCRIPT, where 𝐃=diag(d~)𝐃diag~𝑑\mathbf{D}=\operatorname{diag}(\tilde{d})bold_D = roman_diag ( over~ start_ARG italic_d end_ARG ) is the diagonal node degree matrix and I𝐼Iitalic_I is the identity matrix. We perform EVD of 𝐋𝐋\mathbf{L}bold_L resulting in 𝐋=𝐕𝚲𝐕1𝐋𝐕𝚲superscript𝐕1\mathbf{L}=\mathbf{V}\mathbf{\Lambda}\mathbf{V}^{-1}bold_L = bold_V bold_Λ bold_V start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT, where 𝚲N×N𝚲superscript𝑁𝑁\mathbf{\Lambda}\in\mathbb{R}^{N\times N}bold_Λ ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT is a diagonal matrix containing the eigenvalues λisubscript𝜆𝑖\lambda_{i}\in\mathbb{R}italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R of 𝐋𝐋\mathbf{L}bold_L, and 𝐕N×N𝐕superscript𝑁𝑁\mathbf{V}\in\mathbb{R}^{N\times N}bold_V ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_N end_POSTSUPERSCRIPT is a matrix containing the eigenvectors 𝐯iNsubscript𝐯𝑖superscript𝑁\mathbf{v}_{i}\in\mathbb{R}^{N}bold_v start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT for iVa𝑖subscript𝑉𝑎i\in V_{a}italic_i ∈ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT. We first augment the node feature matrix with a vector that contains the eigenvalues of the graph Λ~=[λi]iVaN~Λsubscriptdelimited-[]subscript𝜆𝑖𝑖subscript𝑉𝑎superscript𝑁\tilde{\Lambda}=[\lambda_{i}]_{i\in V_{a}}\in\mathbb{R}^{N}over~ start_ARG roman_Λ end_ARG = [ italic_λ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i ∈ italic_V start_POSTSUBSCRIPT italic_a end_POSTSUBSCRIPT end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT. We normalize Λ^^Λ\hat{\Lambda}over^ start_ARG roman_Λ end_ARG using the highest eigenvalue. Including Eigvals PE results in D=3+1=4𝐷314D=3+1=4italic_D = 3 + 1 = 4 input features per node.

Stable and Expressive Positional Encodings (SPE PE)

While eigenvalues provide an indication of magnitude and transformation strength, eigenvectors contain richer geometric information in the directional properties. Eigenvectors are not unique, and suffer from sign invariance—i.e., if 𝐯𝐯\mathbf{v}bold_v is an eigenvector, so is 𝐯𝐯-\mathbf{v}- bold_v. Geometrically, this means that they are nontrivial solutions for finding the EVD: any orthogonal change of basis of 𝐕𝐕\mathbf{V}bold_V yields the same Laplacian 𝐋𝐋\mathbf{L}bold_L (Kwak and Hong, 2004).

While early work introduces random eigenvector sign flipping during training to account for sign invariance (Dwivedi et al., 2023; Kreuzer et al., 2021), recent work explores learning the invariances that account for changes in the eigenspace basis 𝐕𝐕\mathbf{V}bold_V (Lim et al., 2023; Huang et al., 2024). The goal is to learn a permutation-invariant transformation of Λ~~Λ\tilde{\Lambda}over~ start_ARG roman_Λ end_ARG and 𝐕𝐕\mathbf{V}bold_V that accounts for their geometrical significance. We choose the Stable and Expressive PE (SPE) method presented by Huang et al. (Huang et al., 2024) due to its benefits over previous methods (Lim et al., 2023). We construct a PE matrix ΓN×ZΓsuperscript𝑁𝑍\Gamma\in\mathbb{R}^{N\times Z}roman_Γ ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_Z end_POSTSUPERSCRIPT using the first Z𝑍Zitalic_Z smallest Eigenvalues Λ^=[Λ~i]i[0:Z]Z^Λsubscriptdelimited-[]subscript~Λ𝑖𝑖delimited-[]:0𝑍superscript𝑍\hat{\Lambda}=[\tilde{\Lambda}_{i}]_{i\in[0:Z]}\in\mathbb{R}^{Z}over^ start_ARG roman_Λ end_ARG = [ over~ start_ARG roman_Λ end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i ∈ [ 0 : italic_Z ] end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_Z end_POSTSUPERSCRIPT and Eigenvectors 𝐕^=[𝐕[:,i]]i[0:Z]N×Z^𝐕subscriptdelimited-[]subscript𝐕:𝑖𝑖delimited-[]:0𝑍superscript𝑁𝑍\hat{\mathbf{V}}=[\mathbf{V}_{[:,i]}]_{i\in[0:Z]}\in\mathbb{R}^{N\times Z}over^ start_ARG bold_V end_ARG = [ bold_V start_POSTSUBSCRIPT [ : , italic_i ] end_POSTSUBSCRIPT ] start_POSTSUBSCRIPT italic_i ∈ [ 0 : italic_Z ] end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_N × italic_Z end_POSTSUPERSCRIPT as (Huang et al., 2024):

(5) Γ=ρ(𝐕^diag(ϕ1(Λ^))𝐕^,,𝐕^diag(ϕm(Λ^))𝐕^),Γ𝜌^𝐕diagsubscriptitalic-ϕ1^Λsuperscript^𝐕top^𝐕diagsubscriptitalic-ϕ𝑚^Λsuperscript^𝐕top,\Gamma=\rho\left(\hat{\mathbf{V}}\operatorname{diag}(\phi_{1}(\hat{\Lambda}))% \hat{\mathbf{V}}^{\top},\dots,\hat{\mathbf{V}}\operatorname{diag}(\phi_{m}(% \hat{\Lambda}))\hat{\mathbf{V}}^{\top}\right)\text{,}roman_Γ = italic_ρ ( over^ start_ARG bold_V end_ARG roman_diag ( italic_ϕ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ( over^ start_ARG roman_Λ end_ARG ) ) over^ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT , … , over^ start_ARG bold_V end_ARG roman_diag ( italic_ϕ start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( over^ start_ARG roman_Λ end_ARG ) ) over^ start_ARG bold_V end_ARG start_POSTSUPERSCRIPT ⊤ end_POSTSUPERSCRIPT ) ,

where ρ𝜌\rhoitalic_ρ is a permutation invariant function and {ϕi}i=1msuperscriptsubscriptsubscriptitalic-ϕ𝑖𝑖1𝑚\{\phi_{i}\}_{i=1}^{m}{ italic_ϕ start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT } start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT are m𝑚mitalic_m independent linear transformations. We implement ρ𝜌\rhoitalic_ρ using a Graph Isomorphism Network (Xu et al., 2019) and ϕitalic-ϕ\phiitalic_ϕ with multi-layer perceptrons, using the same hyperparameters as Huang et al. (Huang et al., 2024). However, as we operate on a supervised setting, the choice of Z𝑍Zitalic_Z is determined by the training graph sizes (topologies up to 10 nodes). Hence, we choose the first Z=9𝑍9Z=9italic_Z = 9 eigenvalues larger than 00 and their eigenvectors. Including SPE PE results in D=3+Z=12𝐷3𝑍12D=3+Z=12italic_D = 3 + italic_Z = 12 input features per node.

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Figure 7. Node degree PE achieved best trade-off between generalization performance and low computational overhead. Graphs demonstrate influence of different PEs by showing performance over multiple runs. Vertical lines depict the standard error of the respective metric.

Node degree PE provides the best trade-off between symmetry-breaking and computational overhead. Figure 7 depicts the model’s performance for different PE methods. While SPE achieved the best carrier saved ΔCsubscriptΔ𝐶\Delta_{C}roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT results in topologies up to 20 nodes (Figure LABEL:subfig:spe-PE), its ΠΠ\Piroman_Π value significantly reduces for an increasing number of sensor tags. Moreover, it is completely unable to compute schedules for topologies of 60 and 100 nodes (Π=0Π0\Pi=0roman_Π = 0). Moreover, Figures LABEL:subfig:eivals-PE and LABEL:subfig:degree-PE for Eigvals PE and node degree PE show similar profiles. Notably, node degree PE achieves higher 75th and 95th percentile values for both 60 node and 100 node topologies. Additionally, node degree PE does not incur in the expensive computation overhead of estimating the EVD. E.g., it takes on average 450 ms extra to compute the EVD on a multi-core processor for 100 node topologies. Hence, node degree PE represents a good trade-off to improve the performance, while avoiding the EVD computation overhead.

5.3.3. Influence of Attention Heads

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 8. Robust and consistent generalization achieved with 12 Heads. Graphs depict the influence of the number of attention heads by showing performance over multiple runs. Vertical lines depict the standard error of the respective metric.

We include warmup and node degree PE based on the results from the previous sections. We now evaluate the influence of model complexity by increasing the number of attention heads M𝑀Mitalic_M in each of the GNN layers. We train eight models for each number-of-heads value in M={4,8,12}𝑀4812M=\{4,8,12\}italic_M = { 4 , 8 , 12 }, and four 16-head models due to their long runtimes (+40 hours per model). We report average and standard error from performance metrics’ statistics.

12 heads crucial for robust generalization. Figure 8 shows the influence of increasing the number of attention heads in the model. As observed from Figure LABEL:subfig:4HLABEL:subfig:12H, increasing the attention heads implies an increase in the carriers saved ΔCsubscriptΔ𝐶\Delta_{C}roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT performance for all percentiles. While the models from 8 heads and 12 heads exhibit similar performance, the overall stability of 12 heads is better for both percentage of correctly computed schedules ΠΠ\Piroman_Π and for pushing the 25th percentile of ΔCsubscriptΔ𝐶\Delta_{C}roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT above 0. Increasing the attention heads beyond 12 to 16 yields no benefit. On the contrary: Figure LABEL:subfig:16H shows how the mean and 25th percentile of ΔCsubscriptΔ𝐶\Delta_{C}roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT fall below 0.

5.4. Final GNN Model

Our analysis from Sec. 5.3 results in a RobustGANTT model of 12 attention heads with node degree PE that is trained with warmup. It exhibits strong generalization to larger topologies, and its performance is consistent across independently trained models. We train RobustGANTT’s model according to Sec. 4.3 using the dataset described in Sec. 5.1.1. Training the model with a mini-batch size of 1024 requires 22 hours on an NVIDIA A100 GPU.

6. Evaluation

In this section, we compare RobustGANTT’s performance against the DeepGANTT scheduler in terms of resource savings over the TagAlong heuristic (Pérez-Penichet et al., 2020). We use both simulated topologies and a real-life IoT network. The design choice of GNNs allows our scheduler to generalize to larger, previously unseen network topologies without retraining. Hence, no further RobustGANTT’s ML model training is performed for these experiments. We highlight the following key findings:

  • RobustGANTT far surpasses the generalization capabilities of DeepGANTT. It scales to 1000 node topologies, while increasingly saving resources compared to TagAlong without sacrificing latency (Figure 9).

  • Both RobustGANTT and DeepGANTT achieve resource savings against TagAlong for the real-life IoT network. However, our scheduler achieves up to 1.9×1.9\times1.9 × more energy savings, up to a 5.7×5.7\times5.7 × reduction in the schedule’s latency, and up to 2×2\times2 × reduction in 95th percentile runtime compared to DeepGANTT (Figures 11 and 12).

  • For the real-life IoT network topology, RobustGANTT achieves an average runtime of 540ms, which allows it to react fast to changing network conditions.

Implementation. We implement RobustGANTT as Function as a Service with 2600similar-toabsent2600\sim 2600∼ 2600 lines of code in a server with an A100 NVIDIA GPU. In general, RobustGANTT’s ML model has 295similar-toabsent295\sim 295∼ 295 million parameters, requiring 1.6similar-toabsent1.6\sim 1.6∼ 1.6 GB GPU memory in total using single-point precision, which allows deploying RobustGANTT in lower-end GPUs.

6.1. Scalability to 1000-node topologies

Refer to caption
(a)
Refer to caption
(b)
Figure 9. RobustGANTT achieves increasing carrier savings with increasing topology sizes compared to the heuristic, while utilizing roughly the same timeslots. Comparison of DeepGANTT (Perez-Ramirez et al., 2023) and our scheduler against the TagAlong heuristic.

We evaluate RobustGANTT’s generalization to larger topologies, far exceeding DeepGANTT’s capabilities, while still achieving significant resource savings against TagAlong.

6.1.1. Dataset

We consider 200 problem instances (simulated IoT networks with random sensor tag assignments) for (N,T)𝑁𝑇(N,T)( italic_N , italic_T ) pairs from the sets N{100,500,1000}𝑁1005001000N\in\{100,500,1000\}italic_N ∈ { 100 , 500 , 1000 } nodes and T{250,500,1000,1500}𝑇25050010001500T\in\{250,500,1000,1500\}italic_T ∈ { 250 , 500 , 1000 , 1500 } sensor tags.

6.1.2. Performance metrics

Besides ΠΠ\Piroman_Π and ΔCsubscriptΔ𝐶\Delta_{C}roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT (see Sec. 5.2), we consider a metric related to the schedule length L𝐿Litalic_L.

𝚫Lsubscript𝚫𝐿\boldsymbol{\Delta}_{L}bold_Δ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT—Timeslots Saved. Relates to the latency of querying all sensor tag values in the network. Given a network topology, it compares the length of the schedule produced by TagAlong Ltasubscript𝐿𝑡𝑎L_{ta}italic_L start_POSTSUBSCRIPT italic_t italic_a end_POSTSUBSCRIPT against the length from the schedule produced by a learning-based scheduler Lnnsubscript𝐿𝑛𝑛L_{nn}italic_L start_POSTSUBSCRIPT italic_n italic_n end_POSTSUBSCRIPT as: ΔL=LtaLnnsubscriptΔ𝐿subscript𝐿𝑡𝑎subscript𝐿𝑛𝑛\Delta_{L}=L_{ta}-L_{nn}roman_Δ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT = italic_L start_POSTSUBSCRIPT italic_t italic_a end_POSTSUBSCRIPT - italic_L start_POSTSUBSCRIPT italic_n italic_n end_POSTSUBSCRIPT.

6.1.3. Results

Figure LABEL:subfig:robust-vs-deep-CARR depicts the carriers saved ΔCsubscriptΔ𝐶\Delta_{C}roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT of both RobustGANTT and DeepGANTT against the TagAlong heuristic. RobustGANTT consistently achieves higher savings with both an increase in the number of nodes and number of sensor tags. Notably, even its 1st percentile lies above zero, i.e., for at least 99%percent9999\%99 % of the cases RobustGANTT achieves savings against TagAlong. Our scheduler achieves on average 12%percent1212\%12 % and up to a 1.4×1.4\times1.4 × reduction in the number of carriers compared to TagAlong. Sec. 6.2.2 demonstrates how ΔCsubscriptΔ𝐶\Delta_{C}roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT directly translates to energy savings. The DeepGANTT scheduler is, however, only marginally better than TagAlong for 100-node topologies, and increasingly worse for larger networks. Additionally, DeepGANTT’s correctly computed schedules ΠΠ\Piroman_Π decreases for 100 nodes, while RobustGANTT’s values are consistently Π=100%Πpercent100\Pi=100\%roman_Π = 100 %.

RobustGANTT computes schedules requiring roughly the same number of timeslots as TagAlong (ΔL0subscriptΔ𝐿0\Delta_{L}\approx 0roman_Δ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT ≈ 0) as shown in Figure LABEL:subfig:robust-vs-deep-LEN. Hence, our scheduler achieves significant savings in energy and spectrum without a significant reduction in the latency to query all sensor tags. Across all topologies considered, RobustGANTT requires on average 1.12 additional timeslots compared to TagAlong. In contrast, DeepGANTT requires on average 20 additional timeslots, and achieves no resource savings for such large topologies. Moreover, Figure LABEL:subfig:robust-vs-deep-LEN shows how DeepGANTT requires increasingly more timeslots than our scheduler.

6.2. Performance for a Real IoT Network

Refer to caption
Figure 10. Floor layout and device configuration of our testbed. We use a real IoT network consisting of Zolertia Fireflies, and collect network metrics over a period of four days.

We now evaluate RobustGANTT’s ability to compute schedules for a real-life IoT network.

6.2.1. Testbed

Our experimental setup utilizes an indoor IoT testbed composed of 23 Zolertia Firefly devices running Contiki-NG (Oikonomou et al., 2022) (see Figure 10). These devices employ the RPL routing protocol (Winter, 2012) and communicate via IPv6 over IEEE 802.15.4 TSCH (Duquennoy et al., 2017). Link connectivity data between IoT nodes was gathered at 30-minute intervals across a four-day span, assuming a link exists between node pairs when the signal strength reaches at least 75 dtimes-75d-75\text{\,}\mathrm{d}start_ARG - 75 end_ARG start_ARG times end_ARG start_ARG roman_d end_ARGBm, suitable for carrier provisioning. Additionally, we enhanced each network topology by assigning simulated tags randomly to achieve various densities, defined as N/T=23/T𝑁𝑇23𝑇N/T=23/Titalic_N / italic_T = 23 / italic_T, for T{46,115,230,460}𝑇46115230460T\in\{46,115,230,460\}italic_T ∈ { 46 , 115 , 230 , 460 }, with each density configuration tested 100 times.

6.2.2. Performance metrics

Besides ΠΠ\Piroman_Π, ΔCsubscriptΔ𝐶\Delta_{C}roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT (see Sec. 5.2), and ΔLsubscriptΔ𝐿\Delta_{L}roman_Δ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT (see Sec. 6.1.2) we explicitly evaluate energy consumption. Moreover, percentages for ΔCsubscriptΔ𝐶\Delta_{C}roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT and ΔLsubscriptΔ𝐿\Delta_{L}roman_Δ start_POSTSUBSCRIPT italic_L end_POSTSUBSCRIPT imply normalization with respect to the heuristic values, e.g., ΔC%=ΔC/CtasubscriptΔpercent𝐶subscriptΔ𝐶subscript𝐶𝑡𝑎\Delta_{C\%}=\Delta_{C}/C_{ta}roman_Δ start_POSTSUBSCRIPT italic_C % end_POSTSUBSCRIPT = roman_Δ start_POSTSUBSCRIPT italic_C end_POSTSUBSCRIPT / italic_C start_POSTSUBSCRIPT italic_t italic_a end_POSTSUBSCRIPT.

𝚫𝐄%subscript𝚫percent𝐄\boldsymbol{\Delta}_{\mathbf{E\%}}bold_Δ start_POSTSUBSCRIPT bold_E % end_POSTSUBSCRIPT—Energy Saved. We consider the average energy required for querying the tag’s sensor values E~~𝐸\tilde{E}over~ start_ARG italic_E end_ARG. It corresponds to the total energy required to interrogate all sensor tags Etotsubscript𝐸𝑡𝑜𝑡E_{tot}italic_E start_POSTSUBSCRIPT italic_t italic_o italic_t end_POSTSUBSCRIPT divided by the number of tags T𝑇Titalic_T in the network (Perez-Ramirez et al., 2023):

(6) E~~𝐸\displaystyle\tilde{E}\!\!\!over~ start_ARG italic_E end_ARG =\displaystyle\!\!\!=\!\!\!= EtotT=Ptxttx+Prx(CTtreq+trx)+Ptx(treq+CTtcg),subscript𝐸𝑡𝑜𝑡𝑇subscript𝑃𝑡𝑥subscript𝑡𝑡𝑥subscript𝑃𝑟𝑥𝐶𝑇subscript𝑡𝑟𝑒𝑞subscript𝑡𝑟𝑥subscript𝑃𝑡𝑥subscript𝑡𝑟𝑒𝑞𝐶𝑇subscript𝑡𝑐𝑔,\displaystyle\!\!\!\frac{E_{tot}}{T}\!=\!P_{tx}t_{tx}\!+\!P_{rx}\left(\frac{C}% {T}t_{req}\!+\!t_{rx}\right)\!+\!P_{tx}\left(t_{req}\!+\!\frac{C}{T}t_{cg}% \right)\!\text{,}divide start_ARG italic_E start_POSTSUBSCRIPT italic_t italic_o italic_t end_POSTSUBSCRIPT end_ARG start_ARG italic_T end_ARG = italic_P start_POSTSUBSCRIPT italic_t italic_x end_POSTSUBSCRIPT italic_t start_POSTSUBSCRIPT italic_t italic_x end_POSTSUBSCRIPT + italic_P start_POSTSUBSCRIPT italic_r italic_x end_POSTSUBSCRIPT ( divide start_ARG italic_C end_ARG start_ARG italic_T end_ARG italic_t start_POSTSUBSCRIPT italic_r italic_e italic_q end_POSTSUBSCRIPT + italic_t start_POSTSUBSCRIPT italic_r italic_x end_POSTSUBSCRIPT ) + italic_P start_POSTSUBSCRIPT italic_t italic_x end_POSTSUBSCRIPT ( italic_t start_POSTSUBSCRIPT italic_r italic_e italic_q end_POSTSUBSCRIPT + divide start_ARG italic_C end_ARG start_ARG italic_T end_ARG italic_t start_POSTSUBSCRIPT italic_c italic_g end_POSTSUBSCRIPT ) ,

where both Prxsubscript𝑃𝑟𝑥P_{rx}italic_P start_POSTSUBSCRIPT italic_r italic_x end_POSTSUBSCRIPT and Prxsubscript𝑃𝑟𝑥P_{rx}italic_P start_POSTSUBSCRIPT italic_r italic_x end_POSTSUBSCRIPT correspond to the radio power at transmit and receive mode, respectively. trxsubscript𝑡𝑟𝑥t_{rx}italic_t start_POSTSUBSCRIPT italic_r italic_x end_POSTSUBSCRIPT, ttxsubscript𝑡𝑡𝑥t_{tx}italic_t start_POSTSUBSCRIPT italic_t italic_x end_POSTSUBSCRIPT, treqsubscript𝑡𝑟𝑒𝑞t_{req}italic_t start_POSTSUBSCRIPT italic_r italic_e italic_q end_POSTSUBSCRIPT, and tcgsubscript𝑡𝑐𝑔t_{cg}italic_t start_POSTSUBSCRIPT italic_c italic_g end_POSTSUBSCRIPT are defined as in Figure 3. Calculating a percentage of energy saved against the TagAlong scheduler corresponds to ΔE%=(E~nnE~ta)/E~tasubscriptΔpercent𝐸subscript~𝐸𝑛𝑛subscript~𝐸𝑡𝑎subscript~𝐸𝑡𝑎\Delta_{E\%}=(\tilde{E}_{nn}-\tilde{E}_{ta})/\tilde{E}_{ta}roman_Δ start_POSTSUBSCRIPT italic_E % end_POSTSUBSCRIPT = ( over~ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_n italic_n end_POSTSUBSCRIPT - over~ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_t italic_a end_POSTSUBSCRIPT ) / over~ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_t italic_a end_POSTSUBSCRIPT. Given a schedule, all values in Eq. 6 except C𝐶Citalic_C are constant for calculating both E~tasubscript~𝐸𝑡𝑎\tilde{E}_{ta}over~ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_t italic_a end_POSTSUBSCRIPT and E~nnsubscript~𝐸𝑛𝑛\tilde{E}_{nn}over~ start_ARG italic_E end_ARG start_POSTSUBSCRIPT italic_n italic_n end_POSTSUBSCRIPT. Hence, lower values of C𝐶Citalic_C directly translates to energy savings.

We adopt Prx=72mWsubscript𝑃𝑟𝑥72𝑚𝑊P_{rx}=72mWitalic_P start_POSTSUBSCRIPT italic_r italic_x end_POSTSUBSCRIPT = 72 italic_m italic_W, Ptx=102mWsubscript𝑃𝑡𝑥102𝑚𝑊P_{tx}=102mWitalic_P start_POSTSUBSCRIPT italic_t italic_x end_POSTSUBSCRIPT = 102 italic_m italic_W based on the Firefly’s reference values. Moreover, we assume treq=ttx=128μssubscript𝑡𝑟𝑒𝑞subscript𝑡𝑡𝑥128𝜇𝑠t_{req}=t_{tx}=128\mu sitalic_t start_POSTSUBSCRIPT italic_r italic_e italic_q end_POSTSUBSCRIPT = italic_t start_POSTSUBSCRIPT italic_t italic_x end_POSTSUBSCRIPT = 128 italic_μ italic_s, trx=256μssubscript𝑡𝑟𝑥256𝜇𝑠t_{rx}=256\mu sitalic_t start_POSTSUBSCRIPT italic_r italic_x end_POSTSUBSCRIPT = 256 italic_μ italic_s, and tcg=15.75mssubscript𝑡𝑐𝑔15.75𝑚𝑠t_{cg}=15.75msitalic_t start_POSTSUBSCRIPT italic_c italic_g end_POSTSUBSCRIPT = 15.75 italic_m italic_s (Pérez-Penichet et al., 2020; Perez-Ramirez et al., 2023).

Refer to caption
(a)
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Figure 11. RobustGANTT achieves more resource savings than DeepGANTT against TagAlong for a real IoT network, while requiring roughly as many timeslots. Performance comparison for a real-life IoT network topology of both RobustGANTT and DeepGANTT against the TagAlong heuristic. Boxplots depict the mean and interquartile range. Bar plot depict the mean and std-dev.
Refer to caption
Figure 12. RobustGANTT exhibits up to 2×2\times2 × reduction in 95th percentile runtime values compared to DeepGANTT. Runtimes for different tag densities N/T𝑁𝑇N/Titalic_N / italic_T.

6.2.3. Results

For the real-life IoT network, RobustGANTT achieves on average 14%percent1414\%14 % and up to 53%percent5353\%53 % energy savings ΔE%subscriptΔpercent𝐸\Delta_{E\%}roman_Δ start_POSTSUBSCRIPT italic_E % end_POSTSUBSCRIPT (i.e., up to 2×2\times2 × less energy) compared to TagAlong. Even for the highest tag densities N/T𝑁𝑇N/Titalic_N / italic_T considered, our scheduler achieves 44%percent4444\%44 % energy savings. Such savings represent up to 1.9×1.9\times1.9 × the savings achieved by DeepGANTT, as shown in Figure LABEL:subfig:pi_perc_energy. Figures LABEL:subfig:pi_diff_carr and LABEL:subfig:pi_perc_energy demonstrate the equivalence between ΔC%subscriptΔpercent𝐶\Delta_{C\%}roman_Δ start_POSTSUBSCRIPT italic_C % end_POSTSUBSCRIPT and ΔE%subscriptΔpercent𝐸\Delta_{E\%}roman_Δ start_POSTSUBSCRIPT italic_E % end_POSTSUBSCRIPT: a reduction in the number of carriers directly translates to energy savings.

In terms of latency to query all sensor values, Figure LABEL:subfig:pi_timeslots_saved shows how DeepGANTT always requires on average more timeslots than TagAlong. In contrast, our scheduler requires on average as many timeslots as TagAlong for tag densities 2.0 and 5.0, and 10. However, it requires on average 8.8 more timeslots for tag density 20. Figure 12 shows the runtime distributions of both schedulers across tag densities. Their profiles are those of heavy-tailed distributions. Both schedulers require roughly the same average runtimes across tag densities. In particular, RobustGANTT’s average runtimes are 120 ms, 260 ms, 540 ms, and 1.2 sec for the respective tag densities 2, 5, 10, and 20. However, Figure 12 demonstrates how RobustGANTT reduces the runtime’s 95th percentile up to a factor of 2×2\times2 × compared to DeepGANTT.

While the real-life network’s size is within DeepGANTT’s proven generalization capabilities (Perez-Ramirez et al., 2023), we demonstrate that our scheduler requires on average up to 1.9×1.9\times1.9 × less carriers (energy savings), up to 5.7×5.7\times5.7 × less timeslots (reduction in latency to query all sensor tags), and up to a 2×2\times2 × reduction in 95th percentile runtime to compute the schedule.

7. Discussion

RobustGANTT is a scheduler that far surpasses the generalization capabilities of existing learning-based systems. Our system can not only processes much larger IoT network topologies than previously possible, but also delivers more resource-efficient schedules.

Large-scale IoT networks. Our system is designed to reduce energy consumption in IoT networks. This is of paramount importance not only for sustainability reasons, but also because such networks are typically energy constrained. Moreover, ensuring energy savings without increasing querying latency is highly relevant, specially for dense network deployments, since it reduces spectrum utilization.

Serving IoT networks in parallel. The NP-Hard nature of generating resource-efficient schedules requires deploying RobustGANTT at the Edge/Cloud, which is also true for other schedulers (Pérez-Penichet et al., 2020; Perez-Ramirez et al., 2023). However, one does not require deploying a RobustGANTT scheduler for every IoT network. Rather, one RobustGANTT instance can process requests from multiple IoT networks either in sequence, or by batching those requests and computing their schedules in parallel. However, the number of requests processed in parallel is limited by the total the amount of GPU memory available.

Latency to query all sensor values. RobustGANTT’s schedules require roughly the same number of timeslots as those produced by TagAlong (see Figures LABEL:subfig:robust-vs-deep-LEN and  LABEL:subfig:pi_timeslots_saved). This implies that our system does not sacrifice querying latency to achieve its significant energy savings. However, there are cases in which TagAlong schedules are shorter than those from RobustGANTT. We attribute this to the optimization objective (Eq. 2), which prioritizes reducing the number of carriers, since we are most interested in energy savings. Moreover, we do not envision backscatter sensor tags to assist in time-critical settings, but rather in energy-efficient sensing and monitoring.

Dynamic Environments. Our system exhibits average runtimes of hundreds of milliseconds, allowing it to react fast to connectivity changes in the IoT devices. Similarly, adding or removing IoT nodes would trigger a new request to compute a schedule. However, detecting the addition or removal of sensor tags to the IoT nodes is a general problem for the type of backscatter networks considered, and lies outside our scope.

8. Conclusion

We present RobustGANTT, a novel system that leverages the latest advancements in GNNs and ML to schedule communications in an IoT network augmented with backscatter sensor tags. We exploit our system design choice of using GNN to train our scheduler using optimal schedules from small networks of up to 10 nodes, and demonstrate that RobustGANTT can seamlessly generalize without re-training to networks of up to 1000 nodes. Our scheduler surpasses the generalization capabilities of current learning-based systems, while achieving significant savings in energy usage, spectrum utilization, and compute runtime. RobustGANTT facilitates the large-scale integration of IoT networks with sensor tags, and significantly reduces their operational expenses by efficiently utilizing their resources.

Acknowledgements.
This work was financially supported by the Swedish Foundation for Strategic Research (SSF). We acknowledge the usage of High-Performance Computing resources under the EuroHPC JU project No. EHPC-DEV-2023D08-049. We also thank M.Sc. Peder Hårderup for initial concept prototyping of the node degree PE during his M.Sc. thesis at RISE.

References

  • (1)
  • Ahmad et al. (2021) Abeer Ahmad, Xiao Sha, Milutin Stanaćević, Akshay Athalye, Petar M Djurić, and Samir R Das. 2021. Enabling passive backscatter tag localization without active receivers. In Proc. of the 19th ACM Conference on Embedded Networked Sensor Systems (SenSys). 178–191.
  • Belkin and Niyogi (2003) Mikhail Belkin and Partha Niyogi. 2003. Laplacian eigenmaps for dimensionality reduction and data representation. Neural computation 15, 6 (2003), 1373–1396.
  • Bengio et al. (2021) Yoshua Bengio, Andrea Lodi, and Antoine Prouvost. 2021. Machine learning for combinatorial optimization: A methodological tour d’horizon. Eur. J. Oper. Res. 290, 2 (apr 2021), 405–421. https://doi.org/10.1016/j.ejor.2020.07.063 arXiv:1811.06128
  • Bluetooth SIG (2021) Bluetooth SIG. 2021. Bluetooth Core Specification 5.3.
  • Dai et al. (2017) Hanjun Dai, Elias B. Khalil, Yuyu Zhang, Bistra Dilkina, and Le Song. 2017. Learning Combinatorial Optimization Algorithms over Graphs. In Proc. Advances Neural Inf. Process. Syst. (NIPS), Vol. 2017-Decem. Neural information processing systems foundation, 6349–6359. arXiv:1704.01665
  • Devlin et al. (2018) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
  • Duquennoy et al. (2015) Simon Duquennoy, Beshr Al Nahas, Olaf Landsiedel, and Thomas Watteyne. 2015. Orchestra: Robust mesh networks through autonomously scheduled TSCH. In Proceedings of the 13th ACM conference on embedded networked sensor systems. 337–350.
  • Duquennoy et al. (2017) Simon Duquennoy, Atis Elsts, Beshr Al Nahas, and George Oikonomou. 2017. TSCH and 6TiSCH for Contiki: Challenges, Design and Evaluation. In 2017 13th International Conference on Distributed Computing in Sensor Systems (DCOSS). 11–18. https://doi.org/10.1109/DCOSS.2017.29
  • Dwivedi et al. (2023) Vijay Prakash Dwivedi, Chaitanya K. Joshi, Anh Tuan Luu, Thomas Laurent, Yoshua Bengio, and Xavier Bresson. 2023. Benchmarking Graph Neural Networks. Journal of Machine Learning Research 24, 43 (2023), 1–48. http://jmlr.org/papers/v24/22-0567.html
  • Ensworth and Reynolds (2015) Joshua Ensworth and Matthew S. Reynolds. 2015. Every smart phone is a backscatter reader: Modulated backscatter compatibility with Bluetooth 4.0 Low Energy (BLE) devices. In Proc. Ann. Conf. RFID. IEEE.
  • Ferrari et al. (2011) Federico Ferrari, Marco Zimmerling, Lothar Thiele, and Olga Saukh. 2011. Efficient network flooding and time synchronization with Glossy. In Proc. 10th ACM/IEEE Int. Conf. Information Processing in Sensor Networks. 73–84.
  • Foundation (2024) The PyTorch Foundation. 2024. PyTorch Reproducibility. https://pytorch.org/docs/stable/notes/randomness.html
  • Geissdoerfer and Zimmerling (2021) Kai Geissdoerfer and Marco Zimmerling. 2021. Bootstrapping Battery-free Wireless Networks: Efficient Neighbor Discovery and Synchronization in the Face of Intermittency. In (NSDI’21). 439–455.
  • Gilmer et al. (2017) Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. 2017. Neural Message Passing for Quantum Chemistry. Proc. 34th Int. Conf. Mach. Learn. (ICML) 3 (apr 2017), 2053–2070. arXiv:1704.01212
  • Guo et al. (2020) Xiuzhen Guo, Longfei Shangguan, Yuan He, Jia Zhang, Haotian Jiang, Awais Ahmad Siddiqi, and Yunhao Liu. 2020. Aloba: Rethinking ON-OFF keying modulation for ambient LoRa backscatter. In Proceedings of the 18th conference on embedded networked sensor systems. 192–204.
  • Hamilton (2020a) William L Hamilton. 2020a. Graph representation learning. Vol. 14. Morgan & Claypool Publishers.
  • Hamilton (2020b) William L Hamilton. 2020b. Graph representation learning. Morgan & Claypool Publishers.
  • Hamilton et al. (2017) William L. Hamilton, Rex Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. In Proc. Advances Neural Inf. Process. Syst. (NIPS), Vol. 2017-Decem. Neural information processing systems foundation, 1025–1035.
  • Hamouda et al. (2011) Essia Hamouda, Nathalie Mitton, and David Simplot-Ryl. 2011. Reader Anti-collision in dense RFID networks with mobile tags. In 2011 IEEE International Conference on RFID-Technologies and Applications. 327–334. https://doi.org/10.1109/RFID-TA.2011.6068657
  • Huang et al. (2024) Yinan Huang, William Lu, Joshua Robinson, Yu Yang, Muhan Zhang, Stefanie Jegelka, and Pan Li. 2024. On the Stability of Expressive Positional Encodings for Graph Neural Networks. In Proc. 12th International Conference on Learning Representations (ICLR’24). https://openreview.net/forum?id=xAqcJ9XoTf
  • IEEE (2016) IEEE. 2016. IEEE Standard for Low-Rate Wireless Networks –Amendment 2: Ultra-Low Power Physical Layer.
  • Iyer et al. (2016) Vikram Iyer et al. 2016. Inter-Technology Backscatter: Towards Internet Connectivity for Implanted Devices. ACM, 356–369. https://doi.org/10.1145/2934872.2934894
  • Jeon et al. (2022) Wonseok Jeon, Mukul Gagrani, Burak Bartan, Weiliang Will Zeng, Harris Teague, Piero Zappi, and Christopher Lott. 2022. Neural DAG scheduling via one-shot priority sampling. In The Eleventh International Conference on Learning Representations.
  • Karimi et al. (2017) Y. Karimi, A. Athalye, S. R. Das, P. M. Djurić, and M. Stanaćević. 2017. Design of a backscatter-based Tag-to-Tag system. In 2017 IEEE International Conference on RFID (IEEE RFID). 6–12. https://doi.org/10.1109/RFID.2017.7945579
  • Katanbaf et al. (2021) Mohamad Katanbaf, Ali Saffari, and Joshua R Smith. 2021. Multiscatter: Multistatic backscatter networking for battery-free sensors. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. 69–83.
  • Kellogg et al. (2014) Bryce Kellogg et al. 2014. Wi-Fi Backscatter: Internet Connectivity for RF-powered Devices. In Proc. Special Interest Group Data Commun. (SIGCOMM). ACM, New York, NY, USA, 607–618. https://doi.org/10.1145/2619239.2626319
  • Kellogg et al. (2016) Bryce Kellogg et al. 2016. Passive Wi-Fi: Bringing Low Power to Wi-Fi Transmissions. In Proc. Symp. Networked Syst. Des. Implementation (NSDI). NSDI, 151–164.
  • Kingma and Lei Ba (2015) Diederik P Kingma and Jimmy Lei Ba. 2015. Adam: A Method For Stochastic Optimization. In Proc. Int. Conf. Learn. Representations (ICLR). arXiv:1412.6980v9
  • Kipf and Welling (2017) Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. In Proc. 5th Int. Conf. Learn. Representations (ICLR). ICLR. arXiv:1609.02907
  • Kreuzer et al. (2021) Devin Kreuzer, Dominique Beaini, Will Hamilton, Vincent Létourneau, and Prudencio Tossou. 2021. Rethinking Graph Transformers with Spectral Attention. In Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (Eds.), Vol. 34. Curran Associates, Inc., 21618–21629. https://proceedings.neurips.cc/paper_files/paper/2021/file/b4fd1d2cb085390fbbadae65e07876a7-Paper.pdf
  • Krogh and Hertz (1991) Anders Krogh and John Hertz. 1991. A simple weight decay can improve generalization. Advances in neural information processing systems 4 (1991).
  • Kwak and Hong (2004) Jin Ho Kwak and Sungpyo Hong. 2004. Linear algebra. Springer Science & Business Media.
  • LeCun et al. (1989) Yann LeCun, John Denker, and Sara Solla. 1989. Optimal brain damage. Advances in neural information processing systems 2 (1989).
  • Li et al. (2018b) Yan Li, Zicheng Chi, Xin Liu, and Ting Zhu. 2018b. Passive-zigbee: Enabling zigbee communication in iot networks with 1000x+ less power consumption. In Proceedings of the 16th ACM conference on embedded networked sensor systems. 159–171.
  • Li et al. (2018a) Zhuwen Li, Qifeng Chen, and Vladlen Koltun. 2018a. Combinatorial optimization with graph convolutional networks and guided tree search. In Proc. Advances in Neural Inf. Process. Syst. (NeurIPS). 539–548.
  • Lim et al. (2023) Derek Lim, Joshua Robinson, Lingxiao Zhao, Tess Smidt, Suvrit Sra, Haggai Maron, and Stefanie Jegelka. 2023. Sign and basis invariant networks for spectral graph representation learning. (2023).
  • Ma and Yarats (2021) Jerry Ma and Denis Yarats. 2021. On the adequacy of untuned warmup for adaptive optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8828–8836.
  • Majid et al. (2019) A. Y. Majid, M. Jansen, G. O. Delgado, K. S. Yildirim, and P. Pawełłzak. 2019. Multi-hop Backscatter Tag-to-Tag Networks. In Proc. Int. Conf. Comput. Commun. (INFOCOM). IEEE, 721–729. https://doi.org/10.1109/INFOCOM.2019.8737551
  • Manchanda et al. (2020) Sahil Manchanda, Akash Mittal, Anuj Dhawan, Sourav Medya, Sayan Ranu, and Ambuj Singh. 2020. Learning Heuristics over Large Graphs via Deep Reinforcement Learning. In Proc. 34th Conf. Neural Inf. Process. Syst. (NIPS). arXiv:1903.03332
  • Mao et al. (2019) Hongzi Mao, Malte Schwarzkopf, Shaileshh Bojja Venkatakrishnan, Zili Meng, and Mohammad Alizadeh. 2019. Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM special interest group on data communication. 270–288.
  • Nakkiran et al. (2020) Preetum Nakkiran, Gal Kaplun, Yamini Bansal, Tristan Yang, Boaz Barak, and Ilya Sutskever. 2020. Deep Double Descent: Where Bigger Models and More Data Hurt. In International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=B1g5sA4twr
  • Oikonomou et al. (2022) George Oikonomou, Simon Duquennoy, Atis Elsts, Joakim Eriksson, Yasuyuki Tanaka, and Nicolas Tsiftes. 2022. The Contiki-NG open source operating system for next generation IoT devices. SoftwareX 18 (2022), 101089. https://doi.org/10.1016/j.softx.2022.101089
  • Paszke et al. (2019) Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, and et al. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Proc. Advances Neural Inf. Process. Syst., H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., 8024–8035.
  • Pérez-Penichet et al. (2016) Carlos Pérez-Penichet, Frederik Hermans, Ambuj Varshney, and Thiemo Voigt. 2016. Augmenting IoT networks with backscatter-enabled passive sensor tags. In Proc. Annu. Int. Conf. Mobile Comput. Netw. (MOBICOM). ACM, 23–27. https://doi.org/10.1145/2980115.2980132
  • Pérez-Penichet et al. (2020) Carlos Pérez-Penichet, Dilushi Piumwardane, Christian Rohner, and Thiemo Voigt. 2020. A Fast Carrier Scheduling Algorithm for Battery-free Sensor Tags in Commodity Wireless Networks. In Proc. Int. Conf. Comput. Commun. (INFOCOM). IEEE, 994–1003. https://doi.org/10.1109/infocom41043.2020.9155241
  • Perez-Ramirez et al. (2023) Daniel F. Perez-Ramirez, Carlos Pérez-Penichet, Nicolas Tsiftes, Thiemo Voigt, Dejan Kostić, and Magnus Boman. 2023. DeepGANTT: A Scalable Deep Learning Scheduler for Backscatter Networks. In Proceedings of the 22nd International Conference on Information Processing in Sensor Networks (San Antonio, TX, USA) (IPSN ’23). Association for Computing Machinery, New York, NY, USA, 163–176. https://doi.org/10.1145/3583120.3586957
  • Pérez-Penichet et al. (2020) Carlos Pérez-Penichet, Dilushi Piumwardane, Christian Rohner, and Thiemo Voigt. 2020. TagAlong: Efficient Integration of Battery-Free Sensor Tags in Standard Wireless Networks. In Proc. 19th ACM/IEEE Int. Conf. Inf. Process. Sensor Netw. (IPSN). Sydney, Australia. https://doi.org/10.1109/IPSN48710.2020.00020
  • Radford et al. (2018) Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018).
  • Radford et al. (2019) Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.
  • Rampášek et al. (2022) Ladislav Rampášek, Michael Galkin, Vijay Prakash Dwivedi, Anh Tuan Luu, Guy Wolf, and Dominique Beaini. 2022. Recipe for a general, powerful, scalable graph transformer. Advances in Neural Information Processing Systems 35 (2022), 14501–14515.
  • Scarselli et al. (2009) Franco Scarselli, Marco Gori, Ah Chung Tsoi, Markus Hagenbuchner, and Gabriele Monfardini. 2009. The graph neural network model. IEEE Trans. Neural Netw. 20, 1 (jan 2009), 61–80. https://doi.org/10.1109/TNN.2008.2005605
  • Talla et al. (2017) Vamsi Talla, Mehrdad Hessar, Bryce Kellogg, Ali Najafi, Joshua R. Smith, and Shyamnath Gollakota. 2017. LoRa Backscatter: Enabling The Vision of Ubiquitous Connectivity. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3, 105:1–105:24. https://doi.org/10.1145/3130970
  • Vaswani et al. (2017) Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proc. Advances Neural Inf. Process. Syst. (NIPS), Vol. 2017-Decem. NIPS, 5999–6009.
  • Veličković et al. (2018) Petar Veličković, Arantxa Casanova, Pietro Liò, Guillem Cucurull, Adriana Romero, and Yoshua Bengio. 2018. Graph attention networks. In Proc. 6th Int. Conf. Learn. Representations (ICLR). ICLR. arXiv:1710.10903
  • Vesselinova et al. (2020) Natalia Vesselinova, Rebecca Steinert, Daniel F Perez-Ramirez, and Magnus Boman. 2020. Learning combinatorial optimization on graphs: A survey with applications to networking. IEEE Access 8 (2020), 120388–120416.
  • Vinyals et al. (2015) Oriol Vinyals, Google Brain, Meire Fortunato, and Navdeep Jaitly. 2015. Pointer Networks. In Proc. Advances Neural Inf. Process. Syst. (NIPS). 2692–2700.
  • Wang et al. (2022) Haorui Wang, Haoteng Yin, Muhan Zhang, and Pan Li. 2022. Equivariant and stable positional encoding for more powerful graph neural networks. In Proc. 10th International Conference on Learning Representations (ICLR’22).
  • Winter (2012) T. Winter. 2012. RPL: IPv6 Routing Protocol for Low-Power and Lossy Networks. Retrieved Oct. 2022 from https://www.rfc-editor.org/rfc/rfc6550
  • Wu et al. (2021) Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2021. A Comprehensive Survey on Graph Neural Networks. IEEE Trans. Neural Netw. 32, 1 (2021), 4–24. https://doi.org/10.1109/TNNLS.2020.2978386
  • Xu et al. (2019) Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How powerful are graph neural networks?. In Proc. Int. Conf. Learn. Representations (ICLR’19).
  • Yang et al. (2011) L. Yang, J. Han, Y. Qi, C. Wang, T. Gu, and Y. Liu. 2011. Season: Shelving interference and joint identification in large-scale RFID systems. In Proc. Int. Conf. Comput. Commun. (INFOCOM). IEEE, 3092–3100. https://doi.org/10.1109/INFCOM.2011.5935154
  • Yue et al. (2012) H. Yue, C. Zhang, M. Pan, Y. Fang, and S. Chen. 2012. A time-efficient information collection protocol for large-scale RFID systems. In Proc. Int. Conf. Comput. Commun. (INFOCOM). IEEE, 2158–2166. https://doi.org/10.1109/INFCOM.2012.6195599
  • Zhang et al. (2017) Pengyu Zhang, Colleen Josephson, Dinesh Bharadia, and Sachin Katti. 2017. Freerider: Backscatter communication using commodity radios. In Proceedings of the 13th international conference on emerging networking experiments and technologies. 389–401.