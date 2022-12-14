Following the evaluation setups, this section covers the experimental result analysis for the proposed GBCSS, compared with other benchmarks. Qualitative discussions regarding GBCSS with some state-of-the-art solutions are also included in this section.

For learning-based solutions (GNN and FA), an offline training stage was first carried out. The trained GNN and FA’s policy were then exported to produce statistical results (i.e. metrics \(E_{saving}\) and \(\Lambda _{\%}\) with respect to \(N_{sc}\)) using the validation dataset. Finally, the two day samples in the test dataset is used to emulate the online deployment for cell-switching execution that provides results for \(P_{tot}\) throughout the day (24 h). Unless otherwise stated, the results for each \(N_{sc}\) case are generated using the GNN trained with the dataset generated for that case. Note that during the online execution phase, it is possible to update the learning models using the latest collected data to further improve the models’ performances. However, such online model updating is beyond the scope of this paper.

Before presenting the results regarding each metric, it is also important to analyze the convergence behaviors of the GNN training. Using the configured GNN setups, the loss function value defined in Eq. (18) was collected during the training stage. For all considered \(N_{sc}\), the GNN model managed to converge within the first 20 epochs for 7 out of 8 \(N_{sc}\) cases, with the minimum epochs for convergence being 5, and the maximum epochs around 55. As the loss records for all 8 \(N_{sc}\) cases cannot be summarized clearly in a graphical manner, the essential information has been presented above.

Statistical results from validation set

Figure 3 shows the results of metrics \(E_{saving}\), \(\Lambda _{\%}\), and \(\eta _{\%}\) with respect to \(N_{sc}\). The average values using the 4 day samples in the validation dataset are calculated for the metrics. It is noteworthy that the ES algorithm has only been executed for \(N_{sc} \in \{4, 8, 12, 16 \}\) due to time consumption burden as the algorithm is highly computationally demanding with a complexity of \(O(2^{\textbf{N}})\). This means that the processing time for the ES algorithm doubles for every unit \(N_{sc}\) increment. In contrast, GBCSS learns to find a sub-optimal solution that approximate to the optimality as much as possible while maintaining a much lower computational complexity of \(O(\textbf{N})\).

The metric \(E_{saving}\) is the optimization objective for cell switching solutions according to the problem definition in Eq. (11), and is an essential metric to consider. It can be seen in Fig. 3a that the daily total energy saved increases when \(N_{sc}\) is raised for all cell-switching methods, based on the fact that deploying more SCs leads to increased power consumption, while creating more possibilities for offloading and cell switching when the MC has sufficient resource to take over, and hence larger energy saving.

For \(N_{sc} \in \{4, 8, 12, 16\}\), the saved energy using the ES algorithm is the highest among the considered solutions, and can be expected to remain so for larger \(N_{sc}\) values if ES was to be executed. For GBCSS, the energy saved is lower than that of ES. For \(N_{sc} \in \{4, 8, 12, 16\}\), the GBCSS achieves 53.97%, 63.04%, 66.82%, and 60.08% of ES’ \(E_{saving}\) performance, resulting in a 62% \(E_{saving}\) performance for the 4 \(N_{sc}\) cases. Moreover, the GNN is able to further increase the \(E_{saving}\) for a large number of deployed SCs as the slope of the \(E_{saving}\) curve has clearly increased for \(N_{sc} \in \{24, 28, 32\}\). The detailed discussion regarding this aspect is covered in the one-day performance analysis with more supporting results.

Interestingly, the \(E_{saving}\) using the FA benchmark is clearly larger than that of GBCSS for most considered \(N_{sc}\) cases except for \(N_{sc} = 8\) and 12, in which both solutions result in similar \(E_{saving}\). GBCSS can achieve a maximum 103.61% and a minimum of 62.28% \(E_{saving}\) performances compared with using the FA, with an average of 86.60% \(E_{saving}\) performance compared with using the FA for all \(N_{sc}\) cases. This suggests that the FA benchmark outperforms GBCSS in raw energy saving.

However, it is equally important to also consider the metric \(\Lambda _{\%}\), which indicates how much of the original traffic load without cell switching (i.e. All-on) can be preserved using different cell-switching solution and represents the optimization constraint defined in Eq. (12). According to its definition, the maximum value for \(\Lambda _{\%}\) is 100%, which means that all original traffic load is preserved after cell switching execution.

Figure 3b shows this metric with a reference red dashed line of the All-on method stands for the upper bound. It can be seen in the figure that ES has \(\Lambda _{\%} =100\%\) for \(N_{sc} \in \{4, 8, 12, 16\}\), and is reasonable to assume this trend will be consistent for other \(N_{sc}\) cases. In comparison, using the proposed GBCSS results in an average \(\Lambda _{\%}\) of 99.63% for all 8 \(N_{sc}\) cases, with a maximum of 99.88% and minimum of 99.31%. This suggests that the GNN learns to preserve the user QoS as much as possible when reducing the HetNet unit’s energy consumption.

Figure 3 Statistical results from the validation set for different \(N_{sc}\) (a) Total energy saved \(E_{saving}\). (b) Relative traffic load \(\Lambda _{\%}\). (c) Relative energy efficiency \(\eta _{\%}\). ES is not executed for \(N_{sc} > 16\) due to huge time consumption.

In contrast, it can be seen that the \(\Lambda _{\%}\) using FA decreases from 99.77% for \(N_{sc} = 4\) to 78.30% for \(N_{sc} = 32\). This means that compared to GBCSS, the extra energy saved when using the FA benchmark as shown in Fig. 3a will cost 21% of the original traffic load and hence the user QoS in the worst case. The reason is that using the offline trained FA algorithm for online decision making leads to much more frequent decision making that causes the MC to overload and thus user QoS downgrade, as only the MC can take over the traffic load of a SC according to the problem formation.

Considering both energy consumption and traffic loads, Fig. 3c shows the normalized daily energy efficiency \(\eta _{\%}\) for the considered cell switching solutions with respect to All-on. It is clear that \(\eta _{\%}\) of using the ES algorithm is the highest and achieves an average \(\eta _{\%}\) of 13.74% among the \(N_{sc}\) cases, with a maximum energy efficiency gain of 16.25% compared to that of All-on, while \(\eta _{\%}\) using the FA solution drops continuously and becomes even lower than that of All-on due to a large proportion of original traffic load being sacrificed to achieve higher power saving. In comparison, GBCSS achieves an average and maximum \(\eta _{\%}\) of 8.50% and 10.41% respectively compared to All-on. The trend of \(\eta _{\%}\) using GBCSS is similar to that of ES based on the results for \(N_{sc} \in \{ 4, 8, 12, 16\}\) according to Fig. 3c, while overall the energy efficiency gain using the GNN is about 62% for these \(N_{sc}\) cases. Moreover, assuming the average \(\eta _{\%}\) (13.74%) using the ES is preserved for \(N_{sc} in \{20, 24, 28, 36\}\), the GNN can achieve a maximum 75.76% of ES’ performance regarding energy efficiency gain.

Nevertheless, the FA benchmark still outperforms the proposed GBCSS when \(N_{sc} = 4\) with FA’s \(\eta _{\%}\) being around 2.5% larger as in Fig. 3c. A potential reason is that the GNN is not able to further approximate to the optimal solution when the gradient calculated via the loss function Eq. (18) becomes too small, as learning to always switch on the MC leads to a large \({\mathscr{L}}\) improvement when training the GBCSS. In comparison, the FA benchmark avoids such situation as the action for the MC has predefined to be always ON. However, the relative underperformance of GNN in this case can be regarded as insignificant as the overall energy saved in this case is low due to only 4 SCs were deployed.

Test set performance results

The results generated with the test dataset for one-day power consumption using each solution are presented for 3 \(N_{sc}\) cases (i.e. \(N_{sc} \in \{ 4, 12, 32\}\)) that represents scenarios of a small, medium and large number of deployed SCs within the considered \(N_{sc}\) cases. The results of node size generalization test for the GNN is also covered in this section.

Figure 4 One-day performance results for the workday sample (Nov. 15th, 2013) in the test set with respect to power consumption for different \(N_{sc}\).

Performance comparison on workday samples

Figure 4 shows the power consumption per time slot using GBCSS and other benchmarks throughout a workday (from 00:00 a.m. to 11:59 p.m.) for the three \(N_{sc}\) cases. Due to the same computational complexity reason as for statistical results analysis, the ES algorithm is not executed to generate results for \(N_{sc} = 32\).

According to Eqs. (2) and (3), the power consumption calculation is a linear transformation of \(\lambda\) when no BS is put into sleep. Therefore, a HetNet unit’s traffic load trend throughout a day can be inferred by the power consumption trend of the All-on method. It can be seen in Fig. 4 that the HetNet unit’s power consumption is relatively low before dawn with only a small number of active users, while the traffic load starts to rise around 8 a.m. and peaks before midday, leading to an increased power consumption period with less potential for power saving. Later, the traffic load start to decline more significantly in the late afternoon (4 p.m.), leading to another period for energy efficiency optimization using cell switching.

As shown in Fig. 4a, all 3 cell-switching solutions are able to significantly reduce the power consumption from 0 a.m. to 8 a.m.. During this period, the power consumption using GBCSS highly mirrors the behavior of the ES algorithm. During the high-traffic hours, GBCSS turns to follow the strategy of All-on, which is a suboptimal strategy for this time period. From 4 p.m. until midnight, the GNN also learns to reduce the HetNet unit’s power consumption, but the performance is not as significant as it does in the time period before dawn compared to the optimal results computed via ES. In contrast, the FA benchmark also mirrors the behavior of ES over the day, and overall outperforms GBCSS especially after 4 p.m.. Moreover, during the busy hours between 9 a.m. and 4 p.m., it can be seen that for some time slots, the power consumption of using the FA benchmark becomes less than that using ES. Because ES produces the optimal cell switching decisions for power saving while maintaining the original traffic loads in the HetNet unit, it can be inferred that FA’s further power-saving comes from sacrificing the user QoS.

For the \(N_{sc} = 12\) case in Fig. 4b, the behavior of the ES algorithm remains the same as in the \(N_{sc} = 4\) case, while a larger gap can be found compared with the power consumption of All-on, suggesting a larger potential for energy efficiency optimization. Similarly, GBCSS also demonstrates comparable results consistent to those in Fig. 4a, with the performance after 4 p.m. also improved compared to that in the \(N_{sc} = 4\) case. However, the results of the FA benchmark start to have more significant fluctuations in Fig. 4b, with obviously lower power consumption compared with using the ES during the busy hours. Combining with the results in Fig. 3b, this means that the FA benchmark starts to output more decisions that causes user QoS sacrifices.

As for the \(N_{sc} = 32\) case in Fig. 5c, the fluctuation in the results of the FA benchmark has even worsen with the number of decisions sacrificing the user QoS further rises. An obvious explanation to this trend is that the FA benchmark utilizes the linear function approximation technique to represent the value function, which may not have enough expressiveness for scenarios with higher complexity. In contrast, GBCSS shows much more stable results that is consistent to those for \(N_{sc} = 4 \,{\text {and}}\, 12\). Moreover, GBCSS also starts to switch off SCs during the busy hours, and the power consumption during this period becomes smaller than that of All-on for \(N_{sc} = 32\) according to Fig. 4c. This is much more similar to the strategy that ES produces based on results in Fig. 4a,b. As discussed in the above section, the main reason to it can be that the loss function cannot be significantly optimized when \(N_{sc}\) is small, following the calculation in Eq. (18). Moreover, cell switching during a time period with intensive traffic mainly results in marginal power consumption improvement for small \(N_{sc}\), as shown by the results using the ES algorithm. In contrast, a larger \(N_{sc}\) leads to more potential for a significant loss reduction during the busy hours. This can be regarded as an advantage to exploit, because the envisioned ultra-dense HetNet development for beyond 5G will result in significantly large numbers of SCs to be deployed, where the GNN may find great potential in approximating to the optimal cell switching decision. All the results presented in this section so far correspond to the discoveries in Fig. 3.

Additionally, it can be seen in Fig. 4 that sometimes using GBCSS and the FA benchmark results in more power consumption than using the All-on method during the busy hours for \(N_{sc} = 4\) and 8. This raises another question as it is counter-intuitive to have such observations that switching off some BSs causes more power consumption than always keeping all the SCs on. However, considering Eq. (2) together with the parameters in Table 1, it is possible for certain cell switching decisions to cause an overall larger power consumption by offloading to the MC. For example, switching off a half-loaded femto BS results in a 2.1W power consumption reduction under the experiment configuration, but the MC taking over the offloaded traffic (assuming sufficient resource) will have its power consumption raised by 47W, which leads to a -44.9 W power consumption gain. A formal mathematical proof can be found in12 that uses the same power model and BS power profiles.

In summary, the proposed GBCSS is able to closely approximate the optimal cell switching decisions computed by the ES algorithm when the total traffic load on the HetNet unit is low, while tends to generate a suboptimal strategies during the time period with intensive traffic. Nevertheless, such suboptimal strategy during the busy hours can be improved when \(N_{sc}\) becomes larger (Fig. 4c), when the GNN starts to mirror the behaviors of ES as in Fig. 4a,b. The one-day performance evaluation on a workday produces results that closely correspond to the statistical results generated from the validation dataset.

Figure 5 One-day performance results for the holiday sample (Jan. 1st, 2014) in the test set with respect to power consumption for different \(N_{sc}\).

Performance comparison on holiday samples

Under the same setup, Fig. 5 shows the power consumption using different cell switching solutions on the New year’s day holiday (2014/01/01). The trending in the figures corresponds with the event of people celebrating the new year’s eve, leading to a large number of active users and hence high power consumption throughout the early hours after midnight. In comparison, the overall power consumption during daytime is more stable compared with that during the workday sample in Fig. 4.

Furthermore, it is clear that using cell switching solutions results in significant power savings during the daytime. This is similar to the two power-saving time periods in Fig. 4, suggesting that during such a holiday, mobile service requests during the normal busy hours are not as intensive compared to that in a workday. Moreover, in Fig. 5a, the power consumption using both GNN and FA is nearly identical to the optimal results using the ES benchmark. In addition, the GNN makes no decisions that cause the power consumption to be higher than All-on and FA also performs significantly better in this regard. The reasoning to this phenomenon is that learning-based solutions learn to capture the power saving potential during low-activity time periods better than during the high-activity periods, combined with the results in Fig. 4.

Other results found in Fig. 5 are highly comparable to the findings in Fig. 4, such as the results using the FA benchmark have fluctuations with the magnitude increases for a larger \(N_{sc}\), while the GNN is more stable in this regard. As these aspects are already discussed in the workday case, this section includes no further elaborations.

Generalization capability on node size

A remarkable feature of GNN models is their node size invariance, indicating that as long as the data with a similar underlying topology can be expressed using the same graph representation, a GNN model trained using data of node size i can be directly use to produce results for node size j (\(i

e j\)). This feature greatly boosts the generalization capability of GNN models compared with other ML models, leading to a significant cost reduction when deploying GNN models to different scenarios for a defined task.

Therefore, this section presents the node size generalization test to the proposed GBCSS. The workday data samples in the test dataset is used. Two GNN models trained with training data of \(N_{sc} = 4\) and 32 are applied in this test, while the node size for the test case is \(N_{sc} = 12\) for both models to give a clearer comparison. Because RL-based solutions need to confirm the feature space and/or action space that cannot be naturally extended by the model itself without reapplication, the FA benchmark is hence not applicable in this evaluation.

The one-day power consumption results of this test is shown in Fig. 6. These results shows that both models trained with different node sizes (both larger and smaller node size during the training stage) can be directly utilized in the \(N_{sc} = 12\) scenario. For the two lower-traffic periods, 0 a.m. to 8 a.m. and after 4 p.m., both models generate comparable results to that in the same node size scenarios in Fig. 5b. Furthermore, it can be seen that the models follow some detail from what learned in the original node size scenario. For example, the GNN model trained with \(N_{sc} = 4\) produces some sub-optimal decisions that lead to higher power consumption around 9 a.m., similar to that in Fig. 4a, while the GNN model trained with \(N_{sc} = 32\) tends to result in large power consumption around 0 a.m., which corresponding to the behavior in Fig. 4c. Unfortunately, the model trained with \(N_{sc} = 32\) does not maintain the strategy to switch off some SCs for power saving as in Fig. 4c for \(N_{sc} = 12\), while keeps mirroring All-on during the busy hours, similar to that in Fig. 4b. The reason to this may still be the learned loss function characteristics, that a smaller \(N_{sc}\) leads to insignificant loss improvement for cell switching during busy hours, as discussed for the workday case.

The node size generalization test results suggest that models trained with one node size can be directly applied to a similar scenario with another node size. Although the performance may not be optimal, this feature can greatly reduce the cost of model transfer, as the whole GNN model can be directly utilized without any preparatory steps. After the transfer, the model can be updated using data collected in the new scenario to learn the underlying patterns to improve performance.