Is there any load balancing or link aggregation possible for local Internet breakout with multiple Internet WAN links?
When the SD-WAN Edge connected with two Internet WAN links, one of the typical questions pop up is “Can the SD-WAN Edge perform any sort of load balancing or aggregation between the two Internet WAN links when doing local Internet breakout?”. Before going to the answer of this question, there are some technical background information worth to mention.
Firstly, when SD-WAN Edge is responsible for local Internet breakout, and if the SD-WAN Edge interface is assigned with public IP address, typically the SD-WAN Edge will perform source NAT (or precisely PAT). The NAT is configurable option called “NAT Direct Traffic” under the interface setting, the following screen capture is from GE3 of Edge-1, which “NAT Direct Traffic” is enabled:

The Edge-1 is having “NAT Direct Traffic” enabled for both “GE3 – 98.1.2.19” and “GE4 – 184.1.2.27”. As a result, when the client machines access Internet, the client machine’s source IP address will get PAT to either 98.1.2.19 or 184.1.2.27 depends or whether GE3 or GE4 is selected. Although SD-WAN Edge is a “Per-Packet” device, the local Internet breakout situation is not applicable for “Per-Packet” load balancing across multiple links when NAT is enabled. The “Per-Packet” load balancing happens when both ends are SD-WAN device so traffic send/receive over the WAN links are encapsulated in the SD-WAN overlay tunnel. When local Internet breakout with NAT, there is no overlay tunnel between the SD-WAN Edge and the destination. For example, when the client machine access web site https://www.vmware.com, and say GE4 is selected for the corresponding traffic flow. That means the web server www.vmware.com will see TCP connection(s) from GE4 IP address 184.1.2.27. The SD-WAN Edge must keep using GE4 IP address 184.1.2.27 for existing flows to www.vmware.com, it cannot switch the NATTed IP to GE3 98.1.2.19 in the middle of existing flows bound to GE4 IP address 184.1.2.27, that will break the flows and confuse the web server www.vmware.com. As a result, it is not possible to have a single flow get load balanced across multiple links for local Internet breakout when NAT is enabled.
Further to the above, that means the only possible load balancing is per flow load balancing in the situation for local Internet breakout when NAT is enabled. In upcoming section and tests will discover is there any flow based load balancing will occur, if yes, under what conditions flow based load balancing will occur.
Edge Configuration for Test 3.1 to Test 3.x
The business policy is still remaining the same as the previous test. That is the link steering is configured as auto. The follow diagram re-cap the business policy configuration:

During Test 3.1-3.3, there will not no extra latency/jitter/packet loss introduced to GE3 (98.1.2.19) and GE4 (184.1.2.27). The configured bandwidth of GE3 (98.1.2.19) is 5Mbps/5Mbps, the configured bandwidth of GE4 (184.1.2.27) is 10Mbps/10Mbps. The following figure summarize these two Internet WAN links status:

The following figure is the Edge-1 Overview reflecting the Internet WAN links status:

Test 3.1 – GE3 (98.1.2.19) and GE4 (184.1.2.27) are both GREEN, when higher bandwidth link upstream is utilized
In this Test 3.1, the focus is on the WAN link upstream utilization. The primary objective of Test 3.1 is to check if the higher bandwidth link GE4 (184.1.2.27) is fully utilized on upstream direction, will the Edge-1 perform some sort of load balancing, such as let any new flow to utilize the other GREEN link with lower bandwidth, that is GE3 (98.1.2.19).
In this Test 3.1, wordpress05 (43.254.254.14) and wordpress06 (24.12.0.14) are having iperf3 (https://github.com/esnet/iperf) running as server mode on port 5201. The iperf3 will be used to let the particular WAN link fully utilized.
To start with, the Client100 (192.168.200.100) will acting as an iperf3 client by injecting upload traffic destinated to wordpress05 (43.254.254.14) for 600s with this command “/usr/local/bin/iperf3 -c 43.254.254.14 -t 600”
Client100 start iperf3 for upstream traffic generation:

In wordpress05 (43.254.254.14), iperf3 server starts to receive the traffic. From the screen capture below, it shows the connection is from 184.1.2.27:20002, that means GE4 (184.1.2.27) is selected by the Edge-1 for this iperf3 session:

It is expected to select GE4 (184.1.2.27) for this flow, from section 2, we understand that when both WAN links are GREEN, the one with higher bandwidth is selected.
The transport live monitoring of Edge-1 shows the GE4 (184.1.2.27) WAN link upstream side is fully utilized:

While the iperf3 to wordpress05 (43.254.254.14) is still running, start another iperf3 to generate upstream traffic to wordpress06 (24.12.0.14) from Client100:

In the iperf3 server side, the screen capture below shows the flow is coming from 184.1.2.27:20002, this means Edge-1 select GE4 (184.1.2.27) as the WAN link to reach wordpress06 (24.12.0.14) iperf3 server:

The transport live monitoring of Edge-1 shows only GE4 (184.1.2.27) is being used, GE3 (98.1.2.19) is not being used:

To further confirm, while the iperf3 is still running, open a browser at Client100 (192.168.200.100) and access the web service at wordpress05 (43.254.254.14). The follow is the access log at wordpress05 (43.254.254.14):

From the web access log, the traffic is from 184.1.2.27, that mean Edge-1 selected GE4 (184.1.2.27) as the WAN link for the web access flow from Client100 (192.168.200.100), during the time GE4 (184.1.2.27) upstream bandwidth is fully utilized.
In this Test 3.1, the result shows the upstream bandwidth fully utilized does not trigger any new flow to utilize the other Internet WAN link. That means when both Internet WAN links are GREEN, and then the larger bandwidth link upstream is fully utilized, the SD-WAN Edge continue to select this larger bandwidth link for new flows going Internet local breakout.
Test 3.2 – GE3 (98.1.2.19) and GE4 (184.1.2.27) are both GREEN, when higher bandwidth link downstream is utilized
Test 3.2 is similar to Test 3.1 but the focus will be on the WAN link downstream utilization instead of upstream. In this Test 3.2, wordpress05 (43.254.254.14) and wordpress06 (24.12.0.14) are having iperf3 (https://github.com/esnet/iperf) running as server mode on port 5201. The iperf3 will be used to let the WAN link fully utilized.
To start with, the Client100 (192.168.200.100) will acting as an iperf3 client by injecting download traffic destinated to wordpress05 (43.254.254.14) for 600s with this command “/usr/local/bin/iperf3 -c 43.254.254.14 -R -t 600”. The “-R” parameter means reverse which will make the traffic going downstream towards the client.

At the iperf3 server side, the screen capture shows the connection is from 184.1.2.27:20002:

Edge-1 selected GE4 (184.1.2.27) for this iperf3 flow from Client100 (192.168.200.100) to wordpress05 (43.254.254.14). Selecting GE4 (184.1.2.27) is expected because both WAN links are GREEN and GE4 (184.1.2.27) is having larger bandwidth (10Mbps > 5 Mbps).
The transport live monitoring of Edge-1 shows the iperf traffic is downstream and using GE4 (184.1.2.27):

To check the downstream utilization will affect how the Edge-1 select which WAN link for new flow for local Internet breakout, at this point, that is the iperf3 between Client100 (192.168.200.100) and wordpress05 (43.254.254.14) are still running and generating downstream traffic for GE4 (184.1.2.27), Client100 (192.168.200.100) start a new iperf3 to wordpress06 (24.12.0.14) by “/usr/local/bin/iperf3 -c 24.12.0.14 -R -t 180”:

On the iperf3 server, wordpress06 (24.12.0.14), the screen capture below shows the connection is from 98.1.2.19:20002:

This result is different from the upstream test in Test 3.1, as Edge-1 picked GE3 (98.1.2.19) for the new iperf3 flow when the GE4 (184.1.2.27) downstream bandwidth is utilized.
Let’s check the transport monitoring page to confirm:

At the transport live monitoring page, now both GE4 (184.1.2.27) and GE3 (98.1.2.19) are fully utilized. This result demonstrated, when the larger bandwidth link downstream bandwidth is fully utilized, the SD-WAN will select the remaining GREEN WAN link for new traffic flow that is local Internet breakout. To further confirm, an additional test is conducted here. The additional test is when iperf3 between Client100 (192.168.200.100) and wordpress05 (43.254.254.14) is running, such that GE4 (184.1.2.27) downstream is fully utilized. But there is no traffic for GE3 (98.1.2.19), that is iperf3 between Client100 (192.168.200.100) and wordpress06 (24.12.0.14) is not running. With this situation, Client100 use a web browser to access the wordpress05 web service, the following is the web access log of wordpress05:

From the wordpress05 web access log, the Edge-1 selected GE3 (98.1.2.19) for the new log from Client100 (192.168.200.100) access wordpress05 web service. The conclusion of Test 3.2 is, when both Internet WAN links are GREEN, and the WAN link with larger bandwidth is fully utilized in the downstream direction, SD-WAN Edge will select the remaining GREEN WAN link for local Internet breakout for newly created flow.
Test 3.3 – GE3 (98.1.2.19) and GE4 (184.1.2.27) are both GREEN, the higher bandwidth link downstream is partly utilized
From Test 3.1 and Test 3.2, for local Internet breakout, to have the SD-WAN Edge selects the other GREEN WAN link (that is not the one with the largest bandwidth) for flow based load balancing, the larger bandwidth link needs to have downstream traffic which consuming the downstream bandwidth. Test 3.3 will continue to test when the WAN link are downstream utilized but specifically not make the downstream bandwidth fully consumed.
The objective of Test 3.3 is to confirm if the SD-WAN Edge picks the WAN link with the largest “remaining downstream bandwidth”. Let’s take some examples to explain the meaning of “remaining downstream bandwidth” and also the expected WAN link selection.
Test 3.3.1 – GE3 (98.1.2.19) and GE4 (184.1.2.27) are both GREEN, GE4 (184.1.2.27) configured with 10Mbps and consumed 3.2Mbps, GE3 (98.1.2.19) configured with 5Mbps with 0Mbps consumed
In Test 3.3.1, all bandwidth mentioned are downstream bandwidth unless specified as upstream.
The formula of calculating the “remaining downstream bandwidth” is as follow:
remaining downstream bandwidth = configure/discovered downstream bandwidth – consumed downstream bandwidth
In Test 3.3.1:
GE4 (184.1.2.27) configured downstream bandwidth = 10Mbps
GE4 (184.1.2.27) consumed downstream bandwidth = 3.2Mbps
GE4 (184.1.2.27) remaining downstream bandwidth = GE4 (184.1.2.27) configured downstream bandwidth – GE4 (184.1.2.27) consumed downstream bandwidth
GE4 (184.1.2.27) remaining downstream bandwidth = 10Mbps – 3.2Mpbs = 6.8Mbps
GE3 (98.1.2.19) configured downstream bandwidth = 5Mbps
GE3 (98.1.2.19) consumed downstream bandwidth = 0Mbps
GE3 (98.1.2.19) remaining downstream bandwidth = GE3 (98.1.2.19) configured downstream bandwidth – GE3 (98.1.2.19) consumed downstream bandwidth
GE3 (98.1.2.19) remaining downstream bandwidth = 5bps – 0Mpbs = 5Mbps
Since 6.8Mbps > 5Mbps, that is GE4 (184.1.2.27) remaining downstream bandwidth > GE3 (98.1.2.19) remaining downstream bandwidth, the expectation is under this situation, Edge-1 will select GE4 (184.1.2.27) for new flow which is Internet local breakout
To run the test to check the expectation is correct or not, Client100 (192.168.200.100) starts an iperf3 flow with destination to wordpress05 (43.254.254.14), the iperf3 command is “/usr/local/bin/iperf3 -c 43.254.254.14 -R -b 3M -t 600”. The “-b 3M” is to let iperf3 limit the download speed at around 3Mbps.

The iperf3 server screen capture shows the connection is from 184.1.2.27:20002:

This means Edge-1 selected GE4 (184.1.2.27) as the WAN link for this iperf3 flow, this is expected as there is no traffic on both links when the flow start. The transport live monitoring confirmed the iperf3 is consuming about 3.26Mbps downstream bandwidth for the GE4 (184.1.2.27):

While the iperf3 is running, that is 3.26Mbps downstream is consumed in GE4 (184.1.2.27), at Client100 (192.168.200.100) open a browser to access the web service at wordpress05 (43.254.254.14). Here is the web service access log:

From the wordpress05 (43.254.254.14) access log, Edge-1 selected GE4 (184.1.2.27) for the newly created web access flow towards wordpress05 (43.254.254.14). This result match the expectation because GE4 (184.1.2.27) remaining downstream bandwidth is 10Mbps – 3.26Mbps = 6.74Mbps, while is larger than 5Mbps of GE3 (98.1.2.19).
Test 3.3.2 – GE3 (98.1.2.19) and GE4 (184.1.2.27) are both GREEN, GE4 (184.1.2.27) configured with 10Mbps and consumed 6.4Mbps, GE3 (98.1.2.19) configured with 5Mbps with 0Mbps consumed
The difference of Test 3.3.2 compare with Test 3.3.1 is the GE4 (184.1.2.27) will get consumed more downstream bandwidth. In Test 3.3.2:
GE4 (184.1.2.27) configured downstream bandwidth = 10Mbps
GE4 (184.1.2.27) consumed downstream bandwidth = 6.4Mbps
GE4 (184.1.2.27) remaining downstream bandwidth = GE4 (184.1.2.27) configured downstream bandwidth – GE4 (184.1.2.27) consumed downstream bandwidth
GE4 (184.1.2.27) remaining downstream bandwidth = 10Mbps – 6.4Mpbs = 3.6Mbps
GE3 (98.1.2.19) configured downstream bandwidth = 5Mbps
GE3 (98.1.2.19) consumed downstream bandwidth = 0Mbps
GE3 (98.1.2.19) remaining downstream bandwidth = GE3 (98.1.2.19) configured downstream bandwidth – GE3 (98.1.2.19) consumed downstream bandwidth
GE3 (98.1.2.19) remaining downstream bandwidth = 5bps – 0Mpbs = 5Mbps
Since 5Mbps > 3.6Mbps, that is GE3 (98.1.2.19) remaining downstream bandwidth > GE4 (184.1.2.27) remaining downstream bandwidth, the expectation is under this situation, Edge-1 will select GE3 (98.1.2.19) for new flow which is Internet local breakout
To run the test to check the expectation is correct or not, Client100 (192.168.200.100) starts an iperf3 flow with destination to wordpress05 (43.254.254.14), the iperf3 command is “/usr/local/bin/iperf3 -c 43.254.254.14 -R -b 6M -t 600”. The “-b 6M” is to let iperf3 limit the download speed at around 6Mbps:

The following is the iperf3 screen capture:

From the iperf3 server screen capture, the connection is from 184.1.2.27:20002, that is GE4 (184.1.2.27). This is expected because when the iperf3 flow starts, there is no traffic at both GE3 (98.1.2.19) and GE4 (184.1.2.27). GE4 (184.1.2.27) is selected because it has the larger remaining downstream bandwidth. The transport live monitoring below shows, with the iperf3 running, GE4 (184.1.2.27) gets consumed about 6.4Mbps bandwidth:

When the iperf3 is running, now GE3 (98.1.2.19) is having a larger remaining downstream bandwidth because 5Mbps > 3.6Mbps (10-6.4=3.6). While the iperf3 is running, in Client100 (192.168.200.100), open a web browser and access the web service of wordpress05 (43.254.254.14). The following is the web service access log from wordpress05:

From the wordpress05 web access log, the request comes from 98.1.2.19 while means Edge-1 selected GE3 (98.1.2.19) as the Internet WAN link for new local Internet breakout flow. This confirmed the expectation is correct because when the web access flow is created, the iperf3 is still running. The iperf3 consumed about 6.4Mbps bandwidth of GE4 (184.1.2.27), this makes GE3 (98.1.2.19) have a larger remaining downstream bandwidth, so the Edge-1 selects GE3 (98.1.2.19) for new local Internet breakout flow.
Conclusion of Test 3.1 to Test 3.3
The conclusion is:
For new flow that is local Internet breakout, with link steering Auto, when the SD-WAN Edge comes with multiple WAN links are GREEN, the WAN link with the highest remaining downstream bandwidth is being selected. The “remaining downstream bandwidth” is calculated by:
Remaining downstream bandwidth = configure/discovered downstream bandwidth – consumed downstream bandwidth
Configured/discovered upstream bandwidth and consumed upstream bandwidth is not considered when select the WAN link for new flow that is local Internet breakout.
In addition, the above conclusion applies to traffic in any service classes (Real Time, Transactional, Bulk).