Securing BGP on the host with origin validation

Vincent Bernat

An increasingly popular design for a datacenter network is BGP on the host: each host ships with a BGP daemon to advertise the IPs it handles and receives the routes to its fellow servers. Compared to a L2-based design, it is very scalable, resilient, cross-vendor and safe to operate.1 Take a look at “L3 routing to the hypervisor with BGP” for a usage example.

Spine-leaf fabric two spine routers, six leaf routers and nine
physical hosts. All links have a BGP session established over them.
Some of the servers have a speech balloon expliciting the IP prefix
they want to handle.
BGP on the host with a spine-leaf IP fabric. A BGP session is established over each link and each host advertises its own IP prefixes.

While routing on the host eliminates the security problems related to Ethernet networks, a server may announce any IP prefix. In the above picture, two of them are announcing 2001:db8:cc::/64. This could be a legit use of anycast or a prefix hijack. BGP offers several solutions to improve this aspect and one of them is to reuse the features around the RPKI.

Short introduction to the RPKI#

On the Internet, BGP is mostly relying on trust. This contributes to various incidents due to operator errors, like the one that affected Cloudflare a few months ago, or to malicious attackers, like the hijack of Amazon DNS to steal cryptocurrency wallets. RFC 7454 explains the best practices to avoid such issues.

IP addresses are allocated by five Regional Internet Registries (RIR). Each of them maintains a database of the assigned Internet resources, notably the IP addresses and the associated AS numbers. These databases may not be entirely reliable but are widely used to build ACLs to ensure peers only announce the prefixes they are expected to. Here is an example of ACLs generated by bgpq3 when peering directly with Apple:2

$ bgpq3 -l v6-IMPORT-APPLE -6 -R 48 -m 48 -A -J -E AS-APPLE
policy-options {
 policy-statement v6-IMPORT-APPLE {
replace:
  from {
    route-filter 2403:300::/32 upto /48;
    route-filter 2620:0:1b00::/47 prefix-length-range /48-/48;
    route-filter 2620:0:1b02::/48 exact;
    route-filter 2620:0:1b04::/47 prefix-length-range /48-/48;
    route-filter 2620:149::/32 upto /48;
    route-filter 2a01:b740::/32 upto /48;
    route-filter 2a01:b747::/32 upto /48;
  }
 }
}

The RPKI (RFC 6480) adds public-key cryptography on top of it to sign the authorization for an AS to be the origin of an IP prefix. Such record is a Route Origination Authorization (ROA). You can browse the databases of these ROAs through the RIPE’s RPKI Validator instance:

Screenshot from an instance of RPKI validator showing the validity
of 85.190.88.0/21 for AS 64476
RPKI validator shows one ROA for 85.190.88.0/21

BGP daemons do not have to download the databases or to check digital signatures to validate the received prefixes. Instead, they offload these tasks to a local RPKI validator implementing the “RPKI-to-Router Protocol” (RTR, RFC 6810).

For more details, have a look at “RPKI and BGP: our path to securing Internet Routing.”

Using origin validation in the datacenter#

While it is possible to create our own RPKI for use inside the datacenter, we can take a shortcut and use a validator implementing RTR, like GoRTR, and accepting another source of truth. Let’s work on the following topology:

Spine-leaf fabric two spine routers, six leaf routers and nine
physical hosts. All links have a BGP session established over them.
Three of the physical hosts are validators and RTR sessions are
established between them and the top-of-the-rack routers—except their
own top-of-the-racks.
BGP on the host with prefix validation using RTR. Each server has its own AS number. The leaf routers establish RTR sessions to the validators.

You assume we have a place to maintain a mapping between the private AS numbers used by each host and the allowed prefixes:3

ASN Allowed prefixes
AS 65005 2001:db8:aa::/64
AS 65006 2001:db8:bb::/64,
2001:db8:11::/64
AS 65007 2001:db8:cc::/64
AS 65008 2001:db8:dd::/64
AS 65009 2001:db8:ee::/64,
2001:db8:11::/64
AS 65010 2001:db8:ff::/64

From this table, we build a JSON file for GoRTR, assuming each host can announce the provided prefixes or longer ones (like 2001:db8:aa::­42:d9ff:­fefc:287a/128 for AS 65005):

{
  "roas": [
    {
      "prefix": "2001:db8:aa::/64",
      "maxLength": 128,
      "asn": "AS65005"
    }, {
      "…": "…"
    }, {
      "prefix": "2001:db8:ff::/64",
      "maxLength": 128,
      "asn": "AS65010"
    }, {
      "prefix": "2001:db8:11::/64",
      "maxLength": 128,
      "asn": "AS65006"
    }, {
      "prefix": "2001:db8:11::/64",
      "maxLength": 128,
      "asn": "AS65009"
    }
  ]
}

This file is deployed to all validators and served by a web server. GoRTR is configured to fetch it and update it every 10 minutes:

$ gortr -refresh=600 \
>       -verify=false -checktime=false \
>       -cache=http://127.0.0.1/rpki.json
INFO[0000] New update (7 uniques, 8 total prefixes). 0 bytes. Updating sha256 hash  -> 68a1d3b52db8d654bd8263788319f08e3f5384ae54064a7034e9dbaee236ce96
INFO[0000] Updated added, new serial 1

The refresh time could be lowered but GoRTR can be notified of an update using the SIGHUP signal. Clients are immediately notified of the change.

The next step is to configure the leaf routers to validate the received prefixes using the farm of validators. Most vendors support RTR:

Platform Over TCP? Over SSH?
Juniper Junos ✔️
Cisco IOS XR ✔️ ✔️
Cisco IOS XE ✔️
Cisco IOS ✔️
Arista EOS ✔️
BIRD ✔️ ✔️
FRR ✔️ ✔️
GoBGP ✔️

Configuring Junos#

Junos only supports plain-text TCP. First, let’s configure the connections to the validation servers:

routing-options {
    validation {
        group RPKI {
            session validator1 {
                hold-time 60;         # session is considered down after 1 minute
                record-lifetime 3600; # cache is kept for 1 hour
                refresh-time 30;      # cache is refreshed every 30 seconds
                port 8282;
            }
            session validator2 { /* OMITTED */ }
            session validator3 { /* OMITTED */ }
        }
    }
}

By default, at most two sessions are randomly established at the same time. This provides a good way to load-balance them among the validators while maintaining good availability. The second step is to define the policy for route validation:

policy-options {
    policy-statement ACCEPT-VALID {
        term valid {
            from {
                protocol bgp;
                validation-database valid;
            }
            then {
                validation-state valid;
                accept;
            }
        }
        term invalid {
            from {
                protocol bgp;
                validation-database invalid;
            }
            then {
                validation-state invalid;
                reject;
            }
        }
    }
    policy-statement REJECT-ALL {
        then reject;
    }
}

The policy statement ACCEPT-VALID turns the validation state of a prefix from unknown to valid if the ROA database says it is valid. It also accepts the route. If the prefix is invalid, the prefix is marked as such and rejected. We have also prepared a REJECT-ALL statement to reject everything else, notably unknown prefixes.

A ROA only certifies the origin of a prefix. A malicious actor can therefore prepend the expected AS number to the AS path to circumvent the validation. For example, AS 65007 could annonce 2001:db8:dd::/64, a prefix allocated to AS 65006, by advertising it with the AS path 65007 65006. To avoid that, we define an additional policy statement to reject AS paths with more than one ASN:4

policy-options {
    as-path EXACTLY-ONE-ASN "^.$";
    policy-statement ONLY-DIRECTLY-CONNECTED {
        term exactly-one-asn {
            from {
                protocol bgp;
                as-path EXACTLY-ONE-ASN;
            }
            then next policy;
        }
        then reject;
    }
}

The last step is to configure the BGP sessions:

protocols {
    bgp {
        group HOSTS {
            local-as 65100;
            type external;
            # export [ … ];
            import [ ONLY-DIRECTLY-CONNECTED ACCEPT-VALID REJECT-ALL ];
            enforce-first-as;
            neighbor 2001:db8:42::a10 {
                peer-as 65005;
            }
            neighbor 2001:db8:42::a12 {
                peer-as 65006;
            }
            neighbor 2001:db8:42::a14 {
                peer-as 65007;
            }
        }
    }
}

The import policy rejects any AS path longer than one AS, accepts any validated prefix and rejects everything else. The enforce-first-as directive is also pretty important: it ensures the first (and, here, only) AS in the AS path matches the peer AS. Without it, a malicious neighbor could inject a prefix using an AS different than its own, defeating our purpose.5

Let’s check the state of the RTR sessions and the database:

> show validation session
Session                                  State   Flaps     Uptime #IPv4/IPv6 records
2001:db8:4242::10                        Up          0   00:16:09 0/9
2001:db8:4242::11                        Up          0   00:16:07 0/9
2001:db8:4242::12                        Connect     0            0/0

> show validation database
RV database for instance master

Prefix                 Origin-AS Session                                 State   Mismatch
2001:db8:11::/64-128       65006 2001:db8:4242::10                       valid
2001:db8:11::/64-128       65006 2001:db8:4242::11                       valid
2001:db8:11::/64-128       65009 2001:db8:4242::10                       valid
2001:db8:11::/64-128       65009 2001:db8:4242::11                       valid
2001:db8:aa::/64-128       65005 2001:db8:4242::10                       valid
2001:db8:aa::/64-128       65005 2001:db8:4242::11                       valid
2001:db8:bb::/64-128       65006 2001:db8:4242::10                       valid
2001:db8:bb::/64-128       65006 2001:db8:4242::11                       valid
2001:db8:cc::/64-128       65007 2001:db8:4242::10                       valid
2001:db8:cc::/64-128       65007 2001:db8:4242::11                       valid
2001:db8:dd::/64-128       65008 2001:db8:4242::10                       valid
2001:db8:dd::/64-128       65008 2001:db8:4242::11                       valid
2001:db8:ee::/64-128       65009 2001:db8:4242::10                       valid
2001:db8:ee::/64-128       65009 2001:db8:4242::11                       valid
2001:db8:ff::/64-128       65010 2001:db8:4242::10                       valid
2001:db8:ff::/64-128       65010 2001:db8:4242::11                       valid

  IPv4 records: 0
  IPv6 records: 18

Here is an example of accepted route:

> show route protocol bgp table inet6 extensive all
inet6.0: 11 destinations, 11 routes (8 active, 0 holddown, 3 hidden)
2001:db8:bb::42/128 (1 entry, 0 announced)
        *BGP    Preference: 170/-101
                Next hop type: Router, Next hop index: 0
                Address: 0xd050470
                Next-hop reference count: 4
                Source: 2001:db8:42::a12
                Next hop: 2001:db8:42::a12 via em1.0, selected
                Session Id: 0x0
                State: <Active NotInstall Ext>
                Local AS: 65006 Peer AS: 65000
                Age: 12:11
                Validation State: valid
                Task: BGP_65000.2001:db8:42::a12+179
                AS path: 65006 I
                Accepted
                Localpref: 100
                Router ID: 1.1.1.1

A rejected route would be similar with the reason “rejected by import policy” shown in the details and the validation state would be invalid.

Configuring BIRD#

BIRD supports both plain-text TCP and SSH. Let’s configure it to use SSH. We need to generate keypairs for both the leaf router and the validators (they can all share the same keypair). We also have to create a known_hosts file for BIRD:

(validatorX)$ ssh-keygen -qN "" -t rsa -f /etc/gortr/ssh_key
(validatorX)$ echo -n "validatorX:8283 " ; \
>             cat /etc/bird/ssh_key_rtr.pub
validatorX:8283 ssh-rsa AAAAB3[…]Rk5TW0=
(leaf1)$ ssh-keygen -qN "" -t rsa -f /etc/bird/ssh_key
(leaf1)$ echo 'validator1:8283 ssh-rsa AAAAB3[…]Rk5TW0=' >> /etc/bird/known_hosts
(leaf1)$ echo 'validator2:8283 ssh-rsa AAAAB3[…]Rk5TW0=' >> /etc/bird/known_hosts
(leaf1)$ cat /etc/bird/ssh_key.pub
ssh-rsa AAAAB3[…]byQ7s=
(validatorX)$ echo 'ssh-rsa AAAAB3[…]byQ7s=' >> /etc/gortr/authorized_keys

GoRTR needs additional flags to allow connections over SSH:

$ gortr -refresh=600 -verify=false -checktime=false \
>     -cache=http://127.0.0.1/rpki.json \
>     -ssh.bind=:8283 \
>     -ssh.key=/etc/gortr/ssh_key \
>     -ssh.method.key=true \
>     -ssh.auth.user=rpki \
>     -ssh.auth.key.file=/etc/gortr/authorized_keys
INFO[0000] Enabling ssh with the following authentications: password=false, key=true
INFO[0000] New update (7 uniques, 8 total prefixes). 0 bytes. Updating sha256 hash  -> 68a1d3b52db8d654bd8263788319f08e3f5384ae54064a7034e9dbaee236ce96
INFO[0000] Updated added, new serial 1

Then, we can configure BIRD to use these RTR servers:

roa6 table ROA6;
template rpki VALIDATOR {
   roa6 { table ROA6; };
   transport ssh {
     user "rpki";
     remote public key "/etc/bird/known_hosts";
     bird private key "/etc/bird/ssh_key";
   };
   refresh keep 30;
   retry keep 30;
   expire keep 3600;
}
protocol rpki VALIDATOR1 from VALIDATOR {
   remote validator1 port 8283;
}
protocol rpki VALIDATOR2 from VALIDATOR {
   remote validator2 port 8283;
}

Unlike Junos, BIRD doesn’t have a feature to only use a subset of validators. Therefore, we only configure two of them. As a safety measure, if both connections become unavailable, BIRD will keep the ROAs for one hour.

We can query the state of the RTR sessions and the database:

> show protocols all VALIDATOR1
Name       Proto      Table      State  Since         Info
VALIDATOR1 RPKI       ---        up     17:28:56.321  Established
  Cache server:     rpki@validator1:8283
  Status:           Established
  Transport:        SSHv2
  Protocol version: 1
  Session ID:       0
  Serial number:    1
  Last update:      before 25.212 s
  Refresh timer   : 4.787/30
  Retry timer     : ---
  Expire timer    : 3574.787/3600
  No roa4 channel
  Channel roa6
    State:          UP
    Table:          ROA6
    Preference:     100
    Input filter:   ACCEPT
    Output filter:  REJECT
    Routes:         9 imported, 0 exported, 9 preferred
    Route change stats:     received   rejected   filtered    ignored   accepted
      Import updates:              9          0          0          0          9
      Import withdraws:            0          0        ---          0          0
      Export updates:              0          0          0        ---          0
      Export withdraws:            0        ---        ---        ---          0

> show route table ROA6
Table ROA6:
    2001:db8:11::/64-128 AS65006  [VALIDATOR1 17:28:56.333] * (100)
                                  [VALIDATOR2 17:28:56.414] (100)
    2001:db8:11::/64-128 AS65009  [VALIDATOR1 17:28:56.333] * (100)
                                  [VALIDATOR2 17:28:56.414] (100)
    2001:db8:aa::/64-128 AS65005  [VALIDATOR1 17:28:56.333] * (100)
                                  [VALIDATOR2 17:28:56.414] (100)
    2001:db8:bb::/64-128 AS65006  [VALIDATOR1 17:28:56.333] * (100)
                                  [VALIDATOR2 17:28:56.414] (100)
    2001:db8:cc::/64-128 AS65007  [VALIDATOR1 17:28:56.333] * (100)
                                  [VALIDATOR2 17:28:56.414] (100)
    2001:db8:dd::/64-128 AS65008  [VALIDATOR1 17:28:56.333] * (100)
                                  [VALIDATOR2 17:28:56.414] (100)
    2001:db8:ee::/64-128 AS65009  [VALIDATOR1 17:28:56.333] * (100)
                                  [VALIDATOR2 17:28:56.414] (100)
    2001:db8:ff::/64-128 AS65010  [VALIDATOR1 17:28:56.333] * (100)
                                  [VALIDATOR2 17:28:56.414] (100)

Like for the Junos case, a malicious actor could try to workaround the validation by building an AS path where the last AS number is the legitimate one. BIRD is flexible enough to allow us to use any AS to check the IP prefix. Instead of checking the origin AS, we ask it to check the peer AS with this function, without looking at the AS path:

function validated(int peeras) {
   if (roa_check(ROA6, net, peeras) != ROA_VALID) then {
      print "Ignore invalid ROA ", net, " for ASN ", peeras;
      reject;
   }
   accept;
}

The BGP instance is then configured to use the above function as the import policy:

protocol bgp PEER1 {
   local as 65100;
   neighbor 2001:db8:42::a10 as 65005;
   connect delay time 30;
   ipv6 {
      import keep filtered;
      import where validated(65005);
      # export …;
   };
}

You can view the rejected routes with show route filtered, but BIRD does not store information about the validation state in the routes. You can also watch the logs:

2019-07-31 17:29:08.491 <INFO> Ignore invalid ROA 2001:db8:bb::40:/126 for ASN 65005

Currently, BIRD does not reevaluate the filters when the ROAs are updated. There is work in progress to fix this. If this feature is important to you, have a look at FRR instead: it also supports the RTR protocol and triggers a soft reconfiguration of the BGP sessions when ROAs are updated.

Update (2021-03)

From version 2.0.8, BIRD reevaluates the filters when the ROAs are updated. You need to replace import keep filtered with import table yes in the BGP instance configuration. You can also drop the connect delay time directive in the proposed configuration. Its purpose was to ensure the ROAs are loaded before the BGP connection is established.


  1. Notably, the data flow and the control plane are separated. A node can remove itself by notifying its peers without losing a single packet. ↩︎

  2. People often use AS sets, like AS-APPLE in this example, as they are convenient if you have multiple AS numbers or customers. However, there is currently nothing preventing a rogue actor to add arbitrary AS numbers to their AS set. ↩︎

  3. We are using 16-bit AS numbers for readability. Because we need to assign a different AS number for each host in the datacenter, in an actual deployment, we would use 32-bit AS numbers. ↩︎

  4. This restriction also prevents the peer from prepending its own ASN to deprioritize a path. A modern alternative is to use the graceful shutdown community↩︎

  5. Cisco routers and FRR enforce the first AS by default. It is a tunable value to allow the use of route servers: they distribute prefixes on behalf of other routers. ↩︎