The peer manager is responsible for establishing connections with peers.
It defines when a node should dial peers and which peers it should dial.
The peer manager is not an implementation abstraction of the p2p layer,
but a role that is played by the PEX reactor.
The ensurePeersRoutine is a persistent routine intended to ensure that a node
is connected to MaxNumOutboundPeers outbound peers.
This routine is continuously executed by regular nodes, i.e. nodes not
operating in seed mode, as part of the PEX reactor implementation.The logic defining when the node should dial peers, for selecting peers to dial
and for actually dialing them is implemented in the ensurePeers method.
This method is periodically invoked — every ensurePeersPeriod, with default
value to 30 seconds — by the ensurePeersRoutine.A node is expected to dial peers whenever the number of outbound peers is lower
than the configured MaxNumOutboundPeers parameter.
The current number of outbound peers is retrieved from the switch, using the
NumPeers method, which also reports the number of nodes to which the switch
is currently dialing.
If the number of outbound peers plus the number of dialing routines equals to
MaxNumOutboundPeers, nothing is done.
Otherwise, the ensurePeers method will attempt to dial node addresses in
order to reach the target number of outbound peers.Once defined that the node needs additional outbound peers, the node queries
the address book for candidate addresses.
This is done using the PickAddress method,
which returns an address selected at random on the address book, with some bias
towards new or old addresses.
When the node has up to 3 outbound peers, the adopted bias is towards old
addresses, i.e., addresses of peers that are believed to be “good”.
When the node has from 5 outbound peers, the adopted bias is towards new
addresses, i.e., addresses of peers about which the node has not yet collected
much information.
So, the more outbound peers a node has, the less conservative it will be when
selecting new peers.The selected peer addresses are then dialed in parallel, by starting a dialing
routine per peer address.
Dialing a peer address can fail for multiple reasons.
The node might have attempted to dial the peer too many times.
In this case, the peer address is marked as bad and removed from the address book.
The node might have attempted and failed to dial the peer recently
and the exponential backoffDuration has not yet passed.
Or the current connection attempt might fail, which is registered in the address book.
None of these errors are explicitly handled by the ensurePeers method, which
also does not wait until the connections are established.The third step of the ensurePeers method is to ensure that the address book
has enough addresses.
This is done, first, by reinstating banned peers
whose ban period has expired.
Then, the node randomly selects a connected peer, which can be either an
inbound or outbound peer, to requests addresses
using the PEX protocol.
Last, and this action is only performed if the node could not retrieve any new
address to dial from the address book, the node dials the configured seed nodes
in order to establish a connection to at least one of them.
As above described, seed nodes are actually the last source of peer addresses
for regular nodes.
They are contacted by a node when, after an invocation of the ensurePeers
method, no suitable peer address to dial is retrieved from the address book
(e.g., because it is empty).Once a connection with a seed node is established, the node immediately
sends a PEX request to it, as it is
added as an outbound peer.
When the corresponding PEX response is received, the addresses provided by the
seed node are added to the address book.
As a result, in the next invocation of the ensurePeers method, the node
should be able to dial some of the peer addresses provided by the seed node.However, as observed in this issue,
it can take some time, up to ensurePeersPeriod or 30 seconds, from when the
node receives new peer addresses and when it dials the received addresses.
To avoid this delay, which can be particularly relevant when the node has no
peers, a node immediately attempts to dial peer addresses when they are
received from a peer that is locally configured as a seed node.
This was implemented in a rough way, leading to inconsistencies described in
this issue,
fixed by this PR.
When the PEX reactor is started, the ensurePeersRoutine is created and it
runs thorough the operation of a node, periodically invoking the ensurePeers
method.
However, if when the persistent routine is started the node already has some
peers, either inbound or outbound peers, or is dialing some addresses, the
first invocation of ensurePeers is delayed by a random amount of time from 0
to ensurePeersPeriod.
The node configuration can contain a list of persistent peers.
Those peers have preferential treatment compared to regular peers and the node
is always trying to connect to them.
Moreover, these peers are not removed from the address book in the case of
multiple failed dial attempts.On startup, the node immediately tries to dial the configured persistent peers
by calling the switch’s DialPeersAsync method.
This is not done in the p2p package, but it is part of the procedure to set up a node.
TODO: the handling of persistent peers should be described in more detail.