diff --git a/SWIPs/swip-39.md b/SWIPs/swip-39.md new file mode 100644 index 0000000..e5ef694 --- /dev/null +++ b/SWIPs/swip-39.md @@ -0,0 +1,424 @@ +--- +SWIP: 39 +title: Balanced Neighbourhood Registry aka Smart Neighbourhood Management +author: Viktor TrĂ³n (@zelig) +discussions-to: https://discord.gg/Q6BvSkCv +status: Draft +type: Standards Track +category: Core +created: 2025-07-21 +--- + +# Balanced Neighbourhood Registry aka Smart Neighbourhood Management + +## Abstract + +This SWIP introduces a systematic way for node operators to enter the Swarm network in such a way that they form a _balanced subnetwork_. +In the context of this SWIP, _balance_ means that the distribution of nodes participating in the subnetwork be as dispersed as possible across the Swarm address space. + +## Motivation + +### Balance and area of responsibility + +The most obvious use case for a balanced sub-network is a _decentralised service network_ _(DSN)_, a set of nodes that commit to collectively perform some task. Instances of this task submitted by the users of the DSN +are best thought of as a partially ordered set of _input/output jobs_. These jobs are then assigned to the service nodes in the DSN based on whether the _job ID_ falls within the node's _area of responsibility_. Execution is load balanced, as long as: + +- jobs are of comparable complexity, +- job IDs are random and uniform within the address space (the hash of their description), and +- nodes' areas of responsibility are address ranges of equal size. + +_Areas of responsibility_ are defined by _proximity_, ie., a contiguous range of addresses close to each other and to the node's ID (i.e., the node overlay address is in the same address space) using logarithmic distance as a metric. + +The design achieves + +- fairness, +- bounded cost of operation, and +- resistance to manipulation. + +### Further support when applied to the current postage redistribution game + +#### Sybil attacks + +The neighbourhood sybil attack is when the same operator runs several nodes (or runs one client node, but plays with several) in the same neighbourhood. This would allow them to share storage without replication and yet get paid. +To mitigate this we resort to the rather weak incentive of additive stake as a proof of redundancy. If stake is variable and is linearly proportional to earnings, then, mutatis mutandis, due to the added operational costs, it is always more economical for one operator in a neihgbourhood to run just one node with all the stake than several nodes. +Random NH assignnment makes it impractical (expensive) for any operator to attempt to place several nodes in the same storage neighbourhood The proposed scheme solves the problem of "one operator, one node in a neighbourhood". + + +#### Fixed stake + +Variable stake is not really compatible with random assignment. If a candidate node is assigned a neighbourhood with high stake density, it can earn less with the same stake, which is not really fair. Fixed stake across neighbourhoods, on the other hand, does not imply any a priori (dis)advantage. Uniform prices could and should allow changes over time. + +#### Shadow world fabrication attack + +In order to control the stamp at game time, attackers must invest the same amount of stamp resources as the entire swarm's used capacity. Assuming that the average utilisation rate over a relavant period is $0d$. +$$ +NH(p,d)=\left\{a\in\mathbb{\Sigma}^{256} \,\mid\,a[0:d]=p[0:d]\right\} +$$ +Given a set of nodes $S$, a node $n_i\in S$ is _unique at depth_ $u_i$ if $u_i$ is the smallest integer such that no other node fall in its neighbourhood (designated by its overlay $o_i$ at depth $u_i$): +$$ +\forall 0\leq j AA["$j=1, d=0, c=''$"] + AA --> BB{"$V(j)=\varnothing?$"} + BB -->|Yes| CC{"$F(j)=1?$"} + BB -->|No| DD["$w=V(j)$"] + DD --> CC + CC -->|Yes| DD["$c=c\parallel \not w[d]$"] + DD --> X[(end)] + CC -->|No| B{"$k|Yes| C["$j=Left(j)$"] + B -->|No| D["$k=k-F(Left(j))$
$j=Right(j)$"] + C --> E["$d++$"] + D --> E + E --> AA +``` + +## Deregistration and Rebalancing + +Nodes are free to deregister at any time. +If the sister node exists, removal proceeds directly and the invariant remains satisfied. + +If removal would leave both child of the parent empty, then _rebalancing_ is required. A donor pair is selected using the same rank-based traversal over $F(c)$. From the selected pair, one of the two nodes is chosen and removed. The donor node is reinserted into the commit queue and assigned to the empty pair. + +The original node is removed only after the donor successfully completes reassignment, ensuring that the invariant is never violated. In order that the rebalancing cannot be manipulated, ie., the selected node reinserted into the neighbourhood of the deregistrant, the donor must to be selected with proper randomness, not known at the time of deregistration. + +Given $\mathbb{F}(1)=N-2^{d}$ is the number of free neighbourhoods currently full (doubly filled). A node computes +$$ +k_i = \rho_i \bmod N-2^{d} +$$ + +The neighbourhoods nodes can be allocated to a cell $j=c_{k_i}$ only if $Free(j)$ is true. +The assigned index is determined by descending the trie. At a node index $j$, $F(Left(j))$ denotes the number of free slots in the left subtree. If $k < F(Left(i))$, the traversal continues to the left child. Otherwise, the traversal continues to the right child with updated rank $k_i \mapsto k_i - F(Left(i))$: + +## Specification + +### Registration + +An initially empty list (_commit queue_) of _entry struct_ types holds the current committers. The struct holds information about the ether address of the node and the blockheight the address registered at. + +## Data Structure + +The assignment structure is implemented as an implicit complete binary trie over the index space. Each node $v$ of the trie corresponds to a contiguous interval of indices. The subtrie has the role of maintaining two quantities. + +### Counting free neighbourhoods for candidate assignment + +The first quantity stands for the number of free slots in the subtree rooted at index $i$; these are tracking the number of candidate neighbourhoods to assign. +$$ +F: \mathbb{N}\to\mathbb{N}\\ +F(i) = \begin{cases} +F(Left(i)) + F(Right(i))&\text{if } Depth(i) 1 + 1 --> 2 + 1 --> 3 + 2 --> 4 + 2 --> 5 + 3 --> 6 + 3 --> 7 + 4 --> 8 + 4 --> 9 + 5 --> 10 + 5 --> 11 + 6 --> 12 + 6 --> 13 + 7 --> 14 + 7 --> 15 +``` + +The implicit binary structure means the represented tree can be traversed using basic arithmetic on the indexes: + +$$ +\begin{array}{l|l|l} +\mathrm{description} & \mathrm{notation} & \mathrm{definition}\\\hline +\text{parent of }i& \mathrm{Parent}(i) & i/2 &\\ +\text{left child of }i&\mathrm{Left}(i) & 2i\\ +\text{right child of }i& \mathrm{Right}(i) & 2i+1\\ +\text{sister of }i& \mathrm{Sister}(i) & \mathrm{Parent}(i\mathrm{Parent}(i)) + \mathrm{abs}(\mathrm{Right}(\mathrm{Parent}(i)))\\ +\text{depth of }i& \mathrm{Depth}(i) & \mathrm{Floor}(\log_2(i))\\ +\text{position of }i& \mathrm{Pos}(i) & i \mod \mathrm{Depth}(i) +\end{array} +$$ + +When the index structure is used as a map, the rule of interitance allows you to look up a value that was 'inherited' from an earlier stage (inserted at a shallower depth). We can define $V$ as a lookup function for a map over the above index structure, then $V!$ is + +$$ +V!(i)=\begin{cases} +V(\mathrm{Parent}(i)) &\text{if } V(i)=\varnothing\text{ and }i>1\\ +V(i) &\text{otherwise} +\end{cases} +$$ + +We can define the predicate _not assigned_ as follows: +$$ +NA(i) \leftrightarrow V!(i) = \varnothing . +$$ +This allows us to define free and fully assigned neighbourhoods: +$$ +\mathrm{Free}(i) \leftrightarrow NA(\mathrm{Left}(i)) \lor NA(\mathrm{Right}(i)) +$$ +and +$$ +\mathrm{Full}(i) \leftrightarrow !NA(\mathrm{Left}(i)) \land !NA(\mathrm{Right}(i)) +$$ + +The data structure operations all enforce the condition + +$$ +\forall i, \mathrm{Depth}(i)< d\longrightarrow V(i)\neq \varnothing \lor V(i+1)\neq \varnothing +$$ + +which ensures that every prefix of length $d-1$ contains at least one node. This invariant contrains the admissible states of the system. + +## + +The IBT is used to + +- assign neighbourhoods for new applicants +- find candidate donors for rebalancing +- find the closest node to an address + +### Further endpoints + +A public read-only endpoint exists for querying neighbourhoods as well as nodes. Accessor for $d$ and $N$ will return the current neighbourhood depth and the current number of assigned neighbourhoods. A public accessor for $A_d$ will return for a neighbourhood (between $0$ and $2^d-1 inclusive) the overlay of the node assigned to that neighbourhood. Another endpoint will return for any overlay $o$ the closest node, so that the network service can find responsible nodes for (i.e., closest to) any address in the space shared by overlays: + +$$ +g(a)=O![a\gg(255-d)] +$$ + +### Changes to the bee client + +A new endpoint to bee client must be added to register a node that is not yet registered to be assigned a neighbourhood. Once the neighbourhood is known, the client can mine the nonce needed to place the overlay in the required neighbourhood. + +### Migration + +Since a new updated staking contract, a stake migration will be needed for the upgrade. Before the change, all the simplification of the staking contract is recommended, especially to allow fixed stake in order to realign redundancy of storage and monetary incentive: with a fixed amount staked, total stake is linearly proportional to the number of nodes, and therefore comparisons across neighbourhoods can be made based on the number of nodes. In particular, the random balanced assignment makes sense in terms of incentives (expected revenue). + +### Putting a node in each neighbourhood + +## Contract