Manual remote bootstrap when a majority of peers fail
When a RAFT peer fails, YugabyteDB executes an automatic remote bootstrap to create a new peer from the remaining ones.
If a majority of RAFT peers fail for a given tablet, we have to manually execute the equivalent of a remote bootstrap. We
can get a list of tablets in yb-master-ip:7000/tablet-replication yb-admin gui.
Assuming we have a cluster where:
- Replication factor is 3
- a given tablet with UUID
TABLET1 - 3 tablet peers, 1 in good working order, referred to as
NODE_GOODand two broken peers, referred asNODE_BAD1andNODE_BAD2 - We will be copying some tablet related data from the good peer to each of the bad peers until we've restored the majority of them
These are the steps to follow in such scenario:
-
on the
NODE_GOODTS, create an archive of the wals (raft data), rocksdb (regular rocksdb) directories, intents (transactions data) and snapshots directories forTABLET1 -
copy these archives over to
NODE_BAD1, on the same drive thatTABLET1currently has its raft and rocksdb data -
stop the bad TS, say
NODE_BAD1, as we will be changing file system data underneath -
remove the old wals, rocksdb, intents, snapshots data for
TABLET1fromNODE_BAD1 -
unpack the data we copied over from
NODE_GOODinto the corresponding (now empty) directories onNODE_BAD1 -
restart
NODE_BAD1, so it can bootstrapTABLET1using this new data -
restart
NODE_GOODso it can properly observe the changed state and data onNODE_BAD1
At this point, NODE_BAD2 should be automatically fixed and removed from its quorum, as it has gotten a majority of healthy peers.
Note
Normally when we try to find tablet data, we use a find command across the --fs_data_dir paths.
In this example, assume that's set to /mnt/d0 and our tablet UUID is c08596d5820a4683a96893e092088c39:
$ find /mnt/d0/ -name '*c08596d5820a4683a96893e092088c39*'
/mnt/d0/yb-data/tserver/wals/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/tablet-meta/c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/consensus-meta/c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.intents
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.snapshots
The data we are interested in here is:
For the raft wals:
/mnt/d0/yb-data/tserver/wals/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
For the rocksdb regular DB:
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39
For the intents files:
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.intents
For the snapshot files:
/mnt/d0/yb-data/tserver/data/rocksdb/table-2fa481734909462385e005ba23664537/tablet-c08596d5820a4683a96893e092088c39.snapshots