Skip to content

Chef backend ctl scenarios for back up and restore testing

PrajaktaPurohit edited this page Jan 11, 2022 · 6 revisions

Steps taken to test backup and restore

Create a umbrella for backup

SCENARIO=chef-backend PLATFORM=ubuntu-18.04 INSTALL_VERSION=14.11.31 UPGRADE_VERSION=14.11.36 BACKEND_VERSION=2.2.0 ENABLE_IPV6=false ENABLE_ADDON_PUSH_JOBS=false ENABLE_GATHER_LOGS_TEST=false ENABLE_PEDANT_TEST=false ENABLE_PSQL_TEST=false ENABLE_SMOKE_TEST=false ENABLE_IPV6=false make apply

Add data

  1. Login to front end
  2. chef-server-ctl user-create -f /tmp/admin.pem admin Admin User [email protected] password; chef-server-ctl org-create -f /tmp/test-validator.pem test Test; chef-server-ctl org-user-add test admin;
  3. mkdir ~/.chef; cp /tmp/admin.pem ~/.chef/; vi ~/.chef/knife.rb
  4. export PATH=$PATH:/opt/opscode/embedded/bin; knife ssl fetch; knife node create FOO -d; knife node create Foo -d; knife node create foo -d; knife node create bar -d;
  5. chef-server-ctl user-list; chef-server-ctl org-list; knife node list;

Take a back up on the follower

  1. Login to on of the follower
  2. Run this chef-server-ctl backup

Do the restore - testing on the same instance.

  1. Run this chef-backend-ctl restore /var/opt/chef-backup/chef-backup-2022-01-07-11-52-19.tgz but got the following error
Would you like to proceed? (y/n)
y
 ✓ Verifying backup has required components
 ✓ Verifying backup has required components
 ✓ Unpacking backup to temporary directory
 ✓ Removing existing data directoriest node
 ✓ Rewriting configuration for current node
 ✓ Restoring configuration cluster
 ✗ Create new Chef Backend cluster
   Restoring PostgreSQL data
   Starting up Chef Backendles
 ✓ Cleaning Up Temporary Files

An error occurred during this operation:

Restore failed:

    Timed out waiting for cluster to be ready.
root@ip-10-0-0-181:~# 
  1. The status after the restore was as follows
root@ip-10-0-0-181:~# 
root@ip-10-0-0-181:~# chef-backend-ctl cluster-status
Name           IP          GUID                              Role    PG      ES      Blocked      Eligible
ip-10-0-0-181  10.0.0.181  f27c650f666ae9b96782d71cb5536f35  leader  leader  master  not_blocked  true    
root@ip-10-0-0-181:~# 
root@ip-10-0-0-181:~# 
root@ip-10-0-0-181:~# 
root@ip-10-0-0-181:~# chef-backend-ctl status
Service        Local Status         Time in State  Distributed Node Status     
leaderl        running (pid 10212)  0d 0h 6m 21s   Error: no cluster configured
epmd           running (pid 10061)  0d 0h 6m 32s   Error: no cluster configured
etcd           running (pid 10004)  0d 0h 6m 34s   Error: no cluster configured
postgresql     running (pid 10281)  0d 0h 6m 19s   Error: no cluster configured
elasticsearch  running (pid 10091)  0d 0h 6m 31s   Error: no cluster configured

System  Local Status                                          Distributed Node Status          
disks   /var/log/chef-backend: OK; /var/opt/chef-backend: OK  health: green; healthy nodes: 1/1
root@ip-10-0-0-181:~# 
root@ip-10-0-0-181:~# 

Tried cluster join from the other follower node

  1. need to copy the /etc/chef-backend/chef-backend-secrets.json of first node to /tmp/chef-backend-secrets.json of joining node
  2. chef-backend-ctl cleanse
  3. chef-backend-ctl join-cluster --accept-license --yes --quiet 10.0.10.189 -p 10.0.4.86 -s /tmp/chef-backend-secrets.json
  4. this was successful and the status was as follows
root@ip-10-0-4-86:~# 
root@ip-10-0-4-86:~# chef-backend-ctl cluster-status
Name            IP           GUID                              Role      PG        ES          Blocked      Eligible
ip-10-0-4-86    10.0.4.86    8d7db929361e812c3e0964f17b90096a  follower  follower  not_master  not_blocked  true    
ip-10-0-10-189  10.0.10.189  e62b212424b293375261e5d5ce0bf81e  leader    leader    master      not_blocked  true    
root@ip-10-0-4-86:~# 
root@ip-10-0-4-86:~# 
root@ip-10-0-4-86:~# 
root@ip-10-0-4-86:~# chef-backend-ctl status
Service        Local Status         Time in State  Distributed Node Status                     
leaderl        running (pid 13107)  0d 0h 0m 40s   leader: 1; waiting: 0; follower: 1; total: 2
epmd           running (pid 13083)  0d 0h 0m 41s   status: local-only                          
etcd           running (pid 12963)  0d 0h 1m 16s   health: green; healthy nodes: 2/2           
postgresql     running (pid 13246)  0d 0h 0m 36s   leader: 1; offline: 0; syncing: 0; synced: 1
elasticsearch  running (pid 13104)  0d 0h 0m 42s   state: green; nodes online: 2/2             

System  Local Status                                          Distributed Node Status          
disks   /var/log/chef-backend: OK; /var/opt/chef-backend: OK  health: green; healthy nodes: 2/2
root@ip-10-0-4-86:~# 

Tried to connect to the other remaining node

  1. This is the leader node of the previous cluster.
  2. Tried the same steps but got the following error.
root@ip-10-0-1-226:~# 
root@ip-10-0-1-226:~# chef-backend-ctl join-cluster --accept-license --yes --quiet 10.0.0.181 -p 10.0.1.226 -s /tmp/chef-backend-secrets.json
Generating configuration...done
Getting peer data from cluster...FAILED

      Failed to open TCP connection to 10.0.0.181:2379 (No route to host - connect(2) for "10.0.0.181" port 2379)

      
      The stack trace can be found in "/tmp/chef-backend-join-cluster-stacktrace.log"

An error occurred during this operation:


      Failed to open TCP connection to 10.0.0.181:2379 (No route to host - connect(2) for "10.0.0.181" port 2379)

      
      The stack trace can be found in "/tmp/chef-backend-join-cluster-stacktrace.log"
root@ip-10-0-1-226:~# 
root@ip-10-0-1-226:~# 
root@ip-10-0-1-226:~# cat /tmp/chef-backend-join-cluster-stacktrace.log
Exception of type Errno::EHOSTUNREACH occurred at 2022-01-11 06:52:24 +0000

/opt/chef-backend/embedded/lib/ruby/2.6.0/net/http.rb:949:in `rescue in block in connect'
/opt/chef-backend/embedded/lib/ruby/2.6.0/net/http.rb:946:in `block in connect'
/opt/chef-backend/embedded/lib/ruby/2.6.0/timeout.rb:93:in `block in timeout'
/opt/chef-backend/embedded/lib/ruby/2.6.0/timeout.rb:103:in `timeout'
/opt/chef-backend/embedded/lib/ruby/2.6.0/net/http.rb:945:in `connect'
/opt/chef-backend/embedded/lib/ruby/2.6.0/net/http.rb:930:in `do_start'
/opt/chef-backend/embedded/lib/ruby/2.6.0/net/http.rb:919:in `start'
/opt/chef-backend/embedded/lib/ruby/2.6.0/net/http.rb:1470:in `request'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/etcd-0.3.0/lib/etcd/client.rb:111:in `api_execute'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:364:in `block (2 levels) in request'
/opt/chef-backend/embedded/lib/ruby/2.6.0/timeout.rb:93:in `block in timeout'
/opt/chef-backend/embedded/lib/ruby/2.6.0/timeout.rb:33:in `block in catch'
/opt/chef-backend/embedded/lib/ruby/2.6.0/timeout.rb:33:in `catch'
/opt/chef-backend/embedded/lib/ruby/2.6.0/timeout.rb:33:in `catch'
/opt/chef-backend/embedded/lib/ruby/2.6.0/timeout.rb:108:in `timeout'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:363:in `block in request'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:284:in `block in with_connection'
/opt/chef-backend/embedded/lib/ruby/2.6.0/timeout.rb:76:in `timeout'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:282:in `with_connection'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:357:in `request'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:329:in `get'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:219:in `members_raw'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/connectors/etcd_connector.rb:170:in `member_client_hosts'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/command/join-cluster.rb:120:in `block in run'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/task_list_renderer.rb:55:in `block (2 levels) in run_tasks'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/task_list_renderer.rb:124:in `do_task'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/task_list_renderer.rb:54:in `block in run_tasks'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/task_list_renderer.rb:53:in `upto'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/task_list_renderer.rb:53:in `each'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/task_list_renderer.rb:53:in `run_tasks'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/command/join-cluster.rb:136:in `run'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/ctl.rb:354:in `run_with_pretty_errors'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/ctl.rb:335:in `run_subcommand'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/lib/libcb/ctl.rb:268:in `run'
/opt/chef-backend/embedded/lib/ruby/gems/2.6.0/gems/libcb-0.1.0/bin/chef-backend-ctl:15:in `<top (required)>'
/usr/bin/chef-backend-ctl:86:in `load'
/usr/bin/chef-backend-ctl:86:in `<main>'
root@ip-10-0-1-226:~# 
root@ip-10-0-1-226:~#

Status of the FE at this point

root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
 Internal Services 
-------------------
run: bookshelf: (pid 18182) 5093s; run: log: (pid 17978) 5140s
run: haproxy: (pid 18131) 5094s; run: log: (pid 3311) 5194s
run: nginx: (pid 2380) 4703s; run: log: (pid 18113) 5108s
run: oc_bifrost: (pid 18136) 5094s; run: log: (pid 17792) 5174s
run: oc_id: (pid 18159) 5093s; run: log: (pid 17825) 5166s
run: opscode-erchef: (pid 18265) 5092s; run: log: (pid 18075) 5136s
run: redis_lb: (pid 2099) 4757s; run: log: (pid 18321) 5091s
-------------------
 External Services 
-------------------

down: elasticsearch: failed to connect to http://127.0.0.1:9200: 404 "Not Found"

run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
WARN: Server returned error 503 for https://127.0.0.1/users, retrying 1/5 in 4s
WARN: Server returned error 503 for https://127.0.0.1/users, retrying 2/5 in 8s
^CTraceback (most recent call last):
	7: from /usr/bin/chef-server-ctl:180:in `<main>'
	6: from /usr/bin/chef-server-ctl:180:in `load'
	5: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/chef-server-ctl-1.1.0/bin/chef-server-ctl:337:in `<top (required)>'
	4: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:745:in `run'
	3: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:203:in `block in add_command_under_category'
	2: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/chef-server-ctl-1.1.0/plugins/wrap-knife-opc.rb:43:in `block (2 levels) in load_file'
	1: from /opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:237:in `run_command'
/opt/opscode/embedded/lib/ruby/gems/2.7.0/gems/omnibus-ctl-0.6.4/lib/omnibus-ctl.rb:237:in `system': Interrupt

root@ip-10-0-10-216:~# 

Update the FE

  1. update the chef_backend_members in /etc/opscode/chef-server.rb by having only the working nodes of the new cluster
  2. chef-server-ctl reconfigure
  3. The status is as follows
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
 Internal Services 
-------------------
run: bookshelf: (pid 18182) 5372s; run: log: (pid 17978) 5419s
run: haproxy: (pid 18893) 129s; run: log: (pid 3311) 5473s
run: nginx: (pid 18896) 128s; run: log: (pid 18113) 5387s
run: oc_bifrost: (pid 18136) 5373s; run: log: (pid 17792) 5453s
run: oc_id: (pid 18159) 5372s; run: log: (pid 17825) 5445s
run: opscode-erchef: (pid 18265) 5371s; run: log: (pid 18075) 5415s
run: redis_lb: (pid 18888) 129s; run: log: (pid 18321) 5370s
-------------------
 External Services 
-------------------

run: elasticsearch: connected OK to http://127.0.0.1:9200

run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
ERROR: Failed to authenticate to https://127.0.0.1:443 as pivotal with key /tmp/latovip20220111-18917-ywu2fo
Response:  Failed to authenticate as 'pivotal'. Ensure that your node_name and client key are correct.
ERROR: Failed to authenticate to https://127.0.0.1:443 as pivotal with key /tmp/latovip20220111-18921-t26put
Response:  Failed to authenticate as 'pivotal'. Ensure that your node_name and client key are correct.
root@ip-10-0-10-216:~# 
  1. do cleanse and reconfigure - chef-server-ctl cleanse, make sure you have the proper chef-server.rb and thenchef-server-ctl reconfigure
  2. the status is fine, but the data is missing
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl status
-------------------
 Internal Services 
-------------------
run: bookshelf: (pid 19206) 37s; run: log: (pid 18839) 210s
run: haproxy: (pid 19099) 39s; run: log: (pid 3694) 238s
run: nginx: (pid 19202) 37s; run: log: (pid 19068) 48s
run: oc_bifrost: (pid 19104) 39s; run: log: (pid 18659) 222s
run: oc_id: (pid 19138) 38s; run: log: (pid 18771) 216s
run: opscode-erchef: (pid 19214) 37s; run: log: (pid 18838) 210s
run: redis_lb: (pid 19094) 41s; run: log: (pid 19249) 36s
-------------------
 External Services 
-------------------

run: elasticsearch: connected OK to http://127.0.0.1:9200

run: postgresql: connected OK to 127.0.0.1:5432
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# 
root@ip-10-0-10-216:~# chef-server-ctl user-list; chef-server-ctl org-list;
pivotal

root@ip-10-0-10-216:~# 

Restoring backup on a single node

  1. spin up a bare metal instance for fe (install chef-server) spin up 2 instances for backend (install chef-backend)