@@ -174,8 +174,30 @@ is advisable to migrate all of the instances to another machine. See
174174Ceph
175175----
176176
177- The following guide provides a good overview:
178- https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/8/html/director_installation_and_usage/sect-rebooting-ceph
177+ #. Check that the cluster is healthy (i.e. ``ceph -s ``). Where possible, solve
178+ or isolate any issues before the shutdown e.g. by marking unhealthy OSDs as
179+ 'out' in the cluster.
180+
181+ #. Stop all clients. This includes
182+
183+ * **All ** OpenStack VMs (if their storage is RBD-backed).
184+
185+ * CephFS mounts.
186+
187+ * Ceph-backed OpenStack services such as Glance, Cinder, Manila, and RGW/S3/Swift.
188+
189+ #. Set the ``noout `` flag, so that the cluster does not attempt to redistribute
190+ data when OSDs go down. Use the following command on a MON node:
191+
192+ .. code-block :: console
193+
194+ sudo cephadm shell -- ceph osd set noout
195+
196+ #. Shut down all the nodes, with those holding MON services last.
197+
198+ Note that if it is not desired for Ceph services to automatically start later
199+ with the operating system, extra steps need to be taken and are not described
200+ here.
179201
180202Shutting down the seed VM
181203-------------------------
@@ -201,6 +223,24 @@ following order:
201223* Shut down seed VM
202224* Shut down Ansible control host
203225
226+ Full startup
227+ -------------
228+
229+ If the entire control plane is powered down, it is best to bring the nodes up
230+ in the reverse order of shutdown:
231+
232+ * Power on Ansible control host
233+ * Power on seed VM (and other service VMs)
234+ * Power on Ceph nodes (if applicable)
235+ * Where possible, start the nodes running MON services first.
236+ * Make sure that all OSD services are back up and running. At this point
237+ it is safe to unset the ``noout `` cluster flag.
238+ * Power on controllers
239+ * Power on network nodes (if separate from controllers)
240+ * Power on monitoring node (if separate from controllers)
241+ * Power on compute nodes
242+ * Power on virtual machines
243+
204244Rebooting a node
205245----------------
206246
0 commit comments