Within a Nutanix cluster, an administrator is getting frequent failure alerts of the Cassandra and Stargate services for one of the nodes.
What action will be taken by the cluster?
Within a Nutanix cluster, an administrator is getting frequent failure alerts of the Cassandra and Stargate services for one of the nodes.
What action will be taken by the cluster?
In a Nutanix cluster, when there are frequent failure alerts of the Cassandra and Stargate services for one of the nodes, the cluster will redirect the I/O path to another Controller Virtual Machine (CVM). This is done to ensure that data access and operations continue smoothly despite the failures on the affected node.
D. When the Nutanix cluster receives frequent failure alerts for the Cassandra and Stargate services on a specific node, it will take the following actions: Self-Healing: Nutanix Prism detects the service failures and initiates self-healing processes. The affected services are restarted or repaired automatically. Data Redistribution: If the node becomes unresponsive or fails completely, data stored on that node is redistributed to other healthy nodes. This ensures data availability and redundancy. VM Failover: VMs running on the failed node are automatically restarted on other nodes. VM failover ensures continuity of services. Alerts and Notifications: The administrator receives alerts to stay informed about the situation. Nutanix Prism provides detailed information about the service failures.
B When a metadata drive fails, the local Cassandra process will no longer be able to access its share of the database and will begin a persistent cycle of restarts until its data is available. If Cassandra cannot restart, the Stargate process on that CVM will crash as well. Failure of both processes results in automatic IO redirection using data path redundancy. During the switching process, the host with the failed SSD may report that the shared storage is unavailable. Guest VM IO on this host will pause until the storage path is restored. After redirection occurs, VMs can resume read and write I/O. Performance may decrease slightly, because the I/O is traveling across the network rather than across the internal network. Because all traffic goes across the 10 GbE network, most workloads will not diminish in a way that is perceivable to users.
C. If Cassandra remains in a failed state for more than thirty minutes, the surviving Cassandra nodes detach the failed node from the Cassandra database so that the unavailable metadata can be replicated to the remaining cluster nodes. The process of healing the database takes about 30-40 minutes. If the Cassandra process restarts and remains running for five minutes, the procedure to detach the node is canceled. If the process resumes and is stable after the healing procedure is complete, the node will be automatically added back to the ring. A node can be manually added to the database using the nCLI command.