Automation → Detects failures and triggers mitigation processes dynamically (e.g., spinning up new instances, rerouting traffic, restarting failed services).
Fault Tolerance → Prevents failures from impacting operations by designing systems that continue running despite failures (e.g., using redundant components, error correction, failover mechanisms).The question specifically asks for a concept that pertains to detecting problems and invoking redundant systems or processes for mitigation.
Automation actively monitors the system, detects issues, and initiates mitigation actions (like provisioning additional resources or rerouting traffic).
Fault Tolerance is about system design, ensuring that failures do not impact operations by having built-in redundancy (e.g., RAID storage, dual power supplies, load-balanced clusters).
It does not actively detect issues or invoke recovery mechanisms programmatically—it simply ensures the system keeps running despite failures.