During the master node upgrade as part of the cluster upgrade, currently customer losses access to K8s API endpoint for up to 15 – 30 sec due to VIP change. VIP will failover if the node or network goes down immediately. Keepalived is configured to perform a health check every 10 seconds. Thus, if K8s apiserver goes down right after the health check, it would take 9-10s for the next check + election time + upstream switch cache update to take place.
What can be done: Bring down keepalived first, forcing a VIP failover before bringing down the K8 apiserver as part of pf9-kube stop process. This would bring down the switchover time during upgrade significantly (in terms of % total time as it is now).