United completes manual reboot as aviation industry reels from CrowdStrike outages
United completes manual reboot as aviation industry reels from CrowdStrike outages
The airline’s IT teams fixed more than 26,000 computers and devices at 365 airports globally, according to CEO Scott Kirby.
A United Airlines plane takes off from San Francisco International Airport in front of the San Francisco skyline on March 13, 2023, in San Francisco, California.
A United Airlines plane takes off from San Francisco International Airport in front of the San Francisco skyline on March 13, 2023, in San Francisco, California. The commercial carrier recovered operations Monday after Friday’s CrowdStrike-triggered IT outage.
United Airlines fully recovered from three days of operational disruptions caused by a faulty CrowdStrike software update, CEO Scott Kirby said in an open letter to employees and customers Monday.
“Today, our operation is back to normal and for the last 24 hours our systems, tools, and schedules have been stable,” Kirby said. “Our recovery was quick given the circumstances but not immediate.”
Teams of technicians manually fixed and rebooted more than 26,000 computers and devices one at a time at United contact centers located in 365 airports around the world, according to Kirby. The Friday update impacted an estimated 8.5 million Windows devices globally, Microsoft said in a Saturday statement. The CrowdStrike bug hit United’s systems hard, leading the airline to cancel 694 flights Friday. IT outages grounded an additional 713 United planes during the weekend, which Kirby characterized as one of the busiest travel times of the year.
As systems came back online, the company reduced flight cancellations to 69 on Monday and 47 by Tuesday afternoon, which respectively represented 2% and 1% of United’s scheduled flights on those days, according to tracking service FlightAware.
Not all carriers were so fortunate — or as successful in their remediation efforts.
Delta Air Lines, by far the hardest-hit domestic carrier, had more than 3,500 cancellations over the weekend. The company was still struggling to recover operations Tuesday afternoon when FlightAware had tallied nearly 500 cancellations.
“Because of the nature of the outage, the ability to respond depends heavily on available resources to do a direct intervention with the endpoint affected,” Forrester Senior Analyst Brent Ellis told CIO Dive in an email. Staffing, security, and device capabilities in critical areas created a perfect storm for the carriers experiencing the most pain, Ellis said.
Delta CEO Ed Bastian blamed the airline’s crew reassignment software for the ongoing service disruptions on Monday. Southwest Airlines traced its December 2022 operational shutdown, which led to the cancellation of nearly 17,000 flights during the holiday travel period, to a similar software failure.
The day before the CrowdStrike outage began disrupting global IT estates, Kirby, alongside CFO and EVP Michael Leskinen, had lauded United’s operations and technology teams for their work to reduce the recovery time and cost of previous operational disruptions.
“Over the year, our operations team has invested in technology and improved their processes to better recover from irregular operations,” Leskinen said Thursday morning, during the company’s Q2 2024 earnings call.
“That team, the three of them combined with a lot of support from Jason Birnbaum, our chief information officer, is identifying the places where there’s opportunity to pull permanent cost out,” Kirby added.
United saw total operating expenditures grow just 3% in Q2, despite 11% year-over-year increases in salaries and fuel costs – the company’s two largest expense categories.
As the airline neared completion of a massive migration earlier this year, Leskinen touted the cloud’s long-term benefits, including efficiency gains that would drive cost savings over time. The immediate benefits were less certain.
“You don’t save the cost of moving to the cloud until you shut the mainframe down,” Leskinen said in April during the company’s Q1 earnings call.
Operational improvements triggered more immediate returns, as United posted a net income of $1.3 billion in Q2, a 23% year-over-year gain. The company had a net loss of $124 million during the first three months of the year, largely due to the grounding of part of its Boeing fleet.
While the executives couldn’t have anticipated the CrowdStrike event, the IT outage made United’s investments in process and technology enhancements seem prescient.
“For industries that heavily rely on technology to support complex processes like crew tracking, bookings, and re-bookings and scheduling, it’s important to understand how and when vendors update their software products and what that could mean for your operations,” Christina Powers, a partner in West Monroe’s cybersecurity practice, said in an email.
“On the flip side, software providers must have meticulous release processes, which include robust testing around functionality, compatibility, and security,” Powers said.
Certainly! Here is a detailed expansion on the recent IT outage at United Airlines, including its impact, the recovery process, and the broader implications for the aviation industry.
United Airlines Completes Manual Reboot Amid Aviation Industry Disruptions from CrowdStrike Outages
Recovery Efforts and Operational Impact
United Airlines has successfully recovered from a significant operational disruption caused by a faulty CrowdStrike software update. This incident, which took place over three days, affected a substantial portion of the airline’s IT infrastructure, leading to widespread flight cancellations and delays. United’s CEO, Scott Kirby, addressed the situation in an open letter to employees and customers, confirming that operations had returned to normal by Monday.
“Today, our operation is back to normal, and for the last 24 hours our systems, tools, and schedules have been stable,” Kirby stated. “Our recovery was quick given the circumstances but not immediate.”
The recovery process was arduous and labor-intensive. Teams of technicians were deployed to manually fix and reboot over 26,000 computers and devices across 365 airports globally. The faulty update impacted approximately 8.5 million Windows devices worldwide, as noted by Microsoft in a statement released on Saturday. The immediate consequence for United Airlines was the cancellation of 694 flights on Friday, followed by additional disruptions over the weekend, marking it as one of the busiest travel periods of the year.
By Monday, as systems gradually came back online, the number of flight cancellations was reduced to 69, and by Tuesday afternoon, further minimized to 47. These figures represented 2% and 1% of United’s scheduled flights on those respective days, according to the flight tracking service FlightAware.
Broader Industry Impact
While United Airlines managed to stabilize its operations relatively quickly, other airlines were not as fortunate. Delta Air Lines, identified as the hardest-hit domestic carrier, faced over 3,500 cancellations during the weekend. By Tuesday afternoon, Delta was still grappling with the aftermath, with nearly 500 additional cancellations reported.
Forrester Senior Analyst Brent Ellis commented on the challenges posed by the outage. “Because of the nature of the outage, the ability to respond depends heavily on available resources to do a direct intervention with the endpoint affected,” Ellis noted. He highlighted that factors such as staffing, security, and device capabilities in critical areas created a perfect storm for the carriers experiencing the most pain.
Delta CEO Ed Bastian attributed the ongoing service disruptions to issues with the airline’s crew reassignment software. This was reminiscent of a similar software failure experienced by Southwest Airlines in December 2022, which led to the cancellation of nearly 17,000 flights during the holiday travel period.
United Airlines’ Operational Resilience
Just a day before the CrowdStrike outage began causing disruptions, Kirby, alongside CFO and EVP Michael Leskinen, had praised United’s operations and technology teams for their efforts in reducing recovery time and costs during previous disruptions.
“Over the year, our operations team has invested in technology and improved their processes to better recover from irregular operations,” Leskinen stated during the company’s Q2 2024 earnings call.
Kirby also acknowledged the contributions of Jason Birnbaum, United’s Chief Information Officer, and his team. “That team, the three of them combined with a lot of support from Jason Birnbaum, our chief information officer, is identifying the places where there’s opportunity to pull permanent cost out,” Kirby added.
Despite the challenges, United Airlines reported total operating expenditures growing by just 3% in Q2, in the face of an 11% year-over-year increase in salaries and fuel costs, which are the airline’s two largest expense categories. The company also posted a net income of $1.3 billion in Q2, marking a 23% year-over-year gain. This performance was a significant turnaround from a net loss of $124 million during the first quarter, which was largely due to the grounding of part of its Boeing fleet.
Lessons Learned and Future Preparedness
The IT outage underscored the importance of robust and resilient IT systems for airlines, which heavily rely on technology to support complex operations such as crew tracking, bookings, re-bookings, and scheduling. The incident has prompted a broader industry discussion about the need for meticulous software update processes and robust testing to ensure compatibility and security.
Christina Powers, a partner in West Monroe’s cybersecurity practice, emphasized the need for vigilance both from airlines and software providers. “For industries that heavily rely on technology to support complex processes, it’s important to understand how and when vendors update their software products and what that could mean for your operations,” Powers stated.
“On the flip side, software providers must have meticulous release processes, which include robust testing around functionality, compatibility, and security,” she added.
United Airlines’ Technological Investments
United Airlines’ recent investments in technology and process improvements played a crucial role in their relatively swift recovery from the outage. Earlier this year, the airline completed a massive migration to the cloud, which was expected to yield long-term efficiency gains and cost savings.
“You don’t save the cost of moving to the cloud until you shut the mainframe down,” Leskinen explained during the company’s Q1 earnings call in April. While the immediate benefits were less certain, the operational improvements triggered more immediate returns, as evidenced by the company’s strong financial performance in Q2.
Kirby and Leskinen’s foresight in investing in technological resilience appears prescient in light of the CrowdStrike incident. The airline’s ability to quickly mobilize resources and address the IT issues head-on mitigated what could have been an even more severe disruption.
The Role of IT and Cybersecurity in Modern Aviation
The aviation industry’s reliance on IT systems makes it particularly vulnerable to software failures and cybersecurity threats. The CrowdStrike incident is a stark reminder of the critical importance of maintaining and securing these systems. As airlines increasingly adopt advanced technologies to improve efficiency and customer experience, the need for robust IT infrastructure and stringent cybersecurity measures becomes paramount.
IT outages not only disrupt operations but also have significant financial implications. The costs associated with flight cancellations, passenger compensation, and the deployment of emergency IT support can quickly escalate. Moreover, such incidents can damage an airline’s reputation, eroding customer trust and loyalty.
To mitigate these risks, airlines must invest in comprehensive IT risk management strategies. This includes regular system audits, thorough testing of software updates, and the establishment of contingency plans to quickly address and recover from IT disruptions. Collaborating with trusted technology partners who adhere to rigorous security and quality standards is also essential.
Industry-Wide Implications and Response
The CrowdStrike-triggered outage has reverberated across the aviation industry, prompting other airlines to reassess their IT systems and preparedness. The incident serves as a wake-up call, highlighting the need for industry-wide collaboration and information sharing to enhance cybersecurity resilience.
Industry associations and regulatory bodies may need to develop and enforce stricter guidelines for software updates and cybersecurity practices. Airlines, on their part, should advocate for transparency from software vendors regarding potential risks and the measures taken to mitigate them.
The incident also underscores the importance of training and preparedness. Airlines must ensure that their IT teams are well-equipped to handle emergencies and that there are clear protocols for communication and coordination during disruptions. Regular training exercises and simulations can help prepare staff to respond effectively to real-world incidents.
The Future of Airline IT Systems
As airlines continue to embrace digital transformation, the integration of advanced technologies such as artificial intelligence (AI), machine learning (ML), and the Internet of Things (IoT) will become more prevalent. These technologies offer significant benefits, including improved operational efficiency, enhanced customer experiences, and better predictive maintenance.
However, the increasing complexity of IT systems also raises new challenges. Ensuring the security and reliability of interconnected systems will require ongoing vigilance and investment. Airlines must adopt a proactive approach to cybersecurity, continuously monitoring for vulnerabilities and emerging threats.
The adoption of AI and ML can also aid in enhancing cybersecurity measures. These technologies can analyze vast amounts of data in real-time, identifying patterns and anomalies that may indicate potential security breaches. By leveraging AI and ML, airlines can improve their ability to detect and respond to cyber threats swiftly.
Conclusion
The recent IT outage at United Airlines, triggered by a faulty CrowdStrike software update, has highlighted the critical importance of robust IT systems and cybersecurity measures in the aviation industry. United’s quick recovery, thanks to its investments in technology and process improvements, underscores the value of proactive planning and resilience.
As the aviation industry continues to evolve, airlines must prioritize IT risk management and cybersecurity to safeguard their operations and maintain customer trust. The CrowdStrike incident serves as a reminder that in an increasingly digital world, the stakes are high, and the need for vigilance and preparedness is paramount.
United Airlines’ experience provides valuable lessons for the industry, emphasizing the need for meticulous software release processes, robust testing, and the importance of quick and effective response strategies. By learning from this incident and investing in resilient IT systems, airlines can better navigate the challenges of the digital age and ensure a smoother, more secure travel experience for their passengers.
The aviation industry must continue to collaborate, innovate, and invest in technology to meet the growing demands of the modern world. By doing so, airlines can enhance their operational efficiency, improve customer satisfaction, and build a more resilient and secure future.