CINXE.COM
<?xml version="1.0" encoding="UTF-8"?> <collection> <dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:invenio="http://invenio-software.org/elements/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd"><dc:language>eng</dc:language><dc:creator>Henschel, Jack</dc:creator><dc:creator>Borges Aurindo Barros, Francisco</dc:creator><dc:title>When One Line Took Thousands of Websites Offline</dc:title><dc:title>SREcon EMEA 2023</dc:title><dc:subject>Talk</dc:subject><dc:identifier>IT-TALK-2012-008</dc:identifier><dc:description>This talk describes an incident where an innocuous change in a configuration management system caused a highly-visible unavailability of thousands of websites, which was followed by an intense recovery procedure. The talk covers the part of the infrastructure that prevented more widespread damage, the lessons learned (in terms of infrastructure design and operational procedures) as well as improvements significant improvements that have been implemented since then. All of this happened on Kubernetes infrastructure, therefore the talk will dive into the topics of Kubernetes operators, automation, manual intervention, configuration management and backups.</dc:description><dc:publisher/><dc:date>2023</dc:date><dc:source>http://cds.cern.ch/record/2875365</dc:source><dc:identifier>http://cds.cern.ch/record/2875365</dc:identifier><dc:identifier>oai:cds.cern.ch:2875365</dc:identifier><invenio:conference.place>Dublin, Ireland</invenio:conference.place><invenio:conference.dates>10-12 Oct 2023</invenio:conference.dates><invenio:conference.contact-email>jack.henschel@cern.ch</invenio:conference.contact-email></dc:dc> </collection>