Contents

Incident Commander

Incident Commander

/en/incident_commander/img.png

How to train and grow new Incident Commanders in your team

When a serious incident happens, it’s not just about CPU at 100%, 500 errors, or exploding queues.
There is always another factor at play: the team’s ability to stay clear-headed.
At the center of that ability there’s a key figure: the Incident Commander.

This article is for those who want to grow new Incident Commanders in their team, defining a training path that is clear, repeatable, and above all, human.


What Incident Commander really means

The Incident Commander is not “the strongest technical person in the room”.
They are the person responsible for the process, not necessarily for the technical solution.

In practice, they:

  • are the single point of reference during the incident;
  • bring order to the conversation, assign roles and steps;
  • communicate with the outside (stakeholders, management, customer-facing roles);
  • keep the focus on “restoring the service”, not on “finding who’s to blame”.

A good way to define the role:

“The Incident Commander doesn’t need to know everything.
They just need to make sure the right people do the right things, in the right order, without being overwhelmed.”

This is the first concept to pass on to people you’re training:
you’re not the technical superhero, you’re the director of the operation.


How to handle pressure and uncertainty

A major incident brings pressure, noise, and compressed time.
The first skill to train is the ability to stay calm in uncertainty, without filling gaps with panic or rushed assumptions.

Practical training exercises

You can build training sessions that simulate exactly this:

  • Timed simulations

    • You create a scenario (e.g. checkout down, slow APIs, queue backlog).
    • You give the future Incident Commander 30–60 minutes to manage the flow, not to solve the technical problem.
    • You evaluate:
      • clarity of requests,
      • ability to stop the noise,
      • structure in how they gather information.
  • Guided uncertainty

    • During the simulation, you introduce partial and even conflicting information.
    • The goal is not to “guess right”, but to make explicit what is known and what is not:
      • “Right now we know this…”
      • “These parts are still hypotheses…”

Tools to support calm

Teach a few simple micro-techniques:

  • Verbal clarity
    Short sentences, clear subject, action verb:

    • “Mario, please check the logs of service X for the last 15 minutes.”
    • “Stop hypotheses for 2 minutes: let’s recap the facts.”
  • Regular checkpoints

    • Every 5–10 minutes:
      • “Let me summarize where we are…”
      • “This is our current hypothesis, this is the next step.”

Calm does not mean “not feeling the pressure”, but not dumping that pressure on the team.


The role as a mission (not just a line on a CV)

If you want to grow credible Incident Commanders, you have to present the role as a mission of service, not as a power badge.

Key messages to future ICs

  • Responsibility towards the team

    • “Your job is to protect the focus of the people who are working.”
    • “You must create an environment where it’s easier to think than to panic.”
  • Responsibility towards the company

    • It’s not just about uptime:
      • there’s the reputation with customers,
      • the business’s trust in the tech team,
      • the ability to learn from the incident.
  • Responsibility towards the truth

    • The Incident Commander is responsible for surfacing facts, not for building a convenient narrative.
    • Even “we don’t know yet” is a valid output, if communicated well.

Turning the role into a growth path

  • Planned rotation:
    • don’t have a single “designated” Incident Commander, but a list of colleagues in development.
  • Shadowing:
    • first as silent observers, then as co-ICs, finally in full autonomy.
  • Structured post-incident feedback:
    • 10 minutes specifically on “how the IC role was handled”, separate from the technical debrief.

Mindset, rituals, communication

A good Incident Commander does not appear out of nowhere:
they are built through mindset, rituals, and communication style that are trained and repeated.

Mindset: few ideas, but very clear

Things to (metaphorically) tattoo into the minds of your trainees:

  • “First restore, then understand.”

    • It’s not the time to refactor.
    • It’s not the time to change technology.
    • It is the time to get the service back up in the safest and fastest way possible.
  • “A clear decision is better than three half-decisions.”

    • Indecision is a decision too, often the worst one.
  • “No live blame-hunting.”

    • Blame is superficial, responsibility is systemic.

Operational rituals during the incident

You can codify actual rituals to teach and repeat in training:

  1. Incident kick-off

    • Explicitly identify the Incident Commander verbally (or via chat):
      • “From now on, the Incident Commander is X.”
    • Define the single communication channel (e.g. #incident-1234).
  2. Fact gathering

    • “In 3 minutes, everyone write what you see, no hypotheses, just data.”
    • The IC summarizes out loud.
  3. Plan definition

    • “Here’s what we’ll do in the next 10 minutes: …”
    • Clear assignment:
      • person → task → estimated time.
  4. Cadenced updates

    • Every X minutes:
      • “Current status: …”
      • “New information: …”
      • “Updated plan: …”
  5. Provisional closure

    • Explicitly state when the incident is closed from an operational point of view:
      • “Service restored, but in degraded mode.”
      • “Incident closed, follow-up and root cause analysis tomorrow.”

External communication

Part of the training must also include the voice towards the business:

  • Clear, non-technical language:
    • “A part of the system that handles X is not working correctly.”
  • Focus on:
    • impact,
    • estimated mitigation time,
    • actions in progress.
  • No reckless promises:
    • “We are working to restore the service; the next update will be in X minutes.”

Conclusion: a culture of block-based thinking

At the foundation of a good Incident Commander there is a way of thinking that you can teach and nurture:
the culture of block-based thinking.

In practice, it means:

  • breaking complex situations into manageable blocks;
  • moving forward through explicit steps instead of chaotic intuition;
  • reasoning through:
    • data gathering,
    • decision,
    • action,
    • verification,
    • communication.

Training new Incident Commanders is exactly this:

  • providing a shared language (roles, rituals, steps);
  • creating psychological safety (it’s okay not to know, it’s okay to ask for help);
  • turning every incident into a gym for clarity, not only into a fire drill.

If you start treating the Incident Commander role as a continuous learning path – and not as an occasional hat to wear “when it’s your turn” – you’ll see not only the quality of incident management rise, but also the overall maturity of your engineering team.