Skip to main content

Hi,


We’re struggling to make some alterations to our existing PD configuration as we onboard some new engineers. I’m not sure if we’re approaching this the wrong way, it’s a limitation of PagerDuty or something else so would appreciate any thoughts and advice!


First a bit of background:


In our org, we have office hours and out-of-hours engineers.


We have two schedules Daytime UK (9am-5pm Monday-Friday) and On Call (5pm-9am Monday-Friday & Friday 5pm – Monday 9am), configured using time restrictions.


There’s an Escalation Policy named 24x7 which notifies the above two schedules.


The problem:


We’re introducing some new engineers in another region and they’ll cover alerts during their working day. This could be achieved by renaming and adding layers to the Daytime UK schedule and adjusting the time restrictions or by adding a new schedule; Daytime US . However, for the foreseeable future, the new engineers would only be covering a subset of services.


So Services A, B and C would be supported by the Daytime UK , Daytime US and On Call schedules


But Service D would be supported by only the Daytime UK and On Call schedules.


As it stands that’d leave Service D with a gap where alerts are sent to nobody between 5pm-midnight (UK time). Is there a way to avoid that gap without duplicating the On Call schedule?


We could move Service D into its own Escalation Policy but that still doesn’t help without a schedule change.


It’d be nice to avoid duplicating the schedule, I can only see that causing future issues keeping them in sync when it comes to overrides or other amendments.


I hope that all makes sense, would welcome any input!


Thanks!

Hello @alan rickman, I consulted PageDuty Technical Support team and using custom APIs was suggested as best approach to address scheduling needs for different services during specific time windows:




  • Run a cron process that changes the schedule used by a given Escalation Policy right at the handoff times (e.g., at 5pm EP X is updated to start assigning via Schedule Daytime US and EP Y starts assigning to Schedule On Call; later at midnight EP X is updated to start assigning via Schedule On Call also).



This could allow for managing as few as two schedules (a base schedule that a layer for Daytime UK, a layer for Daytime US, and a layer for On Call during the week and another layer for On Call weekend; and a schedule that would be swapped in on only for Service D (EP Y) at 5pm, then swapped out again at midnight and replaced with the base schedule).


=========================================================================================


A PagerDuty-native solution would be to introduce a 4th schedule (which, although it would be another schedule to manage, is better than double the schedules) that would be used for the chunk of time between 5pm and midnight for Service D to determine which whom/which team should be notified during that period of time.


27


In a sub-option of the above, if managing fewer schedules is the goal, it would also possible to consolidate Daytime UK and On Call into one schedule (by means of layers); then you’d need two more schedules, one for EP X (Daytime US) and another for EP Y (New Schedule in the table above), leaving a total of 3 schedules to manage.


Hope this helps!


We find ourselves in a similar situation, with follow-the-sun 7x24 escalation policies, and each regional team able to support services in other regions only to varying degrees–though we’d like to eventually have any region fully support services in any other region, for now it’s on a best-efforts basis.


We accept that there will inevitably be cases where a responder, for whatever reason, is unable to address an incident, and ask responders to follow these guidelines in those cases:



  1. If you are notified of an incident, and unable to address it, do nothing (without acknowledging it) or escalate it (to the next rule in the EP).

  2. If you acknowledge an incident, but then realize you are unable to address it, unacknowledge it.


Our escalation timeouts are fairly short too, generally 5 minutes.


So for example, if the escalation policy notifies someone on the daytime-us schedule, and they don’t feel capable of addressing the incident, we would have them do nothing or escalate (next to the oncall-uk schedule).


24x7 escalation policy

|–> daytime-uk schedule

|–> daytime-us schedule

|–> oncall-uk schedule

(repeats)


That’s great thanks @xenda amici! We’ve been able to get something going with your suggestions.


Apologies for my delayed reply!


@alan rickman, thanks for following up! Happy to learn that those schedules are up and running 🏃‍♂️🏃‍♀️ - 24/7 and covering both US and UK time zones!


Reply