Service Level Agreement

There are many misperceptions around Service Level Agreements (SLA).

  • When do you need one?
  • Applicability or practicality
  • Impact on effect or outcome
  • Avoidance of problems or downtime
  • Time to resolution when an issue does occur

Two major types of SLAs

Uptime or availability of systems (Organizations must fund the infrastructure and proactive management of infrastructure to a level that matches their requirements)

Response time of personnel (not time to problem resolution)

Where Did SLAs Come From?

The desire to have Service Level Agreements is typically associated with a desire by parties to a contract to have accountability and contractually enforceable penalties for violations of the SLA. This may be appropriate when millions of dollars are at stake, but the cost/benefit of this kind of overhead and legalese is rarely viable in the SMB market. What is a SMB? Any organization with under 500 staff is typically considered SMB.

Organizations with 5000 or more employees tend to have enough budget to have enough IT staff to cover three shifts (response time) and to have infrastructure resiliency to provide high uptime. Those organizations also have enough budget to self-insure for outages by having spares or inherent automatic failover resiliency/redundancy in systems.

SMB organizations can achieve high levels of resiliency and can have 4-hour response time warranty contracts on infrastructure components. It is important to distinguish between the achievable and necessary resiliency for infrastructure versus the perceived necessity of a one-hour response time for support.

Examples

A concrete example of a match between your uptime requirements and your budget spend is that if you have no tolerance for laptop or PC outages, you need to have spare equipment on hand that is preconfigured and fully maintained all the time. No warranty option or SLA is going to satisfy the requirement to continue business operations when something goes wrong with a laptop or desktop computer. The concept that you can run over to Best Buy and procure a computer and then get it setup in short order is a failed concept on two levels. First, the computer is not going to be able to be fully setup and ready to use in short order. Second, systems from Best Buy do not meet the procurement policy standards.

If your business will have an adverse financial impact of $70,000 per day if a workload is down, then everything that supports that workload needs to be designed for the highest uptime possible in your budget. Or you need to adjust your budget to match the prevention of that kind of a financial loss. The workload and everything it depends on must be competently proactively managed and monitored. That does not mean help desk staff. That means engineering staff. By the proper proactive application of engineering staff, the desired uptime is achieved, and no one‑hour response time is required.

Evaluate the uptime requirements by time of day. What if you need that workload to be up from 6A – 6P Monday through Friday and until noon on Saturday? Great, so maintenance windows exist. Expect that your maintenance costs are going to include the server personnel working evenings and weekends. There is a staunch difference between maintenance and proactive management being done during planned times versus you having the ability to call your IT support 24x7 or receiving a one-hour response time.

A very critical point I am trying to convey is that there are rarely cost savings that come from having help desk staff who will respond to your contacts within an hour. If you need to have a problem worked on in an hour, what you really need is to prevent the problems. We achieve high uptimes by preventing problems.

There are other requirements for a one-hour response SLA also. It means that your personnel are not tampering with the systems and not deploying technology on their own. Poor prior planning or a lack of proper prior involvement of the personnel are not compensated for by server and network personnel responding to support requests in an hour. Issues that occurred because of a lack of communication or a lack of proper planning are common in co-managed IT scenarios. An IT service provider would be foolish to provide a one-hour response time SLA for these types of support requests. It pulls technical resources away from clients who behave well and transfers resources to crisis management for client accounts where poor planning and poor communications occur. This is not fair to clients who behave well. Therefore, an ITSP who accommodates poor behavior is not ensuring they have availability to properly serve the clients who behave well.

Practical resiliency and redundancy

Reliable servers with good uptime are achievable without having everything redundant. Redundant internet connections and network equipment are often quite affordable. What is cost‑effective, adequately redundant or resilient is an art form and best left to your consolidated security, infrastructure, server, and network architects. There are a lot of opportunities for SMBs to self-insure by buying correctly and using the correct designs in infrastructure. QPC Security provides this service for clients. It is critical that services be classified, and the cost of downtime be calculated per service. Then each service is analyzed for cost to provide higher levels of resiliency in the design. By doing procurement, engineering, implementation, and proactive management properly, high uptime is typically available and achievable for SMB. Without proper procurement, the kind of uptime and resiliency you need is not achievable. This is why having a QPC informed and CFO enforced procurement policy is critical to the success of client organizations.

The Truth Behind Service Level Agreements

Maybe all you want is a one-hour response SLA. Is that during business hours or 24x7? Both are wildly different and not likely even if you have internal IT staff. Supply chain counterparty factors must also be considered in SLAs. If the upstream provider has a 4-hour response time contract (not 4 hours to time to resolution), then there is no way that your IT service provider or your internal IT employee can provide a one-hour metric. A lot of time could be spent on the SLA topic when better outcomes could be achieved by designing more resiliency and redundancy into the systems your business is reliant upon.

So how do some IT Service Providers support their five 9s guarantee or their one-hour response time SLA? One way is short-cut issue resolution. A good analogy here is a nail in a tire causing air pressure loss. Sure, you can stop every morning and evening at the local gas station and put air in the tire – in other words, put a band aid on it. Or you can get the nail pulled and tire repaired. IT Service Providers can do the same with your issues. If they are constantly pressured to maintain that SLA and you have an issue that will take 6 hours to long-term fix or 5 minutes to manage with a quick work-around which way do you think they will go? It is also relevant that if the people you have desired to have working on your systems are level one or level two techs, they are typically not capable of seeing or implementing the long-term fix. In your effort to have contractually guaranteed time to response, you may be requiring that your issues are handled by low-level staff as a rule.

Response Time Versus Time to Resolution

It is also important to note that response time is not time to resolution. An ITSP may tell you they have 24x7 support and promise you an SLA on response time but that is not a promised time to resolution. When a client has a requirement for a service level agreement and wants a guarantee of response and resolution time, this cannot be achieved without having high quality engineering staff. A well-staffed answering service can ‘respond’ meaning the ITSP has met their SLA. However, it does not reduce time to problem resolution. But it did increase the cost of the service because the ITSP has now added in another ‘service’ to meet their SLA. Your best bet to getting the problem solved is for first contact to be with the highest-level technical person feasible. That person is best suited to be able to solve whatever the problem is regardless of complexity. This is how QPC Security handles it.

The Outsourced Help Desk Strategy

Another way for an ITSP to meet a SLA is to outsource their Help Desk functions to large, outsourced Help Desk providers – many of which are not even US- based – to deliver on that 24/7/365 response guarantee. Here is the biggest issue with approach. Security. Many of these firms have high turnover. Couple that with the fact that administrative credentials to your environment must be shared with these 3rd party firms for them to be able to support your users. Now not only does your ITSP have administrative access to your systems; so do many, many other people that you do not even have a contract with or legal terms holding them responsible if credentials are stolen or misused. ITPro generated a great article on this subject of How Help Desk Outsourcing Undermines Your Security. I also took a deeper dive into this topic on my Breakfast Bytes podcast.

Do You Really Need an SLA?

If a company does not have that kind of high availability from their own employees, how much availability is actually required? Has response time for highly urgent matters been a problem with your service provider? If not, then there is no problem. If there is, then do you have the right monthly volume relationship in place with them? An ITSP that does not have a significant monthly relationship with you is not able to provide the kind of response time to you that they can provide to another customer who commits to them on a monthly basis as discussed in the next section.

Another way to look at the perceived need for an SLA is based upon potential monetary loss if there is downtime. If you have workloads where you will lose $70,000/day if you are down, then those workloads must have adequate uptime, resiliency, redundancy, and competent support. That does not mean you need an SLA. It means you need to invest in the proper level of proactive self‑insurance to achieve those uptimes which comes from a quality engineered design and excellent proactive management. What you need is a business partner who properly responds to issues based upon a priority level, and who has the technical excellence in their staff to engineer, implement, and maintain the systems to deliver the desired outcomes.

Change Agreement to Average

For the reasons provided above, QPC Security does not provide Service Level Agreements. Instead, we provide Service Level Averages. Managed Service clients will receive a response to their critical support request (assuming it is submitted through proper channels), on average, in an hour or less during business hours. Requests for support, service, and projects are handled in accordance with our prioritization policy.

Breaking this down further, Service Level Averages are provided to clients with whom we have a monthly Managed Service contract in place. Why is this an important distinction? It is quite simple. QPC Security cannot effectively respond to something we do not fully manage. Imagine you have a security system, that you purchased off the internet and installed on your own to save money, in your business. It’s Christmas Eve and you get an alert that there is a bad sensor triggering on your main business entry door. No security company is under contract to help you with these types of issues. You start calling all the local companies that you can find on your Google search. None of them know your specific system nor do they have remote access to try and troubleshoot the problem. How successful do you think you will be in getting any company, on Christmas Eve, to send someone to your business to fix the issue? It is the same with your IT infrastructure. If we do not already monitor and manage it, we cannot commit to responding to an issue with it.

What about after-hours emergencies? We absolutely have a way for our Managed Service clients to reach us if needed. I emphasize the ‘if needed’ component because in the last 16 years, we have had a total of THREE clients contact us outside of business hours to address a critical issue. This is possible because we proactively manage and maintain our clients’ environments so that they experience consistent uptime and network stability. We identify and manage issues before they become a problem and cause service disruption. This begs the question then, are you asking for a Service Level Agreement because you think you need it? Based upon our experience, you do not.

In summary, if you are making business decisions based solely upon a contractual agreement which guarantees a ‘response’ within a certain time period, then QPC Security is not the right fit for you. As this article mentions, anyone, including a 24/4 answering service, can ‘respond’ to you. What you are really looking for is a guaranteed resolution time, and no one can guarantee you a resolution time. What you need are resilient systems with high uptime, and to get this, you need to invest in the infrastructure to support it. To learn more and discuss how QPC Security can effectively manage and maintain your IT infrastructure, contact us today at 262-553-6510 or by visiting qpcsecurity.com.