Introducing OpRes – Our Logical Architecture Overview

Drafted by Ben Saunders: OpRes Founder

Roughly a 10-minute read

If you have read our blog, 8 Reasons Why You Need OpRes in Your Resilience Program, then you should already have a sound understanding of how OpRes is stitched together! Earlier this week, we also pushed an updated logical architecture diagram to our Product Overview page. As we are in the midst of development, things continue to evolve as we work with our founding design partners. However, our core value proposition and the unique functions within OpRes still remain constant.

Over the course of this blog, we will provide you with an overview of how OpRes is architected at a high level. Whilst providing some initial insights around what each core element of the solution does. And, why we believe these are important features for firms as they respond to the regulatory policies set by various bodies around the world. We will also share some early work in progress designs and screens that we are constantly evolving as part of our development process.

As a starter for ten, we have re-posted our logical architecture diagram for you below.

The OpRes Logical Architecture

The OpRes Logical Architecture

OpRes is broken down into what we call “Tiers”. Tiers are essentially a collection of features and functions that allow users to maximise their return on investment with OpRes. We have broken down each of the tiers below and briefly explained what they do.

Tier 1 - Real-Time Data Ingestion & Analysis of Trusted Data Sources:

Based on our experience, many firms already have a lot of information about their business services and the suppliers/systems that underpin them. However, this information is often stored in multiple systems of record, databases, or static documents. In order to get information into OpRes and aggregate these sources, we use a combination of techniques. Let me break down each of the sub-components within this tier.

1a) AI & ML Driven Ingestion:

By leveraging higher-level services in our chosen cloud service provider, we will be able to read, interpret and identify keywords from static documents like supplier contracts and internal risk assessments. In doing so, we will be able to extract key data points that allow firms to quickly capture information like Service Level Agreements, Recovery Time Objective, Recovery Point Objectives, and Incident Management obligations for their important business services, technology systems, and suppliers.

1b) Rich UI Administration & API Integration:

 We have invested significant time and effort in designing a rich user experience for OpRes. In doing so, we have created the ability for users to add new business service and supplier records. This allows users to correlate multiple data sources by mapping their important business and in turn, generate insights when impact tolerances are either matched, met or exceeded. (We’ll expand what we mean by these three conditions later-on in this blog)

Adding a New Business Service in OpRes

Adding a New Business Service in OpRes

Adding a New Supplier Record in OpRes

Adding a New Supplier Record in OpRes

1c) Bulk Data Upload & Amendment:

We recognise that users might want to perform administration to multiple business services or supplier/system records in parallel. So this can be done swiftly, users can perform a bulk upload of a. CSV file to either create multiple new business services or suppliers in OpRes. Alternatively, they can also use the bulk upload option to amend multiple business services or supplier/system records.

Administering Multiple Record Changes in OpRes

Administering Multiple Record Changes in OpRes

Tier 2 - Workflow, Rules Engine & Scenario Testing:

 So, we’ve started to get data into OpRes…. now what?

 Tier 2 of the solution is where the vast majority of our business logic and heavy lifting takes place. In order to help firms act upon the insights and gaps surfaced to them, we need to define a set of rules and policies that create an “if this, then that” decision tree. This consists of:

2a) Impact Tolerance - Rules Engine: This is where customers define their impact tolerances across the following parameters for their important business services:

  • Service Level Agreement

  • Service Level Objective

  • Recovery Time Objective

  • Recovery Point Objective

  • Incident Notification Windows (Severity 1-4)

  • Incident Restoration Windows (Severity 1-4)

With OpRes, we use a Likert scale model to indicate a firm’s risk appetite ranging from Very Low to Very High-Risk. In parallel, users also set a percentage or time-based measurement to indicate the level of tolerance they are prepared to accept in the event of a system/supplier disruption causing intolerable harm to normal operations. Users can also document their justifications for setting these impact tolerances and use them as evidence when working with regulators and compliance teams.

Setting Impact Tolerances in OpRes

Setting Impact Tolerances in OpRes

2b) Audit Tracking, Automated Workflows & Notifications:

 When a firm sets their impact tolerances for a business service, we are able to correlate this with the end-to-end business service map they will have created as part of the Business Service Record Creation process. From here, OpRes is able to identify whether a system or supplier falls into one of the following categories using our workflow automation procedures.

Matches Impact Tolerances:  In the event that a supplier or technology system directly matches the impact tolerances set for the business service line, then no follow-on-action is processed, and the impact tolerance is marked as compliant.

Meets Impact Tolerances: When an impact tolerance is not directly matched, an amber condition is instigated. Based on how small or big the impact tolerance has been deviated from, a Resilience Gap ticket is automatically created and assigned an impact level based on the tolerance levels set by the user. Think of a Resilience Gap much like you would an Incident Record in ServiceNow or a software defect ticket in Jira. It is an audit record to track and remediate problems across important business services.

We created this “middle ground” condition as some firms may be willing to accept small deviations from their impact tolerances but will need to demonstrate to regulators that they have identified the gap and have chosen to accept it as a known risk. Alternatively, they can engage with suppliers/system owners to further explore potential changes that need to be executed to bring the tolerance levels into acceptable thresholds.

Exceeds Impact Tolerances: When an impact tolerance is exceeded, OpRes creates a red condition tagging. Once again, an automated resilience ticket is created and users can follow the same administration steps as they would have in the example above.

Impact Tolerance Visualisation in OpRes

Impact Tolerance Visualisation in OpRes

Open Resilience Gaps Identified in OpRes

Open Resilience Gaps Identified in OpRes

Based on our experience of working with large, regulated enterprises, we recognise that things change…. a lot. As such we are also creating an automated workflow process to notify users when resilience gaps are identified, updated, or closed. Furthermore, users can subscribe to updates about specific suppliers or business services that may have a specific interest in. We intend on distributing these updates through multiple channels. Namely, email and real-time collaboration platforms (e.g., Slack, Microsoft Teams etc).

Example Notification & Update in OpRes

Example Notification & Update in OpRes

In addition to the automated notifications highlighted above, OpRes will also contain workflow automation steps that allow users to capture and record important information about their suppliers at predefined time-points. As an example, users will be able to store documents that contain evidence of ISO, SOX or PCI compliance in addition to penetration testing reports. Users will then be able to set reminders and notifications as to when a document needs updating and can instigate workflows to engage with their suppliers or internal compliance teams.

2c) Impact Tolerance – Scenario Testing:

 This component of OpRes helps firms to perform lightweight impact simulations of their important business services. To do so, we combine the end-to-end business service maps and correlate these with:

  • Supplier & system data points

  • Business Management Information (e.g., active customers, revenue generated etc)

  • Historical Incident Management Data (e.g., severity & mean time to restore etc)

  • Key Non-Functional Data Points (e.g., total transactions per calendar month, per day, per second etc)

By cross-referencing these data points, users can perform simulations of system disruptions to a single, or multiple suppliers. Whilst identifying the potential impact to customers over the course of a defined period of time.

We’ve opted to build this lightweight method of conducting a scenario test so that firms can perform a “dress rehearsal” of their testing plans. We recognise that building end-to-end “like production” environments can be a costly undertaking. As such, we believe that firms can use our Scenario Testing tool to better understand where they make investments to remediate their technology stacks. By testing technology disruption scenarios much sooner, firms can use these insights for early evidence to regulators.

Scenario Testing Business Management Information in OpRes

Scenario Testing Business Management Information in OpRes

Impact Analysis - Scenario Testing Outputs in OpRes

Impact Analysis - Scenario Testing Outputs in OpRes


Tier 3 - Centralised Dashboards & Reporting

So, you’ve got data into OpRes, you’ve been able to set impact tolerances, and are now leveraging the workflow automation elements the tool has to offer…from here users can start to use our rich user interface to act on real-time insights.

Tier 3 of OpRes provides a number of views and screens so that firms can analyse their business services across a number of vectors and conditions. In doing so, we enable firms to analyse data sets across a global basis. Whilst users can segment their data choices to surface insights on core geographical regions, sub-regions, countries, brands and products. Much like popular CRM’s on the market, we are enabling features so that users can save these customised views and auto-generate reports to be shared with both internal and external stakeholders. 

OpRes Resilience Hub Dashboard - Regional Filter

OpRes Resilience Hub Dashboard - Regional Filter

Our intention is not to build another monitoring platform. We believe firms have spent significant time and cost implementing sophisticated Network Operations Centres. With highly tailored monitoring and alerting systems in place. As such, we intend on providing users with the ability to plug into and surface real-time monitoring insights using our extendable API design so that popular systems like ELK, Splunk and cloud-native alerting services can be aggregated to demonstrate conformance or deviations from impact tolerances in real-time. 

Business Service Overview Screen in OpRes

Business Service Overview Screen in OpRes

Users can also search for an individual business service or supplier/system record and quickly identify the total number of active resilience gaps that exist across the entry. Furthermore, users will be able to expand the dependency maps for their important business services and visualise relationships between internal technology teams and their 3rd & 4th party suppliers. We are also creating the ability for users to understand the aggregated resilience posture of a business service or supplier, based on the impact tolerances that are defined within our rules engine.

Business Service Supplier Breakdown Screen in OpRes

Business Service Supplier Breakdown Screen in OpRes

In addition, firms are able to ascertain where workloads are distributed across public and private cloud hosting providers. Whilst OpRes also allows users to create a register of their assets and the consumption models that they procure technology services. Whether these be through Infrastructure as a Service (IaaS), Platform as a Service (PaaS) or Software as a Service (SaaS). 

Finally, as part of our onboarding process, we ask users to outline the types of products that are sold across their firm. As well as the channels that the products are delivered through. This allows us to tag and categorise clusters of business services that are aligned to a specific line of business or products (e.g., home insurance, retail current account etc). In turn, we can then cross-reference product sets across specific brands, countries and regions allowing firms to identify which parts of their organisation are surfacing a larger proportion of resilience gaps. 

Indeed, our intent here is to enable firms with more information around where they make investments to bolster their operational resilience capabilities. For example, a specific cluster of important business services may be carrying a significant number of resilience gaps. However, based on the number of active customers, market penetration, and future business directives, the firm may opt to invest “just enough” capital to optimise its ability to meet impact tolerance targets. Enabling them to further bolster core areas of their businesses’ focus and their customer’s needs. 

In Closing:

Over the course of this blog, we have broken down OpRes’s logical architecture. We have also provided you with some rationale as to why we are embedding certain features so that firms can show evidence of their efforts to comply with the various regulatory requirements regarding operational resilience. We have covered each tier of OpRes. Taking into account how we get data into the solution. How users can maximise our automated workflow capabilities. And finally, we have provided insights into the types of views and dashboards that firms can expect to have access to with our initial launch proposition. 

Over the last few months, we have had the opportunity to work with a number of early-stage design partners, who have helped to shape and inform our product roadmap. We are continuously looking to conduct interviews and show & tell’s with industry peers. Who are as passionate about this subject area, as we are at OpRes. If you, or your firm, are interested in learning more about the platform and want to share your candid feedback around our product’s development. Then please do get in touch with us via email: hq@opres.uk

Stay tuned for more product updates and insights over the coming weeks. Thank you for reading! 

Ben


Previous
Previous

Using Site Reliability Engineering to Increase Operational Resilience

Next
Next

Cross Border Operational Resilience Policies: What Do Firms Need to Know?