IBM: Operations Analytics

Getting ahead of data center failures before they become outages.

My role Design team lead responsible for coordinating sprint design delivery, partnering with the Offering Manager and Engineering to refine the backlog, and supporting UX design and validation through iterative feedback cycles

Scope Product lead + engineering + researcher support | Sprint-based delivery | Log analysis workflow, scenario-based investigation paths, intervention actions and next-step clarity

Challenges Complex technical domain, high-stakes operational workflows, limited time to go from concept to MVP, and the need to balance speed with credibility for expert IT users

Impact Helped launch an MVP on IBM Bluemix by keeping design and delivery aligned to sprint cadence, validating direction with users, and shaping an experience that enabled earlier detection of anomalies and proactive intervention

Skills Design leadership, sprint-based delivery coordination, backlog refinement with stakeholders, UX in high-stakes technical domains, user validation, MVP shaping for credibility and speed

Operations Analytics helped system administrators detect abnormal behavior in data center logs early enough to prevent outages. Users uploaded log files and used the tool to spot patterns that pointed to failing hardware or performance bottlenecks.

Our primary user was Rich, a system administrator tasked with maintaining seamless operations across the infrastructure.

As Design Lead, I focused on team execution and decision-making, keeping design aligned with sprint delivery and user validation:

Backlog Management: Partnering with the Offering Manager and Development team to refine and prioritize user stories
Sprint Planning and Delivery: Ensuring designs were assigned to the team, completed within the two-week sprint cycle, and delivered on schedule
User Validation: Working closely with our user researcher to validate designs with real users and incorporating their feedback into iterative improvements

An MVP of the product was successfully launched on the IBM Bluemix cloud catalog, providing IT professionals like Rich the tools to safeguard their systems and prevent potential failures.

After conducting user interviews, we built empathy maps from which we extracted user insights.

"It's the little issues that go un-detected that really come back to bite you."
— Interview subject

Once we had clear insights and a good understanding of our target user, we created a simple proto-persona. We kept it simple and treated it for what it's good for: a shared reference that kept the team talking about the same person instead of drifting toward edge cases and hypotheticals.

"I've got so much going on, if you can help even a little in staying ahead, that would be huge."
— Interview subject

This is an initial as-is flow diagram done for Operations Analytics. We began by looking at a typical data center tech's course of action during their workday. This particular scenario revealed the following points:

Surface severity and confidence together. The tech could see that something was wrong but not how wrong. The product needed to communicate not just the anomaly, but a signal about its likely trajectory and outcome.
Consolidate cross-device context into a single view. The multi-application and -dashboard problem was a direct consequence of siloed monitoring and reporting. Bringing related signals together was a systems-level fix, not a UI polish.
Reduce the cost of the judgment call. If the tech had clearer insights, the "act or wait" decision becomes lower-stakes and faster. The design goal wasn't just to surface information earlier, but to make the right action feel obvious.

"Keeping the data center running is Job One."
— Interview subject

We did a to-be flow for our offering MVP. We knew that the MVP would be less than ideal but would get the tech working in a slightly different way. The MVP to-be scenario pointed to the following:

Deliver clarity at the moment of highest uncertainty. The MVP didn't change how the tech noticed a problem, but it changed what happened next. Uploading a log file and getting ranked, component-level insight replaced the most costly part of the as-is: guessing where to look.
Ground the action decision in evidence, not experience. In the as-is, the call to act or wait was made on instinct. The MVP gave the tech the details needed to make that same call with confidence.
Respect the existing workflow while improving it. The tech still starts from familiar territory: lights, alerts, a log file. The MVP meets them there rather than asking them to change everything at once.
Set up the complete offering without overpromising it. The MVP ends the same way the complete offering does, with the tech returning to a monitoring stance. The habit was being established even before the full product existed.

A to-be scenario flow diagram for the complete offering.

We also mapped the offering in its mature state, confident it would fundamentally change how techs like Rich approached their work. The complete offering scenario pointed to four design priorities:

Shift from reactive to continuous monitoring. The complete offering was designed around a persistent agent installed across the data center, so the tech was always watching rather than responding after the fact.
Replace experience-based triage with ranked, actionable signals. Rather than relying on the tech's instinct to assess criticality, the product surfaces a prioritized action list so judgment is applied to decisions, not sorting.
Reduce the cost of knowing what to do next. By step three of the mature flow, the tech has a working plan. The goal was to collapse the gap between "something is wrong" and "here is what I do about it."
Close the loop after intervention. The flow ends with the tech back at the dashboard. The design treated monitoring as an ongoing posture — at one "pane of glass" — rather than a task to complete across multiple systems.

Key decisions Summary: scenario workflows, actionable signals, sprint validation rhythm

Design the workflow around scenarios, not raw log volume

DecisionStructured the experience around investigation scenarios that move from signal to action

TradeoffDid not expose every log capability in MVP, focused on the most actionable paths

WhyOps users needed speed and credibility under pressure, not a toolbox that required interpretation

ResultEarlier detection and clearer intervention paths validated through scenario flows

Make "what do I do next" unambiguous

DecisionTurned monitoring signals into explicit next steps for investigation and intervention

TradeoffInvested in guidance and workflow clarity over purely visual log exploration

WhyReactive workflows escalate incidents; actionability creates proactive intervention

ResultStakeholders reported clearer paths to act on early signals

Establish a sprint-aligned validation rhythm to protect quality

DecisionCoordinated design delivery and validation each sprint before handoff to engineering

TradeoffLess speculative design, more incremental progress tied to real feedback

WhyComplex domain plus delivery pressure can push unvalidated choices into production

ResultReliable design-to-delivery cadence and grounded decisions across sprints

Outcomes Summary: earlier detection, more proactive action, validated each sprint

Earlier

Detection

Proactive action

Validated

Each sprint

Time to detect — Ops teams were reactive and anomalies surfaced after escalation to incidents. Log analysis made pattern recognition faster and directly actionable. Confirmed through as-is vs to-be scenario flows validated with target users.
Proactive interventions — Monitoring was reactive and intervention came after impact. MVP gave administrators a reason to investigate early signals before failures. Confirmed through quote synthesis, scenario validation sessions, and launch stakeholder feedback.
Design-to-delivery reliability — Delivery pressure in a complex domain risked unvalidated UX decisions reaching engineering. Sprint-aligned design and validation rhythm established across the workstream. Evidenced by backlog refinement cadence with the offering manager plus researcher-supported validation.

What changed in the product

Monitoring signals were made actionable, reducing ambiguity at the "what do I do next" moment
The investigation and intervention path was clarified so administrators could act on early signals

How we measured

User interviews and empathy mapping established the as-is workflow. Scenario validation sessions confirmed the to-be model. The MVP launched on IBM Bluemix and later evolved into a key IBM Z-series monitoring tool.

Operations Analytics MVP product screens. This humble application MVP went on to become a key IBM Z-series server monitoring tool.