The Category Maturity (CM) Scorecard is a Summative Evaluation that takes into account the entire experience as defined by a Job to be Done (JTBD), instead of individual improvement(s), which are often measured through Usability Testing (i.e. Solution Validation). This specialized process provides data to help us grade the maturity of our product.
The goal of this process is to produce data as objectively as possible given time and resource constraints. For this reason, the process is more rigorous than other UX research methods, and it focuses more on measures and less on thoughts and verbal feedback.
To produce data in which we have confidence, the data should be as free of subjective judgement as possible, relying on observed metrics and self-reported user sentiment. Our goal is to compare data over time to see how additions and improvements have impacted product maturity in a quantifiable way. To facilitate this, we've made this process prescriptive, so that it can be consistently applied by all Product Designers in all categories of our product.
Sometimes the JTBD you need to evaluate occurs over a period of time, such as a multi-step process when responding to an alert. For these cases, it's appropriate to include the phrase '(a period of time) has passed' when moving to the next phase of the JTBD in the CM Scorecard scenario.
Occasionally, teams run into situations where they learn something surprising from a CM Scorecard – for example, even though the score for this single research initiative is high enough to move maturity up, they believe based on the findings that it's not ready, yet. In this case, the Product Manager and Product Designer should use their good judgement about addressing fundamental problems before changing maturity and communicate that decision to stakeholders.
If you have questions, suggestions for improvement, or find this process doesn’t meet the needs of your users or product category, reach out to the UX Researcher for your group.
Any Category Maturity Scorecard effort should have a corresponding issue created in the GitLab UX Research project. Ensure the label
CM Scorecard is applied to the issue to aid in tracking UX research efforts.
Refer to the Category Maturity page to understand scoring. It is important to note that:
See how Scorecards relate to Category Maturity Scorecards in the UX Scorecards handbook page.
All of the UX Scorecards can be found in this epic.
Category Maturity Scorecards are about judging the quality of experiences against a defined and confident JTBD. JTBD are the umbrella component of our product design process which are used as guides to direct our product strategies and the features they are comprised of. Therefore, JTBD(s) for the category should be defined and have a high level of confidence ahead of completing a Category Maturity Scorecard.
Before you move to Step 1, you'll first need to select a couple of high priority job statements that are relevant for your category features and translate them to script scenarios. Ideally, no more than 2 job statements should be tested per Category Maturity Scorecard study. The number of scenarios used per job statement often depends on the complexity of the features tested.
Tip: Since job statements are persona and solution agnostic, you might find them to be too broad to serve as guidance for writing script scenarios. If that is the case, consider breaking the job statements down into user stories as an intermediary step, in order to bridge the gap between high-level job statements and actionable scenarios. Learn more about the difference between job statements and user stories in How to Write JTBD.
To summarise, this is the workflow that should be followed in this step:
During the JTBD creation and validation phases, the Product Designer and Product Manager will have devised a set of user criteria to describe the user(s) you're referencing in your job(s). The same criteria should be used when recruiting for the Category Maturity Scorecard, ensuring you are gathering feedback from the right type of user(s).
To balance expediency with getting a variety of perspectives, we conduct the Category Maturity Scorecard research with five participants from one set of user criteria. If you have multiple user types for your JTBD, it is ideal to recruit 5 from each user type. To keep the study manageable, focus on no more than 2 user types per study. If more than 2 user types are required to accurately measure your JTBD, conduct a separate follow-up study for the remaining user types.
Example: A JTBD can be completed by a DevOps Engineer and a Release Manager. In this case, you’d recruit a total of 10 participants: 5 DevOps Engineers and 5 Release Managers
The recruiting criteria can be based on an existing persona, but needs to be specific enough that it can be turned into a screener survey. A screener survey should then be created in Qualtrics that your prospective participants will fill out to help you determine if they're eligible to participate.
The template survey includes a question asking people if they consent to having their session recorded. Due to the analysis required for Category Maturity Scorecards, participants must answer yes to this question in order to participate. Once your screener survey is complete, open a Recruiting request issue in the UX Research project, and assign it to the relevant Research Coordinator. The Coordinator will review your screener, reach out to you if they have any issues, and begin the recruiting process users based on the timeline you give them.
Note: Recruiting users takes time, so be sure to open the recruiting issue at least 2-3 weeks before you want to conduct your research.
Testing in a production environment is the best choice because your goal is to evaluate the actual product, not a prototype that may have a slightly different experience.
Once you know what scenario(s) you’ll put your participants through, it’s important to determine the interface you’ll use. Some questions to ask yourself:
It’s important to thoroughly plan how a participant will complete your scenario(s), especially if you answered "yes" to any of the questions above. Involve technical counterparts early in the process if you have any uncertainty about how to enable users to go through your desired flow(s).
If you want help creating a pristine test environment be sure to reach out to the Demo Systems group on the #demo-systems Slack channel. They can create a demo environment for users and help build any particular parameters needed for your testing environment. Be aware that setting up a test environment for a research study can be time consuming and difficult.
If your JTBD interacts with other stage groups’ areas, reach out to them to ensure their part of our product will support your scenario(s).
Because this is a summative evaluation of the current experience, all of the available options the participant should need access to must be available in the GitLab instance. When you recruit participants, keep in mind the tools and features they must access to complete the JTBD scenarios.
Run through the scenarios yourself after they have been completed. Document what qualifies as successful completion of each scenario for future reference.
Make sure to test these scenarios with coworkers before evaluating with research participants. Ideally, the coworker(s) won’t be familiar with the scenario or have an expert-level understanding. It’s acceptable to coach them a little, using the pilot as a discussion to uncover any problems with your scenarios.
👍 Defining ‘Success’
👎 Defining ‘Failure’
What to do if ‘Failure’ happens during a study
Making note of Errors
The CMS issue template and CMS Dovetail template both contain areas to make note of errors encountered during the CMS. Errors can be considered anything significant that was off the 'happy path'. Examples may include navigating to a different area and spending time in that area trying to find what they're looking for, misinterpreting something, etc. If it's not significant and they recover quickly, then it may not be worth counting, and could simply be a mistake they made, or as a result of a testing scenario. Errors are not required for the calculation, but can be useful when justifying a failure or rating.
Before you can begin running your participants through your scenarios you'll need to write your test script. Because Category Maturity Scorecards are a standardized process, moderators should complete and follow this testing script as closely as possible. The moderator will typically be a Product Designer, but this is not strictly required. You are encouraged to have any relevant stakeholders attend the sessions to help take notes, but it is very important they remain silent.
When a participant is successful at completing a scenario, they are then asked 3 questions to help us measure their experience, which we then tie back to category maturity. Note that if a participant failed at completing a scenario, there’s no need to ask them these 3 questions.
At the root of how we rate/grade experiences, it arguably comes down to three main elements:
Question 1: Single Ease Question (SEQ)
The Single Ease Question (SEQ) is a newly introduced industry-wide question based on other UX-related questions and measures. This question essentially helps us understand if the scenario was easy or difficult to complete and provides a simple and reliable way of measuring scenario-performance satisfaction. Bonus: this question is also used for UX Scorecard testing.
Q1: “Overall, this scenario was…”
Question 2: User Experience rating
Admittedly, the term ‘user experience’ is broad; as it encompasses many components we care about (ex: efficiency, speed, usability, etc) that are completely applicable to how one rates an overall user experience. Because of that, we’re intentionally not defining ‘user experience’ and feel that given our audience, the definition will be collectively understood with a high level of accuracy. What sets this question apart: it closely aligns with the grading and scoring criteria with the UX Scorecard and CM Scorecard testing. Bonus: this question is also used for UX Scorecard testing.
Q2: “How would you rate the quality of the user experience?”
Question 3: UMUX Lite, adjusted
The UMUX Lite score is based on the UMUX (Usability Metric for User Experience), created by Finstad, and it is highly correlated with the SUS and the Net Promoter Score. It's intended to be similar to the SUS, but it's shorter and targeted toward the ISO 9241 definition of usability (effectiveness, efficiency, and satisfaction).
Q3: "You just experienced our implementation of
<Scenario>. How would you agree or disagree with the following statement:
<Scenario> has the features I need for what I need to do in my own work."
You will need to decide on how to compose your scenario name. Take into consideration the name we use for the category on the Category Maturity page. There may be instances where using the scenario name as we use it is not optimal for presenting to a user for getting feedback because it may not be clear enough to them.
When setting up a project in Respondent, make sure to use your personal Zoom room link, as you can't change the link per participant (this means each participant will have the same Zoom room link). Additionally, be sure to turn off the password requirement for these sessions.
As participants attempt to complete a scenario, for our purposes, the end result will either be: Success or Failure. To move to the next category maturity level, a minimum % pass rate is required, along with the minimum score.The chart below illustrates the relationships between: Minimum % pass rate, the UX Scorecard grades, SUS, CM Scorecard level, and the CM Scorecard score.
|Minimum % pass rate||UX Scorecard grade||Scale option||CM Scorecard score range||CM Scorecard level||SUS (for reference)|
|100%||A||Extremely good/easy, Strongly agree||3.95 - 5.00||Loveable||78.9 - 100|
|> 80%||B||Good/Easy, agree||3.63 - 3.94||Complete||72.6 - 78.8|
|> 80%||C||Neither||3.14 - 3.62||Viable||62.7 - 72.5|
|n/a||D||Difficult/Bad, disagree||2.59 - 3.13||–||51.7 - 62.6|
|n/a||F||Extremely bad/difficult, Strongly disagree||1.00 - 2.58||–||0 - 51.6|
CM Scorecard score: The CM Scorecard score can easily be calculated for each scenario:
Tip: Use this Google Sheet, which contains the calculations already built into it.
Step one: For each scenario, enter the test participants' responses across each relevant question.
Step two: The overall score of each question will be averaged to provide a scenario score.
Step three: Once all of the scenario scores are calculated you will be provided an overall score.
Step four: Finally, find the score in the chart above to determine the resulting grade and CM Scorecard level - 3.93 average = ‘B’ CM Scorecard grade = Complete
Minimum % pass rate: Minimum % pass rates help indicate what percentage of participants must succeed in a scenario to meet a minimum requirement. This also helps indicate what level of scenario failure is acceptable. Scenario failures are important to note and we can’t discount them, so they must be incorporated as part of the criteria to move category maturity levels. If the Minimum % pass rate for any scenario is less than 80% during a study, the study should stop at that most recent participant to conserve resources. In the event this should occur, the category maturity cannot be moved up a level. The team should take those learnings, iterate, and retest when they’re ready again. It’s also recommended that a retrospective take place to learn:
Score interpretation examples:
It’s important that the moderator and any stakeholders don’t leave the call when the session concludes. Instead, remove the participant and remain on the call. Use this time for the group to debrief on what they just experienced. The notetaker(s) should take notes on this discussion.
By following the Category Maturity Scorecard testing script, you will have the following measures to report, per feature, not per scenario. However, scenarios may include more than one feature.
To analyze: Use the Google Sheet to aid in calculating the CM Scorecard score, per scenario. Additionally, look for themes behind the reason why participants scored the way they did.
To document: Document and highlight areas for improvement via issues, utilizing the ‘Actionable Insight’ label, to make further improvements to the experience -or- the level moves up.
Read the UX Research team’s guide for documenting insights in Dovetail.
Several groups currently use jobs_to_be_done.yml to showcase the current maturity of each of the jobs that represent a given categories' overarching problems they are working on solving. Each entry in the YML file consists of the following keys:
||group_jtbd_1a||Unique ID of the JTBD||Yes|
||group_jtbd_1||Unique ID of the parent's JTBD||No|
||Measuring Outcomes||A short reference of the JTBD||No (if
||When….I want to…So that||The complete JTBD||No (if
||"A"||The corresponding letter of the score of A, B, C, D, F||No|
||Researched||Confidence level of the grade||No|
||https://gitlab.com/gitlab-org/ux-research/-/issues/900||URL pointing to the finished research issue||No|
||Plan||The group or stage that the corresponding JTBD belongs to||No|
In order to map the CMS score for a given JTBD to a grade letter, use the following criteria: