The Category Maturity (CM) Scorecard is a Summative Evaluation that takes into account the entire experience as defined by a Job to be Done (JTBD), instead of individual improvement(s), which are often measured through Usability Testing (i.e. Solution Validation). This specialized process provides data to help us grade the maturity of our product.
The goal of this process is to produce data as objectively as possible given time and resource constraints. For this reason, the process is more rigorous than other UX research methods, and it focuses more on measures and less on thoughts and verbal feedback.
To produce data in which we have confidence, the data should be as free of subjective judgement as possible, relying on observed metrics and self-reported user sentiment. Our goal is to compare data over time to see how additions and improvements have impacted product maturity in a quantifiable way. To facilitate this, we've made this process prescriptive, so that it can be consistently applied by all Product Designers in all categories of our product.
Note: As with any evaluation, it's always a good idea to run a pilot first, so you can identify any improvements needed in the research approach.
Note: If you have questions, suggestions for improvement, or find this process doesn’t meet the needs of your users or product category, reach out to the UX Researcher for your group.
Refer to the Category Maturity page to understand scoring. It is important to note that:
Category Maturity Scorecards are about judging the quality of experiences against a defined and validated JTBD. JTBD are the umbrella component of our product design process which are used as guides to direct our product strategies and the features they are comprised of. Therefore, JTBD(s) for the category should be defined and validated ahead of completing a Category Maturity Scorecard.
Before you move to Step 1, you'll first need to select a couple of high priority Job Statements that are relevant for your category features and translate them to script scenarios. Ideally, no more than 2 Job Statements should be tested per Category Maturity Scorecard study. The number of scenarios used per Job Statement often depends on the complexity of the features tested.
Tip: Since Job Statements are persona and solution agnostic, you might find them to be too broad to serve as guidance for writing script scenarios. If that is the case, consider breaking the Job Statements down into User Stories as an intermediary step, in order to bridge the gap between high-level Job Statements and actionable tasks. Learn more about the difference between Job Statements and User Stories in How to Write JTBD.
To summarise, this is the workflow that should be followed in this step:
During the JTBD creation and validation phases, the Product Designer and Product Manager will have devised a set of user criteria to describe the user(s) you're referencing in your job(s). The same criteria should be used when recruiting for the Category Maturity Scorecard, ensuring you are gathering feedback from the right type of user(s).
To balance expediency with getting a variety of perspectives, we conduct the Category Maturity Scorecard research with five participants from one set of user criteria. If you have multiple user types for your JTBD, it is ideal to recruit 5 from each user type. To keep the study manageable, focus on no more than 2 user types per study. If more than 2 user types are required to accurately measure your JTBD, conduct a separate follow-up study for the remaining user types.
Example: A JTBD can be completed by a DevOps Engineer and a Release Manager. In this case, you’d recruit a total of 10 participants: 5 DevOps Engineers and 5 Release Managers.
The recruiting criteria can be based on an existing persona, but needs to be specific enough that it can be turned into a screener survey. A screener survey should then be created in Qualtrics that your prospective participants will fill out to help you determine if they're eligible to participate.
The template survey includes a question asking people if they consent to having their session recorded. Due to the analysis required for Category Maturity Scorecards, participants must answer yes to this question in order to participate. Once your screener survey is complete, open a Recruiting request issue in the UX Research project, and assign it to the relevant Research Coordinator. The Coordinator will review your screener, reach out to you if they have any issues, and begin recruiting users based on the timeline you give them.
Note: Recruiting users takes time, so be sure to open the recruiting issue at least 2-3 weeks before you want to conduct your research.
Testing in a production environment is the best choice because your goal is to evaluate the actual product, not a prototype that may have a slightly different experience.
Once you know what scenario(s) you’ll put your participants through, it’s important to determine the interface you’ll use. Some questions to ask yourself:
It’s important to thoroughly plan how a participant will complete your scenario(s), especially if you answered "yes" to any of the questions above. Involve technical counterparts early in the process if you have any uncertainty about how to enable users to go through your desired flow(s).
If you want help creating a pristine test environment be sure to reach out to the Demo Systems group on the #demo-systems Slack channel. They can create a demo environment for users and help build any particular parameters needed for your testing environment. Be aware that setting up a test environment for a research study can be time consuming and difficult.
If your JTBD interacts with other stage groups’ areas, reach out to them to ensure their part of our product will support your scenario(s).
Because this is a summative evaluation of the current experience, all of the available options the participant should need access to must be available in the GitLab instance. When you recruit participants, keep in mind the tools and features they must access to complete the JTBD scenarios.
Before you conduct your research with your participants, it's important to run through the tasks yourself just for a sanity check, and as a forcing function to document the successful path.
How the participant ended up at the end goal may not be important for the team to document. If the participant took the long path and felt it was or wasn’t easy, that should be reflected in their score. What matters most is if they ended up at the end goal.
Once you've gone through your scenario(s), have a co-worker complete the scenario as a pilot. Ideally, this person won't be familiar with the scenario, so they don't have an expert-level understanding of how it works. Use this pilot to uncover any issues with how you've formulated your scenario(s). As this is meant as a way to check your scenario/flow(s) plan, it's ok to coach your co-worker a little, using this discussion to get to the heart of any problems your scenario or flow may have. If any problems were found during this pilot run update your scenario flow(s) accordingly.
Before you can begin running your participants through your scenarios you'll need to write your test script. Because Category Maturity Scorecards are a standardized process, moderators should complete and follow this testing script as closely as possible. The moderator will typically be a Product Designer, but this is not strictly required. You are encouraged to have any relevant stakeholders attend the sessions to help take notes, but it is very important they remain silent.
When a participant is successful at completing a task, they are then asked 3 questions to help us measure their experience, which we then tie back to category maturity. Note that if a participant failed at completing a task, there’s no need to ask them the 3 questions.
At the root of how we rate/grade experiences, it arguably comes down to three main elements:
Question 1: Single Ease Question (SEQ)
The Single Ease Question (SEQ) is a newly introduced industry-wide question based on other UX-related questions and measures. This question essentially helps us understand if the task was easy or difficult to complete and provides a simple and reliable way of measuring task-performance satisfaction. Bonus: this question is also used for UX Scorecard testing.
Q1: “Overall, this task was…”
Question 2: User Experience rating
Admittedly, the term ‘user experience’ is broad; as it encompasses many components we care about (ex: efficiency, speed, usability, etc) that are completely applicable to how one rates an overall user experience. Because of that, we’re intentionally not defining ‘user experience’ and feel that given our audience, the definition will be collectively understood with a high level of accuracy. What sets this question apart: it closely aligns with the grading and scoring criteria with the UX Scorecard and CM Scorecard testing. Bonus: this question is also used for UX Scorecard testing.
Q2: “How would you rate the quality of the user experience?”
Question 3: UMUX Lite, adjusted
The UMUX Lite score is based on the UMUX (Usability Metric for User Experience), created by Finstad, and it is highly correlated with the SUS and the Net Promoter Score. It's intended to be similar to the SUS, but it's shorter and targeted toward the ISO 9241 definition of usability (effectiveness, efficiency, and satisfaction).
Q3: "You just experienced our implementation of
<Task>. How would you agree or disagree with the following statement:
<Task> has the features I need for what I need to do in my own work."
You will need to decide on how to compose your task name. Take into consideration the name we use for the category on the Category Maturity page. There may be instances where using the task name as we use it is not optimal for presenting to a user for getting feedback because it may not be clear enough to them.
When setting up a project in Respondent, make sure to use your personal Zoom room link, as you can't change the link per participant (this means each participant will have the same Zoom room link). Additionally, be sure to turn off the password requirement for these sessions.
As participants attempt to complete a task, for our purposes, the end result will either be: Success or Failure. Task failures are important to note and we can’t discount them; they must be incorporated as part of the criteria to move category maturity levels. To move to the next category maturity level, a minimum % pass rate is required, along with the minimum score. The chart below illustrates the relationships between: Minimum % pass rate, the UX Scorecard grades, SUS, CM Scorecard level, and the CM Scorecard score.
|Minimum % pass rate||UX Scorecard grade||Scale option||Scale option value||CM Scorecard score range||CM Scorecard level||SUS (for reference)|
|100%||A||Extremely good/easy, Strongly agree||5.0||5.00 - 3.95||Loveable||100 - 78.9|
|80%||B||Good/Easy, agree||4.0||3.94 - 3.63||Complete||78.8 - 72.6|
|80%||C||Neither||3.0||3.62 - 3.14||Viable||72.5 - 62.7|
|n/a||D||Difficult/Bad, disagree||2.0||3.13 - 2.59||–||62.6 - 51.7|
|n/a||F||Extremely bad/difficult, Strongly disagree||1.0||2.58 - 1.00||–||51.6 - 0|
The CM Scorecard score can easily be calculated for each task:
Tip: Use this Google Sheet, which contains the calculations already built into it.
If the Minimum % pass rate for any task is < 80% during a study, the study should stop at that most recent participant to conserve resources. In the event this should occur, the category maturity cannot be moved up a level. The team should take those learnings, iterate, and retest when they’re ready again. It’s also recommended that a retrospective take place to learn:
It’s important that the moderator and any stakeholders don’t leave the call when the session concludes. Instead, remove the participant and remain on the call. Use this time for the group to debrief on what they just experienced. The notetaker(s) should take notes on this discussion.
By following the Category Maturity Scorecard testing script, you will have the following measures to report, per feature, not per scenario. However, scenarios may include more than one feature.
To analyze: Use the Google Sheet to aid in calculating the CM Scorecard score, per task. Additionally, look for themes behind the reason why participants scored the way they did.
To document: Document and highlight areas for improvement via issues, utilizing the ‘Actionable Insight’ label, to make further improvements to the experience -or- the level moves up.
Read the UX Research team’s guide for documenting insights in Dovetail.
Several groups currently use jobs_to_be_done.yml to showcase the current maturity of each of the jobs that represent a given categories' overarching problems they are working on solving. Each entry in the YML file consists of the following keys:
||group_jtbd_1a||Unique ID of the JTBD||Yes|
||group_jtbd_1||Unique ID of the parent's JTBD||No|
||Measuring Outcomes||A short reference of the JTBD||No (if
||When….I want to…So that||The complete JTBD||No (if
||"A"||The corresponding letter of the score of A, B, C, D, F||No|
||Researched||Confidence level of the grade||No|
||https://gitlab.com/gitlab-org/ux-research/-/issues/900||URL pointing to the finished research issue||No|
||Plan||The group or stage that the corresponding JTBD belongs to||No|
In order to map the CMS score for a given JTBD to a grade letter, use the following criteria: