top of page

Acerca de

Evaluating a sexual health counselling chatbot

Context

Research Context

In the spirit of user-centred design, the idea was to evaluate a preliminary prototype of the chatbot before proceeding with further design improvements. I chose to conduct a qualitative evaluation as this was in the early stages of development – not only did we know little about how people would react to such an application, we were unclear on the direction to take with the chatbot.

Research Goals

We wanted to gain an in-depth view on how users perceived the interaction with our chatbot, and asked the following questions:

​

  • How would users characterize their experience with the chatbot?

  • How did users feel about the counselling techniques used?

  • How would users see themselves using this chatbot (a) individually and (b) with a new sexual partner?

Timeline

Define Research Goals
Create Interview Guide
Set Up Research
Recruitment
Data Collection
Analyse Interviews

Method

Method

As the project was taking place during the pandemic, I decided to conduct the research remotely. After discussing with my supervisors, we went with remote usability testing sessions, where users interacted with the chatbot for around 25 minutes, followed by one-on-one semi-structured interviews conducted via Zoom, which took around 45 minutes.

recruitment

Participants were recruited by posting the study on the university research pool and advertising the study in several university classes, and so largely consisted of first-year bachelor students studying communication science or psychology. Fortunately, this sample fell into our intended target group.  ​ The following criteria were used to include participants:  Between 18-25 years old Sexually active in the last 30 days Identifies as male, or has a preference for sexual partners that identify as male

Recruitment
procedure

Chatbot Interaction

Participants were invited to an online meeting via Zoom, which supported audio-recording without video (to preserve user privacy). â€‹â€‹After a brief introduction, the researcher helped participants access the chatbot (running on our local private server), and simply told them to interact with the chatbot. The chatbot takes charge, and guides the user through a series of short conversations about sexual health and condom use. Participants know when to return to the researcher because the chatbot informs them when the interaction is complete.

 

Conducting Interviews

​After participants complete their interaction, the researcher conducts interviews to understand their experience. I devised a semi-structured interview guide (link) for this. Based on prior research, I compiled a list of concepts I wanted to ask participants about, and adapted existing valid survey instruments for these concepts to create open-ended interview questions.​ However, during the interviews, I started with more general, open-ended questions to give users the freedom to say what came to mind. I followed these up with questions using the TEDW approach to elicit specific opinions about aspects of the chatbot. Despite having an interview guide, I often went off-topic when users mentioned interesting points.

MISH v1 screens.png
analysis
  • Transcribed each session from audio to text using an automatic transcription software (Amberscript)

  • Conducting thematic analysis using atlas.ti to code the data line-by-line with pre-determined tags as well as data-derived tags.

  • The idea was to be as specific as possible when designating codes to dialogues so that the users’ intents were not lost.

  • After coding 10 interviews, I regrouped with my team to categorize the codes into larger, meaningful themes, seeing how the data helped us in answering our questions, whether the codes we were using sufficed, and whether we had reached saturation.

  • The atlas.ti tool was extremely useful for this, as it gives an overview of how often a specific code appeared across participants.

  • Then, I continued coding the remaining interviews to assess if any new codes appeared i.e. saturation.

  • Luckily, it appeared I had reached saturation and the following interviews only strengthened the conclusions I got from the first 10.

Analysis
outputs
  • User perceptions were summarized per concept through what we term 'concept indicator models' (essentially structured mind maps) as well as venn diagrams and empathy maps

  • Key takeaways were summarized in report form, as well as a list of pain points that were categorized by scope (small, large), severity (low, med, high) and ease of implementing the fix (low, med, high)

  • Archetypes were created to define specific target groups we identified during the interviews, which will be further refined into personas to guide further chatbot design

  • The results were presented at conferences as well as to our industrial partner SOA AIDS Foundation​​​​

Outputs
Takeaways

Key Takeaways

The chatbot misunderstood the user often which led to communication breakdowns

01

what

02

Users primarily wanted "new" information on the topic - this was not the primary focus of the current design

They were unsure about the chatbot exhibiting behaviours such as counselling, empathy and emotional intelligence

03

It was unclear what the purpose of the chatbot was, and the reason it was asking certain questions

04

Users were not motivated to use this chatbot in the future for several reasons

05

why

Issues in the underlying language models

Their frame of reference was commonly customer service and information chatbots

They believed chatbots were good (or bad) at specific tasks, and the chatbot's behaviour did not align

The chatbot was designed for an uncommon purpose, and did not spend enough time explaining its relevance

Users did not find it useful (enough)

Users were not inclined to continue using the chatbot because they did not find it useful; however, they were not even sure how it was supposed to benefit them, and their expectations were informed by their prior experiences with chatbots that happened to focus on providing information (and not counselling). More importantly, it looks like what the chatbot is trying to do contradicts what users consider "acceptable" chatbot behaviour.

Next Steps

Next Steps + Recommendations

look into...
  • Increasing the user-chatbot fit by exploring how users currently navigate their sexual health, and how chatbots may contribute to this process

  • Adapting counselling approaches for delivery through digital platforms

  • Communicating chatbot's abilities and goals that may be inconsistent with users' expectations​

  • Validating the findings with other target groups that (a) are younger, (b) belong to a lower socio-economic status, and (c) belong to ethnic and sexual minorities

Reflections

Reflections
What went well​
Challenges​
  • The sessions went smoothly with minimum technical difficulties considering the entire procedure was conducted remotely

  • Participant recruitment happened quite fast

  • The interviews gave us enough data to make a list of concrete recommendations for improvement

  • I needed to develop my interview skills over time -> explicitly include interview tips in the interview guide

  • Participants in long-term relationships, who are not part of the intended user group, did not find the chatbot relevant at all -> should be more stringent about inclusion criteria even if recruitment takes a bit longer

  • The way participants were recruited likely resulted in some bias -> should think of more creative ways to get a variety of people to participate

bottom of page