Select Page

13 Cheenu Interview-Conversational AI for enterprise

Episode 13

In this episode I talk with Cheenu, VP of UXPA Austin about his experience doing UXR research for conversational UIs and the different UX challenges between consumer and enterprise AI.

Music: The Pirate And The Dancer by Rolemusic


Show note links:

12-How to remove the creepy from AI

Episode 12

In this episode we look at what makes people feel creeped out and how that translates over into AI software. We cover good design, interaction, lesson from horror houses, and the uncanny valley.

Music: The Pirate And The Dancer by Rolemusic


Has this ever happened to you? You have created an enterprise app for and the main selling point is the AI integration. You’ve worked hard to make the software personalized based on the data you have, but then the deal starts to fall apart. They tried it out with their employees and they are getting nervous using it. Apparently they are calling the Avatar creepy and there is just something they can’t put their finger on about how the software makes you feel, but they don’t like it. Let’s make sure this doesn’t happen.
This podcast is called design for AI
It is here to help define the space where Machine learning intersects with UX. Where we talk to experts and discuss topics around designing a better AI.
music is by Rolemusic
Im your host Mark Bailey
Lets get started
Today we will be diving into what makes AI creepy for people and how to design it out of your product. Creepy is defiantly different than the fear. We have covered fear of AI in previous episodes and if your users are getting creeped out it is different. While it sounds similar, the way to tell the difference is if there is a fear of something you know to run away or not use it.
But if something is creepy… it might be dangerous but you’re not sure it is… there’s an ambivalence. Basically, someone is telling the users brain that the software is outside the accepted social norms. If it was a person it would be standing too close, or staring, say – we become suspicious of their intentions. Someone can be completely familiar with machine learning, use it every day and still get creeped out by badly designed AI.
For this episode We will be looking into current research into the psychology, feedback from previous software and even what causes horror houses to be creepy.
The pinnacle psychology paper on creepiness is called “On the nature of creepiness” (
It says there are 4 driving factors of creepiness
  • They make us fearful or anxious
  • Discern if there is in fact something to fear or not from the person in question.
  • Creepiness is seen as part of the personality of the individual rather than just their behavior
  • We think they may have a sexual interest in us.
Now before writing off the last item as something that doesn’t cross over from psychology to software design: know that in surveys, men are seen as creepy more than women, so using a female voice for say Siri, Cortana, Alexa, and just about every AI voice system out there can lower the creepiness level for women using your product.
For all the other driving factors, it makes sense that AI, can put people on edge. This is something that is new, people are not familiar with. The machine learning model is trying to do something independently of the user and they are not sure if they want to trust it or not.
To cover all the things that are reported to add to a creepy feeling, I’m going to break it up into two different groups: Good design, and good interaction.

Good design

Visual design

When people are creeped out by a person, contributing features to that person are that they could have greasy hair, a peculiar smile, bulging eyes, long fingers,  unkempt hair, very pale skin, bags under their eyes, dressed oddly, wearing dirty clothes.
In computer terms if ever there was evidence that you need a well designed homepage, landing page, or first impression for your product this is it. Research shows ( that your brain will make judgements about trustworthiness within 39 milliseconds of seeing a face. As soon as something captures our attention as being abnormal, we start to deconstruct the face, then from there, then we deconstruct the person.The same is true for using your AI software.
If your software is well designed then you have not blown your initial impression. This is important because of the “halo effect,”. Basically Attractive people were deemed to be trustworthy, whether they were Nobel Laureates or criminals.
So what do I mean by good design? Well I’ll give the counter example. If the app was created and just throws all the data up on the screen without a thought to how it would look, what are the chances that when you use it, it might raise your suspicions that they could have forgotten something in their haste and your data could be vulnerable.
Basically details are important. Pay attention to them. The more details you can make sure are right for the user for the initial impression the better.

Sound usage

The next area for good design is language use. People were looking for signs of kindness or aggressiveness in the faces of those they were evaluating. The same is true of software. Aggressive or abrupt language will put people on edge. So even more so than normal software, AI already has a reputation from movies so it needs to explain what is happening without being abrupt or using euphemisms as much as possible.
Another thing that creeps people out is a person who stands too close to your friend, or uses overly friendly language. The perfect example is the creepy salesperson who will instantly act like your friend. If you product includes any sales make sure the language in the product is toned down, more professional. This goes against the current trend of having the language be overtly friendly so this could be one of the causes of where AI creepiness comes from.
In the same category as language, pay attention to the nonverbal cues. Don’t mimic non-word voice nonverbal cues, such as hand movements or body language. While they are fundamental to smooth human interaction; this is starting to get into the uncanny valley which we will cover soon but is so easy to get wrong. Even if you get it right, it makes people suspicious and literally get the chills (
A good example of this was duplex, a speech system from Google that was demoed at 2018 IO conference( It convinced unknowing people it was human because of the amazing accuracy of the speech engine including pauses, sighs, and uhms. But, in all the reviews of the system, people said it was creepy because the reasons people were saying it was creepy was because it sighed and said hmm. I’ve mentioned this in previously, the technology can get ahead of technological understanding, and when it does, things look like non-understandable magic. And people are afraid of things they don’t understand.
Another non-verbal cue that creeps people out is the person laughing at unpredictable times or displaying inappropriate emotion. I have covered working in humor into your AI personality, and this is a perfect example. If it is done wrong it comes across as creepy. The inclusion of emotion in machine learning is fraught with obstacles so tread slowly and verify changes. Basically user test after each addition and regression test for compatibility of each task to make sure the emotional tone stays consistent through the entire user journey.
The last thing that crosses over from the research into design is that the most frequently mentioned creepy hobbies involved collecting things, and most likely your product is collecting a lot of data about them. So, my recommendation, is don’t talk about all the info you are collecting. Now, I am not saying to lie, or hide the fact that you are collecting data, because that would be worse. There just doesn’t need to be a feature that your app beeps every time it learns another factoid about the user. I’ve seen this from very technical products. They want to sell the technology, but this is not the way to do it.

Good interaction

Next we will be covering good interaction. There are things that make people creepy when someone interacts with them.
The biggest creepy warning that was reported that comes from interaction with a person is when they make it nearly impossible to leave the conversation without appearing rude; or that the person relentlessly steered the conversation towards one topic.
So make sure your product has a way to talk to a human. Too many people see AI as a way to totally remove call centers from your product. You will still need user support even with a perfect machine learning model. In this same vein, don’t have conversational as only interaction interface. It doesn’t work in office or in the public, so forcing users to interact in that way will put them on edge.
This also means that you don’t want your app to keep circling around to the same script until the user agrees. This point affects sales apps the most, If your product is only for sales then make sure to say goodbye when they drop out of the sales funnel. If your product has other functionality then return to where the user was previously in the interface so they don’t stay in a sales funnel.
The next most creepy thing is asking for details of about personal life. Too many questions are a problem. Too personal of questions are a problem. Just think about the flashlight app that on your phone that is requesting access to your address book.
You will need to review the questions you are asking the user. Is it something you already know from somewhere else? Is it something that could get from inference from putting the answer from two other questions you asked?
And the last interaction faux pax is talking about personal life too much. This affects startups a lot. You are proud of your product and company. But don’t force your story on your users. If they ask, or look for it that is fine but keep personal details to its own area.

Interaction lessons from horror houses

Next let’s take some tips from horror houses. They increase creepiness through introducing stillness before large movements. To avoid this for your product pace the level of interaction. Try to keep questions and actions balanced through the entire the user journey. If you are using a scripted conversational UI, each block of text should be as similar of size as possible to keep the pacing the same.
The next tip from horror houses is that they use sound for distraction to cover motion. Basically if you hear a scream from one direction then you won’t notice movement from another direction so after the sound is over, suddenly there is a zombie next to you.
The lesson to avoid this problem is to make sure sound and motion line up so the user knows what movements and sounds are associated with which actions. Create a hierarchy of user actions so if there are multiple actions happening at the same time you can prioritize sound and movements so the user is not overloaded.
Another lesson from horror houses are to use sudden unexpected changes, movements to increase creepiness. Basically something that you are not expecting to move will move. For an example think of a lot of mannequins scattered around a room so as you enter the room the ones close to you are obviously mannequins but one farther back is a person disguised as a mannequin that can jump up as soon as your back is turned.
The lesson to avoid this is to make sure to use a consistent design language. For conversational UI’s this means consistent trigger words that follow whatever platform you are using, or creating your own if you are creating your own platform. For consistent visual design, a design system will help. Using consistent colors, language, shapes, and flow will all lower the users suspicion levels.

Uncanny valley

The uncanny valley refers to the idea that human react favorably to humanoid figures and interaction until a breaking point where they become too human. At that point, the small differences between the human and the inhuman – maybe an awkward gait, an inability to use appropriate eye contact or speech patterns – become more noticeable because everything else is right to the point of causing discomfort and creepiness. The idea originated with Japanese roboticist Masahiro Mori’s 1970 essay anticipating the challenges robot-makers would face.
This is basically the reason for for cartoon avatars, cartoon looking robots, and why video game errors seem so creepy. The most recent example of this is what I have talked about earlier, with Google’s duplex. It was close, but not 100% human voice so it was squarely in the the uncanny valley.
Of course the way to avoid this is to not go into the valley. Like most others products you can use cartoony levels of detail, animals instead of people, or not use avatars at all. It is also important to let the user know the limitations. If the user is expecting perfection the uncanny valley is wider then if they know where the limits are.


Putting it together

To put this all together, think of it like this: creepiness is your brains way of detecting danger. Anything out of the ordinary, or unexpected your brain is going to warn you about. We evolved to err on the side of detecting threats in such ambiguous situations. So use human centered design to avoid problems.
Starting with the need. A lot of creepiness comes from trying to solve your need instead of the customers need, so make sure that you are trying to solve the customers problem. Just using the machine learning model to sell them stuff or collecting all their data is going to trigger warning bells in the user’s head.
Next map out the journey for how the user expects to solve their problem. Test the user journey to make sure. This is to keep down the unexpected turns in the journey.
It is just good design to keep the user updated on where they are in the journey, where else they can go, and what they can do at the point in the journey. In normal software if you don’t do this with normal software it will create confusing software. If the user knows there is machine learning involved with the software they will think the the AI is in the drivers seat. The lack of control leads to the creepy feeling.
To keep the user in control let them decide on the info you collect. Ask them if they want better interaction through data collection. Ask if they want personalized ads. If your product requires data to be collected to function correctly it is better to tell the customer that the product can’t continue than to try to collect data without warning the customer.

User testing

We have covered a lot of things to check for to remove creepiness from your product. How do you know it works? Of course do user testing. Besides the types of user testing I’ve mentioned already there is a very quantitative way to measure it- people feel cold when creeped out. ( So if nothing else work just get a baseline of what your users think the temperature is. Then measure the new version for improvements.
and we are going to need to end on that note,
Unfortunately, that’s all the time we have for this episode, but I would love to hear back from you on how you were able to avoid creepiness in your products.
Use your phone to record a voice memo,
then email it to
The question I want to know the answer to is what ways you have heard people say AI or machine learning was creepy?
That is also an awesome way to let me know what you like and would like to hear more of, or If you have questions or comments record a message for those too.
If you would like to see what I am up to, you can find me on Twitter at @DesignForAI
Thank you again
and remember, with how powerful AI is,
lets design it to be usable for everyone
Thank you

11-Creating AI principles for your company and putting them into practice

Episode 11

In this episode we look at AI principles different companies have implemented, and which ones are the most popular. Then we cover how to implement them for your company so they are followed.

Music: The Pirate And The Dancer by Rolemusic


Have you ever had this happen to you? You want to create the best AI product ever, and you start talking about it with co-workers. You try to get everyone onboard, but everyone has their own definition of what “best” means; and  some of their ideas are worrying you on how your customers will react to them. If you can’t even agree on principles, how are you supposed to implement them? Let’s find out.
This podcast is called design for AI
It is here to help define the space where Machine learning intersects with UX. Where we talk to experts and discuss topics around designing a better AI.
music is by Rolemusic
Im your host Mark Bailey
Lets get started
In my last episode we talked different fears people have with AI. Obviously this is a problem even if you have the best intentions. So what do you point to when your customers come asking? How do you make sure your products don’t destroy your company?
They can be called principles, guidelines, company charter, values. There just needs to be a way to come up with them and also a way to make sure the company follows them. From a UX standpoint coming up with these AI principles help to drive the goals you create, and the metrics to measure them by. Most of the companies AI principles we are going to talk about in this episode are pretty lofty and vague. Vague in this case is actually OK. Everything that has to do with machine learning is still moving so fast. It would be almost impossible to come up with hard rules that could be followed that would not be obsolete 6 months from now.
On the other hand how do you keep from being too vague, or basically just marketing terms that look good but don’t mean anything. The short answer is to implement them. If you can implement the AI principles then they are defined enough to follow.
First we will cover creating them. Now creating AI principles is not as difficult as it sounds. It is not as bad as creating the brand like we talked about in a previous episode. It also helps if you have have already created the brand because your company brand will influence which AI principles you adopt.
There was a paper ( that compared different associations and companies principles. There is a good chance that you will use similar key issues as other companies so first we’ll cover the list from most used by companies to least used, then we will cover what the different big companies state specifically.
  • privacy protection
  • accountability
  • fairness, non-discrimination, justice
  • transparency, openness
  • safety, cybersecurity
  • common good, sustainability, well-being
  • human oversight, control, auditing
  • explainability, interpretabiliy
  • solidarity, inclusion, social cohesion
  • science-policy link
  • legislative framework, legal status of AI systems
  • responsible/intensified research funding
  • public awareness, education about AI and its risks
  • future of employment
  • dual-use problem, military, AI arms race
  • field-specific deliberations (health, military, mobility etc.)
  • human autonomy
  • diversity in the field of AI
  • certification for AI products
  • cultural differences in the ethically aligned design of AI systems
  • protection of whistleblowers
  • hidden costs (labeling, clickwork, contend moderation, energy, resources)
The big companies that have AI principles of some sort we are going to cover are:
  • Open AI
  • Google
  • Microsoft
  • Deepmind
  • Facebook
The companies people might expect but are missing are Apple, and Amazon. They are part of the “association Partnership on AI”. I think abstracting principles out to being part of an association makes following the principles that much harder to integrate. So since this episode is on how to integrate principles into practice I will not be covering any of the association’s individual principles.
Open AI Charter-
I decided to start with them since they seem like the most altruistic.
  • Broadly Distributed Benefits – Basically the output of what they want to produce from the AI created has to benefit the general public and not just a few company owners. This worry comes from as AI putting more and more people out of jobs it will benefit fewer and fewer people.
  • Long-Term Safety – The thinking they have behind this is how to keep AI from hurting anyone or seeing people as the problem. When your company’s goal is to create an AI smarter than humans, this is an important goal to have.
  • Technical Leadership – Not a surprise, they want to lead AI development.
  • Cooperative Orientation – This one just means they are willing to work with other companies.
Google AI principles –
  • Be socially beneficial- For Google this means business areas including healthcare, security, energy, transportation, manufacturing, and entertainment
  • Avoid creating or reinforcing unfair bias – Problems like bad data makes for biased AI. They are watching out for race, ethnicity, gender, nationality, income, sexual orientation, ability, and political or religious belief.
  • Be built and tested for safety – This means to try to design out unintended consequences.
  • Be accountable to people – Allow for products to get feedback, relevant explanations, and appeal from users.
  • Incorporate privacy design principles – To provide appropriate transparency and control over the use of data.
  • Uphold high standards of scientific excellence – One of the problems with machine learning is it is near impossible to reproduce results.
  • Be made available for uses that accord with these principles – This means to keep the applications true to what it was intended to do.
  • AI applications we will not pursue. This was interesting in that this was the only company that set out areas that was off limits.
    • Technologies that cause or are likely to cause overall harm – The benefits  need to outlay the risks.
    • No weapons or other technology to injure people.
    • Technologies that gather or use information for surveillance.
    • Technologies to get around international law and human rights.
  • Fairness – AI systems should treat all people fairly
  • Inclusiveness – AI systems should empower everyone and engage people
  • Reliability & Safety – AI systems should perform reliably and safely
  • Transparency – AI systems should be understandable
  • Privacy & Security – AI systems should be secure and respect privacy
  • Accountability – AI systems should have algorithmic accountability
  • Social purpose – socially beneficial purposes and always remain under meaningful human control.
  • Privacy, transparency, and fairness – protecting people’s privacy and ensuring that they understand how their data is used
  • AI morality and values – different values, make it difficult to agree on universal principles. Likewise, endorsing values held by a majority could lead to discrimination against minorities.
  • Governance and accountability – new standards or institutions may be needed to oversee its use by individuals, states, and the private sector.
  • AI and the world’s complex challenges – They want to make sure they can uncover patterns in complex datasets that haven’t found before.
  • Misuse and unintended consequences – Again to make sure products are not repurposed in unethical or harmful ways.
  • Economic impact: inclusion and equality – They are worried about widespread displacement of jobs and alter economies in ways that disproportionately affect some sections of the population.
Facebook AI values –
  • Openness – AI should be published and open-sourced for the community to learn about and build upon.
  • Collaboration – share knowledge with both internal and external partners and cultivate diverse perspectives and needs.
  • Excellence – focus on the projects that we believe will have the most positive impact on people and society.
  • Scale – Products must account for both large scale data and computation needs.
Next we are going to cover implementing AI principles once you have decided which ones you will use. The simple truth is that there are a lot of people out there that think all you need is to come up with the principles. Surely, you can sell your startup before anyone realizes it is all marketing. No one will find out, right?
The problem is, making AI principles public means people dig into them. If your principles, don’t match your actions someone will know. All it takes is one person to leak, or you lock down so hard the Streisand effect happens. Your company will sink and sink fast if people don’t trust your AI. The distrust of AI is already high so it only takes one little doubt to spook your customers. So how do you implement principles?
Well, it comes down more to how to make sure they are being followed. According to a recent paper (, asking if the guidelines have an actual impact on human decision-making in the field of AI and machine learning? The short answer is: No, most often not, and the trust level of companies show it.
Then how do you get a company to follow principles? Every company is different. We are talking directly to how company politics affects what gets done. How do you implement other directives that have an effect on your bottom line?
Since every company is different I can only speak to what has worked for me. The first thing is to make sure they are hard baked into the model metrics. I guarantee the developers are focusing on accuracy for their models. What ever principles you or or company come up with, when you are sitting at the weekly meetings on what models are getting developed make sure that besides accuracy, how well the model follows the principles is one of the metrics you use to decide on which model to use.
The final and easiest way is to tie the principles to money. This requires buy-in from higher ups, but if you can get the principles tied to how people get raises then people will go out of their way to find ways to tie what they are doing to the principles.
and on that note, what principles work for your company?
Unfortunately, that’s all the time we have for this episode, but I would love to hear back from you on how you were able to create AI principles for your products.
, use your phone to record a voice memo,
then email it to
That is also an awesome way to let me know what you like and would like to hear more of, or If you have questions or comments record a message for those too.
If you would like to see what I am up to, you can find me on Twitter at @DesignForAI
Thank you again
and remember, with how powerful AI is,
lets design it to be usable for everyone

10-How to use people’s fear of AI to make your product better

Episode 10

In this episode we have broken down the different types of fear people exhibit during user testing, both irrational and legitimate, people usually have about AI. We cover what causes the fear, and what to change in the designs to take care of the problems.

Music: The Pirate And The Dancer by Rolemusic


Have you ever had this happen to you? You created a new AI product, you have made sure everything flows well, it solves the users needs, and the model accuracy is spot on. The only thing standing between you and being swarming by venture capitalists is they want to see one last test of the product with users.
When you run the test, everyone keeps comparing your app to killer robots from movies. They talk about Elon Musk, Stephan hawking, and Bill Gates warning them about evil AIs taking over, and your product is the first step in that direction… somehow? 
Let’s make sure that doesn’t happen. In todays episode we will be covering how to separate the noise of the fear of AI in user testing .
This podcast is called design for AI
It is here to help define the space where Machine learning intersects with UX. Where we talk to experts and discuss topics around designing a better AI.
music is by Rolemusic
Im your host Mark Bailey
Lets get started
When doing any user research for AI, you should ask the people you are talking to describe what they think what AI is vs what they think the app is. Ask them to define AI, machine learning, and machine intelligence. Then ask them to define the AI in their phone (for example Siri, or OK Google). Most people will define AI as “what hasn’t been created yet”, and AI models that already exist as just technology. These descriptions will help to level set what is just fear vs what fears are awoken by your product.
Now, when I do talk about these fears in this episode, and when you do ask the questions don’t discount them. One of the problems Developers and researchers can get into is being so deep into the product that it can seem silly people are worried about things like killer robots. But the way to a better product is to recognize concerns your customers have instead of discounting fears and writing them off.
First let’s cover the best case scenario. The reality is there are just too many movies out there where AI is out to get you. No one thinks AI is AI when it is working well like in movies like Star Trek, because, well, it just works. Even in cases where the AI is buggy like in C3PO and R2D2 in Star Wars people don’t think of that as AI.
Any time anyone reads about AI in the news it is always paired with a picture of Terminator. So if you are doing user testing and someone brings up a scary AI movie that is normal; it is part of the American culture. From my experience if they don’t bring up Terminator, then they are not familiar with AI at all. I use it as litmus test to gauge the person’s knowledge level of machine learning.
That is not to say that large companies don’t try to avoid that association. If you look at Google’s ads they refer to everything as machine learning to avoid the association. Apple calls the chips in their iPhones neural engines, and Amazon uses the term “Smart” instead of machine learning for everything. For example, smart speaker, smart displays, even smart home. So it seems all of the big companies have completely avoided the use of AI word associations as much as possible, and depending on your product and user’s skill level that might help your design strategy too.

Irrational Fears

So what if you do that and people still seem apprehensive about using your app? Well I have broken down the different types of fear people exhibit during user testing, both irrational and legitimate, people usually have about AI. We cover what causes the fear, and what to change in the designs to take care of the problems.
We will cover the irrational fears first. The broad categories are:
  • Fear of the unknown
  • Mass unemployment
  • Bad actors
  • Uncaring super intelligence

Fear of the unknown

Let’s start with Fear of the unknown or fear of change. This fear has always existed when there is a large shift in society. So right now it is a fear of AI because that is what is on the news. Before that there was a general anxiety about new tech in general. Back in the 60’s it would have been a fear about nuclear power. Before that you can find old articles about people’s fear of mass media when newspapers became popular. This fear goes can be traced all the way back to the industrial revolution. In other words, it is normal.
When the person you are interviewing has a fear of the unknown, it is kind of annoying because it is so vague, they will have a hard time conveying why they don’t trust your app, but they are sure there is some reason to not trust it. If are running into this a lot for your targeted users then most likely the app is doing a bad job of telegraphing intentions.
Machine learning is used by your product to take shortcuts. These shortcuts take away the tedious steps from the user. But, you still need to tell the user what you are doing, and the steps you took to give them the answer you are giving them. Without this there is just a black box doing stuff that they don’t know.
A good example of this is When they search for airline tickets, there is a whole lot going on in the background. While they could just throw up a progress bar, instead they show some of the steps they are doing to filter down flights based on search parameters. So think of what your AI model is doing for the user and how you can write those steps out to make them obvious.

Mass unemployment

The next fear is Mass unemployment
Pretty much everyone you talk to will probably bring this up as a fear. The older the person the more likely they will think it will affect someone else, not them. The younger someone is the more likely they are going to plan that it will affect them. If this is a strong concern or if you are getting out of age worries then ask them about their view on how e-commerce or mobile disruption changed jobs available. They should match their fear of AI for the same level of job disruption.
If they don’t match, there is a good chance you will need to look at the user journeys. Your product is probably doing something for the user that they want to do. Find out which parts of your tasks that people think are important and which are tedious. Only do the tedious parts for the user. If you are doing some of the important tasks design in ways for the user to see everything that is happening and can take over at any time.
A good example of this is mail chimp. They do a lot of things automatically things for you. But, at any time you can jump in and take over configuration and the most important step of sending the bulk email is left up to the user and must be confirmed.

Bad actors

Next lets talk about bad actors
This covers all the people that would use AI for nefarious purposes. Now if ever there was a problem that there was a huge need for design, this is it. Currently, there is a real problem with state-sponsored AI models trying to spread misinformation through deep fakes, or impersonations. But I am guessing if you are listening to this podcast you are not trying to create this type of app since it definitely breaks good design rules.
But, with Russia admitting to influencing elections world-wide this is defiantly something that is in the news more and more so there is a good chance for it to come up during a user interview since it will be on the users mind.
Before doing user testing, you should have as part of the experience, a way to recognize people or other systems trying to game your AI model. You don’t need to go into specifics but it would be good to let the user know how you are protecting them from the bad actors.
Fears of cyber warfare and viruses can make users reluctant to give up their information to you. So if your model requires collecting a lot of personal information then make sure to show how is your app protecting user data? If you can use things like federated learning then you can reassure the user that no matter what happens, since you don’t know their data, neither will anyone else.

AI super intelligence

The last irrational fear is the uncaring AI super intelligence
This fear is based around the idea that AI is going to take over the world, and when it does machine learning models will be able to achieve more human than human. The idea here is that AI will be able to adapt faster than we can. Which will at some point cause the AI to see people as a threat.
A way this fear can be expressed during user testing is by complaining about lack of control, or the expectation of betrayal by computers. Basically the AI will be your friend until it isn’t. This is again people that want to have control over the system. Rather than giving them control over everything since they don’t trust AI to make decisions for them, in this case it is better to strive for better transparency.
Obviously the user need to feel like they have control over the system. So review what the user wants to do vs what is seen as tedious for the user journey. Another thing to make sure of is don’t try to be too human. The closer your model tries to get to the uncanny valley the more likely users will become suspicious.
For transparency, once you have the user journey mapped out, write them up and integrate them into the app so that as the user goes through using the app they know what you are doing for them and what to expect in the future steps of the journey.

Valid Fears

Now that we have covered all of the irrational fears, there are some valid fears people can have too. These fears might come up during user testing. From a design standpoint these are extra things you need to worry about if you are using AI in your product. They are:
  • Need for data safeguards
  • Need for data protection
  • Avoiding dark patterns
  • and loss of skills

Data safeguards

Let’s start with the need to design in data safeguards. I’ll link an example in the show notes.
In this example a couple’s Amazon Alexa crashed and started sending audio recordings of theirs to their contact list.
Now obviously you can’t know where all the bugs creep in. This is especially true with a machine learning models since they are non-determinant. (non-determinant means you can’t predict what they will do.) But, you can know normal behavior. And machine learning models are great at detecting anomalies. If you had a model that was watching how Alexa was working, sending audio clips to all the contacts would defiantly stick out as an anomaly and could be shutdown before the user noticed, or ask for confirmation before continuing.
So start with your user journey, map out the expected behavior for your model. If you don’t know the type of actions your model will give you should still be able to classify the types of answers. When you are running your beta tests you can find the norms for what types of actions your model will be expected to give. A monitoring model can make sure that the model that is interacting with your users is acting correctly.

Data hacking

The next real fear is losing data from getting hacked. Either the customer gets hacked or the company can. Either way this is a real problem that does happen more frequently than anyone wants to admit and the consequences are only getting more severe the more data that gets collected by computers.
This can be broken down into three different areas to verify: company servers, customer’s computer, and the communication between them. The first area is keeping things protected on the company servers. If you can’t do this, the company won’t be in business that long. The good news is that I’ve covered this previously. Federated learning is an easy way for a company to protect themselves. If they don’t have information stored on the servers then it can’t get hacked. Also look how people interact with the model. If someone sends bad info can they improve their benefit and detriment others? You need to make sure part of the design doesn’t allow for gaming the system to improve their situation.
Verifying the users system is protected is a little harder. Requiring strong authentication, and encrypting all the data locally should be a default. There is a good chance you will need to make sure model isn’t compromised locally. Verification of file integrity will help you know that the local model is running with the right information.
Communication between the model and servers also needs authentication, encryption, and verifying data integrity for updates. It is impossible to cover this in one podcast since data security is its own subfield of computers, so I am only trying to raise awareness. Simple causes of data breaches like insecure servers or no authentication happen every year for big companies who should know better; so obviously awareness is not high enough.

Addictive AI

The next fact based fear is technology that is purposely addictive. People feel like they are losing their human connection with other people and becoming more and more dependent on AI. Even companies with good intentions that pick the wrong metrics to pivot on can cause this problem.
Maybe one of the biggest example of mis-directed metrics may have happened with many social media companies. They start with the stated goal of helping people to connect with each other. But you can’t help people connect if your company goes out of business; so to maximize profit they create an AI model to help show the information people want so they will also see ads. The AI model metric is set to maximize people’s time on site. The AI model gets so good because it finds that conflict means longer view times. People stop connecting with each other and just consume more information because everyone gets split up into splinter groups who just want to yell at everyone because they’ve learned it gets their message out farther from the created conflict. Basically the end result is the exact opposite of the stated goal because of one metric.
So be extra careful with the metrics you implement. The law of unintended consequences can be harsh with AI models. Most developers will optimize their models for some form of accuracy or precision. To lower the chances of unintended consequences, add metrics for customer happiness, fairness, model regression testing,  and faster iteration times.
I dug into this more in the last episode so look there for more details, but for now, know that to have long term customers at a low cost of acquisition, the easiest way is to design the metrics to think of the customer first.

Loss of skill

The last real fear that we will talk about is the loss of skill to technology. As people become more dependent on AI models that complete simple tasks they will forget how to do them for themselves. People will become more dependent on AI models. I agree this will happen. This is a real fear. But I don’t see it as the problem that people fear it do.
As cell phones became ubiquitous studies have shown people no longer memorize as many phone numbers. Calculators make it so people don’t learn as many formulas. Neither of these have anything to do with machine learning models but the outcome is the same. People adjust to the tools they have at their disposal. I think the same will happen with wider adoption of AI.
Because of this, how to allay this fear then becomes closer to the fear of the unknown discussed earlier. I can only give the advice to let the user see exactly what the AI model is doing. Knowing the steps that the model is completing for you is a good transitional interface until everyone sees machine learning models as just another tool.
and on that note, what fears have you encountered with your users?
That’s all we have for this episode, but I would love to hear back from you on how you were able to work around people’s fear of AI for your products.
, use your phone to record a voice memo,
then email it to
That is also an awesome way to let me know what you like and would like to hear more of, or If you have questions or comments record a message for those too.
If you would like to see what I am up to, you can find me on Twitter at @DesignForAI
Thank you again
and remember, with how powerful AI is,
lets design it to be usable for everyone

9-Metrics to care about for a better UX with machine learning

Episode 9

In this episode we cover different metrics that are important for Developers, PMs, and UX when building a model and where they can go wrong.

Music: The Pirate And The Dancer by Rolemusic


Have you ever had this happen to you? You created a new social media company. All you wanted to do was use advertising to pay the bills. But you optimized a few AI models for giving users what they want so they stay use your product longer. They get things they like, and you get more ad revenue from them being on the site longer. Win-win right?
Then the AI model finds out people stay online longer if they are emotionally engaged and nothing engages more than controversy. Everyone hates each other now because all the AI model is serving is the most controversial material, this isn’t how things were supposed to work… right? 
Let’s make sure that doesn’t happen. In todays episode we will be covering how to use the right metrics for your product to create a good experience for the user.
This podcast is called design for AI
It is here to help define the space where Machine learning intersects with UX. Where we talk to experts and discuss topics around designing a better AI.
music is by Rolemusic
Im your host Mark Bailey
Lets get started
I know this is probably going to be the most technical episode I’ve done so far. So I am going to apologize in advance but I do think it is important. One of the most important areas that it is overlooked for good design is UX input into the creation of the model. But UX can’t give useful feedback unless they know what is going on, and that means some technical learning. I’ll make sure to cover all the meaning of the terms, and the way that I’ve learned a lot of machine learning is to go over the same info twice so the second time around I know the terms and it sinks in better. So if this episode doesn’t make sense maybe give it a listen again. If it still doesn’t make sense then let me know what your questions are.
I’ve split it up into 3 groups of metrics. Metrics developers, PMs, and UX should care about. Let’s start with developer metrics.

Developer Metrics

For developer metrics I split it up by the main model types that use differing metrics: classification, regression, and ranking models.

Performance Metrics for Regression Problems

Classification Accuracy

Accuracy = number of correct predictions / total number of predictions. It is the basic metric that is used for almost all models. As the UX person, hopefully the developer you are working with isn’t using this as the only metric since it can give you a false sense of security. For example if you are looking for an anomaly that only happens 2% of the time, a model that never predicts the anomaly will have an accuracy rate of 98%.
Confusion Matrix
A better way to understand accuracy is with a Confusion Matrix. Think of this as four boxes:
  • Top left is True Positives : The cases in which we predicted YES and the actual output was also YES. This is correctly detecting some thing that was there.
  • Top right is True Negatives : The cases in which we predicted NO and the actual output was NO. This is correctly detecting that something was not there.
  • Bottom left is False Positives : The cases in which we predicted YES and the actual output was NO. This is also called a type I error. It is like a doctor telling a man he is pregnant.
  • Bottom right is False Negatives : The cases in which we predicted NO and the actual output was YES. This is also called a type II error. It is like telling a woman in the middle of having a baby that she isn’t pregnant.
Sensitivity matters more when classifying the positive detection correctly is more important than classifying the negative cases. A example of this is detecting cancer. You don’t want to miss out any malignant to be classified as ‘benign’. So it is better to tell a few people they have cancer that don’t, then to let people who do have cancer to slip through the cracks. The consequences of the mistakes are that people have a few bad days and need to get retested. That is seen as a better result than people dying because they were told they were fine when they had cancer.
Sensitivity measures the proportion of actual positives that are correctly identified as such. If you are want a measure a model for how sensitive the model is then you will use the True Positive Rate. This is defined as TP/ (FN+TP). True Positive Rate corresponds to the proportion of positive data points that are correctly considered as positive, with respect to all positive data points.
Specificity matters more when classifying the negative cases correctly is more important than classifying the positives.Maximizing specificity is more relevant in cases like spam detection, where you strictly don’t want genuine messages (the negative case) to end up in spam box (the positive cases). It is better for someone to read a few spam messages than to miss important messages.
Specificity is the proportion of actual negatives that are correctly identified as such. For example, the percentage of healthy people who are correctly identified as not having the condition. If you are looking for measuring Specificity then you will use a False Positive Rate. This is defined as FP / (FP+TN). False Positive Rate corresponds to the proportion of negative data points that are mistakenly considered as positive, with respect to all negative data points.
An improvement to the simplified measurement of accuracy is decreasing the logarithmic loss. It works by penalizing the false classifications. It starts by finding the probability for each group in your data. So if you have a say, minimizing Log Loss gives greater accuracy for the model used for classification.

Receiver operating characteristic (ROC)

A Receiver operating characteristic (ROC) curve is a curve that develops from the true positive rate vs. false positive rate at different classification thresholds. Basically it is mapping out the the line between the model getting it right and wrong. ROC curves are probably the most commonly used measure for evaluating the predictive performance of scoring classifiers. This metric is best used for evaluating accuracy for Classification Models, Regression Models, or Clustering Models.
The thing you need to think about when this is used as a metric is the way you tweak the ROC curve is by lowering the classification threshold. This classifies more items as positive, thus increasing both False Positives and True Positives. So, it is a way to move back and forth as needed depending on which is more important.
Area under Curve (AUC)
The AUC is a hard thing to wrap your mind around. The technical definition for AUC is the probability that a classifier will be more confident that a randomly chosen positive example is actually positive than that a randomly chosen negative example is positive.
An easier way to look at it is, if you map out an ROC curve on a graph, the area under the curve is the an accurate prediction. If you can map it out perfectly then the AUC number is 1. As mapping the curve gets worse the AUC number becomes a fraction. AUC provides an aggregate measure of performance across all possible classification thresholds so it is good for getting the big picture type information on the model. Because it is big picture info things like the scale of granularity, and classification threshold don’t matter.
One problem using the AUC as a metric is that the scale invariance is not always desirable. For example, sometimes we really do need well calibrated probability outputs, and AUC won’t tell us about that.
Classification-threshold invariance is not always desirable. In cases where there are wide disparities in the cost of false negatives vs. false positives, it may be critical to minimize one type of classification error. For example, when doing email spam detection, you likely want to prioritize minimizing false positives (even if that results in a significant increase of false negatives). AUC isn’t a useful metric for this type of optimization.

Improve F1 Score

OK to describe an F1 score we need to cover some new vocabulary. We have already covered Sensitivity and Specificity. Now we will add precision and recall.  For refresher, Specificity is given a negative example, what is is the probability of a negative test result.The easiest one is that Sensitivity = recall. I don’t know why they needed two words to explain the same thing but either word means given a positive result, what is the probability that it is a positive test result. Precision is the opposite of recall: given a positive test result, what is the probability that it is a positive result.
F1 Score is used to measure a test’s accuracy. F1 Score is the average (Harmonic Mean to be specific) between precision and recall. The range for F1 Score is between 0 and 1. It tells you how precise your classifier is (how many instances it classifies correctly), as well as how robust it is (it does not miss a significant number of instances).
High precision but lower recall, gives you an extremely accurate, but it then misses a large number of instances that are difficult to classify. The greater the F1 Score, the better is the performance of our model.

Performance Metrics for Regression Problems

Mean Absolute Error

Mean Absolute Error is the average of the difference between the Original Values and the Predicted Values. It gives us the measure of how far the predictions were from the actual output. However, they don’t gives us any idea of the direction of the error i.e. whether we are under predicting the data or over predicting the data. 

Mean Squared Error

Mean Squared Error(MSE) is quite similar to Mean Absolute Error, the only difference being that MSE takes the average of the square of the difference between the original values and the predicted values. The advantage of MSE being that it is easier to compute the gradient, whereas Mean Absolute Error requires complicated linear programming tools to compute the gradient. As, we take square of the error, the effect of larger errors become more pronounced then smaller error, hence the model can now focus more on the larger errors.

Metrics for Ranking models

Best Predicted vs Human, BPH:

The most relevant item is taken from an algorithm-generated ranking and then compared to a human-generated ranking. This metric results in the comparison that shows the difference in estimations of an algorithm and a human.
The problem with using this method is when people use human ranking, a lot of the time people use mechanical turk or internal employees. I can pretty much guarantee that the people working for mechanical turk are not your target audience and neither is co-workers. So if you are the UX person, make sure it is the people from the target audience that are ranking their choice and not a developer that is familiar with what the answers are possible.


If you are comparing the whole ranked list instead of just the top item then the Kendall’s tau coefficient shows the correlation between the two lists of ranked items based on the number of similar and dissimilar pairs in a pairwise: in each case we have two ranks (machine and human prediction). Firstly, the ranked items are turned into a pairwise comparison matrix with the correlation between the current rank and others. A concordant pair means an algorithm rank correlates with a human rank. Otherwise, this will be a discordant or dissimilar pair.

Business metrics

Let’s shift gears to business metrics. These are the things the PM needs to worry about. Since most models are built in a research setting the developers have the most say into the metrics being used on judging if a model is “good”. As machine learning matures more and more processes mature so will the business metrics. Until that time, here is what I have seen being used by PMs  that I have worked with at different companies. Feel free to let me know if there are others you have used.
The metrics I will be covering are:
  • Model adaptability
  • faster iterations
  • Smaller, more efficient, models
  • Ability to productize the model

Model adaptability

Once a model is created there is a good chance it can be used for more than just its original purpose. I’ve previously talked about the data ladder. Basically a better model leads to new info being able to be collected, which leads to a better model. That is the ladder. It should be part of your plan for which models are needed to get new data sources.
This metric is to rank the model structure based on how adaptable it is to not just the current step in the ladder but how useful this model will also be for future rungs of the ladder. Basically will it improve with added data streams or is this model a one-off that will only work for the current rung in time. Favoring models that are adaptable will help you climb the ladder faster and get ahead of competition.

Faster iteration

How fast can you put out new minor versions of a model? How long does a major version take? Models that can be turned around faster can be improved faster. Faster improvement means more accuracy, more frequent user testing, and better alignment to what your customers want.

Smaller, more efficient, models

It is amazing how accurate models can get given enough data and hardware power. The problem being that cruft can start to build up just like any other software. Smaller models require less hardware for training and serving. Since machine learning hardware is pretty much all cutting edge it is expensive to use as compared to other cloud services. This is cost saving metric, but reducing complexity also helps faster iteration, and create a better user experience.

productizing model

Since machine learning is mainly research still, the majority of models created never see the light of day. This is another problem with building up cruft in the model. Optimizing models for production helps keep the development cycle lean.

UX metrics

The warning for creating a good user experience is to think long term. Be extra careful with the metrics you implement. The law of unintended consequences can be harsh with AI models. Most developers will optimize their models for some form of accuracy or precision. To lower the chances of unintended consequences, add UX metrics. The ones I use are:
  • Customer happiness
  • Fairness
  • Model regression tests
  • Faster iteration times

Customer happiness

Customer happiness will help for long term customers. Choosing between long term or short term customer satisfaction matters on the cost of two items for your company. Compare the customer acquisition cost (CAC) and Customer Lifetime Value (CLV). The higher both of these numbers are the more important it is to focus on long term happiness. This also means smaller perceived changes for the user.


Fairness is important for a metric. I’ve linked the metrics from open data science in the show notes.
  • Disparate Impact” is the ratio of the fraction of positive predictions for both groups. Members of each group should be selected at the same rate.
  • The “performance difference/ratio” is the calculation of all the standard performance metrics like false-positive rates, accuracy, precision or recall (equal opportunity) for privileged and unprivileged group separately and see the difference or a ratio of those values.
  • “Entropy based metric” is the generalized entropy for each group calculated separately and then compared. This method can be used to measure fairness not only at a group level but also at the individual level. The most commonly used flavor of generalized entropy is the Theil index, originally used to measure income inequality.
So which one should you choose? Be empathetic of your users and think how they would measure fairness and find a metric that reflects that.

Model regression tests

With so much about building models being researched based, code base usually doesn’t last long. The problem is that once it is time to move onto the next version it is just as likely to start over with a new code base to try out new ideas. Problems that have been previously solved can easily creep back. All in the search for new better accuracy.
This metric helps to keep the team to constantly keep track of previous problems, how they were solved, and to merge that code base into new ideas so that old problems don’t need to be solved again. Otherwise it is trying to provide a good user experience on a constantly shifting ground because you are never sure what was optimized for in this version of the model.

Faster iteration times

I covered this one in for business metrics, but this is also a big deal for a better UX. The reality is that the faster a model can iterate the more it can be tested with users. More tests with users helps with feedback on what the users want.
Models can can be iterated fast can also test new ideas faster. Rapid prototyping helps in testing more ideas. Anyone doing user tests know how easy it is to get completely unexpected answers from user tests. The ability to quickly pivot helps to align to what users want.
and on that note, what metrics have you caused unintended consequences for the models you have helped to build? Have you found any metrics that have helped create better models?
That’s all we have for this episode, but I would love to hear back from you on which metrics you use for your models.
, use your phone to record a voice memo,
then email it to
That is also an awesome way to let me know what you like and would like to hear more of, or If you have questions or comments record a message for those too.
If you would like to see what I am up to, you can find me on Twitter at @DesignForAI
Thank you again
and remember, with how powerful AI is,
lets design it to be usable for everyone

8-Will AI make UX design obsolete?

Episode 8

We cover changes that are coming to the UX field because of AI. We look at how design, research, and UX management will all need to adapt to new processes.

Music: The Pirate And The Dancer by Rolemusic


I’ve gotten this question. So the scenario that everyone seems to come up with is:
Sure at first it was just the repetitive jobs that got replaced by AI.
Then GANs started generating everything.
Who needs a designer when a computer can put out 1000 designs a second?
Obviously, I wouldn’t be talking about this if I thought this was a problem.
Today we are covering how AI will change UX and design
This podcast is called design for AI
It is here to help define the space where Machine learning intersects with UX. Where we talk to experts and discuss topics around designing a better AI.
music is by Rolemusic
Im your host Mark Bailey
Lets get started
I want to start with an example
Everyone knows of deep blue, the chess application that first got everyone’s attention by beating the best chess player.
Since then AI has beat the best Go player and can beat anyone at competitive video games.
But do you know who has beat the AI systems?
Human and AI hybrids.
The top ranked chess systems right now that can beat any AI out there are all human-AI hybrids.
The human brain and an AI system both make shortcuts.
They do them in different ways.
They do better filling in for each others weak spots.
The best system is always augmented, not replacement.
I’m not the only one who thinks this
IBM CEO Ginni Rometty recently expressed that “If I considered the initials AI, I would have preferred augmented intelligence.”
Now while AI isn’t going to take over UX, or make it obsolete, I do think a lot will change. In the scenario I talked about GANs generating 1000 designs a second. This is actually the case, they can. But automated generation doesn’t mean good. A few years back there was a company that promised to get rid of the need for website design, called The Grid. It delivered underwhelming results. But, you might ask it could get better from another company. Google tried something similar. You may have heard of them testing 42 shades of blue against each other to get just the right blue with the best response rate. That was successful. But when they tried expanding the analytics based design past those very basic items they kept hitting a wall.
So why is this the case?
To get better AI we need better UX
There is mutual benefit on both sides that run in a cycle.
  • As AI starts getting used more,
  • The ML model produces more data that is useful
  • The new model is trained off of that data
  • AI becomes more useful
  • ML models start spouting up delivering unneeded advice and tasks which just add to confusion instead of solving problems
  • The need for a better UX becomes more important
  • A better UX is created and refined
  • AI gets used more, the cycle starts again.
So if the cycle shows that there is still a need for UX, how will the job itself change. Well just like most other jobs, it is the boring repetitive monotonous parts going away. There will be an automation of design. The part I talked about in the scenario about GAN ML models able to generate 1000 designs a second. That already exists. The new UX designer will become more of a curator instead of a generator if designs.
This has already started to happen as the design tools get better and UX design matures anyway. There is no reason to redesign the same widgets over and over for an entire career. The need for systems designers is evidence of this. They design a component once and it is used and customized by everyone using that system. The same should hold true.
I’m going to talk about this from the three main areas that have to do with design: Design, Research, and Management


For the designs you do create, nothing new here, but concentrate on empathy. The problem with ML models is understanding humanness. How can you figure out for the app to react to users current context and mood?
Some of the cutting edge research right now is called ensemble models. The basic idea is that ML models are good at doing one thing well, so if you take a bunch of models and put them together and add another model to make a decision for which model to use it creates a more robust experience. This is what needs to be designed. Every change in context is going to need a different ML model.
As part of knowing the context, I covered previously to know when to tell jokes. That depends completely on context. Another area where context is important is when the model knows when it gets something wrong, you will need to design how admit when it is wrong.
Context matters for the device UI. Know the device they are using and the difference in the devices. I’ve heard of a tool called that helps to test on all the different platforms. How does the context change based on which device they are using?
You will need to keep abreast about what new devices are released because of UI changes. New devices means new features. Know the features available and customize the experience for them. Amazon Alexa is an example. When it first came out it was just a speaker. Now it can has a screen or can interact with different screen in the house. The interaction needs to be designed because AI is good for all the things in the past. It depends on data of things that have already happened. It can predict how things might happen, but it takes a while for it to adapt to a new normal like a new product coming out. These experiences need to be designed for.
Also know that AI does not do transitions well. because of the need to focus models to a very narrow area there is a need to transition from model to model to cover the whole user journey. How well this transition happens will be up to you. Also if the modality changes that will need to be designed as well. For example, if the user transitions from a laptop to mobile, it is hard for one model to hand off all the needed info to the other model, so what is important for that experience will need to be decided as part of the design.
AI does not do edge cases well. As I have talked about previously accessibility is just a group of edge cases that will get smoothed out by a ML model. Ignoring accessibility can open you up to lawsuits and avoid about 12-15% of your customers. Also ignoring them might be adding more noise to your model depending on what your model does, like muscular disabilities can throw off interaction recognition models and cognitive disabilities can throw off data answers. With all these reasons it is a no brainer is to make sure to differentiate the accessibility personas even more than it used to be.


The first thing that a researcher should do is to look over the data that is being collected to train the model. Does the data match the user’s intent? When the data was collected what was the reason for collecting the data? Will that affect data accuracy for how the data is being used to train the model? You will need to watch for holes in the data when you compare it to the data from the field. Otherwise, the model could be completely accurate but not for the customer you are trying to target with the app.
For UX researchers, again the more things change the more they stay the same. The  User journey is hugely important. More so now than previously. Machine learning is there to automate the boring stuff. So find out how to let people do the actions they are passionate about. Do not automate the areas that people enjoy. To know what those areas are it takes research. Like I said previously, machine learning has problems with humanness so knowing the motivation for behavior and pain points is info coming from the UX researcher.
Probably the biggest problem for AI is trust. To know to do the right thing at the right time requires knowing the user journey and covering the different context changes that affect the user journey. Also to build up trust, as a researcher, you need to find out which steps in the user journey are the important steps. Where is the accuracy needed to be? Where is ok if the model is only right 70 or 80% of the time. Being able to differentiate between the importance of the different steps of the user journey will help the developers know where they need to focus their work.
Researchers need to know all the different personas. Up until now the guiding knowledge has been to try to boil down personas into 3 or 4 archetypes. This needs to change because of the narrowness of the AI. The data being used to train the model needs to be gathered from similar populations that you are planning to target. Otherwise there could be problems with all the holes in the data are where your customers fit in.
For internationalization every target market will need its own model. So, you will need to know how are those target markets different? How will the developers need to build the models differently for specifics of each group? The answers to these questions will need to be added to your reports.


First, benchmark current product. benchmark competing products. A lot of the methods that I have talked about to adapt to process changes. With the workings of ML models being unknowable the new process is to compare metrics to a baseline. So first decide on what the important metrics are then benchmark it. Models have a way to becoming un-runnable fast. Libraries get updated quickly and trying to fire up an old model for comparisons might be impossible, so do it now.
Managing the process also has some extra things that will need to be covered. The biggest probably being ethics. Should you be building it? I’m not talking about if the problem can be built without using AI (but you will need to know that answer too). If the product is built, are there any unintended consequences? Is this product going to be the best thing for the user? how are you influencing the actions of the users?
There are also new risks,  Machine learning models can be deceived. If some users can trick the model it can cause a worse experience for other users. You will need to make sure the model isn’t getting exploited by knowing the data going in and comparing it to the data output. This allows bad users to find exportable shortcuts within the model. Other points of security include the training data, data sources, algorithms interacting with the models, along with the models themselves. It is also a good idea to make sure to know how to return to a known good state if something does go wrong and to make sure the model can not alter itself.
Transparency is something that you want to think of even if the users are not asking for it. Most likely it just has not been thought of yet, and when they do the opinion can change quickly if the app lacks transparency. How to expose the process will differ depending on the app, as well as the amount and types of info you want to give. Just keep it as part of the design process that any time there is data used to answer a question or complete a task you have to answer the question: Can we reveal to the user where the data came from and how we process it? Since AI tries to do boring tasks for the user,  a good time to help transparency is telling the user when you helped them.
AI safety seems to be an area that is being handled by UX too. There are different types of safety: for the businesses know that AI can be unexplainable, so if the app involves government or regulated industries like banking it can cause problems with regulators. Also mission critical systems can’t test 1000 iterations and hope that one works, so designers will need to create safety scaffolding for the ML model to operate within to keep it within boundaries.
AI safety for users means you need to recognize context when getting the answer wrong could cause harm in those situations design into the experience a way to hand off to human intervention or shutdown interaction. 
There is an increased need for recognizing human biases. Since data comes from people, and people are biased. So is the data. Training data, labeling data, the way the data is collected, the way the data is cleaned, the format the data is output can all be tainted by bias. A good way to verify is to take the needs found in the research and turn them into stories. Look at how the data ingestion process at every transformation and see if it matches with the intent of the story. It makes it easier to find bias.


I tried covering all the areas I can think of that are changed so I’m going to end on a caveat. There are a lot of predictions in this episode. Like most strategy plans I can only say what is happening in the next three to five years. After that time, this industry is moving so fast it gets hard to know what is coming, but that is what makes it interesting.
Another thing I didn’t talk about is AGI (Artificial General Intelligence). If you are unfamiliar with the term it is basically the AI in movies that can think for itself as opposed to the narrow AI we have now which does one task well. Since there is so much controversy on if AGI will ever be possible, all I can add to this controversial topic is If ever there was a real need for making sure design is human centered, this is it. All the topics of AI transparency, ethics, and safety are important to build into the tools that build the ML models. Even without the movie scare of AGI, machine learning is hugely powerful. I’ve got a blurb on my website about AI being like nuclear power. It can be super powerful, but only if designed correctly. Starting with understanding the intent on why data was collected and translating that into something helpful will require much better tools than the beginnings we have now. And better design is how we will get there.
and on that note, how do you think UX design will improve AI? and how do you think AI will improve UX?
That’s all we have for this episode, but I would love to hear back from you on how you think things will change
, use your phone to record a voice memo,
then email it to
That is also an awesome way to let me know what you like and would like to hear more of, or If you have questions or comments record a message for those too.
If you would like to see what I am up to, you can find me on Twitter at @DesignForAI
Thank you again
and remember, with how powerful AI is,
lets design it to be usable for everyone

7-How AI changes UX interaction at every stage of software development

Episode 7

We show how the normal software development cycle does not work with AI and how the modified dev model needs attention from UX at every step 

Music: The Pirate And The Dancer by Rolemusic


Here is the scenario for this episode:
The boss gives you access to the companies data and asks you to come up with a model that uses it. With all this data it’s got to be good for making something the users will use right?
You buckle down, work with data scientists and make a lot of tweaks to the data come up with something, but no matter how much you advertise it no one wants to use it. Back to the drawing board.
This time you find out what the users do want, more tweaks to the data and get a model that is accurate. People love it, tons of users flood in and flood the server. The servers crash from too large of a model. The IT guys say they can fix it and bring in a bunch of new hardware. It all seems to be going fine until you notice every review of your app laughs at how inaccurate it is. This can’t be, it’s the same model, just running on different hardware, right?
Lets make sure this doesn’t ever happen.
Today we are covering the development cycle for AI
This podcast is called design for AI
It is here to help define the space where Machine learning intersects with UX. Where we talk to experts and discuss topics around designing a better AI.
music is by Rolemusic
Im your host Mark Bailey
Lets get started
Machine learning up to this point has been more on the research side.
So much so that it really doesn’t fit in to the normal software development cycle.
There are all these gotchas that won’t let you fit into the normal cyclic agile sprints that most people are used to.
This affects getting in good design. A big part of UX design not slowing down the software development cycle is to have a regular process so UX can run in parallel to development. It is possible with machine learning development, the cycle just looks a little different.
The normal software development process is building a machine. It’s a really complicated machine, but in development terms it is still stateful, so development is done to by writing to the test case.
For the updated process, instead of a machine, think of it like you are hiring an employee. There are 5 stages to hiring an employee.
  • Plan
    • This is laying the groundwork
    • lay out the job listing – what are the requirements?
    • Find Objectives, why are you hiring them?
  • Job posting
    • What is the purpose & design
    • Set your goals
    • Define benchmarks
  • hire
    • Build On Expertise
    • Collect representative data
    • Build the model
    • Data scientists train the model
  • Train – The model is watching how you do things
    • Reinforce Education
    • Subject matter experts train the model
  • Shadow – You are standing over their shoulder.
    • Build Trust
  • Lead
    • Mentorship
    • AI leads task
    • Subject matter expert manages AI
Step 1: “Plan”
Let’s start with the plan. Before even thinking of Machine learning – collect data. Not just analytic data, user data. This is normal UX research. Is machine learning necessary? Remember AI is not a fortune teller. Aim for problems that are possible now but would take many hours for many people to solve. If a person can’t perform the task, then neither can an AI.
For the people side of UX research, visit in location, in car or lunch to watch real tasks. Bring artifacts if they can’t be visited. Do not talk down to user, ask them to explain things. Write quotes instead of opinion, Take pictures, Ask open ended questions. Do not ask them to design. Do not ask for them to predict the future. People are bad at that. Do not write solutions or bug fixes and do not teach no matter how much you think you can help. Instead, Can you tell be more? Can you explain x to me? Do they have questions for you?
All of these are important to learn the user journeys and to find the user’s true goal. You’ll use these as part of the data design.
As part of the UX, this is also the data to use to build the personas, and map out the user journeys
Step 2: “Job Posting”
Purpose & Design of the model
  • Set your goal
  • Define benchmarks
Take your user’s user journeys and goals and work with the data scientist to line them up to the data points you have available. What data do you have available? Don’t look at the data you have; then design a product around it. this leads to a product that management wants instead of what the users want. Design for what is needed; then find the data sources. There is a pretty good chance you will need to merge different data points to get to the data point you really want to know.
Information quality matters. Determine what the algorithm needs to know. Use representative & complete data. Design in enough measurement points across the entire user journey. Make sure data has enough touch points through the process to help the model have better visibility that it is doing what the user wants.
So what are the things you need to pay attention to while designing?
This is going to sound weird, but it’s OK to remind the user of the good job you did. A lot of the time you are doing things automatically for the user and it is normal human perception to take things for granted. If your server is busy doing something for the user, let them know what you are doing. It helps with transparency too. Perception is key, and last memory is important. So make it a good one.
A good example of this is when you fly on an Airline- Everything can go right, you can get there early, but if it takes too long to get your luggage, even if you leave the airport before you were expecting to, the trip is ruined.
AI specific problems include the user can get lost when you do things automatically for them. Too often AI tries to change the state for the user. If you are creating a world for the user, you need to state the boundaries since this isn’t AGI. If you want to dive deeper into this listen to Episode 6 on AI personality .
When designing ask: Does the user know where they are? Does the user know everything they can do? If you are updating a process with AI. The process got automated originally using the technology at the time. Don’t streamline a process that needs to be replaced. Look at the info the users are getting, what they use them for, what info they really want
Don’t forget accessibility. Machine learning averages toward the general case, not the exceptions. AI generalizes to the bulk of the data, don’t forget the edge cases. It’s easier to not get sued and don’t throw away 15% of your market.
Like I said previously, transparency is important. Machine learning is already viewed suspiciously. Transparency usually isn’t possible in the algorithms. Instead have transparency around:
      • The data you are using
      • Assumptions you made
      • Learning goals for the data
When you are designing, be wary of group think. Being in the machine learning makes you feel like you can solve any problem. You are solving new problem no one has solved before. Just remember, every solution is always a hypothesis that needs to be tested. Everyone can come up with competing ideas. Use user testing, even in the design stage to test the ideas on the users.
Something more controversial I might say is that the designer should have a seat at the table when deciding which algorithm you want to hire. I’ll cover in a future episode what different algorithms will get you, and what they are good for. For now just know to focus on which one you choose will affect the UX for the customer. An example of this: I have known a billion dollar retailer to go with a older less accurate rule based language processor instead of the deep learning language processor that has become the standard over the last few years. The developers recommended what they were familiar with and there was no push back from the product manager or UX because they were not familiar with the space. Not surprisingly the product is stumbling and having a hard time competing.
Step 3: “Hire”
Build On Expertise
  • Collect representative data
  • Build the model
  • Data scientists train the model
As the ‘Boss’ of an AI app, our worst nightmare isn’t that they are too smart; It is that they’ll be like us: dumb. It is even worse if they will inherit our biases.
I’ll be doing an episode just on all the different kinds of bias because there are a lot of ways to run foul of different kinds. For this episode I’ll just leave it at that a lot of people think that if it is a computer program it isn’t biased, because computers aren’t biased. This of course isn’t true since every application is built by people and uses data from people’s actions. So any bias people have can find its way into a AI model.
As the model is getting built it is important to dev and UX to work together. A big part of building a model is trying to get the accuracy up. What to decide in the model for the accuracy should align with what you found out in the research. Part of this is the UX testing. Now normal systems are deterministic. To create normal software dev teams write test driven processes where the outcome is expected, and it has to pass for the software to ship. AI models will always give different answers every time.
Since you don’t know how the answers come about, instead you need to know what the acceptance criteria is. What metric do you need to get your number above? How will you be able to measure it. These can be small numbers. For safety critical systems you want to make sure to move in small measurements. For noisy systems like recommendation systems, a 2% more buy, above the noise might be what you are looking for. It will depend on your industry, and what you found in research.
Another problem to watch for is the ability to do an AB test. Once software has been written it is hard to disable the AI and have the software still work. A good way around this is to have the developers set up an alternative algorithm to just take all the data and find the mean average. This will if nothing else it will tell you if the AI is better than no AI.
Step 4: “Train”
Reinforce Education
  • Subject matter experts train the model
After the model is created with all the training data it is time to open it up a little to a beta test. How you conduct the beta test will be specific to your industry. In the previous episode I spoke about chat bot models being tested using facebook or Kik chatbots or even as a bot on Reddit. The point is to start getting real data directly to the model for it to respond to. The model won’t be live yet but it can be compared to what the SME says the answer should be.
A lot of the time, if the company is small enough, as the UX researcher you will be speaking as the SME since you are the one that talked to the users. Testing the accuracy of models as a non-data scientist might sound difficult but you can’t shy away from math. It will take measuring analytics since there could be a lot of noise. But, don’t worry, no one is good with math. It is just a matter of practice.
The SME need to be directly active during user tests. Every question that comes up the answer that the SME gives can be used to check what the model would have given in the same situation. Right or wrong the new data can be used to better train model edge cases because the SME know the problem domain. They know what a right or wrong answer is for the model to give. Exploratory testing and boundary testing can be done because they need to know where the limits are.
A heads up that at a lot of companies, the QA group is also adjusting to the new reality of working with AI. As long as they can verify that the app gives a reasonable response, it passes. AI can give many responses that make sense but do not help the user to achieve their goals. Make sure that the metric being tested, whether it is quantitative or qualitative,  has been reduced down enough so anyone can tell if it a pass or fail.
Answer quality isn’t the only thing you want to make sure is part of a good user experience. Before release verify:
  •  Availability of serving hardware. A lot of delays can creep in when one server is depending on another’s answer.
  • Response time for the model to give an answer or interaction. Make sure it can be scaled up.
  • How fast the critical mass of users can be built up. If you don’t get enough users in the beta test, it won’t train the model to give better answers. If the pickup of customers isn’t there, why? Did you not advertise the beta enough or is part of the interaction not what is expected or wanted?
  • The answers the users are giving. I’m referring to the Schenectady Problem. The company Meetup was showing a lot of users in New York state. Way more than what was representative. when they looked up zip code, it was for a single GE factory, but was showing 10 of thousands of users living there. That zip code was 12345. Just be on the lookout that you may need to clean your data coming in. This is why you are doing the beta.
Step 5: “Shadow”
It is now the AI’s job to build trust
  • The roles change and the AI system is shadowed by the subject matter experts.
  • You will need to balance competing factors of speed vs accuracy
It is important to build up users to increase accuracy. As the first users started to use your app the accuracy was low because of the lack of data. This difference between expectation and reality has been labeled the “gulf of disappointment”. Time spent in the gulf of disappointment is because of bad design or bad accuracy.
If users spend too much time disappointed they will stop using the app.
Bad design is covered by UX design that hopefully was done so it won’t be the cause. Bad accuracy is a reality when starting out because of the lack of data. As the number of users increase and more data is collected it becomes time to walk the tightrope between using more hardware to increase accuracy and simplifying the model to increase the number of users being served.
It used to be that UX would work with the developers to create a good experience. Dev would build it and then QA would verify it. After that the product would be released and the team moved on to new features. This still works for non-AI products. The current process of developing machine learning throws a wrench into that when things go into production (known as inference for ML models). ML is still enough on the cutting edge that most develops are on the research side of creating ML models, meaning they care more about getting the model quality up instead of speed so the models that do get created work well but are slow.
There are a lot of ways to get the speed up for production but there are trade-offs.
Unfortunately some tradeoffs can negatively affect the model accuracy, so make sure to test before and after for each model to make sure it still makes for an acceptable user experience.
The most obvious solution is to just optimize the code for the current model. A lot of times companies have a different engineer that specializes in productionizing models. This is a different skill than building models. They can strip out inefficient parts of the model to try to simplify it while affecting quality as little as possible. This can be verified before production.
The next solution is distillation. This is a where the slow large accurate model tries to train a faster simpler model. The idea is a smaller model is created without all the specialized code. A much larger training set can be used since it does not need to be hand labeled. Instead the large slow model is used to tag unlabeled training data. The larger training set hopefully allows for a similar quality measure. This is easy to catch if QA tracks model versions the same as code revisions.
The third solution is changing the serving hardware.This one is tricky. Up to this point one of the general ideas of code is that once software is compiled it will run the same no matter what hardware you put it on. This is not the case for ML models. The quality of hardware can affect the quality of the accuracy of the model. When the hardware is built, more expensive hardware means more tensor cores in the case of GPUs, which means more math can be done, which means more accurate answers. If it is a large enough deployment the number of GPUs (or TPUs if you are using Google for serving) can affect the quality of answers so the servers all need to be live with the software on the serving hardware before you can be sure the user experience tested will be the experience the users actually experience.
The last solution I will cover is quantization. It is a shortcut that can be done to cut down on the math needed. The idea is to take big numbers that take a lot of space (for example floating points like 3.12345) and shorten them down (3.1 or even just 3 depending on if you are using floating point or integer quantization). It speeds things up but again the experience needs to be verified after the changes are made. Also just as a heads up quanitizing models makes quality levels even more finicky depending on the hardware it is run on and the hardware it is run on will determine what kind and how much quantization can be done.
Step 6: “Lead”
  • AI leads task
  • Subject matter expert manages AI
This is a successfully working model, but It’s too early to pat ourselves on the back.  Successful models are built in small steps. Building a successful model brings in more data that was not available before now. Does this new data give you a way to improve the model? Does it give you the data needed to fulfill another feature requested by the users?
Probably you are not able to achieve the ultimate goal of what the model should do based on what data was available to you at the beginning. It is a good to have a model expansion plan. Building model A will allow you to gather data x. Data x will allow you to build model B which will allow gathering data y and so on until you reach what you envisioned for the users.
This will however allow for you to start the cycle again. What was learned? what can be improved for next time?
and on that note, what can I improve? For this episode and for the podcast in general?
That’s all we have for this episode, but I would love to hear back from you what you like and would like to hear more of.
If you have questions or comments, use your phone to record a voice memo,
then email it to
If you would like to see what I am up to, you can find me on Twitter at @DesignForAI
Thank you again
and remember, with how powerful AI is,
lets design it to be usable for everyone

6-AI personality

Episode 6

Why creating a personality for your AI is important, be it recommendation system or AGI. We cover the steps needed to evaluate your system and come up with the best personality for your users..

Music: The Pirate And The Dancer by Rolemusic

Background research links


Today’s episode is about personality,
So I thought it best to start with a scenario: For example, you are in the market for finding a lawyer, and like most people looking for a lawyer you need to watch your money. you’ve heard good things about some companies providing virtual lawyer services. You download one since it was the top rated since it was so  friendly. You get started telling it about the background and back and forth is full of jokes from the lawyer. But the jokes just seem off. Then you need to find some more info and take the device down into the basement the virtual lawyer says it lost its network connection and just starts laughing maniacally. Maybe somebody finds this funny, but if they messed up this bad on the humor, you have no confidence that they got the legal part right.
Delete that one, obviously friendly was not the way to go. You download the next one rated totally professional. You start the process but it is taking forever. You have to go through one question at a time. This thing feels like it is reading war and peace off of a DMV form. You find yourself getting lost in the monotony and realize you skipped over the most important nuance. This isn’t professional, this is fingernails slowly scraping a blackboard. Ugh, there is no way you’ll make it through the process and remember everything.
Another failure, money wasted, and you still need to talk to a lawyer. Lets make sure this doesn’t happen.
Today we are covering personalities for AI
This is design for AI
a podcast to help define the space where Machine learning intersects with UX. Where we talk to experts and discuss topics around designing a better AI.
music is The Pirate And The Dancer by Rolemusic
Im your host Mark Bailey
Lets get started
Today we are discussing how to design your AI personality.
We will cover the process step by step for what is important and what to avoid.
Some people associate finding the right personality with something hippy or new age.
This is not that. If you want the book answer, the personality is the distinctive tone, manner and style in which your app will communicate. It is defined by a set of attributes that shape how it will look, sound and feel. The right language, and tone that embodies your app and differentiates it from the competitors.
Look, there is a good chance your app and company already have a personality. Your current web or app design already defines the personality of the company. Color choice, type choices, UI layout, documentation, errors all make up the brand.
Basically, it’s the company personality that dictates the brand. So the next step is to use that personality,
that up to now has been used for the brand, and to translate that over to training the AI. There are some companies that don’t have a personality right now. The reason being is a lot of companies might not have defined a personality up to this point is because of they’ve used a template for their site or app. There are a lot of templates for websites or using default frameworks for building the widgets for apps.
There just isn’t a template for this yet in AI. So going to the trouble of creating a personality has to be done on a case by case basis still.
Because the world does not need another Clippy. It was an avatar that tried to keep it light by telling jokes along with the help it gave. The problem was the brand for Microsoft Word is much more corporate which created anger at the unexpected behavior. Jokes or wacky interface quirks can only increase user’s interest or desire to explore in the application if it what they are expecting.
Personality sells though, so it will pay for itself if you get it right. People can tell when a company has enthusiasm and passion for what they’re doing. The tide will turn soon enough where the AI will stick out like a sore thumb when it is bland. To follow best practices will earn you a spot in the middle of the pack. The problem being most user are not happy with their app as “not terrible”.
If it’s important when hiring employees, why wouldn’t it be important when creating an AI personality? AI centered companies are already working in this. Google is hiring creatives to bring humor and storytelling to human-to-machine interactions, and Microsoft Cortana’s writing team includes a poet, a novelist, a playwright, and a former tv writer. Skills to build a personality can come from writers, designers, actors, comedians, playwrights, psychologists and novelists.  Not the normal job descriptions you would expect for tech companies.
The integration of these skills into tech roles have sprung terms such as conversation designer, persona developer, and AI interaction designer.
So now that we have established a the need lets talk about the creation process for a personality. If you want some long term planning here is some predictions. At some point in the future, companies will probably have many personalities to let people choose their preferred voice or body depending on the AI UI. Different personalities will become popular similar to material design from Google or metro from Microsoft. Which will lead to templatizing a personality similar to how wordpress templates exist now, and it’s only a matter of time before one company sues another for copying their AI personality similar to brand infringements today. Personally I am waiting for the days when enough UX research has been done, that we know which custom AI personality works best for interaction modality. So while a it might sound silly that the best way to get legal information from someone is if they are talking to a salty sailor, there is no way to know which personality can become associated with interaction modalities without creating one for your use case first.
What should you not do
The biggest question to avoid is usually, shouldn’t I just use my own personality? or the founders personality? There are a couple of problems with this. In true UX fashion remember, you are not your customer. The ability to create a company does not always translate over to good customer interaction for a variety of reasons. Another reason why that usually does not work is because you can’t measure your own personality. Most people only associate with their positive traits, unfortunately there are usually blind traits that can go along with them.
So how do we find what personality would work best for the users?
Well, we ask them. Poll people to select descriptive words. Use standardized list, like from Microsoft’s word association test work best,
because it takes time to balance positive and negative words, and making sure all the areas are covered. I think the word associations are the easiest and fastest to move forward with but if that doesn’t sound good for you I have also heard of other people that have successfully used Myers-Briggs to describe character traits. I know it has been debunked because it over-simplies personality types. But that actually helps since it is simplifying choices that need to be made. Another way to gather the information is something called Spectrum.
It was Five Factor Model created by Ari Zilnik It defines personality as a combination of: openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism
Be aware that answers usually skew towards positive so pay more attention to the negative feedback. Basically you are asking users to choose the words they associate with your brand, company, and app. The 5 different areas to measure are
  • Awareness – How aware is the user of the company, product, needs for the product?
  • Consideration – perception of quality, value of the product. misunderstand or can’t find features.
  • preferences – How do features differentiate product from competators
  • Action – Getting stuff done.
  • Loyalty – Will the user want to use your app again?
Depending on the actions that the AI is created to help with sound, haptics, visuals, or AR/VR,  can be aspects of your interface for the user. If anything like this exists for the current interaction get feedback on that as well. Sound Associations can be done by comparisons to companies that have patented their sounds like Porsche or Harley Davidson.
When talking to the users to get the word association get a really good sense of your customer’s personality, what are their goals, what stage are they at in their life, and most importantly who do they aspire to be. This will come into play later. The next step is to run the same word association tests with internal people, but from the aspect of aspiration. Where are the decision makers trying to take the product. This is perfect since this is what PMs and stakeholders are thinking about anyway.
Now comes the comparison.
How you are currently perceived vs how would you want to be seen in the future? Take stock of responses, how do they stack up to expectations. You shouldn’t expect the word associations to be exactly alike but they shouldn’t be too far apart either. If there is too much drift from the customers perception then there either needs to be frank discussion about where your company is for product excellence
Or there needs to be a whole lot of work on the fundamentals. The reason you don’t want to reach too far, is that it comes off as untrustworthy. I mean, you are who you are. Also too far of a reach and the chance of getting it wrong starts to skyrocket.
If you get it wrong this is a lot of work that is going to waste.
Once you have the aspirational view of how you are perceived. Check the goals against what the customer’s goals. These should also be close.
A good example is to go into a teen clothing store. The employees tend to be clones of the people from their ads. It’s not a coincidence, stores are choosing to mirror their target aspirational demographic for their customer interactions.
  • So from that point of view what would your employees look like?
  • How do your customers align with their peers?
  • What motivates the people to do what they do?
Areas of personality that need to be defined include
  • Professional vs casual
    • Don’t want to take all personality out if you are going full professional.
    • Also be aware that if you are going casual, informality changes for different groups, cultures so make sure to gather information all your markets.
  • Humor level and type
    • Do you use dry humor or silly humor?
    • The best example I can come up with why this is a difficult question is: I want you to name 2 comedians with the same style humor?
  • Generalist or a specialist
    • Are you trying to reach a conversion quickly and effectively? Or is the whole bot experience crafted to engage long term as part of a larger creative campaign?
  • Brief or long discussions
    • Unless the destination is the personality you don’t want to slow down the interaction. Aim for minimal clutter and fuss.
  • Understated vs extroverted
  • Cautious vs go where others fear to tread
  • Individual creativity vs group consensus
  • Strong opinions vs easy going
Now none of the word associations need to be shown to the developers. Trying to paraphrase brand guideline, and it always gets reduced down to BS words like innovative and progressive. You will not have the brand guidelines next to you when writing dialogs.
Developers won’t have them next to them when writing code.
It needs to to be easier.
If your organization was a famous person who would it be?
Since you already have all the personality traits and aspirations, who does it describe. It can’t be Stephan fry or Barrack Obama. Those are not good choices. It is the equivalent of boiling everything down to the words innovative and progressive. You want to choose a different personality for at least the 5 main touch-points: Awareness, Consideration, preferences, Action, and Loyalty. Your app might be more specialized so your areas might differ depending on your needs.
Now that you know who the AI should act like in different situations let’s talk about some of the things to avoid. First lets talk about humaness or better titled as, When to convey an bot is an bot. Most companies have a code of conduct of how employees interact with customers. I am surprised the same companies will many times not put the same thought into the personality of their AI that is the first interaction point with their company. Right now, conversational AIs are good enough to pass off for human. If you need an example, I’ll link to how Google wowed people with their 2018 IO demo of Google Duplex ( but the next week the articles tone changed quickly saying the voice was trying to trick people ( . Nothing changed in the demo, just people talking. Current culture was caught off gaurd with the quality of the humaness, and it is human nature to try to the label someone as tricky when you are caught off guard.  Sooner or later, with the current tech your AI is going to fall into the wrong side of the uncanny valley. So it is easier to not try to claim humanness when asked, but also don’t try to deny it. In the principles of Google Assistant’s personality – don’t shutdown conversation by denying humanness. Don’t lie and claim preferences. Using the artful dodge
The next hurdle is to make sure to take into account for Internationalization. Currently for websites type choices and layout carry over cultures so this can catch some people off guard. humor does not cross borders well, or even across regions. I know China has different formats for standup comedy depending which city you in. Cues for informality changes for different groups, cultures. An example of this is the sound your mouth makes when your brain is processing. In the US depending on region it can be “uhm” or “ahh” in China it is “nega”
putting the wrong pause word for a region in a search to be more casual can put you on the wrong side of the uncanny valley.
The third topic is situational awareness
For example, how your AI should act in offline vs online How does your interaction change when it isn’t connected to the network. The level of interaction also depends on how many cognitive abilities the user has to devote to the interaction. If you can detect they are driving, your responses will probably be shorter. There is a lot of complexity and nuance to this It helps that the AI can detect more info What is the emotional context of how the person feels at that point
    • how they feel right now
    • What can you detect in voice and body language?
    • What can you know from context of user journey?
    • What do you know from user profile?
The last topic I want to cover is errors. When the server is down, humor makes the problem worse. People do not feel like they are taken seriously. They will lose trust in your AI since it is not acting appropriately. Instead of humor, try to empathize Acknowledging and validating an emotion is often enough to make customers feel understood and release negativity of the bad situation caused by an error.
So once you have created and implemented your personality how do you know it is working? Let’s talk about testing personality success. You are trying to find out: Do decision-makers select products and services more or less? Currently there is a lot of counting tweets or Instagram pictures. I would recommend against that. They are hard to quantify because of the high noise. Ways that I would measure
  • sentiment analysis with AI
  • Measure the brand strength through qualitative and quantitative surveys
  • AB testing choices are good to compare against the baseline. This is a good time to pull out the brand values. You can get word associations for the changed personality to see how it affects word choice.
  • and of course, keep track of your analytics for unsure answers. Does the personality help the confidence level go up for what the AI is making decisions on from the info gathered from the user?
So that’s all we have for this episode, If you have questions or comments, use your phone to record a voice memo, then email it to If you would like to see what I am up to, you can find me on Twitter at @DesignForAI
Thank you again
and remember, with how powerful AI is, lets design it to be usable for everyone

5 Michelle Carney, founder of MLUX

Episode 5

I talk with Michelle Carney, founder of MLUX. Lecturer for AI design at d.School at Stanford, and Sr UXR at Google AIUX group. We discuss resources available and needs that have not been filled yet


Upcoming Events:


Machine Learning and UX (MLUX) Meetup Resources:

Music: The Pirate And The Dancer by Rolemusic


Coming soon

4 – Improving the UX of conversational UIs

Episode 4

We cover all the steps needed for creating a conversational UI like a chatbot or Siri, Alexa, Google voice, and Cortana. We make sure to cover making a plan so a good user experience is the top priority.

Music: The Pirate And The Dancer by Rolemusic


Hello and welcome to Design for AI, I’m Mark Bailey, Welcome to episode 4
Let’s talk conversational UI.
A lot of people think chatbot, other people think Siri, Alexa, Google Voice, or Cortana.In the current gold rush climate that is AI right now, it seems that the first step a lot of companies dip their toe in. Sounds like a good topic to cover to me. So I’ll cover the steps that need to be covered to avoid mistakes.

1st step: start with a plan.
If you want to have a conversational interface you need a plan. Think of a good plan as a stop off point on way to voice interaction that everyone says is just around the corner. More likely though is to think of the plan as a list of immediate needs, then turn that around and look at it from the users point of view. Who uses a conversational UI? People using voice interfaces right now, they don’t want to be bothered. They don’t want to be bothered to wait to talk to a live person, bothered by downloading your app, bothered by opening their computer, not even bothered to get off the couch. Your UI needs to make their life more convenient. the way to think of your plan is, how will you get what you need AND make it more convenient for the user?

The first part of the plan is how this benefits your company. What is your motivation for building the interface. Your reason will be specific to you. So I can only cover the general cases. It could be for improving media buy by understand customers, or reducing call center time. There are a lot of Industry specific choices. Conversational UIs are easier to apply to certain industries more than others. Some of the easy industries for this kind of interface. If you are running a CRM, then reduce call center times. For IP established media, the personality is already there. There are set expectations on what to expect so it makes the personality a lot easier.

The next part of coming up with a plain is deciding what to measure. Again this is very specific to your industry. Do you want to know the length of engagement? should it be higher or lower? Do you want to increase return users A lot of the time you will be getting some analytics about the user. Do you want to compare info gathered through your UI to analytics info in user profile? What can you add to the user profile? Do you want to increase the number of recommendations made to other people. No I’m not talking about Net promoter score. I’m talking about use referral codes to get real numbers. You can even measure emotions on users leaving.

Once you have your plan
The next step you need is a DMP – Data management platform, to store the info you are collecting from your app. If you do not have one, now is the time to create it. You probably want to hire a data scientist because DMPs can have a high noise level. Because really, to get any usefulness out of them you will need to be running experiments with the data. DMPs work better better when cross referencing information against each other instead of straight search. Also now is the time to try rolling your own natural language processing project, known as a NLP. Siri, Alexa, Google voice, Cortana all have their own sandboxes that are not compatible with each other. You can try developing for a couple of them to see the differences between the systems. Or a good one open source one to get started with is called Mycroft.

So now that you have a plan and a platform to move forward with, what’s next? You need to create a personality. This is going to depend heavily on your company brand, and what you are trying to accomplish. Think of what is going to be the motivation for the AI you are building. What their motivation is will affect how they answer and guide the conversation. It also depends of the situation your users will be in while having the conversation. You don’t want a mechanic in the middle of a job getting asked 100 questions to get the response they want just so they don’t need to clean off their hands . It might sound like we are designing a person, and there is an argument that goes back and forth on how human you should make your your AI. It is too much to talk about here so I will cover that in a future episode.
short story is don’t fake being a real person. Also know that personality and humanness are different. In this case we want a strong personality, so what’s motivating your AI to give the answers it gives is important. A strong personality is important because it helps to hide the holes in the AI, but not in the way you think. Technology is not at the point yet that conversational AI can answer any question, and people really like to test the limits of conversational AIs. Using a strong archetype personality takes the fun out of pushing the limits
You wouldn’t ask an auto mechanic plumbing questions but you would take joy in asking a know it all a question they didn’t know the answer to. So a strong personality helps people from poking areas you can’t think of.

Once you have the written down all the important aspects of your personality, the next step is to create the golden path. You don’t want to get into the AI yet, and we are not thinking of edge cases either. The golden path is perfect non-interactive conversation. The user asks all the perfect questions, your AI knows all the answers and questions needed to get the information needed to get to the goal. Once you have the golden path, you can start breaking down the conversation into dialogics.

For a description of what dialogics are think of it as the interchangeable small parts of the script. There is a stream of conversation that gets broken down into a trigger, then the steps 1,2,3 and so on until you reach the goal. This is the part that is UX. It is the use cases,
and since the personality dictates the dialogics use cases, that’s why you need to work on the personality first.

This is when you create the script. What do you want to know? Take your golden path conversation and atomize into spreadsheets.
Figure out the use cases, breakdown the conversation into the smallest bits possible, test it by talking to another person, then once you have the use cases broken down as small as possible, create conversational points around the user journeys. These conversation points are where the analytics will plug in, so you know how well the conversation is going the way you expect it to go. One problem to be aware of when you are testing the conversations, know that users will altering their behavior to fit the AIs requirements. The best example I can think of is it is similar to over pronunciation when voice to text first came out. So when testing the conversations, make sure the person you are testing with don’t know what the goals or conversational points are. This is something a lot of people have problems with because they want to test it with co-workers first. Co-workers know the goals of your company. They will subconsciously try to move towards the goal or purposely artificially move away from your goal. Neither of these are real world situations.

Next step is the machine learning.
You want to create the algorithms to get it better. This depends on the use cases you came up with in the previous step. I’ll leave it up to your developers group to handle this step.

Once you have done all the machine learning you are ready to release. This is the final step? Not even close. This is where you start to specialize the training on release. You need to look at the analytics. Where are the the conversations getting killed, where are they lasting longer. You can create multi-variant tests for different script choices. For this first release it is not expected to be the final form. It is good to start beta-testing as a game on kik or facebook, or you can create a conversation bot on reddit. If you want to do a branded beta of your app that will work too, but you need to advertise the beta, or you won’t get any training data. The reason for this beta release is to train the AI.
Expect for it take about 3 months to get ahead of the open source text libraries.

The real final step is entering the cycle. Since AI is more like an employee instead of a machine you have to keep checking on it, otherwise data moves, the model moves, everything moves and your AI gets worse. There will always be tweaks you can make to the conversation run smoother. The reality is that the technology isn’t quite there for the AI to understand unstructured conversation. Think of it like you are perfecting the telescope that your AI is looking through to see. Basically there will be some kludges to cover over the holes in your model. I’ll talk more about the development cycle in future episodes.

Thank you again
and remember, with how powerful AI is, lets design it to be usable for everyone

3- How to use privacy to improve the UX of your AI apps

Episode 3

I talk about how to get privacy to improve the UX through federated learning.

Music: The Pirate And The Dancer by Rolemusic


Hello and welcome to Design for AI
Im Mark Bailey, Welcome to episode 3

Today we will be talking about federated learning.
There is a good chance some of you are wondering what it means,
don’t worry it’s still considered a pretty new topic in AI.
Even the word isn’t pinned down, Apple calls it ‘Differential Privacy’.
so I’ll jump right in to explaining what it is and why it’s important to UX.

The old way, or I guess I should say the normal current way,
most models store data used for machine learning
is to round up all the data you think you’re going to need + data attached to it
then all gets uploaded and stored on your servers.
This is the centralized model
There is the saying going around that data is the new oil,
because the more data you can get your hands on
then the better the accuracy is for your model.
Which means you’re at the front of the line for the gold rush,

Well, not so fast
There are problems
Some people refer to data as the new plutonium, instead of the new oil
There is a high liability for personal data
Releasing an app over the internet is global.
But, laws and regulations change by country.
The new EU privacy laws like the GDPR conflict with the laws in authoritarian countries where they want you to share all your data.
In steps the idea of federated learning
As a quick side note, I am using Google’s term federated learning,
instead of Apple’s term Differential Privacy.
Differential Privacy is a little more inclusive of making things outside of machine learning models private,
so in the interest of keeping things as specific as possible I’ll use the term federated learning
to keep things as specific as possible.
I’ve included links for both Apple and Google’s announcements in the show notes.

Anyway, it is easiest to think of it in terms of using a cell phone,
because that is where all of this got its start for both companies
On device storage is small and there is too much data to upload over a slow network
The phone downloads the current AI model.
Then it improves the model by learning from all the local data on your phone.
Your phone then summarizes the changes as a small update.
Only this small update is sent back instead of all the data.
For a non-phone example think of Tesla building their self driving cars.
Every car that Tesla is currently making records 8 different cameras every time that car is driving.
Those video feeds help to train the model Tesla is trying to create for the car to drive itself.
To date Tesla has sold over 575,000 cars since 2014 when they added the cameras needed for self driving.
multiple 575,000 by 8 then multiply that by the number of miles all those cars drive.
It becomes obvious that is just too many video feeds to send over their wireless network
much less to record and store on central servers somewhere.
More importantly, no one wants everywhere they have driven,
and every mistake they made to come back to haunt them.
federated learning allows Tesla to push the model out to their cars.
Let the model be trained by data collected in the car,
then the training corrections are sent back to Tesla without needing to send hours upon hours of video.
Privacy and data bandwidth are preserved.
As a side note, Tesla does upload some video of a car’s driving for things like accidents.
We talk about outliers and making which parts you keep private later.

So, federated learning allows for global results from local data.
Basically train on the local device and send aggregated results back
It allows to keep the sensitive data on device
and if you can promise, and deliver, privacy to the user of an AI model
then you have taken care of one of the biggest fears users have for machine learning.
Think about it, keeping my data private is one of the biggest complaints against people wanting to use AI.
It is right up there with robots taking over the world,
If we can solve real fears now, we can start working on the science fiction fears next.
This is why it is important to UX
All the benefits of privacy for your customers,
plus all the benefits for the company of well trained models.
Of course offering privacy to your users is a selling point but what are the trade-offs?

For the drawbacks I am not going to sugar coat it.
There might be some pushback from developers because it does add an extra layer of abstraction.
There is a good chance the developers have not created a model using federated learning,
so there will be learning involved.
Also, the models created from federated learning are different from the models created from a central database because the amount data and types of data collected are usually different.

As far as the benefits
You don’t have to worry about getting sued for accidentally leaking information you never gathered.
really though the biggest benefit is usually better more accurate models which may seem counter intuitive.
Since all the data stays local you can collect more data.
Also since the model is trained locally the model is better suited for the person using it which is a huge UX benefit.
There are benefits even if your business plan keeps all of your machine learning models centralized,
instead of the models being on your customers computers or phones.
Because data is siloed instead of in one central location
It is a whole lot easier to comply with local regulations like medical
You don’t need to worry about the cost of transferring large amounts of data
It is easier to build compatibility with legacy systems since they can be compartmentalized
and you can have joint benefits by working between companies,
with each company able being their strengths to the table without revealing their data.
Still since privacy is one of the main benefits, from the UX side of it,
it is important to let people using your app know about the privacy you are offering for peace of mind.
This is not easy since machine learning is already a difficult enough topic to convey to your customers.
For example, this is one of the main selling points Apple uses for their iPhone,
that they protect your privacy is a big marketing point for them.
They are probably one the biggest users of this concept be it Differential Privacy or federated learning.
But I’m guessing that the majority of iPhone users have no clue
that most data for all the machine learning stays on their phone.
And, if Apple, the design focused company,
is having this much trouble conveying the message of one of their main selling points,
it’s obvious it is not an easy thing to accomplish.
The easiest way to convey to the user that you are keeping their privacy
is through transparency inside the app.
Show all the things using federated learning.
Break it down by which features use federated learning
Show user where the data goes, or really doesn’t go.
For example one of the limiting factors of federated learning can be turned into one of the selling points
Since federated learning needs to keep labels local,
it gives you a chance to explain why when you have people correct predictions.
For example choosing who the picture is of on your phone
or choosing which word auto-correct should have chosen.
You can let the user know,
they are doing this is to keep their own data private
Now if privacy is important to your business model,
if it is the thing you are showing as a benefit to using your app.
Then it does need to be designed into the app from the beginning.
First, I won’t go into the math involved,
but merging multi-device information can still expose privacy
You need to make sure when the app is designed that the company can’t see individual results,
only the aggregate
Next, the model, over time can also, possibly, learn identifiable info
When you design the app make sure that the model limits influence of individual devices
Another important thing you will need to pay attention to is outliers
normally you only want to be paying attention to the difference to the average
There is a difference between the global model vs personalized model
How much do you want to allow local data to alter the global model behavior?
That is a decision you need to make based on your use case.
The next big part of improving the UX is deciding how much to split your use cases into different personas
usually each persona get’s their own model
The best example I can think of is for a language model
train different models for different languages
that helps to reduce the outlier information
This is where accessibility fits in too.
Make sure not to forget it.
Since AI models try to average everything,
accessibility needs can be averaged out as outlier data.
Make sure to work any accessibility needs into specialized personas and models,
to reduce the noise for the model and get a better user experience for those with and without accessibility needs.
Outliers also influence how often the app should send back information.
Like I was talking about earlier, usually a model stores up enough information
before it sends it back, either to save on bandwidth costs or to ensure privacy.
If the app is getting a lot of outlier data though,
you probably want to want to know about it as soon as possible.
To be able to adapt the model as needed to give a better user experience.
You will need device to say when it has unusual data,
so the transfer can happen sooner.
Well thank you for listening
and I hope you found this episode interesting
I would love to hear feedback on this this topic and
which other topics you would like to hear about
To leave feedback, since this is a podcast,
use the voice recorder app on your phone,
  and make sure to give your name
then email it to

If you would like to know how to help,
Well your first lesson in ML is to learn how to help train your podcast agent,
by just clicking subscribe or writing a positive review on whatever platform you use to listen to this podcast.

Thank you again
and remember, with how powerful AI is,
lets design it to be usable for everyone