Mobile app design survey - example of the analysis

2024-06-01

15 minute read

Introduction

Hello! Long time no see. The fact is that my blog is slowly dying. It will be difficult for me to stop this trend due to the increasing number of professional and personal obligations. Recently, I conducted several surveys at work. I thought it would be a good idea to prepare a quick post showing how to analyze the results of simple surveys using iPython notebooks.

Let’s get to work!

PS. I know this is another post about a pregnancy-related app. I just work with similar apps in my professional life.

Context

Since I work professionally with mobile applications, the context of this survey will also involve a mobile application. Let’s imagine we have an app that helps pregnant women. It has three main functionalities:

Articles with various categories of content,
Baby development preview for the current week of pregnancy,
Pregnancy planning (such as helpful lists, a calendar, or adding health metrics).

The app has been stagnant for some time, maintaining a stable user base but not growing. Users are showing less interest, and new users quickly abandon the app. Your design and product team has asked you to conduct a survey among app users to determine the app’s strengths and weaknesses to address the issues and guide further development. This is the first survey conducted among app users, so it should be very preliminary, not asking too many complicated questions.

Considering all the requirements and suggestions from your colleagues, you prepared the following survey flow:

Demographics

Current Pregnancy Trimester

First
Second
Third

Age

< 18
18-24
25-34
35-44
44+

Functionality assessment

Other Pregnancy Apps Usage

App Functionality Usage

Daily
Several times a week
Once a week
Less frequently

Most Important App Feature

Articles
Baby development by week
Pregnancy planning (calendar, checklists, health tracking)

Article Feature Satisfaction

Very satisfied
Satisfied
Neutral
Dissatisfied
Very dissatisfied

Most Important Article Categories (Select up to 3)

Baby
Mom
Helpful tips
Nutrition
Childbirth classes
Expert advice
Learning
Partner
Facts
Inspiration
Relaxation
Shopping
Useful links
Clothing
Lifestyle
Research
Planning
Self-care
After childbirth

Baby Development Feature Satisfaction

Very satisfied
Satisfied
Neutral
Dissatisfied
Very dissatisfied

Pregnancy Planning Feature Satisfaction

Very satisfied
Satisfied
Neutral
Dissatisfied
Very dissatisfied

Design Evaluation

Overall app design rating (1 to 5)
User interface intuitiveness rating (1 to 5)

Suggestions and Overall Opinion

Any suggestions for app improvement? (Open response)
Would you recommend this app to other pregnant women? (1 to 10 rating)
What features would you like to see added in the future? (Open response)

The flow doesn’t seem complicated - all users will receive the same questions, with few open-ended questions, which are last and optional. After conducting the survey, you would like to return to the designers and product manager with answers to the following general questions:

Which functionality is most important to users,
How often users use the app,
Whether there are dependencies between the most frequently chosen functionality and the user’s age or pregnancy trimester,
The level of satisfaction with individual functionalities,
Which article categories are the most popular,
Whether the app has too many article categories to choose from,
The level of satisfaction with the app’s design,
Whether users for whom the app interface is less intuitive use it less frequently,
The app’s NPS (Net Promoter Score),
What users think needs improvement,
What new features users would like to see added.

Of course, as with any research, additional questions and conclusions may arise during data analysis.

Generating Sample Data

The post is illustrative, so obviously, the survey isn’t real. Random answers to the questions will be generated using the numpy library. The random.choice method allows passing the probability of selecting a given answer. For demographic, open-ended, and app usage frequency questions, I will use this option based on what I think makes logical sense. I will also introduce several common errors or spelling variations in open-ended questions, such as double spaces, varying capitalization of first letters, or typos. Usually, we have to deal with them, so I am replicating these conditions. I will also artificially create dependencies, such as the most important functionality and pregnancy trimester, and the design rating and app usage frequency. I will also program distributions of positive and negative ratings for different functionalities. All this means that the answers will be quite predictable, which wouldn’t be the case in the real world, but the entry serves as an example. It won’t be perfect, but I think it’s enough for presentation purposes.

# Set number of samples
n_samples = 2000

# Generate sample data
frequency_of_use = np.random.choice(['Daily', 'Several times a week', 'Once a week', 'Rarely'], n_samples, p=[0.05, 0.2, 0.55, 0.2])

overall_design_rating = []
interface_intuitiveness = []

for freq in frequency_of_use:
    if freq == 'Daily':
        overall_rating = np.random.randint(3, 6)
        interface_rating = np.random.randint(3, 6)
    elif freq == 'Several times a week':
        overall_rating = np.random.randint(2, 5)
        interface_rating = np.random.randint(2, 5)
    elif freq == 'Once a week':
        overall_rating = np.random.randint(1, 5)
        interface_rating = np.random.randint(1, 4)
    else:
        overall_rating = np.random.randint(1, 3)
        interface_rating = np.random.randint(1, 3)

    overall_design_rating.append(overall_rating)
    interface_intuitiveness.append(interface_rating)

trimester = np.random.choice([1, 2, 3], n_samples, p=[0.3, 0.5, 0.2])
most_important_functionality = []
for t in trimester:
    if t == 1:
        func = np.random.choice(['Articles', 'Baby preview', 'Pregnancy planning'], p=[0.2, 0.1, 0.7])
    elif t == 2:
        func = np.random.choice(['Articles', 'Baby preview', 'Pregnancy planning'], p=[0.7, 0.2, 0.1])
    else:
        func = np.random.choice(['Articles', 'Baby preview', 'Pregnancy planning'], p=[0.2, 0.4, 0.4])
    most_important_functionality.append(func)

def generate_functionality_ratings(n, positive_bias=0.7):
    ratings = []
    for _ in range(n):
        if np.random.rand() < positive_bias:
            ratings.append(np.random.randint(4, 6))
        else:
            ratings.append(np.random.randint(1, 4))
    return ratings

data = {
    'Trimester':trimester,
    'Age': np.random.choice(['<18', '18-24', '25-34', '35-44', '>44'], n_samples, p=[0.05, 0.25, 0.4, 0.2, 0.1]),
     'Frequency_of_use': frequency_of_use,
    'Used_other_apps_before': np.random.choice(['Yes', 'No'], n_samples, p=[0.60, 0.4]),
    'Most_important_functionality':most_important_functionality,
    'Article_functionality_rating': generate_functionality_ratings(n_samples, 0.85),
    'Important_article_categories': [np.random.choice(['Baby', 'Mom', 'Helpful tips', 'Nutrition', 'Birthing class', 'Expert', 'Science', 'Partner', 'Facts', 'Inspiration', 'Relaxation', 'Shopping', 'Useful links', 'Clothing', 'Lifestyle', 'Research', 'Planning', 'Self-care', 'Post-birth'], size=np.random.randint(1, 4), replace=False).tolist() for _ in range(n_samples)],
    'Baby_preview_functionality_rating': generate_functionality_ratings(n_samples, 0.4),
    'Pregnancy_planning_functionality_rating': generate_functionality_ratings(n_samples),
    'Recommend_to_others': np.random.randint(1, 11, n_samples),
        'Overall_design_rating': overall_design_rating,
    'Interface_intuitiveness': interface_intuitiveness,
    'Improvement_suggestions': np.random.choice([
        '', 'more articles', 'Better graphics', 'faster load times', 'More interactive features', 'additional content categories',
        'improve navigation', 'more videos', 'better notifications', 'more language options', 'customizable themes', 'more frequent updates', 'less ads'
    ], n_samples, p=[0.7, 0.05, 0.05, 0.05, 0.05, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.03]),
    'Future_features': np.random.choice([
        '', 'daily tips', 'Exercise routines', 'Diet plans', 'Community forums', 'Expert Q&A sessions',
        'recipe suggestions', 'partner support tips', 'mental health resources', 'tracking multiple pregnancies', 'better calendar integration',
        'offline access'
    ], n_samples, p=[0.7, 0.05, 0.05, 0.05, 0.05, 0.03, 0.02, 0.01, 0.01, 0.01, 0.01, 0.01])
}


# Create DataFrame
df = pd.DataFrame(data)

# Convert lists in 'Important_article_categories' column to strings
df['Important_article_categories'] = df['Important_article_categories'].apply(lambda x: ', '.join(x))

# Function to introduce typos
def introduce_typos(text):
    if text == '':
        return text
    if np.random.rand() > 0.7:  # Introduce typos in 20% of cases
        # Add double spaces
        text = text.replace(' ', '  ', 1)
        # Change first letter to lowercase
        if np.random.rand() > 0.5:
            text = text[0].lower() + text[1:] if text[0].isupper() else text
        # Change first letter to uppercase randomly
        if np.random.rand() > 0.5:
            text = text[0].upper() + text[1:] if text[0].islower() else text
        # Add a random typo by repeating a character
        if np.random.rand() > 0.5:
            pos = np.random.randint(0, len(text))
            text = text[:pos] + text[pos] + text[pos:]
        # Replace letters with similar looking ones
        if np.random.rand() > 0.5:
            text = text.replace('e', '3', 1)
            text = text.replace('a', '@', 1)
    return text

# Apply the typo function to open-ended responses
df['Improvement_suggestions'] = df['Improvement_suggestions'].apply(introduce_typos)
df['Future_features'] = df['Future_features'].apply(introduce_typos)

Calling the head method on the dataframe should print a beautiful table:

df.head()

Trimester	Age	Used_other_apps_before	Frequency_of_use	Most_important_functionality	Article_functionality_rating	Important_article_categories	Baby_preview_functionality_rating	Pregnancy_planning_functionality_rating	Overall_design_rating	Interface_intuitiveness	Recommend_to_others	Improvement_suggestions	Future_features
2	25-34	Yes	Once a week	Baby preview	2	Lifestyle	2	5	5	2	2	less ads	Expert Q&A sessions
2	35-44	No	Rarely	Articles	3	Shopping, Useful links	3	3	2	5	10	faster load times
2	>44	No	Once a week	Baby preview	3	Birthing class, Clothing	4	2	2	4	1	More interactive features
3	18-24	Yes	Rarely	Baby preview	2	Facts, Baby	1	4	2	4	1	faster load times
1	35-44	No	Several times a week	Baby preview	3	Inspiration, Facts	2	4	4	3	7

In real survey conditions, the company might provide professional software for creating surveys, and the results would be collected in a much better form or automatically analyzed. Here, I assume a simple option with data transformation done manually.

Analysis

In survey research, it is helpful to begin by simply analyzing the data describing the surveyed population, as this provides a broader view of the respondents. Therefore, let’s create some simple pie charts with percentages for age, pregnancy trimester, and the question about using other pregnancy apps.

df.groupby('Age').size().plot.pie(y="Age", legend=True, autopct='%1.1f%%')

age_pie trimester_pie other_usage_pie

We can thus say that:

The majority of the respondents (~40%) are aged between 25 and 34 years.
Almost the entire surveyed sample is adult (~95%).
The surveyed users are most commonly in their second trimester of pregnancy (~50%).
60% of the surveyed users have used other pregnancy apps.

Now, let’s move on to the main analysis to answer the previously posed analytical questions.

Let’s check the distribution of app usage frequency among users. We can use the same pie chart for presentation purposes.

# Frequency of use pie chart
df.groupby('Frequency_of_use').size().plot.pie(legend=True, autopct='%1.1f%%', title="Frequency of App Use")
plt.show()

frequency_of_use_pie

The vast majority of app users use it only once a week. Then, the proportions are roughly equal among users who use the app less frequently or several times a week. A very small portion of users visit the app daily. This may be an important finding for those working on the app’s content - they should try to make it offer something to users more frequently than once a week. Perhaps the app’s layout, clearly divided into pregnancy weeks, is a slight mistake and needs to be redesigned. Maybe push notifications should be created to remind users that there is still undiscovered content for the current week. There are certainly many possibilities, and finding these kinds of ideas is the responsibility of the design or product teams.

Now let’s look at the most frequently chosen most important functionality of the app.

# Most important functionality pie chart
df.groupby('Most_important_functionality').size().plot.pie(legend=True, autopct='%1.1f%%', title="Most Important App Functionality")
plt.show()

most_important_functionality_pie

As you can see, articles clearly win. However, knowing that we have a wide variety of ages and pregnancy trimesters among users, it is worth checking whether these variables affect the chosen most important app functionality. To check whether categorical variables are dependent, we can use the chi-square test of independence. The null hypothesis of this test states that the analyzed variables are independent of each other. Let’s create a function that will allow us to perform a statistical test using the SciPy package. I’ll also draw a bar chart to better understand the data.

def chi_test(data_1, data_2):
    crosstab = pd.crosstab(data_1, data_2)
    stat, p, dof, expected = stats.chi2_contingency(crosstab)
    if p <= 0.05:
        print('Dependent (reject H0)')
    else:
        print('Independent (H0 holds true)')
    crosstab.plot.bar()
    plt.show()

Let’s see how the situation looks for age:

chi_test(df.Age, df.Most_important_functionality)

Independent (H0 holds true) crosstab_age_functionality

According to the test, the variables are independent. The chart seems to confirm this; the ranking of the most important function looks similar in each age group. Now, let’s look at the pregnancy trimester:

chi_test(df.Trimester, df.Most_important_functionality)

Dependent (reject H0) crosstab_trimester_functionality

According to the test, the variables are dependent. Based on the chart, this can be observed. In the first trimester of pregnancy, the most important function is pregnancy planning, in the second trimester it’s articles, and in the third trimester, it’s baby preview. This could be an important conclusion; perhaps the app’s interface should be more tailored to the pregnancy trimester the user is in.

Let’s go through each functionality question of the application one by one, starting with the satisfaction level regarding the article view.

df.groupby('Article_functionality_rating').size().plot.bar()

article_functionality_rating

Users are satisfied with the article functionality, with most of them rating it as 4 or 5. In real-life scenarios, it would be worth checking if the rating significantly differs based on whether the functionality is critical to the respondent. However, in this artificially generated dataset, we assume independence from importance for each functionality.

Next, let’s create a ranking of the most liked article categories. This was a multiple-choice question, and responses are provided as comma-separated text. Hence, we’ll use the explode method from Pandas, which allows us to create a separate row for each item in a list. First, we need to create a list from the text using the standard split method.

df['Important_article_categories'].apply(lambda x: x.split(", ")).explode().to_frame().groupby("Important_article_categories").size().sort_values(ascending=False)

The result is a sorted ranking:

Baby              511
Post-birth        311
Relaxation        306
Helpful tips      289
Facts             234
Partner           182
Planning          165
Clothing          157
Lifestyle         153
Shopping          143
Research          120
Useful links       85
Birthing class     71
Mom                53
Nutrition          46
Inspiration        45
Expert             39
Self-care          30
Science            20

The most liked category is Baby. Generally, most categories are popular, but there are a few with very low scores, such as Science. My recommendation would be to conduct a deeper analysis here. Perhaps users are unaware of the existence of some categories. It would be valuable to check if the layout of different category articles in the application interface affects their popularity. If not, it may be advisable to reduce the number of categories, as maintaining so many may not be beneficial from a business perspective.

Let’s examine the ratings of the other functionalities.

baby_preview_functionality_rating

article_functionality_rating

While pregnancy planning is rated relatively positively, there is a wide range of opinions regarding the baby preview feature. This feature warrants deeper analysis. Opinions about it are ambiguous; aggregating ratings 1 and 2 as clearly negative and 4 and 5 as clearly positive shows that negative opinions prevail. It might be beneficial to conduct a more detailed survey specifically on this functionality. Another approach could be proposing different designs and conducting an A/B test to evaluate differences in page visits or time spent on the feature.

Overall, the most highly rated functionality of the application is articles, followed by pregnancy planning, while baby preview is rated the lowest.

Next, let’s analyze the questions directly related to the application’s design. We’ll create bar charts for both questions and see how users’ ratings are distributed.

overall_design_rating

interface_intuitiveness

These are not good results. The application interface seems to be complicated for users. Let’s check if these ratings depend on how often users use the application. Logic suggests that an application that is used relatively well will be used more often than one that is difficult to use.

Dependent (reject H0) crosstab_frequency_desing

Dependent (reject H0) crosstab_frequency_interface

In both cases, dependency has been shown, which aligns with logical expectations. The conclusion is clear - simplifying the application interface would likely result in more frequent use by users.

Now, let’s calculate the Net Promoter Score (NPS) for our application. This score illustrates the loyalty of our application’s users by summarizing responses to a simple question: “Would you recommend our application to others?” Users selecting 0-6 are detractors (unlikely to recommend), 7-8 are passive users, and 9-10 are promoters (likely to recommend to others).

nps_df = df.groupby("Recommend_to_others").size().to_frame(name="score")
total = nps_df.score.sum()
detractors = nps_df[nps_df.index < 8].score.sum()
passive = nps_df[nps_df.index.isin([7, 8])].score.sum()
promoters = nps_df[nps_df.index > 8].score.sum()

nps = (promoters - detractors) / total
nps

The result is -0.5135, which is extremely poor and indicates that the application has significantly more detractors than promoters.

This NPS score suggests a critical need for improvement in user satisfaction and advocacy. It would be crucial to address the usability issues identified in the design and interface ratings to enhance user experience and potentially increase user recommendations.

The last items to check are the open-ended questions. There are issues with them as we messed up. Since I know exactly what problems exist, I need to write a reverse function to solve them using regular expressions and standard code.

import re

def restore_text(text):
  # Remove double spaces
  text = re.sub(r' {2,}', ' ', text)

  # Convert entire text to lowercase
  text = text.lower()

  # Remove repeated characters, but not for valid double letters
  text = re.sub(r'([a-z])\1{2,}', r'\1', text)  # three or more repeated characters
  text = re.sub(r'([a-z])\1', r'\1', text)  # two repeated characters

  # Replace similar looking letters
  text = text.replace('3', 'e')
  text = text.replace('@', 'a')

  return text

Apply the function to non-empty values of improvement suggestions and then display a specific ranking.

df_improvement_suggestions = df.Improvement_suggestions.dropna().apply(lambda x: restore_text(x)).to_frame(name="suggestion")

df_improvement_suggestions.groupby("suggestion").size().sort_values(ascending=False)

suggestion
more interactive features       116
better graphics                 106
more articles                   103
faster load times                98
less ads                         71
more frequent updates            23
more language options            22
improve navigation               20
more videos                      17
customizable themes              17
additional content categories    14
better notifications             11
better graphics                   1
more articles                     1
faster load times                 1
customizable themes               1
more interactive features         1

As you can see, there are still some text processing issues. Normally, after initial processing, it’s worthwhile to address the most common error cases. Here, with relatively little data, it’s clear that users primarily want more interactive features, better graphics, more articles (which also addresses the question of whether there are too many), faster performance, and fewer advertisements.

The same needs to be done for the question about new features.

feature
exercise routines                118
diet plans                       116
community forums                 105
daily tips                        90
expert Q&A sessions               61
recipe suggestions                47
mental health resources           22
better calendar integration       21
tracking multiple pregnancies     18
offline access                    12
partner support tips              12
daily tips                         1
exercise routines                  1
mental health resources            1
mental health resources            1

Users are most interested in having exercise plans, diet plans, community forums, and daily tips introduced.

Using the information from both open-ended questions, it is necessary to consider and plan improvements and new functionalities to be introduced into the application.

Summary

We have efficiently gone through the survey results. Here are the conclusions I would convey to those responsible for developing the pregnancy assistance application:

The vast majority of app users only use it once a week. Subsequently, usage proportions are roughly equal between those who use the app less frequently or multiple times a week. Very few users visit the app daily. It is worth considering how to reverse this trend.
Articles are the most important functionality overall, driven partly by the fact that most respondents are currently in the second trimester of pregnancy. The study revealed a correlation between pregnancy trimester and the importance placed on specific functionalities: planning is crucial for first-trimester users, while baby preview is key for those in the final trimester.
Users are relatively satisfied with the functionality of articles and pregnancy planning. However, there is no clear satisfaction with the baby preview feature. Further investigation into this area is warranted.
The most popular category is “Baby”. Generally, most categories are popular, but there are a few with poor ratings, such as “Science”. It is advisable to examine whether the layout of article categories in the app interface affects their popularity. At this point, I would not recommend reducing the number of categories; user feedback from open-ended questions indicates a desire to expand the article base.
Users are dissatisfied with the overall design of the app. The interface and navigation are perceived as complex. There is a demonstrated correlation between design ratings and app visit frequency—simplifying the app interface could potentially increase user engagement.
The Net Promoter Score is at a dismal level, indicating that the app is not recommended by users. A deeper analysis is needed to address this situation effectively.
Regarding improvements, users primarily desire more interactive features, better graphics, more articles (which also addresses whether there are too many), faster performance, and fewer advertisements.
Users express a strong interest in features such as exercise plans, diet plans, community forums, and daily tips.

The survey served as an initial exploration. Based on these findings, a series of next steps can be planned.

I hope this brief summary proves valuable. Until next time!