On a (Per)Mission: Leveraging User Ratings of App Permissions to Help Users Manage Privacy

by
Hannah Quay-de la Vallee
B. S., Bard College, 2010
Sc. M., Brown University, 2013

A dissertation submitted in partial fulfillment of the
requirements for the Degree of Doctor of Philosophy
in the Department of Computer Science at Brown University

Providence, Rhode Island
May 2017

© Copyright 2017 by Hannah Quay-de la Vallee

This dissertation by Hannah Quay-de la Vallee is accepted in its present form by
the Department of Computer Science as satisfying the dissertation requirement
for the degree of Doctor of Philosophy.

Date
Shriram Krishnamurthi, Director

Recommended to the Graduate Council

Date
Jeff Huang, Reader

Date
Michael Littman, Reader

Approved by the Graduate Council

Date
Andrew G. Campbell
Dean of the Graduate School

iii

Contents
List of Figures

v

1

Introduction

2

2

Overview of Permission Models

5

3

Privacy Decisions Facing Users

7

4

Apps To Help Users Manage Privacy

5

10

4.1

The PerMission Store . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2

The PerMission Assistant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Populating Privacy Information

14

5.1

Automated Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.2

Human Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.3

Merging Human and Automated Ratings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

6

The Effect of Brand Name

18

7

Ranking Apps

22

8

The Permission User Interface

24

9

8.1

Exploratory Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

8.2

Large Scale Interface Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

Related Work

38

10 Appendices

2

iv

List of Figures
2.1

Install-time permission requests in Android 4.4, a permission management screen in Android 6, and a use-time permission request in iOS 10 (which also has a management screen
similar to Android 6). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3.1

Classification survey questions for each app, with happi replaced by the app’s name. Workers were given the description of each app from the Play store. . . . . . . . . . . . . . . . .

3.2

6

7

Apps fall along a spectrum of replaceability, from likely generic apps, like weather apps, to
likely single-source, like Instagram or Facebook. Between these extremes are mixed-mode
apps, whose classification depends on what features of the app the user needs. . . . . . . . .

8

4.1

Screenshots of the PerMission Store. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2

Screenshots of the PerMission Assistant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

6.1

The rating section of the branding survey, showing the Gmail condition. Note that the first
bullet under Storage asks workers to leave a “Somewhat Acceptable” rating for that permission. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

6.2

The ratings for each permission for the Gmail and MailMan apps (also shown in Figure 10.2
in the appendix). The ratings for other pairs of apps follow a similar pattern. . . . . . . . . . 21

8.1

A prototype interface for permission ratings. . . . . . . . . . . . . . . . . . . . . . . . . . . 24

8.2

The lock interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

8.3

The eye interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

8.4

The mask interface, the checkbox interface, and the grade interface. . . . . . . . . . . . . . 29

8.5

The bar interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

8.6

The traffic interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

8.7

Percentage of subjects in each interpretation category. . . . . . . . . . . . . . . . . . . . . . 34

8.8

Subjects’ responses to the Likert-type question asking users whether they thought it was
clear that the icons represented privacy ratings. . . . . . . . . . . . . . . . . . . . . . . . . 35

8.9

Subjects’ beliefs about the source of the ratings. . . . . . . . . . . . . . . . . . . . . . . . . 36
v

10.1 Apps considered in the classification study (Chapter 3). Categories marked by an asterisk are
not built-in Google Play categories but rather sets of apps with specific qualities of interest
to the study: The “white noise” apps have very similar feature sets, and therefore might
be likely to be considered generic by users, while apps in the “brick-and-mortar” category
are closely coupled with real-world products and so might be likely to be single-source.
(“Brick-and-mortar” is not mutually exclusive with respect to the other categories, so there
are some apps in other categories that are “brick-and-mortar,” such as CVS/pharmacy in
health_and_fitness

and the airline apps in travel_and_local.) . . . . . . . . . . . . . .

3

10.2 The ratings for each permission for the Gmail and MailMan apps. . . . . . . . . . . . . . .

4

10.3 The ratings for each permission for the Waze and ShortCuts apps. After eliminating participants who had not heard of Waze, a brand-name app, the Waze condition had only 9
participants. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

10.4 The ratings for each permission for the Pandora and TuneUp apps. . . . . . . . . . . . . . .

6

10.5 The ratings for each permission for the Instagram and PictureIt apps. . . . . . . . . . . . . .

7

10.6 (Note: This figure may be better viewed in color.) An overview of all of the interfaces
explored during our iterative design process (Chapter 8). Arrows map the evolution and
cross-influences of interfaces; solid (black) arrows show redesigns, and dashed (blue) arrows
indicate that feedback on one iconography influenced the design of another. X’s (in red)
indicate the elimination of an iconography, while the checkmark (in green) signifies the
interface was included in our in-depth testing. . . . . . . . . . . . . . . . . . . . . . . . . .

vi

8

Abstract of “On a (Per)Mission: Leveraging User Ratings of App Permissions to Help Users Manage Privacy” by Hannah Quay-de la Vallee, Ph.D., Brown University, May 2017.

Apps provide valuable utility and customizability to a range of user devices, but installation of third-party
apps also presents significant security risks. Many app systems use permissions to mitigate this risk. It then
falls to users to decide which apps to install and how to manage their permissions, but unfortunately, many
users lack the expertise to do this in a meaningful way.
In this thesis, I determine that users face two distinct privacy decisions when using apps: which apps to
install, and how to manage apps’ permissions once they are installed. In both cases, users are not given
meaningful guidance to help them make these choices.
For decisions about which apps to install, users would benefit from privacy information in the app marketplace, since that is how most users choose apps. Once users install an app, they are confronted with the
second type of decision: how to manage the app’s permissions. In this case, users would benefit from an
assistant that helps them see which permissions might present privacy concerns. I therefore present two
tools: a privacy-conscious app marketplace and a permission management assistant.
Both of these tools rely on privacy information, in the form of ratings of apps’ permissions. I discuss gathering this rating information from both human and automated sources and how it is used in the two tools.
I also explore how the brand of an app could affect how users rate its permissions. Additionally, because
my goal is to convey privacy information to users, I design and evaluate several interfaces for displaying
permission ratings. I discuss surprising misconceptions generated by some of these interfaces, and present
an interface that effectively communicates permission ratings.

Chapter 1

Introduction
Thesis Statement

To encourage users to use more privacy-respecting apps, app stores should include user

ratings of privacy as a criterion for sorting apps. In the absence of user-provided ratings, crowdsourcing
can be used to gather ratings. Additionally, these ratings can be leveraged to help users determine which
permissions to enable and disable for a given app.
App-based devices have become pervasive in consumers’ lives [2], due in part to apps’ easy installation
model. Most app ecosystems are supported by a central marketplace that enables users to easily search
for, investigate, and install apps, allowing users of all levels of technical ability to customize their devices.
However, the amount of user information associated with these devices makes third-party apps a threat to
user security and privacy. Indeed, apps have become a popular target of hackers and malware developers,
leading to exposure of private information and financial harm [20, 21, 30, 47]. In addition to malware, there
are apps that, while not necessarily malicious, collect significant amounts of user data, which can be be sold
to other companies, used to target ads, or otherwise used in ways users did not expect and may not approve
of [15, 29, 40, 53, 54]. The expansion of the app model beyond smartphones to platforms such as desktops,
cars, and the Internet of Things exacerbates these concerns [4, 28, 32, 34].
Many platforms try to mitigate these risks by requiring users to grant permission before apps can access
certain hardware resources and user data. Unfortunately, such systems force users, even technical novices,
to manage their own privacy without assistance. Furthermore, most systems ask for consent either at or after
installation time, when users have already chosen an app, making it onerous for them to switch apps if they
dislike an app’s permission requests.
2

App stores, as a primary source of app information, are ideally positioned to act as a fulcrum to aid users
in managing their privacy.1 In fact, marketplaces already influence user decisions by ranking apps and thus
filtering which apps users see. Unfortunately, marketplaces are not incentivized to put user privacy first. It
is even possible, given the profit model of many app stores, that apps that use more ad libraries are put first.
Apart from baseline protection like malware detection, most app marketplaces do not use their position
to better inform or protect users. Google Play, Android’s proprietary marketplace, allows users to search
by price and star rating but does not provide privacy-based search options, nor any privacy guidance past
simply listing apps’ permissions with brief descriptions (which are vague enough that they have been the
subject of lawsuits by users arguing that they were not adequately informed of apps’ capabilities [30]). As
a result, many users can only give uninformed consent to permission requests, as they are ill-equipped to
judge whether an app’s permissions are appropriate for its purpose.
Worse, users who do have judgements about apps’ permissions have no good way to express themselves
to other users or to developers. Some try to communicate their opinions via app reviews, but these are difficult to find amongst the myriad reviews. Worse still, developers who want to explain their app’s permission
requests also lack a dedicated forum to do so. Some developers use their app’s description page but, since
this is not standard practice, it is easy for users to miss, especially in Google Play, where long descriptions
are hidden by default. Other apps, like Pinterest, explain their permission requirements on their websites [7],
where only a very motivated user is likely to find it.
Because it is not standard for developers to explain their permission requests, it is not generally seen
as suspicious when they do not. This is concerning since some permissions grant access to large chunks
of information, and the user does not know what information the app is collecting or how it is using that
information. For instance, the Uber ride-sharing app uses the “read phone status and identity” permission to
view the battery status to know when to switch to low power mode. However, it also uses that information
for research, so Uber now knows that users with low battery are more willing to pay higher prices for rides.
Although Uber says it does not adjust price based on battery status, it may come as a surprise to some users
that they know that at all [12]!
Despite this quagmire, user reviews have been a useful privacy tool, harnessed by researchers to inform
1 In

fact, app stores play such a key role in how users discover apps that they have been the target of censorship in countries like
China and Russia [38].

3

users about the consequences of updating their existing apps [49], and by developers to read user opinion
and guide app development. For instance, an update of the Avis car rental app added the “retrieve running
apps” permission. This led users to leave a spate of negative reviews, spurring the app’s developers to
remove the permission. These examples show that reviews can be a valuable source of information about
app permissions, but current marketplaces limit their effectiveness by making them difficult for users and
developers to find.
We have built two Android apps that leverage privacy rating information to help users make informed
privacy decisions. We also allow developers to respond to these ratings, thereby providing a channel for
communication between users and developers. The first app is a privacy-conscious marketplace, which
helps users to find privacy-respecting apps. The second is a permission management assistant to help users
regulate their apps’ permissions after they are installed.

Applicability Beyond Android We focused on Android because it offered a concrete platform with a
broad user base on which we could build real-world functional tools, and because there is a significant body
of academic research on the Android permission system. However, the concepts of this work could apply to
any app platform, such as Chrome browser extensions, car app stores [26], and Internet of Things apps [1],
and, of course, other mobile platforms like iOS and Windows Phone. Different platforms may need to adjust
specifics like the exact interface for presenting ratings, or how privacy influences ranking, but the broader
concept of privacy information in the marketplace is applicable across the spectrum of app platforms.

Contributions This thesis makes several contributions. In Chapter 3, we show that users face privacy
decisions both when selecting which apps to install and when managing their apps after installation. Second,
in Chapter 5 we use crowdsourcing and automated tools to collect ratings of apps’ permissions to assist users
with their privacy decisions. In Chapter 6 we examine how brand may affect user ratings, and in Chapter 7
we show how these ratings can be used to promote privacy-respecting apps in a marketplace. In Chapter 8 we
discuss the crowd feedback-based method we used to develop an interface for presenting rating information
to users, including some unexpected subtleties in the design of such an interface. All of these features
are incorporated into our two apps, which are discussed in Chapter 4. Chapter 2 provides an overview of
different permission models, and Chapter 9 discusses related work.
4

Chapter 2

Overview of Permission Models
While permissions are a common method of restricting apps’ access to resources and data, there are significant variations between platforms in how those permissions work. For instance, some platforms, such as
iOS, allow users to decide when an app may use certain permissions, not just if, while in other platforms
offer only a binary decision of grant or deny.
Perhaps the most significant variation is an “all-or-nothing” model, where an app requires some set of
permissions and a user must grant the app all those permissions or they cannot install the app, versus an
individual permission model, where a user can toggle individual permissions on or off for each app. There
are several tradeoffs between these models:
• The all-or-nothing model is simpler for users in some sense, since they do not have to manage individual permissions, but it significantly restricts their ability to control their privacy.
• The all-or-nothing model is simpler for developers, since they not have to worry about how their app
will perform with limited permissions.
• On many platforms that use an individual model, apps request access to a permission when they first
use it, while all-or-nothing systems typically require users to agree to permissions when they app
is first installed. Requesting permissions at use-time provides users with some context for how the
permission is being used, but it can be annoying for users when the request interrupts a task.
Both all-or-nothing and individual models have been used in popular platforms. The iOS platform has
used an individual permission model since it introduced permissions in iOS 6 [19]. Figure 2.1(c) shows
5

(a) An install-time permission screen in
Android 4.4.

(b) A permission management screen in
Android 6.

(c) A use-time permission request in
iOS 10.

Figure 2.1: Install-time permission requests in Android 4.4, a permission management screen in Android 6,
and a use-time permission request in iOS 10 (which also has a management screen similar to Android 6).
an iOS permission request. Google Chrome browser extensions [6] and early versions of Windows Phone
employed an all-or-nothing approach, as did early versions of Android (shown in Figure 2.1(a)). Both
scholars and popular writers expressed frustration with how this approach limited user control [31, 41].
Android 4.3 inadvertently exposed the permission toggling functionality [22], but removed it in Android
4.4.2, to some discontent [23]. Google pulled the feature because it caused apps to crash, as developers had
not designed them to run with limited permissions (unlike most iOS apps, which have always had to contend
with the possibility of not being granted a given permission). In Android 6, Google officially transitioned to
an individual permission model (shown in Figure 2.1(b)), and current versions of Windows Phone also take
the individual approach [5].
These variations between permission models affect how users manage their privacy. In Chapter 3 we
will discuss some of these effects on users, and how they inform our choices about how to assist users.

6

Chapter 3

Privacy Decisions Facing Users
To better help users with privacy decisions, we need to understand what types of choices users actually make.
At first blush, users face two types of privacy decisions: which apps to install and how to manage their apps’
permissions after installation. If users need a specific app, managing permissions after installation is the
only way for users to protect their privacy, and so they could benefit from a tool to assist them with that
management, which would require privacy ratings for each permission. However, there may also be times
where users can choose between similar apps, or they are using a platform which does not allow them to
manage individual permissions (as discussed in Chapter 2), in which case a privacy-conscious store, with
overall ratings for each app, would be helpful. To determine which tools to build, we studied whether users
ever have a meaningful choice between different apps.
We posted Mechanical Turk surveys for 66 Android apps. For each app, we showed workers the app’s
description from Google Play, and asked whether workers thought that app was replaceable. If they thought
it was, we asked if they could name an example substitute. If they thought the app could not be replaced,
we asked why they felt it was unique. These questions are shown in Figure 3.1. We then asked several
Do you use happi?
. Yes: No follow-up question
. No: Do you use a similar app?
Do you think there are other apps that could be used in place of happi?
. Yes: Can you think of any examples of apps that could be used in place of happi?
. No: Why do you think that happi is unique?

Figure 3.1: Classification survey questions for each app, with happi replaced by the app’s name. Workers
were given the description of each app from the Play store.
7

Figure 3.2: Apps fall along a spectrum of replaceability, from likely generic apps, like weather apps, to
likely single-source, like Instagram or Facebook. Between these extremes are mixed-mode apps, whose
classification depends on what features of the app the user needs.
demographic questions.
To select the 66 apps, we used the MarketBot scraper [3] to collect the descriptions of the top five apps
in 11 of Google Play’s categories, along with five white noise apps. We also chose six apps that were closely
tied to a service external to the app, such as the Stop and Shop app, which is only useful at a physical Stop
and Shop store. All of the apps had at least 100,000 installs, and only eight apps had less than 1M installs,
suggesting that all the apps were interesting to a broad range of users. Figure 10.1 in the appendix shows
the complete list of apps.
Each survey asked about three to five apps, and no survey contained two apps from the same category.
We gathered 10 to 12 responses for each survey. Our workers were 61% male and 39% female, had an
average age of 29, and 84% were from the United States and 16% were from India.
Apps varied significantly in their substitutability, (ANOVA, p < 0.001), indicating that some apps are
interchangeable, while other apps provide unique functionality, tying users to that app. Rather than dividing
clearly into replaceable or unique, however, we found that apps fall along a spectrum of substitutability,
visualized in Figure 3.2. On one end are single-source apps, which offer unique functionality that cannot be
replicated by a different app. Instagram is an example of a single-source app, as less than 20% of workers
felt it could be replaced. On the other end of the spectrum are generic apps, such as Waze, which 100%
of workers felt was replaceable. In the middle are mixed-mode apps, which can be either single-source or
8

generic depending on the user. For example, consider Strava, an app that allows users to track their physical
activity and compete with friends. For users who only use the tracking features, it could be replaced by a
similar app, such as MapMyRide. Other users might care deeply about the social features of Strava, and so
other apps would not be an acceptable substitute.
Although there were not clear groupings of apps, some categories were more substitutable than others. For example, apps in the “social” category were considered, perhaps unsurprisingly, significantly less
substitutable than apps in the “travel_and_local” category (Tukey’s HSD, p < 0.01).
Ultimately, whether a given app is replaceable depends on the user, and therefore apps cannot be classified a priori. Overall, however, 30% of apps were considered “substitutable” by at least 75% of our
workers, and 77% of apps were considered substitutable by at least 50% of workers. This indicates that
users, whether they are aware of it or not, are making two distinct types of privacy choices: which apps
to install (for generic apps), and how to manage apps’ permissions after installation (for all apps, but most
importantly single-source apps).

9

Chapter 4

Apps To Help Users Manage Privacy
The two types of privacy decisions discussed in Chapter 3 require two approaches to assisting users. A
privacy-aware marketplace would aid users with installation decisions by helping them find more privacyrespecting apps. A privacy assistant could help users manage their apps’ permissions after they are installed
on users’ devices. We split these two approaches into two separate apps, the PerMission Store, and the
PerMission Assistant1 . Dividing the functionality into separate apps means that users who are only interested
in one app are not required to accept the risks of both. In particular, the Assistant needs to access the list of
apps the user has downloaded, information the Store does not need. This separation was practically useful
in March of 2017, when Google Play updated its privacy requirements to classify “device information”
as sensitive user data (an appropriate classification). Because the list of apps a user has installed is part
of “device information,” our assistant app was pulled for a privacy review, and required updates to comply
with the new privacy requirements. Having two separate apps meant that the marketplace app was unaffected
during this process.
Both apps already contain information for approximately 1500 Android apps from Google Play leaderboards, and are continuing to collect information for more apps.
1 Both

apps are available on the Google Play store, and can be found by going to OnAPermission.org.

10

(a) The search results page in the PerMission
Store.

(b) The Kayak app page in the PerMission
Store.

Figure 4.1: Screenshots of the PerMission Store.

4.1

The PerMission Store

The PerMission Store (shown in Figure 4.1) is designed to be a comprehensive app store, so, in addition
to privacy ratings, it includes apps’ description, screenshots, icon image, star rating, developer, category,
and price from Google Play2 and allows users to search and browse through apps, and rate permissions.
There is one notable feature our store does not provide: it relies on the Play store to actually install apps.
When users click to install an app in the PerMission Store, they are taken to that app’s page in the Play store,
where they can then install the app. Ideally, users would complete the entire process within our marketplace,
but this would expose users to insecurity by requiring third-party downloads and by bypassing the malware
protections in place in the Play store.
The PerMission Store displays privacy ratings at two levels: the permission-level and the app-level. Both
levels of rating are represented with percentage bars developed via a series user interface design studies
(Chapter 8). The permission-level ratings are comprised of both automated and human ratings as described
in Section 5.3 and provide users with detailed information they can use to make privacy decisions. These
2 Scraping

the Play store, while not explicitly prohibited in the letter of the Terms of Service, is somewhat counter to their spirit.
Integrating our store into Google Play would render this step unnecessary.

11

ratings are unique to a given app-permission combination, and so the same permission may have a different
rating on different apps.
App-level ratings are calculated from permission-level ratings (Section 5.3), and serve several purposes.
First, they are incorporated into the PerMission Store’s ranking mechanism (discussed in Chapter 7), which
is used to sort responses to user search queries, thus allowing the PerMission Store to promote more privacyrespecting apps. They also provide a broad privacy overview, making it easier for users to compare apps.
Throughout the marketplace, an app’s app-level privacy ratings are displayed next to its star rating from the
Play store so that users can weigh both when choosing apps.
When users search or browse apps, they are shown tiles that display the apps’ general information, like
name, developer, app-level privacy rating, star rating, and price (see Figure 4.2(a)), as well as links to rate
or install the app. If a user clicks one of these tiles they are taken to the app’s page (an example of which is
shown in Figure 4.1(b)), which has more detailed information like permission-level ratings and comments,
and the app’s description. The permissions are ordered worst-rated to best to ensure that users see the most
worrisome permissions.

4.2

The PerMission Assistant

The PerMission Assistant (shown in Figure 4.2) helps users manage permissions for apps they have already
installed. Because user time and attention is limited, the Assistant sorts a user’s installed apps by their worstrated permissions, which allows users to address the most concerning permissions first. It is thus useful for
apps the user installed before the PerMission Store was available, and for single-source apps where the user
cannot switch to a more privacy-respecting alternative. The Assistant allows users to run these apps within
their own privacy limits. Because it relies on the ability to turn individual permissions off, the PerMission
Assistant requires Android Marshmallow, while the PerMission Store can be used with any Android version.
The PerMission Assistant uses the same interface elements as the PerMission Store to display an app’s
permission ratings and provides a link to manage a given app’s permissions (as seen in Figure 4.2(b)).
Because we cannot actually edit other apps’ settings, this link takes them to the app’s page in their device’s
settings. This is, of course, a security necessity, because Android should not allow apps to adjust each others’
permissions. However, it does mean that we cannot display privacy ratings on the actual adjustment screen
12

(a) The home page of the PerMission Assistant.

(b) The Kayak app page in the PerMission Assistant.

Figure 4.2: Screenshots of the PerMission Assistant.
in settings. Similarly, we cannot display ratings along with “just-in-time” permission requests that pop-up
when an app requests a permission during use. The user can always look at the permission’s ratings in the
Assistant later, but ideally the ratings would be available at the time of the request. However, these dialogue
boxes are a protected communication from the operating system, so we can not (and would not want to)
inject rating information into them. Both of these issues could be solved if these ratings were incorporated
into the Android infrastructure.

13

Chapter 5

Populating Privacy Information
The essential feature of our apps is privacy information, which we gather from two sources: an automated
tool and human raters. As discussed in Chapter 4, our apps use both permission-level and app-level ratings.
Since we cannot know, for a given app, whether a user will need permission-level ratings (to manage permissions) or app-level ratings (to choose between apps), we collect ratings for all apps at the permission level
and compute an app-level rating from the permission-level ratings. Section 5.1 and Section 5.2 discuss collecting ratings from automated and human sources, and advantages and disadvantages of each. Section 5.3
discusses combining the human and automated ratings and calculating the app-level rating.

5.1

Automated Ratings

The research community has developed a number of systems that use automated techniques to provide
privacy and security information about Android apps. Some attempt to identify malware apps [57, 58],
while others detect worrisome permissions or suspicious handling of user data [17, 25, 52]. Chapter 9 offers
further discussion of such systems.
These automated tools can provide objective, quantitative privacy information for a large number of
apps at low cost. However, automated tools suffer from a number of short-comings. They are often difficult
to use, even for sophisticated users (the author of this thesis was unable to get many of these tools to run).
They provide little-to-no qualitative feedback, such as discomfort or confusion about permissions. Finally,
many of these tools cannot consider the context of a permission (accessing contact data may be worrisome
14

for a flashlight app, but not a messaging app).
One of the few automated tools that offered a working installation is the DroidRisk system [51], which
analyzes permission request patterns in both malware and benign apps to assign a risk score to each permission. (Because Android has added new permissions since the development of DroidRisk, the tool does not
provide scores for all the current permissions.) Because it is a functional system which offers permissionlevel ratings, our apps incorporate DroidRisk ratings, but it should be noted that we are repurposing the tool,
which was designed to detect malware rather than to rate legitimate apps.
Because we are using the DroidRisk ratings outside their intended purpose, and because they still lack
important contextual and qualitative information like how users feel about a certain permission, our apps
use the DroidRisk ratings primarily as a complement to the human ratings.

5.2

Human Ratings

To capture the full range of users’ concerns, our apps employ user ratings and reviews, similar to the star
ratings and text reviews in Google Play, along with the DroidRisk ratings. Of course, the average user is not
a security expert, and thus may “mis-rate” a permission because they misunderstand its purpose. However,
our apps aim to serve as a communication channel between users, developers, and the Android team, and
“incorrect” ratings signal to developers that they are not adequately explaining their apps’ permissions, and
to the Android team that a permission is confusing or misleading. This is vital information, because if users
do not understand a permission, they cannot meaningfully consent to its use, and therefore the permission
system is failing in its primary objective. Because user ratings provide valuable information for other users,
for app developers, and the Android team, our apps incorporate those ratings as a key source of permission
information.

Bootstrapping Human Ratings

Human ratings present a bootstrapping problem: Users will likely only

use our apps if they contain ratings, but without ratings, the apps would struggle to gain the users necessary
to rate apps. Our apps could initially rely only on automated ratings, but they would then suffer from the
shortcomings of automated tools.
One option for seeding text reviews would be to mine the existing app reviews in Google Play, searching

15

for permission relevant text. However, Google Play makes it difficult to gather more than a sample of reviews
for each app (40 per app, as of June 2016). The Play Store itself, should it ever integrate our apps’ features,
could leverage the complete database of existing reviews.
To offer human ratings right away, our apps use crowdsourced ratings from Mechanical Turk, which
offers a cost-effective platform with a supporting body of academic research [39]. Although the Play store
offers millions of apps, many of these apps are not at all widely used, so we have focused our seeding on
popular apps by pulling from the Play Store’s leaderboards (this is similar to the star ratings in the Play
Store, where popular apps generally have numerous ratings while less popular apps may have few, if any).
We have seeded our apps with crowdsourced ratings for over 1500 apps, and we are continuing to collect
more. (The cost-effectiveness of Mechanical Turk enabled us to do this with a limited research budget.)
Crowdsourcing solves the bootstrapping problem, but raises concerns about whether workers take rating
tasks seriously. (They might, for example, assign random ratings to finish the task as quickly as possible to
maximize their income.) We thus performed a study to evaluate the quality of Mechanical Turk ratings.
We surveyed workers about 14 apps: Facebook, Gmail, Pandora, Angry Birds and ten weather apps, with
20-30 workers per app. For each app, we provided workers with its description and required permissions. We
instructed workers to imagine that they were considering installing the given app and asked them, “Which,
if any, of the permissions did you find unacceptable, and why?” They had to label each permission as either
“acceptable” or “unacceptable,” and could explain each rating in an optional text box.
We reviewed the text responses explaining the ratings. First, we found that more than 60% workers
did provide explanations for their ratings, despite this being optional. Furthermore, their responses were
relevant to the permissions being discussed, indicating that the workers performed the task seriously.
We also evaluated the quality of the binary ratings. This presented a challenge because, as ratings
are essentially opinions, there is no ground truth against which to evaluate. We could measure agreement
between workers with Fleiss’s κ measure of inter-rater reliability, but low agreement would not necessarily
mean that workers were negligent, since there could be valid disagreement. However, we would expect
workers to agree on some of the permissions, particularly non-controversial ones, leading to a range of
agreement across permissions. We computed κ scores for each permission and found that the scores ranged
from -0.1 (significant disagreement) to 1.0 (total agreement). The scores aligned with our intuition about
which permissions would be non-controversial. For example, coarse-grained location had κ = 1.0 for
16

all weather apps, which is unsurprising, as a weather app needs to fetch local conditions.
These findings suggest that Mechanical Turk is a viable method for seeding ratings for an initial corpus
of apps. That said, we consider the crowdsourced ratings to be temporary. As we amass ratings from
in-the-wild users, we will phase out crowdsourced ratings.

5.3

Merging Human and Automated Ratings

While having both human and automated ratings helps mitigate the shortcomings of each, it could be confusing and overwhelming for users to consider two ratings for every permission and to understand the distinctions between them. Thus, we merge each permission’s human and automated ratings together, so that
users can see questionable permissions at a glance.
Calculating the combined rating depends on whether the permission is in the DroidRisk corpus. If it is
not, and thus does not have an automated rating, we take the average of its human ratings. If the permission
does have an automated rating, we take a weighted average of the automated rating, denoted by ar, and the
average of the human ratings, denoted by hr. The overall rating PR for a permission p is given by:

pr p = (0.25 × ar p ) + (0.75 × hr p )

(5.1)

where both ar and hr normalized to be between 0 and 1. Automated ratings are given a lower weight because
they are a less nuanced metric than human ratings.
After computing a single rating for each permission, we have to calculate an overall privacy rating for
each app. This app-level rating makes it easier for users to compare between multiple apps, and is necessary for ranking apps. An app that requires no permissions is given a privacy score of 1 (the best possible
rating), because, from a permission standpoint, it is innocuous. For an app that does request permissions,
we need to calculate an overall rating from its permissions’ ratings. A naive approach would be to average
the permissions’ ratings (perhaps with some sort of weighting). However, an average would suffer a significant drawback: the aggregate rating would always be either equal to or better than the app’s worst rated
permission. As a result, an unscrupulous developer could hide a suspicious permission by requesting a large
number of innocuous-seeming permissions. To avoid this, our marketplace uses an app’s worst permission
rating as the overall rating.
17

Chapter 6

The Effect of Brand Name
One element that could affect user trust of apps, and therefore influence permission ratings, is the brand
behind the app. However, it’s not clear how brand would affect ratings. On one hand, users could be more
comfortable with permissions for a known brand app since they may feel the app will be better engineered
and thus be more secure. On the other hand, they may feel that a large company may be more inclined to
collect user data to better target ads. To explore these questions, we did a survey study examining how users
would rate the permissions of otherwise identical apps with different branding.
We pulled the description and permission information of four apps from the Play store: Gmail, Waze,
Pandora, and Instagram. To limit the length of the survey, we selected a subset of each app’s permissions
to include (between five and nine permissions per app). For each app, we created an “off-brand” version:
MailMan (Gmail), ShortCuts (Waze), TuneUp (Pandora), and PictureIt (Instagram). The off-brand versions
had the same description with the name of the app changed, and the same subset of permissions as their
branded counterparts. We also removed any links from the description of both the on- and off-brand versions.
We posted survey tasks to Mechanical Turk for each of the eight conditions. Figure 6.1 shows a screenshot of the Gmail condition. We asked workers to read the description of the app and rate each one of its
permissions on a 4-point Likert-type scale from “completely acceptable” to “completely unacceptable”. For
all conditions, the Storage permission was an attention check, where its description asked workers to leave
a specific rating.
After workers rated the permissions, we asked them to answer several Likert-type questions about how

18

Figure 6.1: The rating section of the branding survey, showing the Gmail condition. Note that the first bullet
under Storage asks workers to leave a “Somewhat Acceptable” rating for that permission.
19

trustworthy the app seemed, and how functional it seemed, to see if a known brand would influence perceptions of trustworthiness or quality, and if so whether those perceptions would affect permission ratings. The
trustworthiness and quality questions were adapted from Measuring Customer-Based Brand Equity by Lassar et al. [35]. We also asked workers whether they had heard of the app before completing our survey. We
eliminated any workers who either had not heard of a brand-name app, or who thought that they had heard
of an off-brand app. The survey was a between-subjects design, so each worker saw only one app (and only
one version of the app).
We chose to create off-brand versions of name-brand apps and gather survey responses for each, to more
directly recreate the scenario of rating two similar apps, one familiar and the other unfamiliar. We chose
this approach, rather than directly asking workers about their opinions on brand, to avoid the “privacy gulf”:
the gap between users’ stated opinions about privacy and the actions they take (in this case, saying they feel
one way about the importance of brand, but actually rating permissions in way that contradicts their stated
beliefs).
Initially, we had a total of 274 respondents across all 8 conditions, with between 27 and 40 per condition.
After eliminating responses based on the attention check, we were left with 233 total respondents (85% of
the original number), with between 25 and 34 respondents per condition. When we removed respondents
based on familiarity with the brand, we found that for all of the apps (both brand-name and off-brand) except
for Waze, this check eliminated a small percentage of participants: between 3% and 16%, leaving between
21 and 32 respondents per app. However, for the app Waze, 67% of participants had not heard of the Waze
app, and so there were only 9 respondents left.
After eliminating responses based on attention and brand-familiarity, we compared trustworthiness and
quality ratings for each pair using an ANOVA analysis. We also compared the permission ratings between
each pair of apps using an ANOVA analysis.
Results of trustworthiness analysis indicated that workers found Gmail (M = 1.711, SD = 0.757) to be
significantly more trustworthy than MailMan (M = 2.563, SD = 0.601), with F(1, 60) = 24.21, p < 0.001.
Also, workers found Instagram (M = 2.063, SD = 0.552) to be more trustworthy than PictureIt (M = 2.73,
SD = 0.389), with F(1, 51) = 23.14, p < 0.001. The trustworthiness scores for Waze and ShortCuts, and
Pandora and TuneUp were not significant.
For quality analysis, results showed workers found Pandora (M = 2.032, SD = 0.759) to be of higher
20

(a) The Calendar permission group.

(b) The Photos/Media/Files permission group.

(c) The Other permission group.

(d) The Identity permission group.

Figure 6.2: The ratings for each permission for the Gmail and MailMan apps (also shown in Figure 10.2 in
the appendix). The ratings for other pairs of apps follow a similar pattern.
quality than TuneUp (M = 2.32, SD = 0.605), with F(1, 44) = 7.915, p = 0.007. None of the other pairs
showed a significant difference in quality.
Despite its effect on trustworthiness and quality, brand did not have any affect on the apps’ permission
ratings. For each pair, we found the brand of the app was not a significant factor in determining a permission
ratings. Figure 6.2 shows the ratings for the Gmail and Mailman apps. The ratings for all four pairs of apps
are shown in Figures 10.2, 10.3, 10.4, and 10.5 in the appendix.
These findings suggest that respondents do not consider the brand of an app when rating its permissions.

21

Chapter 7

Ranking Apps
While the privacy ratings can help users choose between apps, a privacy-conscious marketplace should also
promote privacy-respecting apps so that users can find them in the first place. In particular, the marketplace
should incorporate apps’ privacy ratings into its search function so that apps with better privacy scores are
ranked higher in results. However, the marketplace cannot simply sort results by privacy rating; users need
apps that are functional and relevant to their needs, as well as privacy preserving.
One option would be to replicate the Play store’s ranking for a given query and combine those rankings
with our privacy ratings to sort apps. However, as discussed in Chapter 1, the Play store may rank apps
in a way that is contrary to users’ privacy interests, so integrating their ranking could undercut our goals.
Also, the Play store’s ranking method is opaque and could rely on privileged information, and so may be
irreproducible. Thus, we need another way to incorporate functionality and relevancy.
Our marketplace uses apps’ star ratings from the Play store as a proxy for functionality. These ratings
are supplied by users, not by Google, and therefore do not present the same concerns as the Play store’s
ranking function.
To incorporate relevancy, we leverage our database of apps. The scraped app data are stored in a Postgres
database. Postgres provides built-in text search that, given a search query, calculates a relevancy score for
each record based on how often and where the query appears. Our marketplace searches against apps’ title
and description to get the relevancy score.
Given privacy, functionality, and relevancy information, we need compute a single ranking number because the marketplace ultimately needs a sort order for apps. Although we are building a privacy-conscious
22

marketplace, relevancy is the most important factor, followed by functionality, since users will not be satisfied with irrelevant or dysfunctional apps, no matter how privacy preserving. We use a weighted sum of all
three components, so an app a’s rank for a query q is defined by:

Rankaq = raq + (0.25 × f ra ) + (0.2 × pra )

(7.1)

where raq is the relevancy score for app a on query q, f ra is its functionality rating, and pra is the permission
rating of its worst-rated permission (as defined in Equation 5.1), and raq , f ra , and pra are normalized to be
between 0 and 1. We arrived at these weights empirically by experimenting with different weights. Because
relevance is the only component that depends on the search query, it carries more weight than functionality
or privacy. If a user does not find an app they want after their initial query, and they try a second query,
we want to return different results. Weighting f ra and pra more heavily had the result that issuing similar
but distinct queries (like “game” and “puzzle game”) did not provide distinct results. Giving pra alone any
more weight resulted in apps with poor functionality ratings appearing frequently in search results. Both of
these effects could be frustrating to users.

23

Chapter 8

The Permission User Interface
Because we want to communicate rating information to users, the interface for displaying the ratings is another critical component of our marketplace. The interface should help users understand
the riskiness of individual permissions so they can
make informed decisions without requiring significant effort. Ideally, it would be intuitive enough
that users could understand it without too much direction. Figure 8.1 is an example of what such an
interface might look like.
Designing such an interface proved surprisingly
subtle. Our original designs, based on existing security metaphors, failed to convey the desired information. Indeed, we found that some of them
actively mislead users (Section 8.1). We also unearthed some common patterns of interface confu-

Figure 8.1: A prototype interface for permission ratings.

sion. In the end we found three designs that most

subjects understood, and conducted a large user study to confirm this (Section 8.2). One of these designs
will be used in the new marketplace.
24

8.1

Exploratory Interface Design

To find a functional interface, we designed several prototypes and leveraged Amazon’s Mechanical Turk
platform to give us rapid feedback on those prototypes.

Color in the Interfaces Several of our candidate interfaces use color to convey information. Although
they all use colors distinguishable by viewers with red-green color vision deficiency (deployed apps are
compatible with both red-green and blue-yellow color vision deficiency), they do lose meaning viewed in
greyscale. Thus, recommend reading this section in color.

Methodology We explored each prototype with a survey on Mechanical Turk. These surveys were intended to expose broad conceptual problems in the interfaces, so we recruited only 10 to 12 subjects per
interface. The surveys focused on two issues: how well subjects understood the purpose and meaning of the
interface absent any explanation, and whether subjects understood where the ratings came from.
During each study, subjects were shown a mock-up of a candidate interface. Figure 8.1 is an example
of such a mock-up. The mock-ups displayed the full permissions interface for a fictional app called Find a
Pharmacy, which appeared to be developed by the (also fictional) company ApexApps. We chose a pharmacy
locator app because it could pose a privacy risk to a user (if, for example, it stored a list of the user’s
medications for refill reminders), but would be unlikely to offend any subjects. Each mock-up used different
iconography to present the user permission ratings (which were also fake), but the permissions and their
rating values were the same or comparable across interfaces.
The mock-ups were presented as static images that were tall enough not to require scrolling. (Because
the rating icons varied in size, the mock-ups varied in height.) This was both to ensure subjects did not miss
any of the iconography by failing to scroll, and to avoid distraction induced by interaction.
Upon being presented with one of the mock-ups, subjects were asked to explain, in a free-response text
box, what they thought the icons next to the permissions meant. Subjects were given no information about
the purpose of the interface. The next page of the survey told them that the icons were privacy ratings and
asked them to rate how clear this was from the interface, on a 4-point Likert-type scale.
We manually examined the text responses to identify conceptual problems with each interface, whereupon we either attempted to redesign the interface to address issues raised by subjects, or we decided the
25

interface was not viable and disqualified it. Using this process we eliminated all but three interfaces, which
we evaluated in a larger study (Section 8.2).
To understand subjects’ beliefs about the ratings’ source, we asked whether they thought the ratings
came from “other Android users”, “independent security experts”, “a review team at Google”, or “don’t
know”. I will discuss the outcome of this question before delving into the individual interfaces.

The Source of the Ratings If users are going to trust the ratings enough to use them, they are necessarily
placing trust in the raters, so it is important that users understand the source of the ratings. We found that
most of the interfaces failed to convey to subjects that the ratings were from other Android users. This is
therefore something that should be considered in the design of the complete marketplace.

Stars A five-star system is possibly the most common iconography for user ratings, and is already in use
in the Google Play store to display apps’ overall functionality ratings. It is therefore a natural basis for
experimentation.
Possibly due to the ubiquity of five-star ratings, subjects seemed to have preconceptions about the meaning and source of the ratings. This proved to be both an advantage and a disadvantage. On the positive side,
subjects correctly understood the source of the ratings (other Android users), and that more stars corresponded to a better rating.
Unfortunately, subjects’ association with stars as a functionality rating was too strong. Many subjects
thought the ratings indicated how well the permissions’ services worked. For example, some subjects
thought the rating next to “Network Communication” showed the strength of the network signal.
In order for the star ratings to effectively communicate the meaning of the permission ratings, users
would have to understand that the same icon on the same page had two different meanings (the app’s functionality rating and the permission ratings). This potential for user confusion led us to eliminate this interface. However, it did inspire interfaces using privacy-relevant symbols rather than stars, with the intention
of leveraging users’ existing understanding of an out-of-five system while expressing that the ratings are
about privacy.

Locks One symbol we used place of stars was locks, a common visual metaphor for protection. Our original lock design used yellow locks over a grey background: This design caused a number of misconceptions.
26

Figure 8.2: The lock interfaces
First, although most subjects understood that the locks were privacy ratings, some thought they meant that
the permission’s service was restricted. (This may stem from the practice by developers of using locks to
mark features of an app that must be purchased or earned before they can be used.) Second, those subjects
who did understand that the locks represented privacy ratings could not tell whether more yellow locks
denoted a better or worse rating. This is troubling, because it would cause users to think the most dangerous permissions were the safest. We label this confusion, present in many interfaces, the better-or-worse
phenomenon, and discuss it more at the end of this chapter.
The second lock interface, drawing from the traffic light interface (presented later in this chapter), tried
to eliminate the better-or-worse phenomenon by using red and green locks. To further reinforce the message
of privacy, the green locks were closed and the red locks open. We also hoped that using color would reduce
the perception that the locks indicated restricted services (in which case fewer locks would be preferable).
Though these changes helped curtail the better-or-worse phenomenon, they did not eliminate it entirely.
Because the better-or-worse phenomenon was at least partially caused by confusion about whether more
or fewer icons was better, we replaced the out-of-five system with a single lock next each permission, and
relied on color and open-ness to convey the rating: Using only red and green locks would have been too
similar to the checkbox interface (discussed below), which had resulted in dangerous misunderstandings by
subjects. To avoid this, the interface also used half-open yellow locks. This had the additional benefit of
conveying more information than just red and green locks without adding much cognitive effort.
27

Figure 8.3: The eye interfaces
This redesign improved understanding, but some subjects still thought that the locks indicated inaccessible features. To further clarify the icons’ meaning, we grouped permissions by rating and added explanatory
text alongside the icons, drawing from the design of the first traffic light interface. (Additionally, we hoped
introducing the word “voted” would also clarify the source of the ratings by emphasizing that they were an
aggregate of community opinions.) The final lock interface was an improvement over its predecessors, but
some subjects still thought the locks indicated availability. One subject said of the yellow lock, “I think it
signifies that some features are unlocked but not all of them.” Since locks performed worse than percentage
bars (presented below) and traffic signs, we eliminated this interface family.

Eyes Continuing our exploration of other symbols in an out-of-five rating, this interface used eyes in
the place of stars. Our first icon, which used a no-smoking-style circle-and-slash over an eye, proved too
difficult to see at small scale. One subject stated that it “looks like a picture of a watch, so I would say it has
something to do with time.” We thus tried different-color eyes: The more dangerous a permission was, the
more red eyes it had; the more benign it was, the more grey eyes it had. Additionally, the red centers had
the appearance of a red recording light as seen on a camera.
Though subjects could now see the icon, this interface exhibited the better-or-worse phenomenon. One
possible cause is that the grey eyes looked more like actual eyes, and so subjects thought that more grey
eyes meant more surveillance.
28

(a) The mask interface.

(b) The checkbox interface.

(c) The grades interface.

Figure 8.4: The mask interface, the checkbox interface, and the grade interface.
We tried various redesigns such as grouping permissions by rating with a text header and introducing
a yellow eye category. Though these changes helped, percentage bars and traffic signs were still better
understood by subjects, so we disqualified this interface.

Guy Fawkes Masks

We also explored out-of-five ratings using Guy Fawkes masks, which were popular-

ized by the graphic novel V for Vendetta and its film adaptation, and have become a symbol for personal
privacy and activism. Unfortunately, subjects felt the rating showed how well protected their information
was from the government (possibly due to the “hacktivist” group Anonymous’ adoption of the mask as a
symbol). As this is not a protection a permissions system can provide and it is dangerous for an interface to
suggest protections that do not exist, we eliminated all variations of this design.

Binary Checkboxes As we wanted to convey information without demanding much cognitive effort, we
designed a simple interface in which each permission was given either a green checkmark indicating users
approved of the permission or a red X indicating they did not approve. Unfortunately, we discovered a very
significant confusion: in this interface the red X was meant to indicate a potentially invasive permission, but

29

subjects thought it meant that the given permission had been disabled. This is an extreme case of the betteror-worse phenomenon and is an alarming misconception. We therefore eliminated this interface without
attempting to redesign it.

Grades Drawing on another iconography, this interface used letter grades to present the ratings. Typically
used to rate students’ academic performance, grades are also used in some non-educational settings (e.g., the
New York City Department of Health restaurant inspection results). Unfortunately, most subjects thought
the ratings were for the functionality of a permission’s service. As this interface failed in its primary purpose,
we eliminated it.

Percentage Bars

Eschewing existing privacy and safety metaphors, this interface used rectangular bars to

indicate the percentage of raters who considered a given permission to be acceptable. This style of rating
conveys more information than the other interfaces, and therefore carries a greater risk of overwhelming
a user. To mitigate this issue, the bars were colored red, yellow, or green depending on the permission’s
approval rating, giving a more obvious visual distinction between ratings: Subjects understood that the bars
indicated privacy ratings, and this interface did not suffer from the better-or-worse phenomenon, due in part
to the colors of the bars. One subject stated the bars rated the permissions from “most risky to the least, red
being the highest and the green being generally safe”.
Although the bars were effective, subjects’ feedback on the traffic signs interface revealed a potential
pitfall: their comments suggested that subjects perceived a green light as a signal to proceed without caution,
which could encourage users to download an app without considering the permissions at all. We were
concerned the green bars could have the same over-soothing effect.
To encourage caution in all cases, we modified the interface to use red, orange, and yellow bars. This
interface had two variations. In both, more dangerous permissions had red bars and less dangerous permissions had yellow bars. In the first variant, the more dangerous a permission, the fuller its bar would be
(showing the percentage of raters who deemed the permission unacceptable). These bars might look like
,

, and

. In the second variation, the more dangerous a permission, the more

empty its bar (showing the percentage of raters who deemed the permission acceptable). These bars would
look like

,

, and

.
30

Figure 8.5: The bar interfaces
Both versions of this interface introduced the better-or-worse phenomenon. It is possible that, because
all of the colors were “warning colors”, the effectiveness of the color differentiation was diminished. Additionally these colors could cause warning fatigue after continuous use.
To avoid these problems, we introduced two-color bars. As before, each bar had some percentage of
a warning color, (the percentage of raters who deemed the permission unacceptable for the app), however
the rest of the bar was green, to clarify meaning and limit warning fatigue. There were four variants: The
first two interfaces used only red and green (with two variants: red on the left or red on the right), so the
goodness of a rating was indicated only by the ratio of red to green. Unfortunately, subjects thought these
31

bars were progress bars or ratings of the permission’s service quality.
The second two interfaces used red, orange, and yellow along with the green, so the goodness of the
permission was indicated both by the ratio of the warning color to green and by the warning color used. As
with the red-green interfaces, one of the interfaces had the green on the left (so

,

), which we will call G-ROY bars, and the other had the warning color on the left (like
, and

, and
,

), which we will call ROY-G bars.

Unlike the red-green only bars, subjects still understood that the ratings were privacy related, and, unlike
the warning-color only bars, they understood which ratings were better and which were worse. One subject
said of the orange bar that “It means to me that feelings about this permission are mixed—about half of
people think it is acceptable and half think it is not acceptable for this app to have that permission.” Thus
we subjected these interfaces to large-scale testing (Section 8.2).

Traffic Signs

The final set of interfaces we designed used traffic markers, an iconography suggested by a

subject from another interface.
The traffic marker interface split the permissions into three categories, with headers above each category.
This interface successfully communicated that the ratings were related to privacy, but it exhibited a significant danger: the single green light gave subjects the sense that all the permissions in the “most acceptable”
category were completely safe and did not need to be examined at all, which is not necessarily the intended
meaning. Additionally, this interface could be unsuitable for users with color vision deficiency.
To address color vision deficiency issues, we tried a variation using position (as real traffic lights do).
However, it still did not address the problem of an overly-soothing green light.

Rather than simply changing the colors of the lights (which could look jarringly different from actual
traffic lights and thus confuse users), the next interface used traffic signs: a red octagon (mimicking a stop
sign), an orange diamond, and a yellow circle. The colors would be sparser in the interface (as they were
only by the section headers), so warning fatigue was less of a concern than in the percentage bars. This
interface was well understood by subjects so we included it in the large-scale testing (Section 8.2).

32

Figure 8.6: The traffic interfaces
Common Findings and Observations

These studies exposed two issues that arose in multiple interfaces.

First, because Android is used in a range of cultures, some metaphors may not be familiar or applicable to
all users. For example, some countries do not use a letter grades in their schools. Of our interfaces, only the
percentage bars do not rely on an existing metaphor and so avoid this particular confusion.
The second common issue was the better-or-worse phenomenon, wherein the more negative a rating is,
the more positive subjects interpreted it to be. The net effect of this is alarming: The most dangerous permissions appear to be the most harmless! This problem is most troubling in the checkbox interface. There,
dangerous permissions were indicated by a red X, but subjects thought the X meant that the permissions had
been disabled, and therefore were completely innocuous. Because of this phenomenon’s dangerous nature,
it greatly influenced our design decisions, and our selection of interfaces to study further.

8.2

Large Scale Interface Evaluation

Small-scale testing allowed us to eliminate all but three interfaces. To further validate these three, we carried
out a large scale evaluation.

Methodology As with the smaller studies, we posted surveys on Mechanical Turk. We had 311 subjects
for the traffic signs interface, 365 subjects for the G-ROY bar interface, and 83 subjects for the ROY-G bar
interface. The surveys explored four issues.
33

Two are the same as before: how well subjects understood the meaning of the interface absent other cues,
and whether subjects understood the ratings’ source. For these, we used the same prompts and mock-ups as
for the small-scale studies (Chapter 8.1).
In addition, we asked subjects how much they would trust ratings from each of the three possible sources
(“other Android users”, “independent security experts”, and “a review team at Google”). For each source,
subjects had to select either “I would not trust them at all”, “I would trust them somewhat”, or “I would trust
them completely”.
Finally, we examined whether subjects would consider these ratings useful for different types of users.
Specifically, we asked them to provide “yes”/“no” answers for whether they would use such a system for
themselves, recommend the system for use by a parent (someone who might need assistance with technical
decisions), and recommend it for use by a teenager (someone for whom they might be responsible).
I will first discuss how well subjects understood the interface, than I will discuss the perception of utility
of these ratings for different populations. Finally I will discuss whether subjects understood the source of
the ratings and how much they would trust each source.

The Meaning of the Ratings We had two types of data to evaluate how well subjects understood the
meaning of the interface: text answers to the free-response question, and the Likert scale data.
To evaluate the text data we manually coded the correctness of the interpretation of the interface for each of the
responses.
To classify, we used a rubric that was
revised until we obtained an inter-coder
Figure 8.7: Percentage of subjects in each interpretation cate- reliability score (κ) of 0.835. The rubric
gory.
is as follows:
Predominantly Correct Interpretation User understands, for all three ratings, that the ratings signify the
acceptability of each permission or believes that the rating signifies the potential harm that could be
caused by each permission, and correctly interprets the order of ratings from positive to negative.
Users who correctly identify which of two ratings is more positive, but do not explain their choice
34

further, fall in to this category.
Semi-Correct Interpretation User understands that the ratings are privacy related, but does not understand
or mis-understands exactly what they signify (e.g., they may think it signifies how often the app uses a
service, or the “level” of access the app has to the rated service). Users who understand that the ratings
are privacy-related, but cannot correctly interpret the order of ratings from positive to negative fall in
to this category. Users who think the ratings signify how much access an app has to the permission fall
in to this category. Additionally, users who believe that the ratings signify the percentage of behaviors
pertaining to a given permission that are acceptable (e.g. half of the network communications by the
app are acceptable) fall in to this category.
Incorrect Interpretation User does not understand that the ratings are privacy related (e.g., they may think
the ratings have to do with power usage), or that they are ratings at all.
Fig. 8.7 summarizes the percentages of subjects in each class
for each interface.
Broadly classifying both correct and semi-correct interpretations as understanding the interface, all three interfaces were unFigure 8.8: Subjects’ responses to the Likert-type question asking
users whether they thought it was clear that the icons represented pri- derstood by over 50% of subjects.
vacy ratings.
In all three cases, a chi-squared
goodness-of-fit test showed the interface performed better than chance. For the G-ROY bar interface, 58% of subjects understood the interface, χ 2 (1, N = 365) = 8.29, p = 0.004. The traffic sign interface was understood by 64% of subjects, χ 2 (1, N = 311) = 24.34, p < 0.001. For ROY-G bars, 66% of subjects understood the interface,
χ 2 (1, N = 83) = 8.78, p = 0.003.

Communicating Rating Information Without Context

Note that these subjects had been given no ex-

planation for the interface, so these results represent a worst-case baseline. This would likely not occur in
35

the privacy-focused app store where users would have context for the ratings’ meaning. However, ideally
our privacy features would be incorporated into a general purpose marketplace, where users may not have
the context to cue them to the meaning of the ratings. In this case, the ratings would need to communicate
their meaning absent the context.
After subjects answered the free-response questions, we informed them that the icons were privacy
ratings. We asked them to rate whether this was clear from the interface, on a 4-point Likert-type scale
from “completely unclear” (a value of 1) to “completely clear” (a value of 4). The responses are shown in
Fig. 8.8. The ROY-G bar interface performed the best, with a mean of 3.2, and a chi-squared goodness-of-fit
test showed the results to be significantly different from chance, χ 2 (3, N = 83) = 36.3, p < 0.001. Traffic
signs had a mean of 3, and the chi-squared test showed the results to be significantly different than chance,
χ 2 (3, N = 311) = 83.32, p < 0.001 The G-ROY bars had a mean of 2.9, and again, a chi-squared test showed
a significant difference from chance, χ 2 (3, N = 365) = 105.38, p < 0.001

Likelihood of Recommending the System To determine whether subjects felt the ratings would be useful
for different audiences, we asked if they would personally use the ratings, and if they would recommend
them to a parent or teenager. Subjects’ responses did not differ significantly between interfaces, so we
consider them in aggregate. Overall, 75% of subjects said they would use the ratings for themselves, 74%
would use them for a teenager, and 72% would use them for a parent. These results suggest there is a user
base for these ratings.

The Source of Permission Ratings As in the small-scale studies, we investigated whether subjects understood that the ratings
were from other Android users.
Subjects’ beliefs are shown in
Figure 8.9: Subjects’ beliefs about the source of the ratings.

Fig. 8.9. During this large-scale
study, a plurality of subjects under-

stood that they were from other Android users, but there was still confusion. Although it would be ideal, it
36

may not be possible to convey the source of the ratings solely through an interface.
Overall, this study suggests that users are concerned about their privacy but currently lack the tools or
expertise to control their own data and resources. Our marketplace’s privacy ratings of permissions will
provide users with a mechanism to make informed decisions about apps.

37

Chapter 9

Related Work
There are a number of “permission manager” apps on Google Play, many of which simply reorganize the
information provided in the Android settings, and do not offer any additional privacy information. Some
highlight “risky” apps, but it is not clear how they are calculating risk [10, 11]. Many appear to use the
number of permissions a given app requests, which is an unreliable metric. There are also managers that
remove other apps’ permissions by altering the apps’ APKs [8], or require root access to disable permissions [9], which are significant threat vectors in their own right and do not actually help users make privacy
choices (and are of limited use since the release of Android Marshmallow, where permission toggling is a
built-in feature). None of these tools provides the structured permission ratings and reviews available in our
PerMission Assistant.
Almuhimedi et al. [13] show that a permission manager can be helpful to users in managing their privacy.
Liu et al. [37] present a personalized privacy assistant (PPA) that engages users in a dialogue to determine
a privacy profile for the user, which the manager then employs to suggest permission settings to the user.
Although similar in concept, by focusing on publicly viewable ratings, our system can both let users explore
how other users understand permissions, and serve as a channel of communication amongst users, developers, and the Android team. Our Assistant could be incorporated with the PPA to provide a more complete
tool.
Highlighting the value of privacy information in the marketplace, researchers such as Felt et al. [27]
have found that smartphone users take privacy risks seriously. In a study conducted while Android was
still using an all-or-nothing permission model, Wijesekera et al. [54] found that at least 80% of respondents
38

would have liked to block at least one permission request, indicating they are concerned about their privacy.
However, Chin et al. [18] show that although smartphone users are careful about performing certain tasks,
they engage in risky behavior when it comes to installing apps, suggesting that users could benefit from a
more privacy-conscious marketplace. Tsai et al. [50] built a search engine annotated with privacy scores for
the merchants. They found that users are more likely to purchase products from sellers with higher privacy
scores, demonstrating that offering privacy information during the search process can affect user decisions.
Tan et al. [48] show that when iOS developers provide explanations of their apps’ permission requests, users
are more likely to approve the requests. This indicates that users want to understand how permissions will
be used, and that it is in developers’ interest to provide this information.
Tian et al. [49] use app reviews to give users more privacy information, showing that user reviews can
help users make privacy decisions. However, they focus on the consequences of app updates, rather than
installing new apps or managing current apps. Additionally, they draw from existing reviews, rather than
gathering privacy-specific reviews.
There are systems that use automated approaches to detect misbehavior or privacy risks in apps (such
as Chin et al. [17], Enck et al. [25], Sarma et al. [46], and Wei and Lie [52]), to flag dangerous permissions
(such as Wang et al. [51] and Pandita et al. [42]), or to detect malware (like Zhou et al. [57], Zhou et al.
[58], and others). All of these systems generate information that could be employed in a privacy-centric
marketplace to rank apps and inform users about privacy. Yu et al. [56] and Rosen et al. [45] use API and
method calls to generate privacy policies for Android apps, and to highlight privacy-relevant app behavior,
respectively, but neither system connects particular behaviors with the permissions that enable them. If
developers or Android were to provide this information, our PerMission Assistant could incorporate these
tools to help users decide which permissions to enable or disable. Ayyavu and Jensen [14] find that user
feedback (such as ratings) and heuristic-based automated tools are complementary.
Papamartzivanos et al. [43] analyze smartphone usage patterns across users to find privacy leaks in apps.
Lin et al. [36] and Yang et al. [55] use information gathered via crowdsourcing to find unexpected permissions and improve user understanding of Android permissions. These systems aggregate crowd feedback
into observations about apps, rather than providing a direct channel of communication for users and developers. Burguera et al. [16] also take a crowd-based approach to app security. Unlike our work, they use the
crowd to collect traces of app behavior to detect malware, rather than gathering direct feedback from users
39

on permission use in legitimate apps.
Kelley et al. [33] find that using a “nutrition label” format for privacy policies helped users better understand the policy. Egelman et al. [24] use crowdsourcing to evaluate user comprehension of privacy icons
for ubiquitous computing environments. These works demonstrate how an interface can help users better
understand privacy, but their icons are intended for different uses.

40

Bibliography
[1] Family hub refrigerator. Accessed May 2017. http://www.samsung.com/us/explore/family-hubrefrigerator/.
[2] Google Play Store:

number of apps 2009-2016 | statistic. Accessed Feb. 2016. https:
//www.statista.com/statistics/266210/number-of-available-applications-in-thegoogle-play-store/.

[3] Google Play Android app store scraper. Accessed Apr. 2016. https://www.github.com/chadrem/
market_bot.
[4] HomeOS: Enabling smarter homes for everyone - Microsoft Research.

Accessed May 2017.

https://www.microsoft.com/en-us/research/project/homeos-enabling-smarter-homesfor-everyone/.

[5] App permissions explained - what are they, how do they work, and should you really care? Written
Jan. 2016. https://www.dbbest.com/blog/app-permissions-explained/. Accessed: Mar. 2017.
[6] Permissions requested by apps and extensions - Chrome web store help. Accessed Mar. 2017. https:
//support.google.com/chrome_webstore/answer/186213?hl=en.
[7] Android permissions | help center.
articles/android-permissions.

Accessed Apr. 2016.

https://help.pinterest.com/en/

[8] Apk permission remover - Android apps on Google Play. Accessed Apr. 2016. https://play.
google.com/store/apps/details?id=com.gmail.heagoo.apkpermremover.
[9] Fix permissions - Android apps on Google Play. Accessed Apr. 2016. https://play.google.com/
store/apps/details?id=com.stericson.permissionfix.
[10] MyPermissions privacy cleaner - Android apps on Google Play. Accessed Apr. 2016. https://play.
google.com/store/apps/details?id=com.mypermissions.mypermissions.
[11] PermissionDog - Android apps on Google Play. Accessed Apr. 2016. https://play.google.com/
store/apps/details?id=com.PermissioDog.
[12] This is your brain on Uber. Written May 2016. http://www.npr.org/2016/05/17/478266839/
this-is-your-brain-on-uber. Accessed: Jan. 2017.
[13] H. Almuhimedi, F. Schaub, N. Sadeh, I. Adjerid, A. Acquisti, J. Gluck, L. F. Cranor, and Y. Agarwal. Your location has been shared 5,398 times!: A field study on mobile app privacy nudging. In
Conference on Human Factors in Computing Systems, 2015.
1

[14] P. Ayyavu and C. Jensen. Integrating user feedback with heuristic security and privacy management
systems. In Conference on Human Factors in Computing Systems, 2011.
[15] J. Bernstein. You should probably check your Pokemon Go privacy settings - BuzzFeed News. Written
July 2016. https://www.buzzfeed.com/josephbernstein/heres-all-the-data-pokemon-gois-collecting-from-your-phone?utm_term=.ceMJPj2k7#.hnVZpx89n. Accessed: Apr. 2017.
[16] I. Burguera, U. Zurutuza, and S. Nadjm-Tehrani. Crowdroid: Behavior-based malware detection system for Android. In Security and Privacy in Smartphones and Mobile Devices, 2011.
[17] E. Chin, A. P. Felt, K. Greenwood, and D. Wagner. Analyzing inter-application communication in
Android. In Mobile Systems, Applications, and Services, 2011.
[18] E. Chin, A. P. Felt, V. Sekar, and D. Wagner. Measuring user confidence in smartphone security and
privacy. In Symposium on Usable Privacy and Security, 2012.
[19] J. Cipriani. How to control your privacy settings on iOS 6 - CNET. Written Sept. 2012. https://
www.cnet.com/how-to/how-to-control-your-privacy-settings-on-ios-6/. Accessed: May.
2017.
[20] G. Cluley.
IT manager has bikes stolen after cycling app reveals his address.
Written
Dec. 2015. https://www.welivesecurity.com/2015/12/22/manager-bikes-stolen-cyclingapp-reveals-home-address/. Accessed: Apr. 2017.
[21] J. Cox. Hack brief: Malware sneaks into the Chinese iOS App Store | WIRED. Written Sept. 2015. https://www.wired.com/2015/09/hack-brief-malware-sneaks-chinese-iosapp-store/. Accessed: Apr. 2017.
[22] P. Eckersley. Awesome privacy tools in Android 4.3+. Accessed Mar. 2015. eff.org/deeplinks/
2013/11/awesome-privacy-features-android-43.
[23] P. Eckersley. Google removes vital privacy feature from Android, claiming its release was accidental.
Accessed Mar. 2015. eff.org/deeplinks/2013/12/google-removes-vital-privacy-featuresandroid-shortly-after-adding-them.
[24] S. Egelman, R. Kannavara, and R. Chow. Is this thing on?: Crowdsourcing privacy indicators for
ubiquitous sensing platforms. In ACM Conference on Human Factors in Computing Systems, 2015.
[25] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth. TaintDroid:
An information-flow tracking system for realtime privacy monitoring on smartphones. In Operating
Systems Design and Implementation, 2010.
[26] D. Etherington. GM’s new SDK for in-car infotainment apps offers access to nearly 400 data points
| TechCrunch. Written Jan. 2017. https://techcrunch.com/2017/01/26/gms-new-sdk-forin-car-infotainment-apps-offers-access-to-nearly-400-data-points/. Accessed: May.
2017.
[27] A. P. Felt, S. Egelman, and D. Wagner. I’ve got 99 problems, but vibration ain’t one: A survey of
smartphone users’ concerns. In Security and Privacy in Smartphones and Mobile Devices, 2012.

2

[28] D. Frommer.

The smart TV app revolution - Business Insider.

Written Oct. 2013.

http://www.businessinsider.com/the-smart-tv-app-revolution-2013-10?utm_source=
House&utm_term=RR&utm_campaign=RR. Accessed: Apr. 2017.

[29] A. Gell.
The not-so-surprising survival of Foursquare - the New Yorker.
Written Mar. 2017.
http://www.newyorker.com/business/currency/the-not-so-surprisingsurvival-of-foursquare. Accessed: Apr. 2017.
[30] D. Goodin. Golden state warriors android app constantly listens to nearby audio, fan says. Written Sept. 2016. https://arstechnica.com/tech-policy/2016/09/golden-state-warriorsandroid-app-constantly-listens-to-nearby-audio-fan-says/. Accessed: Jan. 2017.
[31] C. Hoffman. iOS has app permissions, too: And they’re arguably better than Android’s. Written Dec. 2013. https://www.howtogeek.com/177711/ios-has-app-permissions-too-andtheyre-arguably-better-than-androids/. Accessed: May. 2017.
[32] J. Kahn. Apple redesigns Siri with new features in iOS 7, introduces iOS in the car. Written
June 2013. https://9to5mac.com/2013/06/10/apple-redesigns-siri-with-new-featuresin-ios-7-introduces-ios-in-the-car/. Accessed: Jan. 2014.
[33] P. G. Kelley, J. Bresee, L. F. Cranor, and R. W. Reeder. A “nutrition label” for privacy. In Symposium
on Usable Privacy and Security, 2009.
[34] J. Laird. Google’s Android OS is mating with cars at CES, promising big things for your ride. Accessed
Jan. 2014. techradar.com/news/car-tech/google-s-android-os-is-mating-with-cars-atces-promising-big-things-for-your-ride-1212393.
[35] W. Lassar, B. Mittal, and A. Sharma. Measuring customer-based brand equity. In Journal of Consumer
Marketing, volume 12, 1995.
[36] J. Lin, S. Amini, J. I. Hong, N. Sadeh, J. Lindqvist, and J. Zhang. Expectation and purpose: Understanding users’ mental models of mobile app privacy through crowdsourcing. In Mobile Ubiquitous
Computing, Systems, Services and Technologies, 2012.
[37] B. Liu, M. S. Andersen, F. Schaub, H. Almuhimedi, S. A. Zhang, N. Sadeh, Y. Agarwal, and A. Acquisti. Follow my recommendations: A personalized privacy assistant for mobile app permissions. In
Symposium on Usable Privacy and Security, 2016.
[38] F. Manjoo.
Clearing out the app stores: Government censorship made easier.
Written
Jan. 2017. https://mobile.nytimes.com/2017/01/18/technology/clearing-out-the-appstores-government-censorship-made-easier.html. Accessed: Apr. 2017.
[39] W. Mason and S. Suri. Conducting behavioral research on Amazon’s Mechanical Turk. Behavior
Research Methods, 44, 2012.
[40] L. Mirani. The amount most people are willing to pay for an app is $0-until they’ve actually downloaded it - Quartz. Written Sept. 2013. https://qz.com/129699/the-amount-most-people-arewilling-to-pay-for-an-app-is-0-until-theyve-actually-downloaded-it/. Accessed: Apr.
2017.

3

[41] M. Nauman, S. Khan, and X. Zhang. Apex: Extending Android permission model and enforcement
with user-defined runtime constraints. In ACM Symposium on Information, Computer and Communications Security, 2010.
[42] R. Pandita, X. Xiao, W. Yang, W. Enck, and T. Xie. WHYPER: Towards automating risk assessment
of mobile applications. In USENIX Conference on Security, 2013.
[43] D. Papamartzivanos, D. Damopoulos, and G. Kambourakis. A cloud-based architecture to crowdsource
mobile app privacy leaks. In Panhellenic Conference on Informatics, 2014.
[44] K. Pratap.
June 2015.

Android M: Top new features in the next major android release.

Accessed

gadgets.ndtv.com/mobiles/features/android-m-top-new-features-in-thenext-major-android-release-697502.

[45] S. Rosen, Z. Qian, and Z. M. Mao. AppProfiler: A flexible method of exposing privacy-related behavior in Android applications to end users. In Conference on Data and Application Security and Privacy,
2013.
[46] B. P. Sarma, N. Li, C. Gates, R. Potharaju, C. Nita-Rotaru, and I. Molloy. Android permissions: A
perspective combining risks and benefits. In Symposium on Access Control Models and Technologies,
2012.
[47] J. Skillings. Overexposed: Snapchat user info from 4.6M accounts - CNET. Written Jan. 2014.
https://www.cnet.com/news/overexposed-snapchat-user-info-from-4-6m-accounts/. Accessed: Apr. 2017.
[48] J. Tan, K. Nguyen, M. Theodorides, H. Negrón-Arroyo, C. Thompson, S. Egelman, and D. Wagner.
The effect of developer-specified explanations for permission requests on smartphone user behavior.
In Conference on Human Factors in Computing Systems, 2014.
[49] Y. Tian, B. Liu, W. Dai, B. Ur, P. Tague, and L. F. Cranor. Supporting privacy-conscious app update
decisions with user reviews. In Security and Privacy in Smartphones and Mobile Devices, 2015.
[50] J. Y. Tsai, S. Egelman, L. Cranor, and A. Acquisti. The effect of online privacy information on purchasing behavior: An experimental study. Information Systems Research, 22(2), 2011.
[51] Y. Wang, J. Zheng, C. Sun, and S. Mukkamala. Quantitative security risk assessment of Android
permissions and applications. In Data and Applications Security and Privacy XXVII, 2013.
[52] Z. Wei and D. Lie. LazyTainter: Memory-efficient taint tracking in managed runtimes. In Security
and Privacy in Smartphones and Mobile Devices, 2014.
[53] C. Welch. Tinder’s new ‘Social’ feature reveals which Facebook friends are swiping - the Verge. Written Apr. 2016. https://www.theverge.com/2016/4/27/11518034/tinder-social-revealsswiping-facebook-friends. Accessed: Apr. 2017.
[54] P. Wijesekera, A. Baokar, A. Hosseini, S. Egelman, D. Wagner, and K. Beznosov. Android permissions
remystified: A field study on contextual integrity. In USENIX Security Symposium, 2015.
[55] L. Yang, N. Boushehrinejadmoradi, P. Roy, V. Ganapathy, and L. Iftode. Enhancing users’ comprehension of Android permissions. In Security and Privacy in Smartphones and Mobile Devices, 2012.
4

[56] L. Yu, T. Zhang, X. Luo, and L. Xue. AutoPPG: Towards automatic generation of privacy policy for
Android applications. In Security and Privacy in Smartphones and Mobile Devices, 2015.
[57] W. Zhou, Y. Zhou, X. Jiang, and P. Ning. Detecting repackaged smartphone applications in third-party
Android marketplaces. In Conference on Data and Application Security and Privacy, 2012.
[58] Y. Zhou, Z. Wang, W. Zhou, and X. Jiang. Hey, you, get off of my market: Detecting malicious apps
in official and alternative android markets. In NDSS, 2012.

Chapter 10

Appendices

2

game
Blossom Blast Saga
Star Wars: Galaxy of Heroes
Clash of Kings
Prize Claw 2
Subway Surfers
entertainment
Netflix
Hulu
Google Play Games
Vine - video entertainment
YouTube Kids
news_and_magazines
Yahoo - News, Sports & More
CNN Breaking US & World News
Viewers to Volunteers
AOL: Mail, News & Video
Fox News
travel_and_local
Waze - GPS, Maps & Traffic
Yelp
Maps
United Airlines
Southwest Airlines
brick-and-mortar*
Stop and Shop
HSBC
Wegmans
Starbucks
Subway
Regal Cinemas

business
Job Search
ADP Mobile Solutions
UPS Mobile
LinkedIn Job Search
Job Search - Snagajob
health_and_fitness
Strava Running and Cycling GPS
Calorie Counter - MyFitnessPal
CVS/pharmacy
Google Fit - Fitness Tracking
Headspace - meditation
social
Facebook
Instagram
Snapchat
Pinterest
Twitter
weather
The Weather Channel
1Weather:Widget Forecast Radar
AccuWeather
Transparent clock & weather
WeatherBug

medical
CareZone
MyChart
FollowMyHealth Mobile
Ovia Pregnancy Tracker
ScriptSave WellRx
finance
Credit Karma
Chase Mobile
Bank of America
Android Pay
PayPal
music_and_audio
Pandora Radio
Spotify Music
SoundCloud - Music & Audio
YouTube Music
Shazam
white noise*
White Noise Free
White Noise Pro 2.0
White Noise Baby
Relax Melodies: Sleep & Yoga
Relax Rain - Nature sounds

Figure 10.1: Apps considered in the classification study (Chapter 3). Categories marked by an asterisk
are not built-in Google Play categories but rather sets of apps with specific qualities of interest to the
study: The “white noise” apps have very similar feature sets, and therefore might be likely to be considered generic by users, while apps in the “brick-and-mortar” category are closely coupled with real-world
products and so might be likely to be single-source. (“Brick-and-mortar” is not mutually exclusive with
respect to the other categories, so there are some apps in other categories that are “brick-and-mortar,”
such as CVS/pharmacy in health_and_fitness and the airline apps in travel_and_local.)

(a) The Calendar permission group.

(b) The Photos/Media/Files permission group.

(c) The Other permission group.

(d) The Identity permission group.

Figure 10.2: The ratings for each permission for the Gmail and MailMan apps.

(a) The Calendar permission group.

(b) The Photos/Media/Files permission group.

(c) The Other permission group.

(d) The Identity permission group.

(e) The Contacts permission group.

(f) The SMS permission group.

(g) The Phone permission group.

(h) The Camera/Microphone permission group.

Figure 10.3: The ratings for each permission for the Waze and ShortCuts apps. After eliminating participants
who had not heard of Waze, a brand-name app, the Waze condition had only 9 participants.

(a) The Calendar permission group.

(b) The Photos/Media/Files permission group.

(c) The Other permission group.

(d) The In-app purchases permission group.

(e) The Phone permission group.

(f) The Contacts permission group.

Figure 10.4: The ratings for each permission for the Pandora and TuneUp apps.

(a) The Photos/Media/Files permission group.

(b) The Other permission group.

(c) The Device & app history permission group.

(d) The Location permission group.

(e) The Contacts permission group.

(f) The SMS permission group.

(g) The Phone permission group.

(h) The Camera/Microphone permission group.

Figure 10.5: The ratings for each permission for the Instagram and PictureIt apps.

Figure 10.6: (Note: This figure may be better viewed in color.) An overview of all of the interfaces explored during our iterative design process (Chapter 8). Arrows map the evolution and cross-influences of
interfaces; solid (black) arrows show redesigns, and dashed (blue) arrows indicate that feedback on one
iconography influenced the design of another. X’s (in red) indicate the elimination of an iconography,
while the checkmark (in green) signifies the interface was included in our in-depth testing.