-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathhe-iml.tex
276 lines (215 loc) · 17.2 KB
/
he-iml.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
%% The first command in your LaTeX source must be the \documentclass command.
\documentclass[sigchi]{acmart}
%% For page numbers! Yay!
\settopmatter{printfolios=true}
%%
%% Remove the ACM copyright information. Found on:
%% https://tex.stackexchange.com/questions/346292/how-to-remove-conference-information-from-the-acm-2017-sigconf-template
% Removes citation information below abstract
\settopmatter{printacmref=false}
% removes footnote with conference information in first column
\renewcommand\footnotetextcopyrightpermission[1]{}
%% Please use this for your title. We use the long version for the title on the
%% first page and the shorter version for the running page header on the right.
\newcommand{\myLongTitle}{Heuristic evaluation of interactive machine learning systems} %% @TODO
\newcommand{\myShortTitle}{HE of IML systems} %% @TODO
%% \BibTeX command to typeset BibTeX logo in the docs
\AtBeginDocument{%
\providecommand\BibTeX{{%
\normalfont B\kern-0.5em{\scshape i\kern-0.25em b}\kern-0.8em\TeX}}}
%%
%% Remove copyright note by ACM: Found on:
%% https://tex.stackexchange.com/questions/21536/how-to-remove-the-copyright-box-on-a-paper-that-uses-the-acm-sig-alternate-cls-c
\setcopyright{none}
%% own package usages
%%
%% end of the preamble, start of the body of the document source.
\begin{document}
%%
%% The "title" command will get the content of "myLongTitle".
\title{\myLongTitle{}}
%% The "subtitle" command is used for the course title and will change
%% every semester.
\subtitle{Proseminar "Interactive Intelligent Systems", Winter Term, 2020/21}
%%
%% The "author" command and its associated commands are used to define
%% the author and affiliation.
\author{Jan Ole Zabel}
\affiliation{%
\institution{Freie Universität Berlin}
\city{Berlin}
\country{Germany}
}
\email{[email protected]}
%%
%% By default, the author name and the "short title" of the document is
%% printed in the page headers. As the full name and the title is sometime very
%% long and stuff might overlap, please use a short version of your name here.
\renewcommand{\shortauthors}{J. Zabel - \myShortTitle{}}
%%
%% This snippet is used a the page header on the outer. This should be changes
%% every semester.
\acmConference[Proseminar "Interactive Intelligent Systems"]{PS IIS}{Winter Term}{2020/21}
%%
%% The abstract is a short summary of the work to be presented in the
%% article.
\begin{abstract}
Interactive Machine Learning enables end-users to design machine learning models. This completely new mode of human-computer interaction requires a high level of usability, which has naturally to be evaluated. The Heuristic Evaluation is a method for analysing software usability, which only needs adjusted heuristics to be applied to IML. On the other hand, there exists a significant but not total lack of research within these combination.
\end{abstract}
%%
%% Keywords. The author(s) should pick words that accurately describe
%% the work being presented. Separate the keywords with commas.
\keywords{human computer interaction, usability analysis, heuristic evaluation, interactive machine learning}
%%
%% This command processes the author and affiliation and title
%% information and builds the first part of the formatted document.
\maketitle
\pagenumbering{arabic}
%%
%% Add a table of contents to this ACM template
%% Removed "Abstract", "ACK", and "References" from TOC (2401 in acmart.cls)
%% Removing the Contents headline from the TOC!
\let\oldaddcontentsline\addcontentsline
\newcommand{\stoptocentries}{\renewcommand{\addcontentsline}[3]{}}
\newcommand{\starttocentries}{\let\addcontentsline\oldaddcontentsline}
\stoptocentries% Stop adding content to the ToC
\tableofcontents
\starttocentries% Resume adding content to the ToC
\section{Introduction}
Since software has started being developed, there has been a natural interest in measuring its usability, as its usage by humans is an integral part of every software. Non-usable software is worthless in the best case, and dangerous in the worst. However, most methods for evaluating usability are made for traditional software without AI abilities. Thus, for intelligent systems, these methods have to be re-evaluated.
This is especially true for Interactive Intelligent Systems (IIS). A special case within these is the interactive machine learning (IML), where the user's feedback is the most integral part. A user trains a model who is not necessarily an expert on the field. This is accomplished through the smart use of visualizations and the user's ability to evaluate the model's decisions. The usability is the most important thing here, although it may be difficult to measure by traditional methods.
Heuristic evaluation \cite{heureval1990} is a usability analysis method that is comparably easy to perform. By aggregating multiple evaluator's input after a semi-formal review, many usability problems can be found.
\section{Interactive Machine Learning}
As AI-driven software becomes more and more important, the same applies for the creation and adaptation of machine-learning models. Interactive Machine Learning (IML) enables "everyday users to interactively explore the
model space through trial-and-error and drive the system towards an
intended behavior, reducing the need for supervision by practitioners"\cite{reviewiml2018}.
The method is done by providing users well visualized feedback on the current performance of a model, and after getting user input back, automatically adjust the model. This can be seen as a feedback loop, and is visualized in figure \ref{fig:iml}.
Many machine-learning approaches benefit from a human in the loop\cite{humankernel2015}, which is also shown by empirical data \cite{imlexp2018}, although IML puts a constraint on the algorithms that can be used with this approach\cite{iml2003}. For instance, many machine learning algorithms optimise on the time the classification takes, but take a long time to train. However, the training should be done in nearly real-time to be an interactive process. This means the usage of fast training algorithms is important. "This allows the interactive cycle to be iterated quickly so it can be done more frequently."\cite{iml2003}.
\begin{figure*}[h]
\centering
\includegraphics[width=0.8\linewidth]{iml-scheme}
\caption{Scheme of the IML approach\cite{whatiml2018}}
\label{fig:iml}
\end{figure*}
\section{Heuristic Evaluation}
The heuristic evaluation is one of the most relied upon method for measuring usability aspects of software. In their early empirical review of this method, \citeauthor{heureval1990} state:
"There are basically four ways to evaluate a user interface: Formally by some analysis technique, automatically by a computerized procedure, empirically by experiments with test users, and heuristically by simply looking at the interface and passing judgement according to ones own opinion."
\subsection{Conducting heuristic evaluation}
\citeauthor{heureval1990} also provide a definition of heuristic evaluation: "Heuristic evaluation is done by looking at an interface and trying to come up with an opinion about what is good and bad about the interface."~\cite{heureval1990} In practice, this can be done also with a specification of a user interface, or with screenshots of the interface. Therefore, it is not necessary to have a finished prototype.
Best, this evaluation is done according to a preferably simple set of usability rules, which are called the heuristics. Nielsen and Molich set up nine basic rules which define what a usability problem actually should violate (table \ref{tab:9heur}). Later, Nielsen updated this set to a quite different list of ten heuristics (table \ref{tab:10heur}).
As a result of this process, a list of problems emerges.
\begin{table}
\caption{Nine basic rules for evaluation}
\label{tab:9heur}
\begin{tabular}{ccl}
\midrule
Simple and natural dialogue\\
Speak the user’s language\\
Minimize user memory load\\
Be consistent\\
Provide feedback\\
Provide clearly marked exits\\
Provide shortcuts\\
Good error messages\\
Prevent errors\\
\bottomrule
\end{tabular}
\description{\citeauthor{heureval1990} 1990~\cite{impdialog1990}~\cite{heureval1990}}
\end{table}
\begin{table}
\caption{Updated ten other heuristics for evaluation}
\label{tab:10heur}
\begin{tabular}{ccl}
\midrule
Visibility of system status \\
Match between system and the real world \\
User control and freedom \\
Consistency and standards \\
Error prevention \\
Recognition rather than recall \\
Flexibility and efficiency of use \\
Aesthetic and minimalist design \\
Help users recognize, diagnose, and recover from errors \\
Help and documentation \\
\bottomrule
\end{tabular}
\description{\citeauthor{he1994} 1994~\cite{he1994}}
\end{table}
\subsection{Reasons to use heuristic evaluation}
The method is used for a number of benefits.
Firstly, it is organisationally easy to do. It does not have a long list of dependencies. It can be done at any point in time in the development process, without much communication needed to other teams or processes.~\cite{evalmethods2018}
Secondly, it is costly inexpensive. It does not require a large number of experts or time, and these do not even have to be usability experts.~\cite{evalmethods2018}
There are other benefits, such as that the kind of discovered issues is fundamentally different from other usability analysis. Heuristic evaluation finds potential errors, that could be missed on a more practical analysis.
\subsection{Testing heuristic evaluation}
However, the method is not without problems. These problems can be investigated by experiments. Although it is not an empiric method, it can be empirically evaluated itself. This was first done by \citeauthor{heureval1990} in 1990\cite{heureval1990}, and subsequently by \citeauthor{conducthe1994} in 1994\cite{conducthe1994}.
The conductors of the experiment in \cite{heureval1990}, which were usability experts, identified problems with the software first. In a next step, the normal evaluation was performed with people who were no experts in this area. Subsequently, the authors aggregated the found (or mentioned) issues in the reports. The evaluators can then be analysed; along with the method itself.
\subsection{Problems with heuristic evaluation}
\begin{figure*}[h]
\centering
\includegraphics[width=0.8\linewidth]{performance-evaluators}
\caption{1990 \citeauthor{heureval1990} empirical test of heuristic evaluation\cite{heureval1990}}
\label{fig:perf}
\end{figure*}
First, as this is a very non-formal and subjective form of evaluation, an evaluator simply looking at an interface or a specification does not see many usability problems. In the early empirical test done by \citeauthor{heureval1990}, evaluators performed quite low: Only one-fifth to half of the problems (that were previously identified by the aggregators) were found by a single person. Also, different evaluators perform weaker or stronger on certain issue categories.
On the other hand, this may be tackled well by multiplying the number of evaluators. Molich and Nielsen also analyse the performing of multiple evaluators, as can be seen in figure ~\ref{fig:perf}. They conclude that 3-5 is an optimal number of evaluators, which is comparably small for a usability evaluation method.
Second, heuristic evaluation tends to discover potential problems rather than actual problems. Following \citeauthor{howtoeval2020}, "problems identified by evaluators can often be false alarms". On the contrary, this may be (to a degree) inevitable using a not-so-empirical usability analysis. This problem may be amplified by the "unpredictable nature" of intelligent systems.
Third, the choice of heuristic is important but difficult. To measure how effective a certain set of heuristics is, one would have to conduct another study like \cite{enhanceheuristics1994}, where the original set of heuristics where refined based on empirical data.
\section{Evaluation of interactive machine learning}
\subsection{New heuristics by \citeauthor{imlheur2018}}
There have been adaptations and redevelopments of the original set of heuristics. Similar to the special nature of many software domains like groupware\cite{groupwareheur2001}, ambient displays, interactive television, et cetera, \citeauthor{imlheur2018} note "how IML challenges many of the assumptions of usability that the original heuristics were meant to address"\cite{imlheur2018}. They developed the below ten new heuristics for IML, structured into three categories:
\subsubsection{Model Input}
The most important aspect of IML is giving the user the ability to influence the model. This is done by reading user input in the first place.
\begin{enumerate}
\item "Enable the user to steer the model" \\
\item "Enable the user to provide concept improving feedback" \\
\item "Capture intent rather than input"\cite{imlheur2018} \\
\end{enumerate}
\subsubsection{User Feedback}
Also very important is the ability to give meaningful feedback to the user. Only this way, the user will know if the model improved at all, and what user actions are needed in the first place.
\begin{enumerate}
\setcounter{enumi}{3}
\item "Support user assessment of model quality" \\
\item "Provide instance-based explanations to the user" \\
\item "Support rich, natural feedback"\cite{imlheur2018} \\
\end{enumerate}
\subsubsection{Interaction Design}
This guideline category requires the program to design the interaction so that an effective loop of user-to-program and program-to-user feedback emerges.
\begin{enumerate}
\setcounter{enumi}{6}
\item "Make interactions and constraints explicit" \\
\item "Promote trial-and-error based model exploration" \\
\item "Error prevention" \\
\item "Help and documentation"\cite{imlheur2018} \\
\end{enumerate}
The authors state, "the specific interpretation and relative importance of these heuris-
tics are likely to vary depending on the level of involvement expected
of the user and on the complexity of the required functionality."\cite{imlheur2018}
\subsection{Execution of the recent HE application}
As IML approach, \citeauthor{imlheur2018} implement two algorithms for decision trees and then examine how well they perform within a heuristic evaluation. They state "it is clear from our evaluations that a classification algorithm must get under the 10-20 second barrier in producing a new classification" and "when creating a new perceptual interface it is not classification time that is the real issue. The important issue is designer time."
So \citeauthor{imlheur2018} can confine the list of feasible algorithms, eliminating "slower algorithms such as neural nets or nearest-neighbor."
"We have implemented two algorithms, which employ different division techniques. These two algorithms also represent the two approaches of longer training time with better generalization vs. shorter training time with poorer generalization. The first strategy slightly reduces interactivity and relies more on learning performance. The second relies on speed and interactivity. The two strategies are Center Weighted (CW) and Mean Split (MS)."\cite{imlheur2018}
\begin{figure}
\centering
\includegraphics{cwss-vs-msss}
\caption{Two Decision tree approaches and times a classifier was built during \citeauthor{imlheur2018}'s experiment}
\label{fig:cwss-vs-msss}
\end{figure}
\begin{figure}
\centering
\includegraphics{act}
\caption{Tree structures and average classification (ACT) time in milliseconds~\cite{imlheur2018}}
\label{fig:act}
\end{figure}
Figure \ref{fig:cwss-vs-msss} shows that the Center Weight algorithm was faster in building classifiers, and figure \ref{fig:act} shows that it was also faster in classifying. Although Mean Split is quicker in building classifiers, users had to build more often, because the approach was less accurate, thus needing more training data to get acceptable results.
\subsection{Evaluation of the evaluation}
It is important to gather insights in the performance of a method. In the starting time of the heuristic evaluation, there were experiments to measure the effectiveness of the method and certain sets of heuristics (like stated before).
Unfortunately, neither the method nor the newly developed heuristics seem to have been empirically tested in the field of interactive machine learning. Also, literature does not know other attempts to update the Heuristic Evaluation to IML, so the previously mentioned heuristics set by \citeauthor{imlheur2018} cannot be compared to similar work. Thus, the significance and meaningfulness of the newly developed heuristics is limited. It seems this field of research is underdeveloped and has to be extended.
\section{Conclusion}
The heuristic evaluation appears as a valid method for many different types of software, due to its basic nature. It has not to deal with the inner modes of operation, and can be applied also to interactive machine learning systems. The adjustments needed are confined to the generation of a new set of heuristics, which is not unusual for heuristic evaluations.
%%
%% The next two lines define the bibliography style to be used, and
%% the bibliography file.
\bibliographystyle{ACM-Reference-Format}
\bibliography{sample-base}
\end{document}
\endinput