Pushing Towards Embodied 
Interactions
Ludwig Weschke
Gustav Börman
Bachelor of Software Engineering and Management Thesis
Report No. 2010:031
ISSN: 1651-4769
University of Gothenburg
Department of Applied Information Technology
Gothenburg, Sweden, May 2009
Pushing Towards Embodied Interactions
Ludwig Weschke
l.weschke@gmail.com
Gustav Börman
gustav.borman@gmail.com
IT University of Gothenburg, Software Engineering and Management. Gothenburg, Sweden
Abstract– New interaction techniques are coming more
than ever before, but are they meaningful? Is there a pur-
pose behind these novel technologies or is it the technology
itself that is emphasized? In this paper we have taken the
approach of embodied interaction and evaluated current in-
teraction modalities in order to design a next generation input
device focusing on the user instead of the technology behind
it. We have established the relevance of embodied interac-
tion and our result brings interesting insights into design con-
siderations when designing novel interaction mediums. We
have also discussed the technical considerations around our
proposed interaction technique.
Keywords: Embodied Interaction, HCI, Interaction De-
vices, Ubiquitous Computing
1 Introduction
T oday the current paradigm of human computer inter-action (HCI) consists mostly of standard computer in-teraction techniques such as keyboards and mouse
devices that has been around for the last 20 years (Dourish,
2004). However, in recent research new novel approaches to
bring the interaction between humans and computers closer
together through more natural and embodied interaction so-
lutions implies a paradigm shift in the way we interact with
the digital environment around us (Jaimes and Sebe, 2007;
Kurkovsky, 2007). Interaction with a system is today no
longer seen as interaction with standard computer systems
alone, e.g. desktop environments, but also seen as the inter-
action with everyday objects such as interactive- whiteboards
(Klemmer, 2005), -furniture, -spoons, -cups (Loke, 2006) or
even digital jewellery (McCarthy et al., 2006).
Due to advances in hardware technologies during the
last decades, the fields like ubiquitous computing, pervasive
computing and context-awareness have recently started to
flourish. Computers, sensors and actuators are constantly
shrinking and dropping in price making actualization of these
areas feasible in practice (Dourish, 2004; Satyanarayanan
et al., 2001). Interaction methods such as haptic, voice,
gaze, wearable devices and gestures are due to the progress
in the hardware market today possible to implement using
more or less standard computer peripherals. This plays an
important role in reducing the borders between humans and
computers in order to make interactive systems embodied in
the real-world environment. The question is, what is needed
to design new interaction mediums to make it more than a
technology hype–to make it part of peoples lives as some-
thing meaningful they can interact with?
The Wiimote used in the Nintendo Wii™1 gaming console
has proved to reduced the borders between participants and
the games they are playing making the system more enjoy-
able and understandable for a broader audience (Bowman
et al., 2008), however, it is generally not preferred by the hard
core gamers (LaViola Jr, 2008). Alternate interaction tech-
niques for personal computers have been proposed, how-
ever these solutions are more or less intrusive on the users
experience as they e.g. have to be equipped with colored
markers placed on their hands (Fredriksson et al., 2008).
Wearable computers used for augmented reality are often
expensive and intrusive, its focus lie mostly on technologi-
cal aspects and its scope of use is often fuzzy (Kolsch et al.,
2004; Plouznikoff et al., 2007).
We have chosen to take the approach of embodied inter-
action (Dourish, 2001) to design and start the development
of a 3D input device aimed at computer desktop interaction,
namely the AiRL3D Project. We argue for taking this ap-
proach into consideration when designing the next genera-
tion input devices as it puts the user in focus of the design in-
stead of the technology behind it. It combines both tangible-
and social computing thus focus on peoples physical skills as
they are as well as their social understandings.
} By embodiment, I do not mean simply physi-
cal reality, although that is often one way in which
it appears. Embodiment, instead, denotes a form
of participative status. Embodiment is about the
fact that things are embedded in the world, and the
ways in which their reality depends on being em-
bedded. ~ - Paul Dourish 2001 p.18
This paper and prototype will make use of a set of design
principles (Dourish, 2001) together with related HCI tech-
niques in order to incorporate embodied interaction into com-
puter systems. Previous research will be evaluated and we
will identify what is good, what is bad and what is missing
in this area and compare the proposed interaction solution
1http://www.nintendo.com/wii
1
towards the previous findings and discuss what needs to
be considered when designing novel interaction mediums in
general. The contributions of this study thus to provide in-
sights into design considerations, technical aspects of the
prototype and state-of-research within embodied interaction.
The aim is not to build state-of-the-art 3D-interaction
application but rather show that it is possible to realize
embodied interaction with the use of general design prin-
ciples to develop a non-intrusive prototype out of low-cost
hardware. It is not meant to be seen as a replacement to
current technology, since both the keyboard and the mouse
have their specific purpose, merely as an extension to the
available interaction space.
The rest of this paper is structured as follows. Section
2 will go through research related to embodied interaction
and its related areas, interaction modalities and technical
aspects. Section 3 will present the research design and
methods used in this paper followed by section 4 which
presents the data collected in the literature review and
from the interviews. Section 5 will analyze these findings
followed by technical details of the prototype in section 6.
We conclude this thesis with our results and propose further
research in section 7.
2 Related Research
Embodied Interaction
The area which is of particular interest in this paper is em-
bodied interaction which is a richer form of interaction with
computer systems. Embodied interaction is focused on how
humans create and communicate meaning with a system in
the realms of both physical and social reality i.e. in contrast
to the traditional interaction using mouse devices and key-
boards controlling graphical user interfaces consisting of win-
dows, icons and menus (WIMP) (Dourish, 2001). Embodied
interaction is a common ground for social- and tangible com-
puting (Dourish, 2001) which in itself is part of the broader
area of context-awareness which is characterized by Dey
et al. (2001) as the interaction between humans, application
and the surrounding environment. Dey also acknowledge the
challenge that the notion of context is ill defined within the
area of HCI. The research area of context-awareness have
evolved and is now consisting of several different fields such
as pervasive-, ubiquitous-, and mobile computing. However,
embodied interaction spans over several of these areas com-
bining the foundations to create the common notion embod-
iment. This notion promotes the creation of more natural in-
terfaces by incorporating social and/or physical skills in or-
der to create a more meaningful interaction space, allow-
ing participants to engage in systems in a more natural way
(Dourish, 2001) and this is why we choose the forum of em-
bodied interaction rather than covering the aspect of context-
awareness. Social computing incorporates the area of soci-
ology into interactive computer systems design whereas tan-
gible computing takes use of real world objects into interac-
tive systems enabling humans to interact with several senses
(Ishii and Ullmer, 1997).
Dourish (2001) presents six design principles to be con-
sidered when designing interactive systems:
1. Computation is a medium
2. Meaning arises on multiple levels
3. Users, not designers, create and communicate meaning
4. Users, not designers, manage coupling
5. Embodied technologies participate in the world they
represent
6. Embodied interaction turns action into meaning
- Paul Dourish 2001 p.163
Dourish means that these principles should not be used as
rules but rather describe the features of embodied interac-
tion and design considerations to keep in mind for a designer
implementing interactive systems both at a theoretical- and
design level. We intend to follow these principles as a tool to
guide our attention during the design.
Embodied interaction, or embodiment, is a commonly oc-
curring concept used in previously presented papers. An im-
portant note to take is that embodiment is about how we in-
teract with technology and not a technology itself (Dourish,
2001) thus as we go on presenting related design consider-
ations and approaches, technologies such as tangible user
interfaces and context-aware applications are technologies
related to embodiment as they withhold more or less embod-
iment.
SignalPlay is a prototype system designed in order to get
an understanding for how users explore and understand the
space that are occupied with ubiquitous computing technolo-
gies. It builds on the concept of embodied interaction and
focus on how people act together in space rather than focus
on interaction (Williams et al., 2005). They reach conclu-
sions that peoples understanding of interacting with ubiqui-
tous computing applications was in terms of individual ac-
tions and components rather that an understanding of the
whole system. Designers of systems should emphasize indi-
vidual features, not as elements of the system but rather as
objects who have their own meaning and interpretation. Sec-
ond they concluded that current design approaches consid-
ers mostly the sequential organization of interaction but lacks
to understand the temporal aspect such as pace and rhythm
when designing systems. Design models should address
space as something explicitly constructed and managed as
people come to understand infrastructures through ubiqui-
tous technologies. The concept of space can mean multiple
things, not only the digital interaction space but also the phys-
ical space. It can mean the space a two- or three dimensional
computer desktop occupy, the space in which sounds travel
through and how we interpret it as well as something people
share and interact within. Space comes hand in hand with
place whereas space is concerned with the physical environ-
ment and place is where social understandings unfold in that
space (Dourish, 2001; Williams et al., 2005).
Empirical work has been done on comparing traditional
graphical user interfaces to tangible user interfaces where
the effects on user’s spatial cognition have been studied
when modeling 3D-blocks. It was shown that tangible user
interfaces gave a richer sensory experience thus off-loading
their cognition. In addition, the correlation between percep-
tual and immersive gestures eased the creativeness of peo-
ple. It was concluded that off-loading cognition and immer-
sion should be considered when designing novel user inter-
faces and that it should be either through multimodal user in-
2
terfaces or spatial systems with augmented reality (Kim and
Maher, 2008).
Plouznikoff et al. (2007) presents a novel interaction ap-
proach for meaningful information manipulation through vir-
tually embodied software processes in a users environment
with wearable computing. Here the design and implemen-
tation, to bring embodiment and meaningful interactions to
a system, is described in a more direct and technical man-
ner as opposed to the design considerations previously men-
tioned. The result from the research showed that this inter-
action techniques where feasible at least in a controlled en-
vironment.
Interaction Modalities
There are several areas where gestures, embodied interac-
tion and spatially convenient applications has found its way
to. LaViola Jr (2008) presents the evolution of gaming and
draws conclusions that gaming consoles such as the Wii™
gaming console might not provide the precision and expres-
sive powers as others might but that its more appealing to
the casual gamer. Its hard to remember complicated but-
ton sequences if your not a hardcore gamer, thus its more
appealing for a casual gamer to play more naturally using
gestures. He also explains that 3D interaction is no longer
a gimmick since hardware have become cheaper and faster
making it more feasible to implement. Wingrave et al. (2009)
discusses the Wiimote’s capabilities and shows that spatially
convenient hardware indeed is becoming mainstream. How-
ever as applications are designed with different input hard-
ware by industry, academia etc. it is going to be hard to truly
introduce it on a large scale since no common input hardware
platform exists.
Spatially convenient interaction techniques have also
found its way into the automotive industry where the current
interaction technologies with infotainment systems can dis-
tract the drivers attention. Previous research and surveys
have looked at interaction techniques such as gestures and
speech in in-vehicle systems (Alpern and Minardo, 2003;
Bach et al., 2008; Pickering et al., 2007) where they found
that using gestures and similar interaction techniques for
controlling in-vehicle secondary task is viable, reduces inter-
action complexity and that it has much potential in the future
accordingly. An overview of potential problems regarding
3D user interfaces and interaction in cars is given by Tön-
nis et al. (2006) where they find that keeping it simple and
provide multimodal interaction techniques have high poten-
tial when interacting with in-vehicle systems. Althoff et al.
(2005) does experimental results on a head- and hand ges-
ture video-based system implemented on a BMW limousine
and shows that in-vehicle gestures are both effective and in-
tuitive since a driver can focus on its primary driving-tasks.
Technical
At this point we have established the relevance of an embod-
ied interaction perspective to related areas, and a number of
existing interaction modalities related to the prototype we are
developing and assessing in this paper. To accomplish this
we also need to understand the technical aspects in devel-
oping this prototype and will move on to this now.
The main technique needed by our prototype is stereo
vision which has a variety of different approaches such
as correlation-based stereo vision (Hirschmüller, 2001;
Hirschmüller et al., 2002), global, semi-global (Hirschmüller,
2006), local and local-global (Zhao et al., 2006). The dif-
ferent techniques listed here have more or less been suc-
cessfully implemented in real-time or near real-time systems
which is one of the prerequisites for the design of our imple-
mentation since the interaction medium needs to respond in-
stantly to make the users feel like they are directly interacting
with the system to keep their attention focused. Techniques
for detecting fingertips and gestures in real-time have been
proposed by several authors such as (Malik, 2003; Manresa
et al., 2000).
3 Research Design
The research conducted in this paper have been ad-
dressed with an extensive literature review, lead user
methodology (Hippel, 1988) together with the repertory grid
interview technique (Tan and Hunter, 2002) while using a
interpretive research approach (Walsham, 2006) to analyze
the data.
Literature Review
When the topic was identified, an extensive literature re-
view was conducted for the purpose of evaluating the cur-
rent state of research within embodied interaction, related
areas and interaction techniques. Most of the previous re-
search presented here was gathered from the top journals
Human-Computer Interaction (HCI)2, Communications of the
ACM (CACM)3, IEEE Transactions on Software Engineer-
ing4, IEEE Computer Graphics and Applications5 and IEEE
Pervasive Computing6 which are seen as some of the lead-
ing journals within HCI and the areas of context awareness,
pervasive- and ubiquitous computing among others. These
sources have also been complemented with articles found
through keyword search from other scientific databases such
as IEEE Xplorer7 and Science Direct8.
Lead User Methodology
As we wanted to investigate if this novel interaction tech-
nology was feasible, we decided to make use of lead user
methodology. The reason for this, as described by Hippel
(1986, 1988), is that lead users can foresee the future of
products in the market. When new products reach the mar-
kets e.g. a new phone, its generally not so much different
than its predecessor and "normal" users can participate to
provide input to those sort of products as they are famil-
iar with them. However, regarding novel products, "normal"
users might feel alienated by them, hence a lead user, or ex-
pert, was a natural choice to provide input to our prototype.
2http://hci-journal.com/
3http://cacm.acm.org/
4http://www.computer.org/portal/web/tse/
5http://www.computer.org/portal/web/cga/home
6http://www.computer.org/portal/web/pervasive/home
7http://ieeexplore.ieee.org
8http://www.sciencedirect.com/
3
Hippel defines lead users in two aspects: “1. Lead users
face needs that will be general in a marketplace, but they
face them months or years before the bulk of that market-
place encounters them, and 2. Lead users are positioned to
benefit significantly by obtaining a solution to those needs”
(Hippel, 1988).
We adapted a four-step process described by Hippel
(1986, 1988) to best utilize the lead users. The process
consists of the following steps: 1) Identifying an important
trend, 2) Identifying Lead users, 3) Analyzing Lead User in-
sights and 4) Testing Product Concept Perceptions and Pref-
erences. The fourth step was ignored in this article as it was
out of the scope. In the first step, we identified the current
state of research within alternate interaction techniques and
interaction trends through the literature review. In the second
step we identified lead users with higher technical knowledge
and sufficient insight in to the domain of our prototype. The
lead users were then interviewed with the repertory grid tech-
nique.
The Repertory Grid Technique
The repertory grid technique is a cognitive mapping tech-
nique (Tan and Hunter, 2002). The technique evaluates peo-
ples views on objects and events in terms of bipolar personal
constructs (Kelly, 1955), e.g. a cup of coffee may be seen
as hot while a glass of soda is seen as cold, the construct
would then be ’hot-cold’. The repertory grid is composed of
elements, constructs and links. The elements is the objects
that the interviewees will evaluate within a specific domain
(coffee, soda), the constructs are the bipolar personal con-
structs which they label the objects with (hot-cold) and the
links relates and shows the interpretations an interviewee
have between a construct and an element (Tan and Hunter,
2002).
There are several different ways on how to design and exe-
cute the repertory grid. The general steps it consists of are 1)
element selection, 2) construct elicitation, 3) linking elements
to constructs (Tan and Hunter, 2002). Below is a description
of the setup used in this paper deriving from Tan and Hunter
(2002).
1. Element selection, in this step we selected a number of
elements related to the proposed prototype in this pa-
per.
2. Construct elicitation, here we chose to use the triadic
sort method where three elements were selected from
the whole set of elements. The interviewee were then
asked to point out one of the three elements that was
different in some aspect from the other two to then
put bipolar personal construct labels on them. This
step was repeated with different triads until a required
amount of constructs was obtained from an interviewee.
3. Linking the elements to constructs, In the final step to
link the elements with the constructs we let the intervie-
wee rate the elements on a scale from one to five on
each of the constructs obtained.
When the steps above were performed with all intervie-
wees, the data was analyzed by transforming it using the FO-
CUS cluster analysis algorithm (Shaw and Thomas, 1978).
In order to do this we used the tool Web Grid 59 in which the
data was inserted into and which automatically performed
the cluster analysis.
Interpretative Research
The data that have been analyzed consists of data found in
the literature review, the repertory grid results, the proposed
prototype and related interaction devices. From this, themes
have been identified and interpreted around the design prin-
ciples proposed by Dourish (2001), this have been done on
a continuous basis. The interaction techniques or proposed
solutions presented in the current research are also evalu-
ated based on attributes such as degrees-of-freedom (DOF),
intrusiveness, naturalness, accurateness, mobility and feed-
back.
4 Reflections of Data
Collected
On Interaction Devices
Several different approaches to extend the interaction space
from pure mouse and keyboard or touchscreen interfaces
have been invented such as voice recognition, gaze detec-
tion, facial recognition, motion detection and tangible inter-
faces. Of these modalities the most commonly known tech-
nique is interaction via hand gestures or facial recognition
which is done by processing the images captured by vari-
ous input devices such as web- or a infrared cameras. Other
techniques such as haptic input devices like force feedback
game controls, -gloves and -pens have also been proposed
to decrease the boarder between the user and the system
by giving feedback from the virtual environment back to the
user. Some less known devices have been also proposed
such as Blowable User Interfaces (Patel and Abowd, 2007)
or a device for improving human-chicken interaction through
the internet (Lee et al., 2006)
Frohlich et al. (2006) presents an overview of their solu-
tions for interaction in three dimensional space in order to
extend the two dimensional available standard input devices
such as the mouse. The GlobeFish is a 6-DOF input device
consisting of an elastically suspended trackball allowing ro-
tation around all three axes as well as spatial movement by
applying force to the trackball. The same authors proposed
a hand-held device for spatial 3D interaction in virtual reality
environments called Two-4-Six consisting of a touchpad for
rotating an elastic ring for viewpoint motion and an analog
rocker sensor controlled by the users index finger for control-
ling depth movements (Bowman et al., 2008; Frohlich et al.,
2006).
Holman et al. (2005) introduces a system called Paper-
Windows for bridging the gap between the digital environ-
ment and the real world while working with documents based
on paper enhanced with IR markers where the document is
projected on the paper and easily changed by rubbing the
9http://gigi.cpsc.ucalgary.ca:2000/
4
paper against a LCD display. The interaction with the inter-
active paper is done by gestures captured by a motion de-
tection system based on a marker placed on the users index
finger. The technique allows for documents to be handled in
a more natural way combining the properties of both worlds.
Plouznikoff et al. (2007) introduces a prototype where
meaningful real-world actions in a mobile setting can take
place with help of a wearable computer consisting of a head
mounted display, a webcam and a vest containing the com-
putational hardware. Their idea is that e.g. when you are
walking on the street and an e-mail arrives or the weather
report is given, that an avatar will be displayed in a head
mounted display in front of you. These events can then be
handled by picking up the avatar (accepting the event) or
brushing it away (rejecting the event), other examples where
also given such as stomping on the avatar (rejecting the
event) and kicking it away (delaying the event). Several other
researchers have created prototypes for AR by the use of
wearable computation devices such as vests as in the previ-
ous example or by a backpack containing the computational
platform as proposed in (Avery et al., 2005; Piekarski et al.,
2004; Smith et al., 2005) and others.
Games like Guitar Hero10, Rock Band11, flight simulators
and racing games allows the user to be engaged in the
games in a more realistic way by offering custom made con-
trollers shaped like guitars, drums, basses, flight controls and
steering wheels. Even though these controllers are a suc-
cess they are quite expensive, especially if you look at the
area they are applicable to, these kind of controllers tends
to be tailor made for one specific game. These controllers
takes gaming to a more realistic level, however, they are still
only adaptations to the original controllers still interacting in
two dimensions mapped to a three dimensional world caus-
ing the controllers to score low on the DOF scale.
In 2006, Nintendo released their gaming console Wii™
introducing a brand new way of interacting with the gam-
ing environment. Instead of using a static input modality
such as a standard hand-controller, mouse or keyboard, they
extended the controller (Wiimote) to take advantage of the
users hand movement taking the first step towards incorpo-
rating the users themselves into the actions of the game. Re-
search that have been done on the new modality have shown
to create a better understanding and involvement than pre-
vious ways of interacting with games making the console a
success to a wider audience. (Bowman et al., 2008; LaVi-
ola Jr, 2008). LaViola Jr (2008) comes to the conclusion that
even if the Wiimote allows for spatial coarse 3D interaction,
few of the available games takes advantage of this due to
developers lack of knowledge on how to exploit the opportu-
nity and that the extra freedom it offers is usually seen as an
afterthought. LaViola Jr (2008) also points out that the Wi-
imote initially is not able to function as a input device comply-
ing with the requirements for the 6-DOF, however, the coarse
precession may be improved by the additional MotionPlus12
accessory to the control.
Sony’s upcoming motion controller Move13 resembles the
Wii ™ interaction method combining advanced motion sen-
10http://guitarhero.com/
11http://www.rockband.com/
12 http://www.nintendo.com/wii/console/accessories/wiimotionplus
13 http://us.playstation.com/ps3/playstation-move/
sors and visual input via Playstation’s Eye™ 14 camera
promising to “Take core gaming to a new level or bring your
whole family in to the adventure.."15. Microsoft’s upcoming
Project Natal™16 to the Xbox360™17 console offers similar
approach to embodiment, tracking the users body movement
by replacing the physical controller with the users own body
movements. At the time of writing the information on Mi-
crosoft’s Project Natal™ and Sony’s Move™ is sparse and
most speculative, however one can see a trend of growing
emphasis on embodiment within this area.
Image Processing Techniques
Since our idea for the prototype was to create an input device
based on natural hand interaction, extensive research was
done in the area of computer vision and gesture based inter-
action via webcams and/or infrared sensors. Manresa et al.
(2005) determines the location of the users hands by the
use of skin segmentation, others such as Shan et al. (2007)
make use of particle based tracking to determine areas of
interest. Other widely used techniques for hand recognition
include contour tracking as used by Bowden et al. (2004),
background subtraction used by e.g. Coogan et al. (2006);
von Hardenberg and Bérard (2001) or by colored markers to
identify fingertips or other features based on histogram anal-
ysis (Fredriksson et al., 2008; Joslin et al., 2005; Keskin et al.,
2003). Ogawara et al. (2003) presents a device for track-
ing hands by using infrared cameras, however, this approach
was dismissed due to the need of specialized equipment or
the need for hardware modification of a standard webcam to
be able to capture the spectrum of infrared light. Agarwal
et al. (2007) presents a solution to make any surface inter-
actable like a tabletop by using two cameras mounted above
the surface and the use of stereo vision to determine the lo-
cation of the hand as well as the distance from the surface
that the image is projected on. Schlattmann et al. (2007)
are using similar solutions for detecting posture and/or hand
gestures using stereo vision however most of the techniques
mentioned above tracks gestures and/or 2D movement of
only one point and all of them have their own benefits and
drawbacks.
There are several approaches to detect hand properties
such as fingertips, palm and wrist once the hand have been
segmented from the rest of the scene. Principal component
analysis (PCA) is one of the common techniques for com-
paring the eigenvalue of the input segment to a predefined
set of reference images. Hidden Markov Model (HMM) or
adaptations of the method is another common approach for
detecting hand properties and gestures.
Repertory Grid Results
As described in the Research Design section, lead user
interviews with the repertory grid technique were con-
ducted. Four lead users were identified who had the required
technical- and domain knowledge. The area from which
14http://us.playstation.com/ps3/accessories/scph-98047.html
15http://us.playstation.com/ps3/playstation-move/
16http://www.xbox.com/en-us/live/projectnatal/
17http://www.xbox.com/en-US/
5
the elements was selected from was interaction technolo-
gies ranging from traditional devices to unreleased devices
including our own prototype. The elements that were chosen
are listed in Table 1. The participants did have good knowl-
edge about most of these elements beforehand and informa-
tion was given before the interviews started in the cases they
lacked sufficient knowledge. From these elements, five triad
combinations were selected and shown to the participants
as a basis for the construct elicitation. The participants were
then given the task to rate their own constructs on a scale
from one to five.
Elements Description
BMW Gestures In-vehicle system for controlling in-
fotainment systems with predefined
hand- and head gestures (Althoff
et al., 2005).
Project Natal™18 Controller-Free gaming for the
Xbox 360™ using your own body to
control the games.
Nintendo Wii™19 Gaming console where players
control the games with physical
gestures in combination with button
presses.
AiRL3D20 The proposed prototype in this pa-
per.
iPad21 A tablet computer with multi-touch
touchscreen.
Mouse Devices Standard mouse devices.
HandVu22 Wearable computer hand-tracking
for mouse-based interaction or
augmented reality interaction.
Table 1: Elements used in the repertory grid.
After the data was collected from all lead users, everything
was inserted into Web Grid 5 23 where the FOCUS cluster
algorithm generated an initial, unmodified view of the reper-
tory grid data which is illustrated in Figure 1. What can be
seen in the figure is that some of the constructs resembles
each other and in some cases they are almost identical. The
majority of the constructs address to what extent you are de-
pendent to use something physical to interact with e.g. ’Tied
- Independent’, ’Natural - Mechanical’ and ’Restricted - Free’.
Followed by constructs that indicate how flexibly you can in-
teract with these devices or interaction technologies e.g. ’All-
round - Predefined’, ’Freestyle - Predefined’ and ’Invariable -
Versatile’.
The next step was to look at the FOCUS view more closely
in order to narrow it down. First constructs that were re-
lated less than 88% where sorted out followed by grouping
of the constructs that resembled each other. This was done
in order to get the most outstanding differences and to get a
comprehensible set with unique constructs to interpret. This
generated another FOCUS view shown in Figure 2.
Notes where also taken during the interviews, as the par-
ticipants discussed why they wanted specific constructs, in
18http://www.xbox.com/en-us/live/projectnatal/
19http://www.nintendo.com/wii/console
20http://sourceforge.net/projects/airl3d/
21http://www.apple.com/ipad/
22http://www.movesinstitute.org/ kolsch/HandVu/HandVu.html
23http://gigi.cpsc.ucalgary.ca:2000/
Figure 1: FOCUS, Domain: Interaction Technologies
20 Constructs, 7 Elements
Figure 2: FOCUS, Domain: Interaction Technologies
6 Constructs, 7 Elements
order to clarify what these constructs means as they can be
perceived differently depending on the context. Their reason-
ings are briefly described below and will be looked at more
closely in the analysis section.
User Friendly - Unfamiliar Novel interaction techniques
and less explored areas of use were predicted to be seen as
alienating and unfamiliar as they can be perceived as com-
plicated or overambitious. Interaction devices like the mouse
were seen as more user friendly as they are and have been
broadly established interaction techniques.
6
Single Purpose - Utilizable This construct describes to
what extent these interaction techniques could be utilized in
different areas and also within the current destined area of
use.
Entertaining - Mundane Entertaining was here seen as
fun and engaging whereas mundane was seen as standard
or traditional interaction.
Mechanical, Restricted - Natural, Free This mainly
means how much you need to be, or are, aware of the phys-
ical presence of the hardware you are interacting with. How-
ever, it is also related to the construct ’Versatile, Allround -
Predefined, Invariable’ as devices using predefined gestures
that have to be learnt was seen as mechanical behaviour.
Direct - Concealed This construct was selected based on
how “hands-on” your experience would be. While e.g. a
mouse device was seen as concealed as it acts as a tool be-
tween you and your computer, other interaction techniques
were seen as having more direct control of your interactions.
Versatile, Allround - Predefined, Invariable This con-
struct tells us about how flexibly you can interact e.g. all-
round and versatile indicates that there are several plausible
ways in which artifacts can be acted upon as they unfold in
real-time. The interaction technique was seen as predefined
or invariable if there where limited ways or if you have to con-
figure they ways in which you interact.
5 Analysis
In this section we will discuss design considerations re-
garding the proposed protoype to other interaction tech-
niques together with the repertory grid constructs using
Dourish (2001) six design principles.
Computation Is A Medium
When designing interactive systems, Dourish means that
one should not focus on the technologies capabilities but
rather how they are embedded in a set of practices. “Mean-
ing is conveyed not simply through digital encodings, but
through the way that computation enlivens those encodings
with semantic and effective power” (Dourish, 2001). He
means that computer systems should be seen as augmenta-
tions and amplifications of our own activities.
The design of the prototype strived for a more realistic in-
teraction experience, but also realistic in the sense that it
should not be considered as a replacement to current in-
teraction devices as they still serve their purpose. So how
do you put focus on practice instead of technology? And
what does practice really mean? As an example, in the
paper on the wearable computer presented by Plouznikoff
et al. (2007), they do realize that the natural actions proposed
could be socially unaccepted and that a more applicable area
of use could be in-vehicle interaction to minimize distracting
events, but should it then be necessary to be equipped with
a wearable computer inside a car? Could not an augmented
reality windshield display (Kim and Dey, 2009) better serve
that purpose since a head mounted display could be consid-
ered to be intrusive and superfluous to a user. Even if it would
be highly applicable in an area, is it really understood by a
broader audience that kicking away an avatar delays e.g. a
notification message? What we want to illustrate here is that
it is not enough to have a technologically advanced device
which people can naturally interact with, when designing an
interactive system one also needs to consider the area and
context it is supposed to be used within.
This can be related to the construct ’User Friendly - Unfa-
miliar’ which seems to be rooted in what expectations a user
have about an interaction device. The clearer the purpose
and meaning is, the more user friendly and welcoming the in-
teraction device or technique is. To enable people to see this,
it should be important to steer the focus of the design early
on by thinking of about what practice really means, it is not
enough to look at practice or natural interaction as only e.g. a
hand movement by itself. The AirL3D project was rated right
in the middle of this construct, this was not all surprising con-
sidering that it is a novel interaction technique. Nevertheless,
it was seen as more user friendly than the wearable computer
(HandVu) and in-vehicle gestures (Althoff et al., 2005) which
is seen as promising.
Meaning Arises On Multiple Levels
With this principle Dourish means that representations of ar-
tifacts in interactive systems also can be acted upon and as
well as understood by the user that it is possible. As an ex-
ample he describes the traditional WIMP desktop approach
where icons represents files but that the embodiment of the
interaction comes into play when the icon can be treated as
an artifact as well. The new meanings that arises with the
AiRL3D project can be illustrated in a comparison with a nor-
mal mouse device, which in this case are the artifacts. With a
normal mouse, users can move the cursor, navigate through
a system, move things around and resize windows etc. it
has been around for a long time and therefore feels natu-
ral for most people, as recognized by Dourish, but it is still
just a cursor that we move around. Guided by the principles
of Dourish and the analysis of different input devices for 3D
space we came to the conclusion that the design of our pro-
totype should be non-intrusive, spatial, free and allow for a
high DOF. The design should be able to simultaneously track
multiple fingertips and posture of the hand without the use
of intrusive markers. The idea with the prototype is to not
restrict the user to predefined gestures but instead freely in-
teract with the environment much like a multi-touch system
and at its simplest acting as a single mouse cursor while at
the same time allow for extending the interaction to multiple
fingertips rendering it to a more iconic interaction than the
graphical cursor.
This meaning on multiple levels leads to the construct ’Ver-
satile, Allround - Predefined, Invariable’ where the proposed
prototype together with HandVu and Project Natal was seen
as the most allround and versatile devices. As seen by the
ratings on all devices, this seem to indicate what degree of
spatial freedom you have, however, this was not the case,
BMW gestures was seen as predefined and invariable al-
though it is more or less spatially free. What is also seen
7
in Figure 2, is that the ratings in this construct is similar to
the ratings in the construct ’Mechanical, Restricted - Natural,
Free’ although they address slightly different aspects. This
observation is important as it shows that even if you do not
have to be aware of the physical hardware its more or less
perceived as predefined or invariable.
Interestingly to also be seen in the ratings is that HandVu
was seen as versatile and allround while it was seen as
something in between the construct ’Mechanical, Restricted
- Natural, Free’ unlike the other elements that got fairly simi-
lar ratings on these constructs. Since the HandVu wearable
computer did not have clear area of use, the participants rea-
sonings behind seeing it as allround and versatile was that it
had the potential of being used in many areas because of its
mobility.
Users, not designers, create and
communicate meaning; Users, not
designers manage coupling
Traditionally a designers responsibilities concerns how an
artifact looks like and functions together with other artifacts
as well as how users will use them. However, artifacts are not
always used as intended by the designer ”Meaning, and its
coupling to the features and representations the system of-
fers, emerge from actions of users, not designers” (Dourish,
2001). This is the reason we do not want to implement e.g.
pure gestures but instead allow for several plausible actions
to be made upon artifacts in a desktop environment. As iden-
tified by Suchman (1987), that plans can not fully determine
nor reconstruct an action in a specific situation and means
that we need to consider how people make use of what their
given in specific situations. It is not only the gestures per se
we argue against, but rather that you should not consider and
design for every possible use case scenario that might arise
while using a system in order to support embodied interac-
tion. This is not seen this as a flaw, rather as a feature of the
system enabling users to potentially create the meaning they
are striving to achieve.
Looking at the current state of HCI, one can see a clear
trend of using statically defined hand gestures as an al-
ternate interaction technique where standard input devices
might not be applicable, but these might limit the user to ex-
plore the medium and create their own meaning as proposed
by Dourish. Even though statically defined gestures might be
applicable in certain areas such as within the automotive in-
dustry where the visual feedback needs to be kept to a mini-
mum (Althoff et al., 2005), other areas where visual feedback
is high we argue for a more flexible approach to allow for the
exploration and creation of meaning within the system.
Embodied Technologies participate in the
world they represent
The embodied perspective means that mind and body, rep-
resentation and object are not separate entities but that they
exist in the same world, “..embodiment does not denote
physical reality, but participative status.” (Dourish, 2001). As
an example, a number of years ago before the internet and
the advanced search engines where available to the broader
community and you wanted to find information about a cer-
tain topic, it felt natural to look for the answer in e.g. an ency-
clopaedia. However, today it is just as, or even more, natrual
to go to a computer to find the information you are want, it has
become participative in the world we live in. The construct
’Direct - Concealed’ points out that both our proposed proto-
type as well as Project Natal allows the user to become more
participative and engaged in the virtual environment due to
the direct interaction via your own actions instead of being
conveyed through a combination of e.g. mechanical button
presses. The intention with our prototype is that it should be-
come just as participative as a mouse or a keyboard in the
meaning of being seen as something established, but with a
higher sense of freedom, allowing the user to adapt the in-
teraction to his/her meaning instead of adapting themselves
around the interaction medium.
As mentioned earlier, we strived to design for a realistic
life-like interaction, but one should also be careful to overdo
it as embodiment implies participative status not physical re-
ality. The envisioned technique in PaperWindows Holman
et al. (2005) is a good example that illustrates that there are
benefits both in the physical and digital world and that there
are advantages in combining them both. They take the phys-
ical properties of paper which can be archived and queried
as efficient as in the documents you have on your computer.
Even though their envisioned technique is not quite there yet
due to the cost of for instance OLED displays, their vision
to implement their proposed interaction technique on high-
resolution paper-thin displays seems promising and the re-
sults from their experiments have got positive feedback from
the users. The reason why we see their prototype as promis-
ing is that once the required hardware is out on a consumer
wide market it can provide the freedom and flexibility to sup-
port the guidelines for embodied interaction where the user
can create their own meaning of the device.
Even if the user should be the center of attention when
designing interaction mediums, the technological properties
should not be totally neglected as they do have advanta-
geous properties the physical world does not have.
Embodied Interaction Turns Action Into
Meaning
It all boils down to this final principle which could be argued
as the main principle for supporting embodied interaction–
“Embodied interaction turns action into meaning as part of
a larger system. Meaning, after all, does not reside in the
system itself, but in the ways in which it is used.” (Dourish,
2001). The question is, does the AiRL3D project support
embodied interaction?
To this point we have looked at several different interac-
tion devices and lessons have been learnt from all of them.
The Nintendo Wii™ gaming console offers a higher degree
of spatial interaction by tracking the users control via IR sen-
sors aided by motion sensors within the controller itself. As
mentioned earlier, the system has been a huge success in
a wide audience without emphasis on the graphical experi-
ence in comparison to other systems, since focus have been
put on an engaging interaction technique instead. However,
the precision is very coarse and does not really allow for real
interaction in three dimensional space due to the low DOF.
8
Input devices such as the GlobeFish and Two-4-Six ranks
high on the DOF scale and allows for interaction in three di-
mensional space suitable for virtual reality environments and
CAD applications, but they still suffer from the same draw-
backs as the previous controllers. They all require the user
to use physical controllers and they only allows control of a
single interaction point at a time restricting the user to freely
explore the interaction space.
HandVu is also available for computer desktop interaction
without the intrusive wearable computer equipment and can
roughly track the hand motion of the user for controlling a
mouse cursor, however the precision is rather low and the
interaction is in a plain two dimensional space restricting the
user for real three dimensional interaction. We ignored most
of the techniques used in the area of augmented reality, even
though some of the techniques aids the merge of virtual and
real world entities and also share the same vision of embodi-
ment, however, at the current state most of them are not ma-
ture enough to not be intrusive on the user. Even though
embodiment is clearly existing within the research field of
augmented reality the focus tends to lean more towards the
technology behind the sensors and actuators rather than the
aspect of embodiment.
Guided by the principles proposed by Dourish together
with our findings we made, we designed an interaction
medium that should be capable of tracking fingers separately
functioning much as a multi-touch display with an additional
dimension without being intrusive on the user. The reason
why we chose not to include static gesture recognition in our
proposed design is to allow the user to interact freely with
the system as if it was extended right to the top of their fin-
gertips without having to learn multiple predefined patterns.
It is realized that interaction by using only your hands can
cause fatigue since the user does not have a place to off-
load the weight of the arms compared to using a keyboard or
mouse but as previously mentioned our proposed prototype
is not meant to be a replacement merely an extension to the
interaction space.
The results from the repertory grid technique were use-
ful as they highlighted important properties of interaction de-
vices. The constructs ’Single Purpose - Utilizable’, ’Versatile,
Allround - Predefined, Invariable’, ’Mechanical, Restricted -
Natural, Free’ and ’Direct - Concealed’ are all closely related
to each other as they more or less address flexibility but on
slightly different levels. In broad terms, the ’Single Purpose
- Utilizable’ looks at to what purpose and scope of use the
interaction technology can be utilized for followed by ’Ver-
satile, Allround - Predefined, Invariable’ which address the
flexibility within the scope of use. The construct ’Mechanical,
Restricted - Natural, Free’ tells us to what extent you are re-
stricted or liberated from the presence of the hardware while
’Direct - Concealed’ address the sense of how much your
physical body is in control of the interaction. This illustrates
how much a designer needs to keep in mind when designing
interactive devices, there is a fine line between these con-
structs. The proposed prototype was on average seen as
the most flexible within these constructs followed by Project
Natal and HandVu which is seen as promising results since
it was qualities we aspired for. It should be understood that
these results does not indicate that it is better, this depends
on what you want to achieve with the interaction device.
The prototype was also seen as the most entertaining in
the construct ’Entertaining - Mundane’ together with Project
Natal and iPad. This construct seemed to address entertain-
ing in the sense of the interaction itself, not for the area which
it was applied to. This can be observed in the Nintendo Wii™
case where it was ranked lower than the iPad and AiRL3D
as they are meant to be utilized in other aspects than purely
gaming. Therefore a construct such as ’Entertaining - Seri-
ous’ or ’Playful - Dull’ or similar could also have been useful
to take into consideration.
Even though all of the devices mentioned earlier have their
specific purpose and are great input devices few of them
clearly shows the support for embodied interaction and they
are designed around the current interaction paradigm which
restricts the user of the freedom of exploring the medium.
That is why we do not only propose a device derived from
Dourish principles of embodied interaction together with a
high DOF but also the need for a shift in the environment the
user interacts within, the pure WIMP paradigm simply does
not allow the kind of embodiment we are striving for.
Based on our findings and the feedback received from the
lead users we can see indications that the proposed proto-
type points towards supporting embodied interaction.
6 Technical Discussion &
The Prototype
Technical Discussion
As mentioned earlier, several different approaches can be
taken in order to detect the points of interest needed in vision
based interaction and they all have their benefits and their
drawbacks.
Skin segmentation is quite computational cheap, however
detecting the right segments are not the easiest task since
the approach is based on filtering pixels within the range of
a skin classifier which could vary widely due to ethnicity and
illumination conditions. The use of background subtraction
is dependent on a reference image either statically taken be-
fore the user enters the frame which are sensitive to fluctu-
ations in the background scene or by adaptive classification
which tends to be quite computational expensive (Stauffer
and Grimson, 1999; Zivkovic and van der Heijden, 2006).
The use of color histograms based on markers placed on the
users hands is relative easy to implement and computational
cheap, however, it has an obvious drawback due to the fact
that intrusive specialized markers needs to be placed on the
hand.
Figure 3: Samples of tested segmentation methods.
Using haptic devices for interaction such as gloves usually
provides good precision at a high update rate and are often
9
Figure 4: Samples of obtained disparity maps.
able to detect the posture of the hand. But as previously
mentioned, this technique suffers from the same intrusive-
ness as the previous approach since it requires the user to
be equipped with a specialized glove. The stereo vision ap-
proach is computational expensive and the images retrieved
from the two cameras must be perfectly aligned before the
computation of the depth map based on epipolar geometry,
also known as disparity map, can be performed. Another
drawback of the stereo vision technique is that it relies on
careful camera calibration often done by capturing images
of a specialized checkerboard. However, the main benefit
of stereo vision is the disparity map crucial for obtaining the
Z-coordinates in a 3D environment as well as the possibility
of extending the technique from purely hand detection to a
system that are capable of detecting other objects the user
could interact with as well.
Since the prototype needs to be robust in the sense that
it should function outside a controlled lab environment and
to be non intrusive on the user while preserving the freedom
and naturalness of the interaction medium, the stereo vision
approach came to our main attention.
The Prototype
Figure 3 shows a few of the segmentation techniques tested
during the design phase of AiRL3D. Figure 3A shows the
segmentation done by filtering based on the sum of abso-
lute difference, comparing the difference between a pixel in
current- and reference frame to a threshold value. Figure 3B
is the output of motion detection by comparing the difference
of the current- and previous frame. Figure 3C shows one of
the skin segmentation algorithms tested, however this was as
mentioned before, sensitive to illumination and ethnicity con-
ditions. These methods for segmentation where dismissed
due to the unreliable results caused by fluctuations in the sur-
rounding environment. Figure 4 shows a few samples of the
segmentation obtained by the stereo vision approach which
proved more robust as well as the possibility to extract the
three dimensional coordinates.
At the time of writing AiRL3D is in a early phase of the im-
plementation, however, acquisition of synchronized frames
with a resolution of 640x480 and generation of the dispar-
ity map was successfully implemented and running with a
update frequency of ∼26 frames per second without any op-
timized code nor any graphic hardware acceleration on an
Intel Core 2 Duo T7300 2.0 GHz with 2 GB RAM.
Our prototype has been implemented in ANSI C currently
only in a Linux distribution, however, to be able to port the
prototype to other platforms the only modification needed is
to the webcam interfaces and to the output host such as a
multi-touch interface.
Figure 5 shows an overview of the AiRL3D project and a
brief explanation to each step is given in Table 2. a
Figure 5: AiRL3D
10
Step Description
Init The main initiation procedures for
the CCD sensors & RGB to YUV
Look-Up-Tables (LUT).
Calibration Captures one frame from each in-
put stream and
applies image stitching to calcu-
late the X, Y and angular offsets,
sets up reference LUT to avoid re-
calculating the rectification for each
acquired frame.
Main Loop Spawns two threads for image ac-
quisition if the host system is a
multi-core system.
Left / Right Frame
Acquisition
Synchronizes threads to
capture frames simultaneously.
Disparity Calcula-
tion
Waits for frame threads to complete
and handles the disparity algorithm
proposed by Shafik and Mertsching
(2009). If the host system is a mul-
ticore system the disparity calcula-
tions are divided between the mul-
tiple cores to improve performance
of the algorithm.
Feature Detection Handles identification of fingertips,
palm and wrist.
Event Handler Takes the feature points X-,Y-, and
Z-coordinates and sends it to the
host system as mouse events.
Table 2: AiRL3D
7 Conclusion
In this paper we have, with the help of design principles
proposed by Dourish and the findings we have made when
investigating the current interaction techniques, defined the
design for our prototype.
To allow the interaction medium to become participative
one should not restrict the user to predefined interaction rules
by designing specific actions, one should leave the inter-
action space open to allow the user to adapt the medium
around their needs instead of adapting their needs around
the medium. The interaction should also be natural, in the
meaning of not being intrusive on the user and allow the
user to freely explore the medium forming their own mean-
ing. However, the interaction technique is not enough, the
design should also reflect on the context in which it should
be used.
Solutions such as Nintendo Wii have shown to create a
more natural and graspable interaction that is understand-
able by a wider audience. Microsoft’s upcoming Project Na-
tal, and Playstation’s Move are examples of this as well,
showing that at least that the gaming industry is striving to-
wards an approach to support a higher degree of embodied
interaction, however, the same techniques can be applied in
other fields as well.
We used the guidelines proposed by Dourish and started
the implementation of the AiRL3D project and performed val-
idation of our proposed medium towards lead users both
showing that the creation of a 3D embodied input devices
is feasible both technically as well as likely to support the
perception of a meaningful medium for the user. However to
enable this, our prototype is not enough, one needs to push
further towards changing the current static two dimensional
WIMP paradigm, towards a more flexible environment.
We can not yet know for sure if this will be meaningful for
the user or not, after all its the user that decides if some-
thing is meaningful when they take action upon an artifact.
Nevertheless, what we have tried to show here, is that we as
designers have to start thinking outside of the box of the old
WIMP paradigm.
Further Research
The interfaces existing today are unfortunately not adapted to
this type of interaction, not only does the standard interfaces
today mainly support interaction in two dimensions but most
of them are also only intended for use of with a single inter-
action point at a time. To bridge the gap further we propose
further research in how to extend the current WIMP paradigm
to move from a mainly iconic representation towards a more
entity based platform to allow the user to a more embodied
state.
References
Agarwal, A., Izadi, S., Chandraker, M., and Blake, A. (2007).
High Precision Multi-touch Sensing on Surfaces using
Overhead Cameras. 5
Alpern, M. and Minardo, K. (2003). Developing a car gesture
interface for use as a secondary task. In CHI ’03: CHI ’03
extended abstracts on Human factors in computing sys-
tems, pages 932–933, New York, NY, USA. ACM. 3
Althoff, F., Lindl, R., Walchshausl, L., and Hoch, S. (2005).
Robust multimodal hand-and head gesture recognition
for controlling automotive infotainment systems. VDI
BERICHTE, 1919:187. 3, 6, 7, 8
Avery, B., Thomas, B., Velikovsky, J., and Piekarski, W.
(2005). Outdoor augmented reality gaming on five dol-
lars a day. In Proceedings of the Sixth Australasian con-
ference on User interface-Volume 40, page 88. Australian
Computer Society, Inc. 5
Bach, K., Jæger, M., Skov, M., and Thomassen, N. (2008).
You Can Touch, but You Can’t Look: Interacting with In-
Vehicle Systems. Proceedings of the Human Factors in
Computing Systems (CHI’08), Florence, Italy, ACM Press,
pages 1139–1148. 3
Bowden, R., Windridge, D., Kadir, T., Zisserman, A., and
Brady, M. (2004). A linguistic feature vector for the visual
interpretation of sign language. Computer Vision-ECCV
2004, pages 390–401. 5
Bowman, D., Coquillart, S., Froehlich, B., Hirose, M., Kita-
mura, Y., Kiyokawa, K., and Stuerzlinger, W. (2008). 3D
User Interfaces: New Directions and Perspectives. IEEE
Computer Graphics and Applications, 28(6):20–36. 1, 4, 5
11
Coogan, T., Awad, G., Han, J., and Sutherland, A. (2006).
Real time hand gesture recognition including hand seg-
mentation and tracking. Lecture Notes in Computer Sci-
ence, 4291:495. 5
Dey, A. K., Abowd, G. D., and Salber, D. (2001). A conceptual
framework and a toolkit for supporting the rapid prototyp-
ing of context-aware applications. Hum.-Comput. Interact.,
16(2):97–166. 2
Dourish, P. (2001). Where the action is: the foundations of
embodied interaction. The MIT Press, Cambridge, MA. 1,
2, 4, 7, 8
Dourish, P. (2004). What we talk about when we talk about
context. Personal and ubiquitous computing, 8(1):19–30.
1
Fredriksson, J., Ryen, S., and Fjeld, M. (2008). Real-time
3D hand-computer interaction: optimization and complex-
ity reduction. In Proceedings of the 5th Nordic conference
on Human-computer interaction: building bridges, pages
133–141. ACM New York, NY, USA. 1, 5
Frohlich, B., Hochstrate, J., Kulik, A., and Huckauf, A. (2006).
On 3D input devices. IEEE computer graphics and appli-
cations, 26(2):15–19. 4
Hippel, E. V. (1986). Lead users: a source of novel product
concepts. Manage. Sci., 32(7):791–805. 3, 4
Hippel, E. V. (1988). The sources of innovation. Oxford Uni-
versity Press, Inc., 200 Madison Avenue, New York, New
York 10016. 3, 4
Hirschmüller, H. (2001). Improvements in real-time
correlation-based stereo vision. In Proceedings of IEEE
Workshop on Stereo and Multi-Baseline Vision, pages
141–148. Citeseer. 3
Hirschmüller, H. (2006). Stereo Vision in Structured Environ-
ments by Consistent Semi-Global Matching. In Proceed-
ings of the 2006 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition-Volume 2, page
2393. IEEE Computer Society. 3
Hirschmüller, H., Innocent, P., and Garibaldi, J. (2002). Real-
time correlation-based stereo vision with reduced bor-
der errors. International Journal of Computer Vision,
47(1):229–246. 3
Holman, D., Vertegaal, R., Altosaar, M., Troje, N., and Johns,
D. (2005). Paper windows: interaction techniques for digi-
tal paper. In Proceedings of the SIGCHI conference on Hu-
man factors in computing systems, pages 591–599. ACM.
4, 8
Ishii, H. and Ullmer, B. (1997). Tangible bits: towards seam-
less interfaces between people, bits and atoms. In CHI
’97: Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 234–241, New York,
NY, USA. ACM. 2
Jaimes, A. and Sebe, N. (2007). Multimodal human–
computer interaction: A survey. Computer Vision and Im-
age Understanding, 108(1-2):116–134. 1
Joslin, C., El-Sawah, A., Chen, C., and Georganas, N.
(2005). Dynamic gesture recognition. In IEEE Instru-
mentation and Measurement Technology Conference Pro-
ceedings, volume 22, pages 1706–1711. IEEE; 1999. 5
Kelly, G. (1955). The psychology of personal constructs
(Vols. 1 & 2). 4
Keskin, C., Erkan, A., and Akarun, L. (2003). Real time hand
tracking and 3D gesture recognition for interactive inter-
faces using HMM. ICANN/ICONIPP, pages 26–29. 5
Kim, M. and Maher, M. (2008). The Impact of Tangible
User Interfaces on Designers’ Spatial Cognition. Human-
Computer Interaction, 23(2):101–137. 3
Kim, S. and Dey, A. (2009). Simulated augmented real-
ity windshield display as a cognitive mapping aid for el-
der driver navigation. In Proceedings of the 27th interna-
tional conference on Human factors in computing systems,
pages 133–142. ACM. 7
Klemmer, S. (2005). Integrating physical and digital interac-
tions. Computer, pages 111–113. 1
Kolsch, M., Turk, M., Hollerer, T., and Chainey, J. (2004).
Vision-based interfaces for mobility. In Proc. of Intl. Con-
ference on Mobile and Ubiquitous Systems. Citeseer. 1
Kurkovsky, S. (2007). Pervasive computing: Past, present
and future. In Proceedings of ITI 5th International Con-
ference on Information and Communications Technology
(ICICT 2007), pages 16–18. 1
LaViola Jr, J. (2008). Bringing VR and spatial 3D interac-
tion to the masses through video games. IEEE Computer
Graphics and Applications, 28(5):10–15. 1, 3, 5
Lee, S., Cheok, A., James, T., Debra, G., Jie, C., Chuang,
W., and Farbiz, F. (2006). A mobile pet wearable com-
puter and mixed reality system for human–poultry interac-
tion through the internet. Personal and Ubiquitous Com-
puting, 10(5):301–317. 4
Loke, S. (2006). Context-aware artifacts: two development
approaches. IEEE Pervasive Computing, 5(2):48–53. 1
Malik, S. (2003). Real-time Hand Tracking and Finger Track-
ing for Interaction. CSC2503F Project Report. 3
Manresa, C., Varona, J., Mas, R., and Perales, F. (2000).
Real–Time Hand Tracking and Gesture Recognition for
Human-Computer Interaction. Electronic Letters on Com-
puter Vision and Image Analysis. 3
Manresa, C., Varona, J., Mas, R., and Perales, F.
(2005). Hand tracking and gesture recognition for human-
computer interaction. Electronic letters on computer vision
and image analysis, 5(3):96–104. 5
McCarthy, J., Wright, P., Wallace, J., and Dearden, A. (2006).
The experience of enchantment in human–computer inter-
action. Personal and Ubiquitous Computing, 10(6):369–
378. 1
12
Ogawara, K., Hashimoto, K., Takamatsu, J., and Ikeuchi, K.
(2003). Grasp recognition using a 3D articulated model
and infrared images. In IEEE/RSJ Proceedings of Confer-
ence on Intelligent Robots and Systems, volume 2, pages
27–31. Citeseer. 5
Patel, S. N. and Abowd, G. D. (2007). Blui: low-cost lo-
calized blowable user interfaces. In UIST ’07: Proceed-
ings of the 20th annual ACM symposium on User interface
software and technology, pages 217–220, New York, NY,
USA. ACM. 4
Pickering, C., Burnham, K., and Richardson, M. (2007). A
Research Study of Hand Gesture Recognition Technolo-
gies and Applications for Human Vehicle Interaction. In
Automotive Electronics, 2007 3rd Institution of Engineer-
ing and Technology Conference on, pages 1–15. 3
Piekarski, W., Smith, R., and Thomas, B. (2004). Design-
ing backpacks for high fidelity mobile outdoor augmented
reality. In Proceedings of the 3rd IEEE/ACM International
Symposium on Mixed and Augmented Reality, page 281.
IEEE Computer Society. 5
Plouznikoff, N., Plouznikoff, A., Desmarais, M., and Robert,
J. (2007). Gesture-based interactions with virtually em-
bodied wearable computer software processes competing
for user attention. In IEEE International Conference on
Systems, Man and Cybernetics, 2007. ISIC, pages 2533–
2538. 1, 3, 5, 7
Satyanarayanan, M. et al. (2001). Pervasive computing:
Vision and challenges. IEEE Personal communications,
8(4):10–17. 1
Schlattmann, M., Kahlesz, F., Sarlette, R., and Klein, R.
(2007). Markerless 4 gestures 6 dof real-time visual track-
ing of the human hand with automatic initialization. In
Computer Graphics Forum, volume 26, pages 467–476.
Blackwell Science Ltd, Osney Mead, Oxford, OX 2 0 EL,
UK,. 5
Shafik, M. and Mertsching, B. (2009). Real-Time Scan-Line
Segment Based Stereo Vision for the Estimation of Bio-
logically Motivated Classifier Cells. KI 2009: Advances in
Artificial Intelligence, pages 89–96. 11
Shan, C., Tan, T., and Wei, Y. (2007). Real-time hand track-
ing using a mean shift embedded particle filter. Pattern
Recognition, 40(7):1958–1970. 5
Shaw, M. and Thomas, L. (1978). FOCUS on education–
an interactive computer system for the development and
analysis of repertory grids. International Journal of Man-
Machine Studies, 10(2):139–173. 4
Smith, R., Piekarski, W., and Wigley, G. (2005). Hand track-
ing for low powered mobile AR user interfaces. In Pro-
ceedings of the Sixth Australasian conference on User
interface-Volume 40, page 16. Australian Computer Soci-
ety, Inc. 5
Stauffer, C. and Grimson, W. (1999). Adaptive background
mixture models for real-time tracking. In Computer Vision
and Pattern Recognition, 1999. IEEE Computer Society
Conference on., volume 2. 9
Suchman, L. (1987). Plans and situated actions: The prob-
lem of human-machine communication. Cambridge Univ
Pr. 8
Tan, F. and Hunter, M. (2002). The repertory grid technique:
A method for the study of cognition in information systems.
MIS Quarterly, 26(1):39–57. 3, 4
Tönnis, M., Broy, V., and Klinker, G. (2006). A Survey of
Challenges Related to the Design of 3D User Interfaces for
Car Drivers. In Proceedings of the 1st IEEE Symposium
on 3D User Interfaces (3D UI), pages 127–134. 3
von Hardenberg, C. and Bérard, F. (2001). Bare-hand
human-computer interaction. In Proceedings of the 2001
workshop on Perceptive user interfaces, pages 1–8. ACM.
5
Walsham, G. (2006). Doing interpretive research. European
Journal of Information Systems, 15(3):320–330. 3
Williams, A., Kabisch, E., and Dourish, P. (2005). From in-
teraction to participation: Configuring space through em-
bodied interaction. Lecture Notes in Computer Science,
3660:287–304. 2
Wingrave, C., Williamson, B., Varcholik, P., Rose, J., Miller,
A., Charbonneau, E., Bott, J., and LaViola, J. (2009). Wii
Remote and Beyond: Using Spatially Convenient Devices
for 3DUIs. Computer Graphics and Applications. 3
Zhao, J., Yu, S., and Cai, H. (2006). Local-global stereo
matching algorithm. Aircraft Engineering and Aerospace
Technology: An International Journal, 78(4):289–292. 3
Zivkovic, Z. and van der Heijden, F. (2006). Efficient adaptive
density estimation per image pixel for the task of back-
ground subtraction. Pattern recognition letters, 27(7):773–
780. 9
13