Challenges in mobile multi-device ecosystems

Coordinated multi-display environments from the desktop, second-screen to gigapixel display walls are increasingly common. Personal and intimate mobile and wearable devices such as head-mounted displays, smartwatches, smartphones and tablets are rarely part of such multi-device ecosystems. We conducted a literature research and an expert survey to identify challenges in mobile multi-device ecosystems. We present grounded challenges relevant for the design, development and use of mobile multi-device environments as well as opportunities for future research. While our surveys indicated that a large number of challenges have been identified, there seems to be little agreement among experts on the importance of individual challenges. By presenting the identified challenges, we contribute to a better understanding about factors that impede the creation and use of mobile multi-device ecosystems and hope to contribute to shaping the research agenda on interacting with those systems.


Introduction
. Middle: interaction between a smartphone and smartwatch (image courtesy of Chen et al.) [2]. Right: Interaction between head-mounted display and smartphone [3].
Multi-display environments from the desktop to gigapixel displays have emerged as ubiquitous interfaces [4] for knowledge work (e.g., Microsoft Surface Hub for collaboration or Bloomberg systems for financial trading) and complex tasks (e.g. city or factory management). Similarly, social applications such as second screen TV experiences are further extending the proliferation of increasingly complex display ecosystems with different sizes, mobility or reachability. In parallel, we see the emergence of further classes of more personal, intimate and body-centric computing in the form of head-mounted displays (HMDs) such as the Oculus Rift or Microsoft Hololens and smartwatches such as AndroidWear or Apple Watch, which promise always-on information access around the user's body. Small touch devices (such as smartwatches and smartphones) aim at improving mobility, portability and privacy by simply shrinking the device, but as a result sacrifice the display and interaction arXiv:1605.07760v1 [cs.HC] 25 May 2016 area. HMDs have the potential to enable rich spatial interaction with information located around the user's body through dexterous and expressive human hand motion, but at the cost of interaction accuracy, and, like wearables, present challenges for sharing information with co-located people. Support for activities between and across a set of devices around the user's body presents a myriad of challenges, which we aim to address in this paper.
Mobile multi-device environments promise to overcome limitations of interacting with individual devices in mobile contexts. These environments consist of multiple interactive and coordinated devices, typically displays, with at least one mobile or wearable component (see Figure 1). We see recent interest in the research community (e.g., [1,2,3,5]). However, while, for example, multi-display interaction is already common in stationary scenarios, to date we see less cross-device and multidisplay interaction [6,7], which include mobile or wearable components employed outside of the laboratory. We argue that in order to integrate such diverse and disparate types of displays into a unifying interaction environment and to enable interaction with information freely across device boundaries, we need to better understand specific challenges for the creation and use of such systems. With this article we aim to contribute an overview on challenges for designing, developing, evaluating and using mobile multi-device environments. We base our findings on literature research and reflections about the development of mobile multi-device environments by the authors. Furthermore, we complement these findings with the results and analysis of an expert survey.

Challenges in Mobile Multi-Device Environments
The fundamental challenges in mobile multi-device ecosystems reach beyond that of multi-device ecosystems [8]. This is connected to the larger variety of input and output modalities found on mobile devices compared to desktop systems, to the mobility of each individual component and to the proximity of those devices to the human body. We aim at uncovering relevant challenges that impede the creation and use of such systems.

Methodology
To identify the key relevant challenges, we developed a review protocol for conducting our literature survey to ensure maximum coverage of relevant publications. This section describes our review protocol, starting with a literature survey using the ACM digital library. The ACM digital library keyword search returned 37 entries for the keyword "multi-display interaction", 94 papers for "multi-device interaction", 94 for "cross-device interaction", 0 for "cross-surface interaction", 10 for "cross-display interaction", 5 for "multi-fidelity interaction", 30 for "distributed displays", 142 for "distributed user interfaces", 16 for "second-screen interaction", and 6 for "multi-screen interaction", in total 434 prior to de-duplication. To minimise the possibility of excluding relevant research, relevant proceedings (e.g., ACM CHI, MobileHCI, UIST, AVI, DIS, EICS, ITS/ISS, INTERACT, HCI International, NordiCHI) and journals (e.g., TOCHI) were then searched for papers with explicitly identified challenges relevant for mobile scenarios. In addition, given this initial set of papers, we extended our search to secondary papers mentioned in the reference sections or referring to the initial papers (via Google Scholar). Furthermore, we scrutinised more closely without the above search criteria, the papers from recent workshops in this domain [9,10,11]. In total, we considered 140 papers (excluding papers which where redundantly identified through the keyword search) by matching their titles and abstracts. Starting from a broad set of keywords (standard codes) we created classifications through open and axial coding steps [12]. With axial coding we employed our keywords to identify central ideas, events, and usage conditions and strategies which we grouped into categories and sub-categories. While open coding allowed us to identify new concepts and join them into categories and sub-categories. The merger of the classifications between open and axial coding produced the categories and sub categories we detail in the following sections. In addition to a literature survey, we also reflected on our own experiences in researching multi-device environments [3,8,13]. In the subsequent sections, we have grouped challenges into four top-level categories of design, technological/development, social and perceptual/physiological challenges. However, we are aware that some challenges can be associated with multiple categories (e.g., device-binding can be seen from a technical development or user-centred design point of view). Specific aspects of challenges associated for multiple top-level categories are discussed in the relevant sections. The results of the individual sections are summarized in Figure 2. In order to be associated with a category, a paper had to explicitly mention relevant aspects of that category.

Design Challenges
There are a number of design challenges for realizing mobile multi-device ecosystems for single user and collocated interaction. For single user interaction these challenges include varying device characteristics, fidelity, spatial reference frame, foreground-background interaction, visibility and tangibility. For collocated interaction, we additionally identified micro-mobility, f-formations, and space syntax. Several design factors that are potentially relevant for mobile multi-device interactions, have been identified in previous work. In total 26 papers fall into this category which we divided into nine subcategories.
Parameterization i.e. characteristics of individual devices, e.g., ID, pose, data context and (prior) selection on the phone or smartwatch has been explored by Schmidt et al. [14] as well as Houben et al. [5] to describe how the interaction on a large interactive surface could be supported. Similarly, Grubert et al. used the term fidelity to describe the quality of output and input characteristics, such as resolution, colour contrast, fixed vs. variable focus distance of devices in a mobile multi-device system [3].
Spatial Reference Frame; i.e. the real-world entity, relative to which interaction takes place is explored in terms of the roles adopted in several papers [3,15,16,17,18]. Examples include body-parts of the user (head, chest, hands), physical objects in the scene (table, monitor, mug, poster, other mobile devices) or world-referenced locations (longitude and latitude).
Pairwise device interaction has also explored how two touch screens could be used together by enabling or disabling their input and output channels, including combinations of smartphones with (large) interactive surfaces [19], smartwatches with interactive surfaces [5], or smartglasses with smartwatches [19], resulting in four different device combinations.
Foreground-background interaction [20,21] was applied to mobile multi-device environments by Chen et al. [2]. Foreground activities require attention (e.g., dialing a number); they are intentional activities. Background activities take place in the periphery, requiring less attention (e.g., being aware of a nearby person). Ideally, background activities can be sensed and actions can be triggered automatically (e.g., automatically switching on the light when a person enters a room). Chen et al. explored interaction techniques when both a smartphone and a smartwatch were jointly used as foreground devices [2].
Proxemic dimensions [22,23,24] have also been applied to mobile multi-device scenarios (e.g., [17,25,26,27,28,29,30,31]. Proxemics can be understood as culturally dependent ways in which people use interpersonal distance to understand and mediate their interactions with other people. Greenberg et al. identified distance, orientation, movement, identity, and location as relevant proxemic dimensions for ubiquitous computing [23]. More recently, Proxemics can be seen as a form of context-awareness for supporting users' explicit and implicit interactions in a range of uses, including remote office collaboration, home entertainment, and games [32]. Beyond such simple proxemics we suggest the need to consider kinesics, paralinguistics, haptics, chronemics and artifacts around us in our understanding of the design challenges. There are a number of further design factors, which have not yet been explored in depth. For example, Grubert et al. presented continuity of fidelity / fidelity gaps as a relevant design factor. Continuity of fidelity can be understood as the degree to which individual device characteristics differ across devices, specifically input modalities (e.g., touch vs. in-air gestures or input resolution) and output modalities (such as display size, resolution, contrast). One need only consider the fidelity of inputs possible with a Microsoft Kinect, Leap Motion, Touch Screen or Google's Project Soli or the size and display resolution on a Microsoft Surface Hub, Microsoft Band, or smartwatch to appreciate the challenge continuity of fidelity presents. Cauchard identified similar challenges [17,33]. Ens et al. identified a number of design factors focused on interaction with 2D information spaces [18]. While not directly targeted at multi-device use, some of these factors appear to be relevant. For example, tangibility describes if the presented information is perceptible by touch [4]. For example, touch screens provide a tangible representation of information spaces with haptic feedback. Virtual screens in optical see-through head-mounted displays such as Google Glass or Microsoft HoloLens or projectors are typically not tangible. Very recent work on mid-air haptic feedback using ultrasound promises to add tangibility even for those projection-based displays [34,35]. Another relevant design dimension is the visibility of the individual devices and information spaces; i.e. the amount of visual information available in a multi-device interface [18]. The visibility also determines the degree to which proprioception is needed for operating an interface.
Co-located Interaction in mobile multi-user, multi-device scenarios present additional factors we can identify. For example, micro-mobility is the fine-grained positioning and orientation of objects so that those objects might be fully viewed, partially viewed or hidden from other persons [36,37]. F-formations are spatial patterns formed during face-to-face interactions between two or more people [37,38,39]. Another potentially relevant design framework is space syntax [40,41]. Originally aimed at urban planning, space syntax is "a family of techniques for representing and analysing spatial layout of all kinds" [40].
However, to date it remains unclear if the described design factors are sufficient for guiding future design space explorations, if and how they are interdependent, to which extent they are relevant for non-touch screen devices and how they scale to more than two jointly used displays. For example, fidelity gaps might be more relevant for touch-screen -smartglass interaction as the difference in output resolution and contrast is considerably larger compared to interaction with two touch screens only [3]. Further challenges for the interaction design of multiple wearable displays concern how to explicitly or implicitly transition between individual interaction modes, e.g., from side-by-side to device-aligned [3], from touch to mid-air interaction [42,43] and viewing [44] or when to switch the input and output channels of devices. These two top-level categories and nine sub-categories form the basis of design questions posed in our expert survey described in Section 3.

Technological Challenges
There are a number of technological challenges for realizing mobile multi-device ecosystems, including binding, security, spatial registration, heterogeneous platforms and sensors, non-touch interaction as well as development and runtime environments. Twenty-seven papers were classified into this category.
Heterogeneity of software platforms (e.g., Android, iOS, Windows Mobile), hardware (e.g., sensors), form factors (e.g., smartwatch, smartphone, smartglass, pico projector) or development environments increases as compared to stationary multidisplay systems. Specifically, the heterogeneity of platforms can lead to data fragmentation, which impedes sharing of information between devices [6]. Development toolkits targeting cross-device applications involving mobile devices (e.g., [5,45,46,47,48]) are proliferating as device heterogeneity increases. To address this, they can, for example, support the distribution of web-based user interfaces across displays with varying characteristics (such as size, distance, resolution) [45], allow for on-device authoring [46] or the integration of hardware sensor modules [5].
Still, these toolkits have a number of challenges to address in the future. For example, we need better support for creating user interface widgets that can adopt themselves to the manifold input and output configurations or awareness [49] in mobile multi-device environments. Specifically, it remains unclear if existing adaption strategies (e.g., from responsive web design [50]) remain valid when users relocate widgets frequently between displays or how they should be operated and appear when spanning across multiple displays (including non-touch displays such as smartglasses) [3].
Also, most existing toolkits have not anticipated the integration of non-touch screen devices. More specifically, projection based systems, such as optical seethrough head mounted displays, or wearable pico-projectors still need better integration.
Device-binding, i.e. the association and management of multiple devices into a common communication infrastructure needs to be better addressed for mobile multi-device scenarios. There is a large body of work on technical and usercentred aspects on this topic [51] ranging from individual [52] to group binding [53,54,55,56]. Existing techniques are generally not found outside of laboratory contexts. Furthermore, most research has concentrated on binding of stationary systems or mobile touch-screen devices such as smartphones and tablets [56], neglecting the diversified input and output space of new devices such as smartglasses, or wearable activity trackers and smartwatches.
Security aspects of mobile multi-device environments have not been a core focus of existing research, with only some exceptions, e.g., regarding second screen apps for ATMs [57] or security in group binding [58].
Mobile and unified sensing is another important challenge for creating mobile multi-device systems. So far, we see a fragmented input space for operating individual devices. For example, smartphones and smartwatches typically allow for touch input on their interactive surface or distance sensing with computer vision [59] or other sensors. Commercially available smartglasses often use indirect input via a touch pad. Sensing around individual devices has also been explored allowing above surface input on phones (e.g., Project Soli) and smartwatches [5] or mid-air input in front of smartglasses (e.g., Microsoft Hololens). Gestures using the devices themselves can also be realized, e.g., through inertial sensors or linear accelerometers. Some mobile phone (e.g., the Nokia N900) posses multiple atenna which can be used for sensing the relative position of other devices [60] and which have been employed for multi-device, collocated interaction [30,31] However, it remains unclear how to utilize these diverse sensing approaches to create a unified and seamless interaction space across devices. Also, tracking the full six degrees of freedom poses, from all multiple wearable devices, hence enabling a precise mutual spatial understanding of the display positions in space, has been not extensively explored in mobile scenarios and is so far often restricted to lab-based prototypes. For example, approaches such as MultiFi [3] or HuddleLamp [28] typically rely on stationary tracking systems. Only recently, we see the emergence of mobile sensing solutions, which so far are either restricted in the achievable degrees of freedom or the accuracy and precision of sensing [61,62]. Similarly, when using head-mounted displays in a spatially registered multi-device environment, we need better and more robust means for calibrating them relative to the user's eye [63,64].
Further challenges include authoring mobile multi-device interactions, e.g., for non-experts, in-situ on mobile devices or creating body-referenced information spaces, which "float" virtually around the users' body instead of coinciding with a physical screen [65,66,3]. Similarly, the specification of spatial gestures for triggering actions (e.g., through programming by example) has not been studied in this context. Finally, performance issues for web-based frameworks are still a hurdle to allow for fluid interaction across computationally restricted wearable devices [67].
These eight sub-categories form the basis of technical questions posed in our expert survey described in Section 3.

Social Challenges
New technologies and design can lead to new social challenges. While existing social challenges can help inform the design and development of technologies. Considering these as socio-technical systems can help better position the social challenges as considerations to be addressed throughout, rather than simply before or after any technical or design decisions are made. As such, we present five key and durable social challenges that mobile multi-device ecosystems present, including privacy, social acceptability, social participation, social exclusion and social engagement. Four papers in the domain of multi-device environments involved social challenges.
Privacy presents a major challenge in the use of public or semi-public displays as part of a mobile multi-device ecosystem [68]. We can consider such forms of social interaction with technology at different scales from inch (cm) to chain (several m) and beyond [8]. Personal devices overcome the privacy challenge by use of private environments, use at an intimate distance, privacy screens or non-visual modalities. Questions arise when we consider how we might share content on intimate displays [69,70], at varying scales, different social interaction types or even share content spanning multiple private displays. For example, users might be reluctant to surrender the possession of their smartphone in group binding situations [71]. We can differentiate between personal and public privacy. Personal privacy describes the challenges faced when using personal display elements in a mobile multi-device environment. Public privacy describes the challenges faced when using semi-public and public display elements in a mobile multi-device environment.
Social acceptability. The use of wearable on body displays presents a range of social acceptability issues. Some of the inherent form factors can present acceptability challenges. In addition, existing research has explored the suitability of different parts of the body for gestural inputs [72], along with issues of social norms and behaviour [73]. Here, mobile multi-device environments introduce new challenges as the coordination and movement of multiple displays can require unusual interdisplay coordination and body orientation. Also, in contrast to touch-only operated displays such as smartphones, the manipulations of multiple body proximate displays through spatial gestures are more visible whereas the effects of those actions remain hidden to bystanders [74]. Depending on the social situation this could lead to inhibited or non-use of an interactive system, similar to observations made for handheld Augmented Reality systems [75,76]. Further issues arise from the use of shared or public display elements within an ecosystem [68]. All of these issues are modulated by differences in cultures, work practices, age, familiarity with technology an evolving social norms for new technology behaviours.
Social participation. Today, civic discourse is impacted by the isolation that technologies provide people. For example, the "filter bubble" [77] stems from the personalisation in search results presented to individual people. Such bubbles can socially isolate people from one another, into their own economic, political, cultural and hence ideological groups. With mobile multi-device ecosystems, we might further encourage people into "interaction bubbles" which isolate them further from others and discourages interpersonal interaction. The "in-your-face nature" of what is proposed in mobile multi-display ecosystems, is unlike other forms of technology. One approach to overcome participation is to design technologies to entice users to participate [78].
Social exclusion. Mirroring the problems in social participation are the further challenges of social exclusion [79]. By augmenting our interactions with mobile multi-device ecosystems we are changing the nature of our interaction with the world. Many personal technologies reside out of sight, whereas wearable and on body displays present a visible digital alienation to those without access to such technology. By allowing some to see and experience more than others can see are we further disenfranchising people? Do these technologies exacerbate the digital social stratification we are already witnessing?
Social engagement. In using semipublic or public displays as part of an egocentric mobile multi-device ecosystem, issues of performance and social engagement present themselves [80]. These challenges are also opportunities for improved social engagement between people but also draw into question the appropriateness of any device appropriation. Fair use, sharing space or time, along with the use of non-visual modalities present challenges for the design and deployment of such systems.
Further challenges include personal space, which describes the physical space immediately surrounding someone [22], into which encroachment can feel threatening or uncomfortable as well as fair sharing, which describes the equitable and joint use of display resources and space. These five categories form the basis of technical questions posed in our expert survey described in Section 3.

Perceptual and Physiological Challenges
There are a number of Perceptual and Physiological challenges for realizing mobile multi-device ecosystems when we consider human perception in mobile multi-device ecosystems from physiological to cognitive levels. Such issues stem from varying display resolutions, luminance, effective visual fidelities, visual interference, color or contrast in display overlap which can be experienced with body proximate ecosystems. Thirteen papers were associated with this category.
Display switching. Existing research has identified the cost of display switching [13] and the factors which influence visual attention in multi-display user interfaces [81,17], specifically for second-screen TV experiences [82,83,84,85,86]. These factors include: • selective attention [87]: the ability to react to certain stimuli selectively when several occur simultaneously. • sustained attention [87]: the ability to direct and focus cognitive activity on specific stimuli. • divided attention [85]: the ability to time-share attention across stimuli; this occurs when we are required to perform two (or more) tasks at the same time and attention is required for the performance of both (all) the tasks. • angular coverage [81,17]: the angular extent of the displays in the environment. It can be used to determine if turning one's body, head of eyes is sufficient for looking at a display. • display contiguity [81,17]: the extent to which the proximity or overlap of displays causes them to be associated as continuous or discontinuous. • time to switch between displays [13,83] : describes the time taken to switch one's gaze from one display to another. This may be due to a combination of eye, head and body movements but does not include time to focus the eyes due to any depth disparity.
• content coordination [81]: refers to how the content of different displays are semantically connected even when showing different views of the same data. Existing methods have explored cloned, extended and coordinated displays. • input directness [81]: refers to the traditional HCI categorisations of input in terms of direct manipulation can be considered as direct, indirect or hybrid.
Measures of directeness could aid in understanding physical challenges in such systems. • input-display correspondence can be considered as local, global or redirected in mobile multi-device ecosystems. • visual overload [83,84]: the over stimulation of the visual sensory system due to outputs from the multi-device environment coupled with the physical environment which can be mitigated with techniques which are aware of where a person is looking [88]. Focus in human vision. The shape of our lens and iris alters how much light enters and how our eyes focus. However, our eyes cannot focus sharply on two displays which are near and far simultaneously. If the virtual display plane of an optical see-through head-mounted device is in sharp focus, then effectively closer or distant displays won't be. Depth disparity describes a display environment where one's eyes are regularly changing focus. This occurs when the eye to display distances vary such that the eye is constantly accommodating between display switches. This can be easily seen with a smartwatch which is in focus but is then surrounded by unfocused input from displays effectively further from the eye. The effective distance, not actual distance, needs to be considered as devices, such as optical seethrough displays (e.g., Google Glass) often employ optical techniques to generate images at distances which are easier for the eye to accommodate. A further issue to consider is that as the ciliary muscles in our eyes age, our range of accommodation declines. Another byproduct of our eyes inability to focus sharply on two distances, is that it then takes time for the eye to refocus on objects at different distances. In addition, the speed of this process also declines as the muscles age. However, with mobile multi-device ecosystems the eye will need noticeable amounts of time (e.g., 300 msec latency and 1000 msec stabilisation period [89]) for the focal power of the eye to adapt in markedly discontiguous display spaces. Further, these accomodation times don't include movements if the displays are "visually field discontiguous" [81].
Field of view Humans have a limited field of view and an age diminished "useful field of view" (UFOV) [90], which needs to be considered. Excluding head rotation, the typical field of view for a human has a difference between the horizontal and vertical field of view, an area of binocular overlap and areas of monocular far peripheral vision. "For many of our interaction tasks the UFOV varies between younger and older people. A 36 degree field of view will be practical in many situations" [90]. Within each person's field of view we can also distinguish regions of central (ie. foveal, central, paracentral and macular) and peripheral (near, mid and far) vision. The useful field of view, typically includes both central vision, measured through visual acuity (ability to distinguish details and shapes of objects), and largely near peripheral parts of vision (part of vision that occurs outside the very center of gaze).
Further factors include change blindness [91,83] (the phenomena of a change in the visual stimulus (eg. a new icon [92]) being introduced but the observer not noticing it, specifically the introduction of an obvious change; it can occur when the stimulus changes slowly or the stimulus is interrupted, for example with a blank display, blink or saccade). By contrast, inattentional blindness [93] (the phenomena of an unexpected visual stimulus not being noticed as one's attention is engaged on other aspects of the visual scene) and visual discomfort (symptoms of visual fatigue or visual distortion) [94].

Expert Survey
The goal of the expert survey was two-fold. First, we wanted to complement the literature research to saturate the list of factors we previously identified. Second, we wanted to find out if certain factors were assessed as more important than others by a majority of experts in the field.

Design and Procedure
The survey was targeted at experts in mobile multi-device interaction or related fields. Experts were invited through personal e-mail communication. In addition, social media channels were used to reach out to further experts in the field. The main part of the survey consisted of four sections: development, design, social and perceptual/physiological challenges. Participants were free to skip individual sections. In each section, participants were asked to rank a list of factors according to how important they assessed this factor. Furthermore, participants were asked to list any additional factor, which was not included in our list. The survey took about 5-30 minutes to complete, depending on the number of sections participants Figure 3: Usage frequency of multi-display environments on a six-item Likert item scale (1: never 6: very frequently). Legend: stationary: use of multiple stationary displays (including a notebook + additional external display), mobile: multiple mobile displays (e.g., work across a smartphone and tablet or smartphone and smartwatch), mixed : mixed mobile and stationary displays (e.g., second screen apps for TVs).
were willing to answer. One Amazon voucher worth 30 Euros was raffled among participants.

Participants
Twenty-seven volunteers participated in the survey (24 male, 2 female, one preferred not to indicate the gender, mean age 33.4 years, SD=6.3). Nineteen participants had experience in designing, developing or evaluating multi-device environments, 20 indicated to have undertaken general research in this area and one participant indicated to teach in this domain. Most participants regularly used stationary multidisplay environments, but to a lesser extent mobile and mixed environments, see Figure 3.

Results
We present results for the individual sections on design, development, social and perceptual/physiological challenges next.

Design Challenges
Twenty-one participants answered the design challenges section. Figure 4 shows the ranking on how important individual development factors were assessed by participants. Figure 9 depicts an aggregated version with multiple summed ranks. Characteristics of individual devices, visibility of devices and proxemic dimensions were identified as important factors. Foreground-background interaction and spatial reference frame tended to be ranked as medium important followed by fidelity gaps, tangibility and other factors.
In addition, participants were asked if they think that there is a sufficient number of design factors to guide the creation of mobile multi-device systems. On a 5-item item Likert scale (strongly disagree ... strongly agree) the average score was 2.76 In addition, participants were asked to prioritize factors for designing multi-device systems for co-located interaction. The results are depicted in Figure 5. Figure 10 depicts an aggregated version with multiple summed ranks. While no strong trends could be identified, micro-mobility [36,37] and proxemic dimensions [22,23,24] were ranked as important, followed by F-formations [38,37,39] and space syntax [40,41]. One participant explicitly highlighted accessibility issues (e.g., visibility, reach) when multiple persons interact with distributed multi-device systems. Figure 6: Rankings of development challenges. Legend: bind : Ad-hoc binding / joining / leaving device groups, sec: Secure communication between devices, widg: User interface widget adoption, loc: Localization / spatial registration of devices, char : Characteristics of individual devices (e.g., contrast, input, output modalities, input output resolution), devl : Heterogeneity of development languages, op: Heterogeneity of operating systems, ntd : Integration of non-touch screen devices (e.g., Google Glass, Microsoft HoloLens), sens: Heterogeneity of sensors

Development Challenges
Eighteen participants answered the development challenges section. Figure 6 depicts the ranking on how important individual development factors were assessed by participants. Figure 11 depicts an aggregated version with multiple summed ranks. Ad-hoc binding, localization / spatial registration of devices and security were ranked as very important. Integration of non-touch screen devices, characteristics of individual devices, heterogeneity of operating systems tended to get assigned medium priorities. Heterogeneity of development languages, heterogeneity of sensors, UI widget adoption tended to be ranked as medium to less important, but with a wide spread.
One participant mentioned responsiveness and reliability of network-based operations and two testing and debugging, with one highlighting the need for a better support for non-expert developers and "lack of development support on mobile devices".

Social Challenges
Twenty participants answered the social challenges section. Figure 7 indicates the ranking on how important individual social factors were assessed by participants. Figure 12 depicts an aggregated version with multiple summed ranks. The social factors were ranked diversely, not indicating a strong trend for most factors. However, social exclusion and fair sharing tend to be ranked as less important. One participant suggested that for social participation one should understand more the joint participation or co-interaction of multiple users instead on focusing on isolation aspects. Another participant mentioned social exclusion due to platform differences.

Perceptual and Physiological Challenges
Fifteen participants answered the section on perceptual and physiological challenges. Figure 8 depicts the ranking on how important individual factors were assessed by participants. Figure 13 depicts an aggregated version with multiple summed ranks. The factors were ranked diversely, not indicating a strong trend for most factors. However, divided attention, angular coverage, selective attention, visual overload, visual discomfort, inattention blindness and time to switch between devices were identified as more important. No other factors were mentioned by the participants.

Discussion of the Survey Results
One goal of this survey was to saturate relevant factors for the creation and use of mobile multi-device environments. The experts identified only a few additional factors, including accessibility issues (e.g., visibility, reach) and development support for non-experts and development tools for mobile platforms. This suggests the identified factors can form a basis for future exploration and new research and development in mobile multi-device excosystems emerge.
Another finding of our survey is, that participants consistently identified only some development challenges (e.g., ad-hoc binding, localization / spatial registration of devices) and design factors (e.g., device characteristics and proxemic dimensions) as important. Beyond these selected factors, no strong consensus on the importance of the diverse factors was found. This could indicate that the importance of individual factors is very dependent on the context of use. In fact, one user explicitly mentioned that "I think the order of importance of these challenges depends on the users, the context and the system under development". One clear outcome from this survey is the need to establish new theories and research motor themes [95] for mobile multi-device ecosystems. Without these, research and developments in this area will remain fragmented, diverse and disconnected from any theoretical grounding.

Discussion
Through our literature survey and expert survey we have identified a number of challenges for mobile multi-device environments. While some of these challenges are similar to stationary multi-display systems, the highly mobile nature of the components leads to a large number of challenges to be addressed including the need for well-founded theory.
For design challenges, we see a large number of proposals on what factors and frameworks are relevant for creating mobile multi-device systems. Still, there is no strong consensus in the community on if the existing factors are sufficient to guide the design of current and future systems. Only some design factors (eg. proxemics, visibility, characteristics of individual devices) were consistently identified as important by experts. However, that does not necessarily imply that other factors are less important, but that those factors are either more context-dependent or just not well researched in the community. For example, we believe that with the diversification of input and output channels in mobile multi-device scenarios, we need to incorporate better the relative differences between device capabilities (ie. fidelity gaps), not just their individual absolute characteristics. One such example is the transition between touch and mid-air interaction. While recent research has shown that for some tasks (e.g., gaming) users would prefer mid-air input for smartglasses [96], there are clearly benefits of haptic qualities of surfaces [97], which are evident in touch being the dominant interaction mode for smartphones and smartwatches. While researchers have begun to investigate the joint interaction space of touch and free-space input (e.g., [42,43,98]), there is clearly a larger design space to explore in highly mobile multi-device scenarios. Another opportunity might be to further investigate micro-mobility for co-located interaction [36,37]. The increasing num-ber of mobile and wearable displays open up new possibilities to study how people utilize interactive mobile devices to share or hide information from others [69,99].
Looking at technological and development challenges we see that device-binding is considered as a very important topic. However, so far device-binding has mainly been considered for tablets and smartphones [56]. There is still potential to find novel ways to bind other mobile devices such as smartglasses, smartwatches, picoprojectors or activity-trackers without a display. We also argue, that there is an increased need for considering the adoption of user interface widgets across devices. While there are guidelines how to change the layout of widgets depending on different screen sizes (e.g., from responsive web design [50]), those guidelines often assume the interaction on an individual device at a time. It remains to be explored how well users can interact with changing layouts if they have to relocate widgets frequently between displays (e.g., a smartwatch and tablet). Also, there is more research needed for how to adopt widgets that span multiple displays at once, including non-touch displays such as smartglasses. Is it sufficient to change the appearance of a widget to a different level of visual details or do the semantics of operation have to change [3]? Furthermore, we see the opportunity to combine device-integrated [61,62] with body-mounted sensors [100,101] into hybrid pose tracking systems in order to derive a full spatial understanding of all on-and around the body devices. However, to date it has not been explored in depth how precise and reliable those mobile sensing solutions can and should work [102]. Furthermore, it has still to be explored which granularity of spatial sensing (precise to none) is actually sufficient for various cross-device interaction tasks. Finally, as many cross-device toolkits offer to create web-based user interfaces it might be worthwhile to investigate the integration of sensing solutions based on web-standards [103].
Our literature review and survey suggests the consideration of social challenges in mobile multi-device ecosystems is immature. The ecological validity of the scenarios described in many papers are open to criticism due to the unconvincing use cases, novel forms of interaction or unrealistic scenarios described. The laboratory settings can contribute new research findings to many of the other facets described while our social challenges require research in non-technical domains or socio-technical settings.
Finally, the perceptual, cognitive and physiological issues will clearly play a more important role in studying mobile multi-device environments in the future. However, this research should not remain in a HCI context alone as it requires a wider range of research expertise. An example of this can be seen in investigation of some issues (e.g., attention) in works on interactive TV / second screen experiences [82,83,84,85,86], but is less studied in more mobile usage scenarios.
In the future the survey results could be complemented with further studies targeted at end-users of multi-device environments, e.g., similar to the work of Jokela et al. [7].

Conclusion
There are many future visions of computing [104] which incorporate aspects of mobile multi-device ecosystems. Within this article, we have considered design, technical, social and perceptual challenges and the questions raised in interaction with mobile multi-device environments. The fundamental challenges in mobile multidevice ecosystems reach beyond that of stationary multi-display ecosystems, due to the larger variety of input and output modalities, the mobility of its individual components and due to the proximity of those devices to the human body. We have based our findings both on a literature survey and on an expert survey. While the expert survey indicated that we have identified a large number of current challenges, there is only little agreement on the importance of individual challenges. This might be due to the highly contextual nature of mobile multi-device interaction, which influences the importance of individual factors. By presenting current challenges and questions we hope to contribute to shaping the research agenda for new theory, new areas of research inquiry outside of HCI and research on the interaction with mobile multi-device environments.
Author's contributions JG carried out the digital library search and drafted the design and technological challenges. AQ drafted the social, perceptual and physiological challenges. JG, MK and AQ conceived of the expert survey, and participated in its design and coordination and all helped to draft the manuscript. All authors read and approved the final manuscript.
Author details 1 University of Passau. 2 St Andrews University,.