Shared Spaces Description

Shared Spaces –
High Definition Ultra-Videoconferencing

The goal of this project is to push videoconferencing beyond simply viewing others at a remote location and instead create the feeling of sharing the same space. The expected result is an experience of “being there” that will enable corporate, educational, cultural and social interaction well beyond what is now possible.

Innovation - Shared Spaces

The feeling of sharing the same space will be created by using a large panoramic display, in close proximity to the viewer, on which the remote site is shown life size as it would appear if the viewer were there. The present intention is to use three 65” plasma displays arranged to form the super wide screen as shown below:

This arrangement should engage the peripheral vision of a viewer seated in the centre and allow the viewer to look left and right without the need for the camera to pan. However image artifacts or low resolution will be far more apparent to a viewer seated close to a large display than one seated further away. It will therefore be necessary to use high definition video.

In its simplest form, the above setup will allow people in different locations to sit around a shared table as if they were all in the same room. Each plasma display can show two people life size. By positioning the displays five or six feet from participants at the local site, the remote participants occupy the same space as they would if physically present. During the project, this setup will be used for business meetings, and small group teaching in Music, English as a Second Language and Medicine. It will also be used for music performance teaching by removing the shared table and instead creating a shared rehearsal room for small ensembles and solo instruments.

Small Venue Applications

If there is successful small group medical teaching between McGill and UBC, then there is the possibility that the system could be expanded to other sites in British Columbia after the end of this Project. The only medical school in British Columbia at the University of British Columbia is being expanded to the University of Victoria and the University of Northern British Columbia. It is very difficult to deploy all of the required teaching staff in a short space of time and high quality videoconferencing among the three sites may prove to be a viable alternative for small group teaching.

In addition to the meetings and small group teaching sessions, time will be set aside for students to use the shared space as a common room and “drop-in” to chat with students at the other institution. One important question is whether the setup can support multiple simultaneous conversations as would normally take place in a common room. Even if this does not work well, the plan is to allow a group of deaf students at both institutions to do the same thing where audio will not be necessary. Deaf students at major universities often feel very isolated and it will be interesting to see if this technology can play a major role in ameliorating that situation.

The idea of a panoramic display and support for multiple simultaneous conversations is also important for industrial applications where meetings are very lengthy and participants often want to have “sidebar” discussions with a participant at the other end. For example, NASA conducts such meetings and has been looking for a solution that would provide this type of functionality.

Large Venue Applications

The far end could instead be a concert hall or sports stadium. By engaging the peripheral vision of those seated in front of the central display, this arrangement should greatly enhance the feeling of attending a live performance on a wide stage such as a symphony concert, opera, dance or play where the view would match that seen by the audience in the hall. Similarly one could use it for sports events such as a tennis match.

One application of the technology may be to allow world class experts, such as conductors, directors and coaches in remote locations, to attend auditions, rehearsals and practice sessions to offer advice on improving performance. It is situations like these requiring live two way interaction that are among the most compelling uses for ultra-videoconferencing over high bandwidth networks. There has been interest from York University in the United Kingdom and British Telecom in using very high quality videoconferencing for coaching of entire orchestras by remotely located experts as part of their project entitled BT Music On-line.

If there is a compelling sense of shared spaces, how important is that to those involved in the meetings, small group teaching and performances? It is therefore important that we will have real world users with broad experience in their professions who can comment on how it affects what they do. Of course, it is possible that the planned setup will not create a compelling sense of shared spaces. One goal of the research is to manipulate the available visual display variables (e.g. display size, orientation, frame rate and resolution) to see how important a role each plays in creating the sense of a shared space.

High Definition Video and Bandwidth Requirements

The need for large screens located only a few feet in front of the viewer means that imperfections in the image will be far more obvious than they would if the display were smaller and/or further away. Also the super wide display (48:9 aspect ratio v.s. 16:9 for conventional wide screen) triples the amount of video data required. Therefore high definition video must be used. To match the progressive scan of the plasma displays, the present intention is to use the 720p60 high definition video standard and transmit HD-SDI, the high definition serial digital transmission standard used by television broadcast networks. This has the added benefit of allowing the commercial television industry to use the software developed by the project for transmitting their standard signals over high bandwidth IP networks.

The HD-SDI bandwidth requirement is 1.5 Gbps uncompressed. These signals are usually highly compressed for transmission and decompressed at the far end. The compression-decompression process introduces artifacts and latency (delay). These are a major problem for applications such as this one where the viewer is seated close to large displays and the events are live with two way interaction between the two sites, rather than passive viewing of pre-recorded material. Experiments will be done to determine the amount of light compression that can be done without causing noticeable artifacts and latency, but it is anticipated that the bandwidth requirement will be 1 Gbps per video stream plus network overhead. Solutions requiring one and three video streams will be investigated (see Video Capture below). This includes exploration of multiple video stream synchronization methods. Where three streams are used, the total bandwidth will be over 3 Gbps.

Development Work – Existing Software

The project will build on the transmission software development work already done as part of the Canarie ANAST project, McGill Advanced Learnware Network. That project developed software capable of transmitting standard definition SDI video over IP networks. Compared with the high definition HD-SDI video transmission format discussed above which will carry 720 lines of progressive video at 1.5 Gbps, the SDI video format used carries 480 lines of interlaced video at 270 Mbps. The existing software must be developed further to support high definition video.

The exploration of light video compression techniques as mentioned above will also be an important part of the work. This includes sensing when no one is in the drop-in common room or there is only one person occupying a small part of the display so that the amount of video being transmitted can be reduced.

The existing software only supports unicast transmission of standard definition video and audio between two sites. It will be expanded to support multicast transmission of standard definition video (SDI) and audio among multiple sites, a capability frequently requested by current users. This implies multiple displays – one for each site – so the three screen wide setup described above may be suitable for multiple sites. This presumably could then be expanded to multicast of high definition video using multiple displays although that is not a task of the current project since there is only sufficient funding for high definition equipment at two sites, not three. We intend to use the Communications Research Centre BADLAB as a standard definition SDI multicast test site for our research.

Work will also be done on layered coding for SDI video transmission that will allow multicast users to receive as many data channels as they can support, obtaining maximum quality in areas they designate as highly important and progressively reduced quality in other areas as their bandwidth capabilities are reduced. The intention is to then expand this capability to high definition video which will allow users to receive high definition transmissions even if they don’t have the bandwidth and hardware to receive and display high definition video.

The existing software was also used to transmit 12 channels of 24 bit, 96 KHz high resolution audio for music applications. To our knowledge, this is the only software in the world with this capability. It led to an approach from the producers of the Lord of the Rings series of films with a view to using it for future projects since music production involves collaboration between individuals in different countries.

Audio Capture

In applications involving communication of music, the sense of shared spaces will be augmented by the addition of superior sound projected from a comprehensive multichannel high resolution system. The goal will be to capture, transmit and render the large dynamic and spatial range of sound necessary to provide a sense of immersive presence at the remote site. This will be done using multiple loudspeakers distributed in vertical and horizontal planes, supplied by independent audio channels, to project sound from all perceptually valid directions.

Video Capture

We intend to explore three different approaches to capturing the super wide screen video necessary for the three displays. One is to use a single high definition camera with a wide angle lens, transmit a single high definition video stream and use line doubling at the far end to spread it over three screens. This will provide lower resolution, but no alignment problems.

The second is to use three high definition cameras with standard lenses, transmit three streams and feed them to individual displays. This will provide higher resolution, but may cause alignment problems. We will investigate software techniques to reduce the alignment (and overlapping view frustum) problem.

The third is to explore the possibility of using a lens system developed in Japan by MegaVision Corporation that uses a single wide angle lens system to feed three high definition cameras. While the loan is not yet confirmed, the researchers met in Tokyo with the President of MegaVision who agreed to facilitate the loan of the lens system which is owned by the Japanese government. The Canadian embassy in Tokyo is assisting with the arrangements.

Latency

Keeping latency to a minimum requires careful experimentation with hardware components, particularly cameras, video capture cards, format converters and displays. Although there are existing components that will do what we need, we intend to spend considerable time working with equipment manufacturers to see if particular combinations of components or hardware modifications will reduce latency.

Given the distance between McGill in Montreal and UBC in Vancouver, there will be some network latency as well as video hardware latency that cannot be overcome. While likely small, this creates an audio echo and some form of echo concealment must be used. This is not a problem for applications like meetings and small group teaching because there is only one audio channel and existing echo cancellation hardware solves the problem. However such equipment doesn’t work with music which involves high resolution multichannel audio. Some acoustic and DSP-based echo suppression will be necessary. The plan is to evaluate what is being done elsewhere on multichannel echo cancellation and then arrange to test whatever system seems to be the most promising although we cannot guarantee that a practical solution will be found.

Technical Evaluation

In addition to the technical comparisons of various solutions to problems mentioned above, there will be a formal evaluation of the difference between standard definition and high definition video on large plasma displays which will include photos illustrating the differences from a variety of viewing distances. There will also be a formal evaluation of the difference between regular research networks and lightpath enabled networks for this type of application comparing variables such as reliability, throughput and latency. If possible this will be done using both types of network simultaneously for transmissions from identical single cameras.

User’s Perceptions and Response

An important part of the work will be the evaluation of user response to the technology. Do they indeed feel that they are sharing a space with the remote participants when the latter appear life size and in close proximity? Will they notice the difference if we reduce the resolution of the image from high definition to standard resolution? How does it compare to conventional videoconferencing? Does the difference between standard and high definition videoconferencing make more of a difference as the length of the session increases? Is there more of a difference for group discussions rather than one-on-one conversations?

Toward the end of the project, s enior executives in both the public and private sectors will be invited to try using the technology to meet with colleagues at the far end. Representatives from the cultural industries will be invited to observe the technology being used for performances. Do they find the shared space technology useful? How do they feel it could impact social outcomes in their field?

In the previous ANAST projects, formal evaluation of user’s perceptions and responses was done by Adam Finkelstein using interviews and specially designed questionnaires. He will do that again for this project, but in addition will explore the idea that the feeling of "being there," or presence as the literature defines it, is highly dependent on the individual and that the more suggestible you are the more likely you are to "believe" that your external, artificial “shared space” is real . Presence has been linked to learning, but there are specific situations where both presence and absence in a virtual learning environment can be helpful for success. Using methodology to measure presence, suggestibility and cognitive style, he will attempt to relate the three together.

Responsibilities

The Lead Contractor is McGill University. The partner organizations are the University of British Columbia and BCnet. Both universities have many years of experience as major international research organizations and all of the expertise required to complete the proposed research successfully. The senior researchers have international reputations and many years of experience in their fields.

The software development work will be done at McGill as will initial testing of high definition camera systems and video capture cards. Thereafter, transmission and latency testing will be done between McGill and UBC. Once a working prototype is available, it will be used in actual remote small group teaching situations in English as a Second Language, Medicine and Music. Medical teaching sessions will be organized by UBC and the English as a Second Language and Music ones by McGill. Remote coaching for a sports event and/or stage performance will be organized during planning of the network infrastructure installation at both universities since it will depend on which venues have easiest access to high bandwidth networks.

The Communications Research Centre BADLAB will be responsible for the lightpath setup and tear down software. It will work closely with the software development team at McGill. Each site will be responsible for its own network infrastructure. BCnet will have prime responsibility for network coordination and for ensuring that the End-to-End Lightpath hardware required at McGill and UBC is properly specified, installed and tested.

McGill Participation

The McGill team expects to use the research results as part of its ongoing research in very high quality videoconferencing as well as for teaching applications where such quality is necessary such as in Medicine and Music. The McGill team involves collaboration of the Instructional Multimedia Services which is responsible for the development of the video components, the Centre for Intelligent Machines of the Faculty of Engineering which is responsible for development of the transmission software and the Centre for Interdisciplinary Research in Music Media and Technology of the Faculty of Music (CIRMMT) which is responsible for development of high resolution multichannel audio.

Access to Music teaching and performances will be arranged through the direct participation of the Faculty of Music in the project. A number of professors participated in the earlier Canarie ANAST projects and they have identified colleagues whom they know at UBC who will likely participate. English as a Second Language teaching will most likely be done by McGill professors. The drop-in common room for deaf students will be organized by Jamie MacDougall, a McGill Psychology professor and world recognized expert on services for the deaf.

Support for CA*net 4 networking will be provided by McGill’s Network and Communication Services in collaboration with RISQ.

Evaluation of user perceptions and response will be led by Adam Finkelstein, a staff member of the Instructional Multimedia Services at McGill and an expert in evaluation methodology, particularly with reference to human-computer interface. He designed the questionnaires used in the previous McGill Canarie ANAST projects.

UBC Participation

The UBC team expects to use the research results to develop an ongoing research program in very high quality videoconferencing as well as for teaching applications. The UBC team is based in Telestudios which provides video production and video-conferencing services. Networking Support will be provided by ITServices. Faculty of Medicine support will be provided by Tony Voon, Director of their Media Group.

There is already an indication of willingness to participate in the project from the Faculty of Music at UBC. The Senior Associate Dean of Medicine is in the process of identifying professors at UBC who already have colleagues at McGill with whom they would be interested in collaborating on small group teaching.

BCnet Participation

BCnet will coordinate close collaboration with the Canarie CA*net 4 team and RISQ which has already sent a message of support for the project. BCnet will ensure that the End-to-End Lightpath hardware required at McGill and UBC is properly specified, installed and tested. Overall supervision will be provided by Michael Hrybyk, President and CEO of BCnet. In addition, an Applications Project Leader has already been assigned to the project.

CRC BADLAB and National Research Council / National Arts Centre Participation

CRC will provide support for our use of their lightpath setup and tear down software. It will also act as a multicast test site. CRC participation will be led by Michel Savoie and John Spence. NRC / NAC participation will be led by Martin Brooks.

Dissemination of Results

It is expected that the results of the research will be published in academic journals and disseminated at international conferences as well as Canarie workshops and the project web site. Previous ANAST project results were widely disseminated in this manner and also received extensive coverage in the popular press.

Technology Architecture and Implementation Plan

It is anticipated that the bandwidth requirement will be 1 Gbps per video stream plus network overhead. Where three streams are used, the total bandwidth requirement will be over 3 Gbps. We will initially begin with SDI transmission at 270 Mbps then move to HD-SDI.

A major objective of the project is to compare conventional use of CA*net4 to the use of lightpaths. Almost no serious testing has been conducted to date with extremely high bandwidth transmissions of latency-sensitive media over networks such as CA*net4. The use of lightpath technology promises to be a more robust means of ensuring minimal latency. We will be comparing latency, jitter, reliability and ease of use.

Networking support for the project will be provided primarily by BCnet with assistance from RISQ, Canarie and the networking staff at McGill and UBC. BCnet will have prime responsibility for ensuring that the End-to-End Lightpath hardware required at McGill and UBC is properly specified, installed and tested. We intend to use the Communications Research Centre BADLAB’s software for setting up and tearing down lightpaths.

McGill’s transmission software is already being supported by McGill and made available for free non-commercial use throughout the world. This will continue as the software is enhanced during this project to support high definition video and multicast transmission of standard definition video. Support will continue for at least a year after the conclusion of this project. It is anticipated that later enhancements will include multicast transmission of high definition video.