// The “ConTatto” Software //

conTatto: an Interactive System for Active Music and Dance Therapy of the Autistic Spectrum Disorder and Psycho-Motorial Rehabilitation

In the context of Contatto project, commissioned by MousikEssere, the Signal and Images Lab of ISTI-CNR developed a system for active Music and Dance therapy to be used with young children affected by autism spectrum disorders (ASD). The system uses a video camera, a FireWire digitalization board, and a Macintosh computer running an original software. During the therapy sessions the patient freely moves his body inside an empty room. The software, using special algorithms, can extrapolate features from the human figure, such us spatial position, arms and legs angles etc. Using the software GUI the medical operator can link these features to sounds synthesized in real time, following the therapy schema. The system latency is very low thanks to the use of Mac OS X native libraries (CoreImage, CoreAudio). The resulting augmented interaction with the environment could help to improve the contact with reality in young autistic subjects.

1. Introduction

In the last years sensor based interactive systems for helping the treatment of learning difficulties and disabilities in children appeared on the specialized literature [1][2]. These systems, like the quite popular SoundBeam, generally consists of sensors connected to a computer, programmed with special software which reacts to the sensor’s data with multimedia stimuli.

The general philosophy of these systems is based on the idea that even profoundly physically or learning impaired individuals can become expressive and communicative using music and sound.

The sense of control which these systems provide can be a powerful motivator for subjects with limited interaction with reality.

Our research department has got a long tradition in developing special gesture interfaces for controlling multimedia generation, even if targeted to new media art.

MousikEssere, a society of musico-therapists based in Rome asked to the ISTI’s department the project and the development of a system for a medical research which follows the trends cited before. While the systems like SoundBeam totally relies on ultrasonic sensors [3], our system is based mostly on real-time video processing techniques, even it is easily possible to use an additional set of sensors (e.g. infrarared or ultrasonic).

The project was partially sponsored by a local bank foundation, the Cassa di Risparmio di Lucca and by Mousikessere.

2. Autism spectrum disorder

Autism is a brain development disorder characterized by impaired social interaction and communication. It appears in the first years of life and arrests the development of affective evolution. It basically compromises social interaction and language expression, and often leads to restricted and repetitive behavior. Autism affects about 4 children on 10000, even if the number of people known to have autism has increased dramatically since the 1980s (figure 1). Autism has agenetic genetic basis, but a complete explanation of its causes is still unknown. An exhaustive description of this disorder in medical terms is beyond the scope of this document.

Figure 1. Autism diffusion.

2.1 Existing Therapies

Studies have shown that music-therapy has a significant, positive influence when used to treat autistic individuals [4]. Participating in music therapy allows autistics the opportunity to experience non-threatening outside stimulation, as they do not engage in direct human contact. Music is a more universal language respect to oral language, and it allows a more instinctive form of communication.

Our system, due to the Expressive and Relational Rehabilitation methodology created and tested by experience of Grazia Ragone, an expert in treating ASDs children, is active and interactive, tailored for each child: providing an augmented interaction with the environment it tries to remove the subject from its pathological isolation. The child is guided to make sympathetic responses to the sound pulse and quality of therapist’ movements. In this way the child is stimulated through the imitation of the therapist due to emphaty developed in the conTatto setting. Also the therapist moves the skeleton in gentle and synchronic motions, attracting the attention of the child to send a message to the child’s brain to re-organize his logic for more optimal movement. This methodology is absolutely not invasive, without the use of equipment, force or constraint, the children are very receptive to Expressive and Relational Rehabilitation hands-on approach, and quite often they will begin to initiate movements on their own.

3. System structure

The system is installed in a special empty room, with most of the surfaces (walls, floor) covered by wood. The goal is building a warm space which, in some way, recall the pre-natal ambient. All system parts such as cables, plugs etc. are carefully hidden, as they are potential elements of distraction for autistic subjects. The ambient light is gentle and indirect, also for avoiding shadows that can affect the motion detection precision.

Figure2. Systemstructure.

3.1 Hardware

The whole system is based on a Apple Macintosh computer (figure 2), running the latest version of Mac OS X. The video camera is connected to the computer through a firewire digitizer, the Imaging Source DFG1394. This is a very fast digitizer, which allows a latency of only 1 frame in the video processing path. As an output audio card we decided to use the Macintosh internal one, its quality is superior to an average PC, more than sufficient for our pourposes. A couple of TASCAM amplified loudspeaker completes the basic system. For using additional sensors (infrared, ultrasonic) we could add a simple USB board which digitize analog control signals translating them into standard MIDI messages, easy to manage inside the application.

3.2 SW Platform

We used the Mac OS platform for its reliability in real time multimedia applications, thanks to its very robust frameworks: Core Audio and Core Image libraries permit very fast elaboration without glitches and underruns.

4. The interactive application

The software is a stand alone application, and it is obviously structured in different modules following a strict C++ paradigm (figure 3). The most important modules are the Sequence grabber, which manages the stream of video frames coming from the video digitize, the Gesture tracking module, which analyzes the frames and extrapolate the gesture parameters, and the Mapper, responsible of the mapping between detected gesture parameters and the generated sounds.

4.1 GUI

Following the musico-therapist specifications we implemented the application graphical user interface as a single window, with subfolders for specific topics (figure 4). In this way every aspect

of the system setup is quicky accessible to the operators during the musico-therapy sessions. The upper area of the GUI contains the video preview and the detected parameters monitor, while the lower one permits to setup the mapping between parameters and generated sounds.

4.2 Image elaboration

The incoming frame grabbed from the digitizer is processed in several steps, in two alternative modalities (figure 5): area based or edge based. In the first modality the segmentation process is made on the full area areas, while the edge based one it is based on the edge present in the grabbed frame. In both moded the image is firstly it is smoothed with a Gaussian filter (fastly computed thanks to the Coreimage library). In the edge mode the image is processed with an edge detection filter, too.

Then, we use a background subtraction technique for isolate the human figure from the ambient. Pressing the “Store background” button (obviously with no human subjects in front of the camera) we can store the background, area or edge based. When the figure is present in front of the camera the incoming frames are compared with the stored background, using a dynamic threshold, obtaining a binary matrix. The average threshold used in this operation can be tuned by the operator using a simple slider. It is not necessary to set again this sensivity if the ambient light does not change.

Finally we apply an algorithm for removing unconnected small areas from the matrix, usually generated by image noise. The final binary image is then ready to be processed by the gesture tracking algorithm.

4.3 Gesture tracking algorithm

Starting from the binary raster matrix we apply an algorithm to detect a set of gesture parameters. This euristic algorithm supposes that the segmented image obtained by the imagine elaboration process is a human figure, and tries to extrapolates some features from it. This process is based on a simplified model of the human figure (figure 6). Additional model for single parts of the body (face, hands) are under development and it can be used for more “zoomed” version of the system. At the moment we can rapidly detect the position of the head, the arms, the legs, and its evolution over time. Starting from these five time dependent positions we decided to compute the following parameters:

- Right Arm angle

- Left Arm angle

- Right Leg angle

- Left Leg angle

- Torso angle

- Right leg speed

- Left leg speed

- Baricenter X

- Baricenter y

- Distance

Their names first is self-explaining. We also compute these two additional parameters:

- Global activity

- Crest factor

The first is an indicator of overall quantity of movement (0.0 if the subject is standing still with no moments), while the second one is an indication of the concavity of the posture: (0.0 means that the subject is standing with the legs and the arms are united with the body).

4.4 Sound Generation and Typology

The sound generation is based on the Mac OS CoreAudio library. We used the Audio Unit API for building an Audio graph: 4 instances of DownLoadable Synthesizer (DLS) are mixed together in the final musical signal (figure 7). These synthesizers produce sounds according to standard MIDI messages received from their virtual input ports. We added two digital effects (echo and Reverberation) to the final mix: for each synthesizer we can control the portion if its signal to be sent to these effects.

Each synthesizer module can load a bank of sounds (in the DLS or SF2 standard format) from the set installed in the system. The user can obviously add his own sound banks, including the sounds he created, to the system. It is also possible to specify a background audio file, to be played together with the controlled sounds.

The operator can choose different kind of sound in relation to the child’s type, usually choosing water sounds, typical of the intrauterine environment, which so far have shown great success with autistic children in the first phase of the intervention. Then the sound changed and adapted to the particular goal achieving by therapist who considers appropriate for the child.

4.6 Parameters summary

The detected parameters are shown in real-time with a set of horizontal bars (figure 9). Their shown value is normalized between 0 and 1, in this way we found that it is easy to understand their role in a link.

At the end of a session it is possible, pressing the Statistics button, to show a simple Statistic of the gesture parameters (currently the average and the variance). These data, together with some other useful information can be saved in a text file for further external analysis.

Figure 9. Parameters visualization and statistics

5. Further developments

We are planning to add a database to the control software in order to store the patient’s data, including all parameters statistics for the therapy sessions. In this way the therapists would like to investigate the relationship between the the patient’s gesture evolution and his autistic disorders. We are developing new models for the gesture recognition module, specialized for single parts of the body. For example, we would like to give the possibility to concentrate the video camera only on the patient’s face, detecting movements and positions of eyes and mouth.

A 3D versions of the system is also under study, in this way it will be not necessary to stand in front of the camera, but it could be possible to rotate around the body axis, still capturing the correct arms and legs angles.

Actually there are children who are benefiting from the software and methodology in UK than in Italy, where this system has been installed in a rehabilitation center at Empoli. Also the videos developed during the research were presented to the vision some of the Great Ormond Hospital’s doctors who appreciated the quality and innovation.

6. Conclusions

We described an interactive, computer based system based on real-time image processing, which reacts to movements of a human body playing sounds. The mapping between body motion and produced sounds is easily customizable with a SW interface. This system will be used for experimenting an innovative music and dance therapy technique for treating autistic children. The system was commissioned to the Signal and Images Lab of ISTI-CNR by MousikEssere, a society of music therapists, and it was partially sponsored by a local bank, Cassa di Risparmio di Lucca. The experimentation of the system with real cases started in Italy before the end of 2009, and the first results are been published during 2010. Even if the system was initially imagined for treating autism spectrum disorder we found that it could be successfully used for other diseases, too, such as Alzheimer and other pathologies typical of older people.

The Contatto Software In Practice

The gestures of the child who expresses himself by movement causing evidences of sound. The environment is configured as an educational and therapeutic action space for the recovery and development of latent and emerging skills, under the “maieutic” guidance of the medical educator, who accompanies the child in a right relationship with the surrounding reality.

The novel aspects of ConTatto is the application of reactive systems to the traditional music therapy approaches. The prototype version has been developed in Pisa by ISTI-CNR during 2009- 2010. This prototype, installed in an institute’s room was used for a real , successful experimentation with autistic children during the spring 2010.

The Con-Tatto (meaning “with [gentle] touch” and also “contact”) project was established with the aim to make innovations in the world of music- therapy by integrating the experiences of both humanities and therapeutic components, introduced by “Mousikessere” company

(where therapists and musicians work together), with the advanced technology of the ISTI-CNR Institute of the Italian National Research Council at Pisa. 

Gesture tracking technologies, developed at ISTI-CNR, that allow to capture the gestures without the applicationof any sensor on the body of the individual (and are thus defined non- invasive) have been used so far only for games and art. Mousikessere ltd, in contrast, saw in these technologies the ability to create a therapeutic method that is based on the idea of transforming the non-invasive gesture measurements into particular sounds. These sounds are indeed generated and managed by the movements of the individual. ConTatto is an innovative technology created particularly for the rehabilitation of children with pervasive developmental disorder (autism, ADHD, Asperger syndrome, Rett’s, etc..) and / or motor coordination problems.

Autism presents itself as a serious disturbance of the primary relationship, with an arrest of emotional development to a very early stage. The characteristics shared by all children well affected by autism disorder are the impairment in social interaction, language and psychophysical behavior, also called stereotypy. Since the area of communication and therefore of the expression are salient feature of the disorder, we intend to develop a warm, easy-going environment as a revival of the uterine womb where children are at their best, maturing attitudes aware of their person, intended as an autonomous entity in the surrounding context, the basic preparation of the social womb.

History

The project’s test have been conducted at the ISTI-CNR during Summer-Fall 2010. The objectives of such experiments are multi-faceted and can be summarized as follows. 

- To gather information about the ability of “ConTatto” find specific features and properties of movement and posture (movement speed and frequency, angle and curvature, gravity, etc..).

- To collect quantitative data on the characteristics of stereotyped movements.

- To assess how the movement of the child with ASD can be changed from the application of sound that responds to the movement of the child.

- To assess the adaptability of children with ASD to an interactive environment.

An unexpected, emotional result was that, even before the end of the experiments, and independently of each other, all the parents of the patients reported with their surprise and satisfaction that their children behavior at home was changing. Parents’ representatives even requested that the experiments were converted into regularly based therapeutic sessions.

Materials and Methods

The sample 

A sample of 12 children with autism spectrum disorders was selected, with the permission and supervision of the “Stella Maris” Institute of Pisa (Italy). The children were all aged between 6 and 10 years and had a degree of impairment in the narrow interests and stereotyped movements. Some of these children had already performed acoustic therapy treatments at various institutions with satisfactory results. Data specifically related to repetitive movements and Stereotypes through scale Repetitive Behavior Scale were also collected. A control group of five was selected, typically matching the experimental group in age and sex. CBCL will then be administered to all the children of the two groups.

Experimental procedures

Two meetings of about 45 minutes with each child were performed with a Grazia Ragone, music therapist and dance therapist, already experienced in the treatment of children with ASD, at the premises of the ISTI CNR Pisa. The operators had already met the children at the IRCCS Stella Maris Institute.

The purpose of the first meeting was to familiarize the child with the environment and the working methodology of the operator. For the first 30 minutes no sound was associated with the movement of the child. During these early stages, parameters were recorded regarding the speed and angle of movement, gait and gestures as well as the frequency of repetitive movements and stereotypes. Over the past 15 minutes an operator started to associate a standard sound set to allow the child to become familiar with the sound experience of the environment.

The second meeting was focused on assessing the child’s behavior during a 45’ session of interactive music. The standard set of sounds used in the first meeting was used from the beginning. The aim was to assess possible changes in the parameters concerned with the introduction of sound in response to movements of the child and to evaluate the child’s adaptation to the situation.

OUTCOME

Cognitive skills Based on the presence or absence of mental retardation, each child has recognized himself as a leading productor of sound, increasing actions, and intentional requests.

Social skills Interactions have been established with the therapist in each case significant.

Overall communication skills In any case, the verbal and nonverbal communication has increased dramatically, increasing too ecomportamenti research and use of social markers (smiling, eye contact etc.)

Motor skills The fine and gross motor has undergone significant improvements in each case, from the point of view of coordination and integration because of the first recognition by the child himself as body agent in space.