WO2014060995A1

WO2014060995A1 - System and apparatus for the interaction between a computer and a disabled user

Info

Publication number: WO2014060995A1
Application number: PCT/IB2013/059443
Authority: WO
Inventors: Paolo BELLUCO; Flavio MUTTI; Alessandro Maria MAURI
Original assignee: B10Nix S.R.L.
Priority date: 2012-10-19
Filing date: 2013-10-18
Publication date: 2014-04-24
Also published as: ITMI20120375U1; ITMI20130186U1

Abstract

A control device of an output device (4) by a disabled user unable to use his/her upper limbs is disclosed, comprising an IT processor (3) provided with processing and analysis means (5), as well as with at least an entry (1) for data input and an outlet towards said output device (4), wherein said at least one entry comprises a video detector (1), apt to frame an area around a user's head (V), said video detector (1) being connected to said entry of the IT processor (3), so as to acquire and interpret configuration data of a user' s head (V), and said configuration data of the user's head (V) are transformed downstream of said processing and analysis means (5) into two axes of activity controls, while additional means are provided for the acquisition of video/audio data of at least a third axis of activity control, said activity controls being issued towards said outlet for controlling said output device (4).

Description

SYSTEM AND APPARATUS FOR THE INTERACTION BETWEEN A COMPUTGER AND A DISABLED USER

DESCRIPTION

Field of the invention

The present invention relates to a system and apparatus apt to allow the interaction between a processor and a disabled user (permanent or pathological disability) , in particular a user who cannot rely on the use of his/her hands and/or of his/her upper limbs .

Background of the art

As known, people with motor disability or with special temporary inabilities, may find themselves in difficulty using equipment and tools conceived for the average population who has instead the full use of their muscle and joint apparatuses. In particular, the so-called human-machine interfaces used in IT systems resort to various data input devices which require the use of hands. Among these input devices keyboards and mice traditionally fall, but also other types of pointers and selectors, such as joysticks, controllers for videogames, track-balls, mouse-pads, optical-character recognition pens, bar code readers, tablets, touch-screens, and so on. Whenever it is necessary to input not only text data or coordinate data, it is resorted to digital input devices (with digital source or through A/D converter) which exploit technology known from before the onset of IT systems, such as video and audio inputs. Through suitable softwares it is then possible to extract from this type of data other information or controls for the same IT system; for example, through voice recognition software, it is possible to extract from an audio input a series of text data (for example voice transformation into a text in a word processor) or of controls which can be performed by the IT processor.

Besides these more common devices other, more developed systems also exist, used in specific contexts, but nevertheless devised for complex functions and for users with full motor faculties. Let us think, for example, to optic recognition systems used on some game consoles, such as kinect^® by Microsoft Corpo- ration or the Playstation Move controller by Sony Computer Entertainment Inc.

All these devices require the full control, at least of a hand's fingers, if not of the entire motor apparatus.

This condition prevents disabled people from accessing many of the IT systems currently in use, although the very IT systems could instead represent a significant progress for the lifestyle of these people. Let us think, for example, what an improvement of communication and personal interaction not only standard per- sonal computers connected in the web (Internet) could represent, but also personal digital assistants (PDA) , tablet PCs, smart- phones, gaming consoles, information kiosks and so on.

For such reason, the pressing need exists to provide an input apparatus which is easily usable also by people with motor disabilities, which is inexpensive and also easy to install. Summary of the invention

The object of the present invention is hence that of proposing a system and an apparatus which enables a user with motor disability to interact with the IT processor (computer, console, pda, smart-phone and other similar tools) , in a simpler and more effective way than provided so far by apparatuses of this kind. Such object is achieved through the features highlighted in claim 1. The dependent claims describe preferred features of the invention.

In particular, according to a first aspect of the invention, a control system of an output device by a disabled user unable to use his/her upper limbs is provided, comprising an IT processor provided with processing and analysis means, as well as with at least one inlet for data input and with outlet to- wards said output device, wherein

said at least one inlet comprises a video detector, apt to frame an area around the user's head, said video detector being connected to said inlet of the IT processor, so as to acquire and interpret configuration data of a user's head, and wherein

said configuration data of the user's head are transformed downstream of said processing and analysis means into two activity control axes, while additional means are provided for the acquisition of video/audio data of at least a third activity control axis, said activity controls being issued towards said outlet to control said output device.

According to a special aspect, said additional means are in the shape of translation means arranged between said video detector and said output device. Preferably said translation means are apt to widen the combinations of said head configuration da- ta consisting of at least the opening/closing status of the eyes .

According to another aspect, said translation means also comprise a communication interface for remotely controlling an output device.

According to another aspect, said additional means comprise an audio detector apt to detect sounds in the area surrounding the user and to transform them into audio data to be combined with said video data to obtain said third activity control axis.

According to a further aspect of the invention, said video detector consists of any device in the group of webcam, rgb camera, time-of-flight camera, structured-light camera, multicame- ra, depth-map camera, IR motion capture camera with marker, motion sensing input device.

Preferably, the system furthermore comprises a permanent memory on which a database containing personalising parameters of the user resides relative to said configuration data of a user' s head and to said video/audio data of the additional means voluntarily issued by said user. Even more preferably, the permanent memory on which the database resides is of the removable type or remotely accessible type.

According to another singular aspect, with said analysis and processing means an expert apparatus is associated having an inference engine with which deductive rules are applied to the data coming from said analysis and processing means before transforming them into said activity control axes.

Advantageously, said IT processor (3) is any one among a personal computer (PC), a personal digital assistant (PDA), a tablet PC, a smart-phone, an information kiosk or a gaming console .

According to another aspect, the output device is a display on which selectable objects are displayed which may be activated through said acitivity control axes and to which activities/functions of the IT processor correspond. Alternatively or in addition, the output device is at least a driving motor for moving a wheelchair for disabled people.

Brief description of the drawings

Further features and advantages of the invention will be evident from the following description of the apparatus according to the invention, illustrated as a non-limiting example in the attached drawings, wherein:

fig. 1 is a diagram view of the main components which make up the control system according to the invention;

fig. 2 is a diagram and concise view which shows the type of interaction provided between the user and the IT processor of fig. 1;

fig. 3 is a diagram view which shows in a more realistic way the diagram of fig. 2;

fig. 4 is a flow chart of a first way of operation of the apparatus of fig. 1;

fig. 5 is a pictorial view exemplifying the way of opera- tion according to a different embodiment of the present invention;

fig. 6 is a diagram view of a possible evolution of the interactive diagram of the system according to the invention; and fig. 7 is a block diagram of the way of operation of the system according to the invention.

Detailed description of preferred embodiments

As already mentioned, the invention aims to offer a tool of human-machine interaction specifically conceived for disabled people who are unable to regularly use their motor apparatus, in particular their upper limbs. The invention relies instead on the fact that the disabled person has control over at least their facial muscles and possibly over the neck muscles (hence of the head attitude) .

With reference to figure 1, an exemplifying system according to the invention comprises a video detector 1 and an audio sensor 2, both connected to respective inlets of an IT processor 3, an outlet of which is directed towards an output device, shown in the drawings as a display 4, for example a classic LCD monitor .

In the context of the present application, by output device it is meant to generically designate a device towards which the user intends to send controls which produce certain desired technical effects; typically, an output device is hence a PC monitor, but it may also be a reader of audio/video means, a Hi- Fi system, Home Theatre system, or a more complex motorised ap- paratus such as a wheelchair for disabled people.

Video detector 1 is conceived to acquire moving images and send the same, in a digital or analogic form to a suitable inlet of processor or IT system 3.

Video detector 1 is typically a digital videocamera, for example a traditional webcam already installed on many personal computers, but also an rgb camera, a time-of-flight camera, a structured-light camera, a multicamera, a depth-map camera, an IR motion capture camera with marker, a camcorder or other device also known as motion sensing input device. Video detector 1 must be arranged in a position suitable to acquire an area in which the user's head lies with a sufficient contour margin, so that the head always remains entirely framed by video detector 1 even when it is partly shifted laterally or upwards /downwards .

Audio detector 2 is conceived for acquiring audio signals from the environment in the proximity thereof, for example the user's voice or other noises voluntarily caused by the user (should he not have the control of his/her vocal apparatus) . Audio sensor 2 is typically a digital microphone, possibly a directional one.

IT processor 3 may be any IT device with a processing unit

(CPU) and a minimum capacity in terms of processing memory e storage memory, for running the control software programme. The processing unit (CPU) and the memory means define means 5 for the disjoined/joined analysis and processing of the acquired signals. The apparatus is therefore capable of acquiring, measuring, recognising and managing the received input data, for transforming them into information useful for controlling the output device as desired by the user.

Typically it is a general purpose personal computer (PC) , possibly integrated with an own output video, but it can also be a personal digital assistant (PDA) , a tablet PC, a smart-phone, a gaming console, an information kiosk and so on.

IT processor 3 has at least two inlets, for the audio and video acquisition, and an outlet towards an output device, which may be a display but also another apparatus to be controlled (for example a motorised wheelchair for disabled people) . The connections with input devices 1 and 2 and output devices 4 may occur by means of a cable or via electromagnetic signals (RF, wifi, bluetooth^®, IR, ...) . In addition to these input/output channels, IT processor 3 has other traditional input devices - for example alphanumeric keyboard, pointer graphic, stylus, remote control and so on - through which further controls may be entered.

According to a first operating mode, the system according to the invention provides to acquire the lateral and forward/backward oscillation of the head, together with sound signals, to suitably combine them and derive therefrom the controls necessary for driving output device 4.

Figs. 2 and 3 schematically show an example of how the displacement ^"of the user's head V in a lateral direction ("roll" movement arrows) and in a forward/backward direction ("pitch" movement arrows) , may be detected and acquired by video detector 1 for generating a piece of movement information of a cursor/pointer 4' on video 4, according to the horizontal and vertical, respectively, discontinued lines reported on the drawing in correspondence of the "roll" and "pitch" movements, respectively. By the head movement, the user is hence capable of act- 201 ing on a first position control in the plane (defined by two motion axes, which will also be called "two activity control axes") to be able to position pointer 4' on screen 4 in the position suited to trigger a desired event.

According to this first operating mode, moreover, the system also detects data from an audio source, to obtain a further control, according to a third activity control axis. A vocal sound or a sound voluntarily issued by the user in another way (for example beating a foot or smacking their tongue or other) , is detected by audio sensor 2 and transformed into IT data useful for defining the third activity control axis, for example a video selection of the virtual button on which cursor 4' has been previously positioned.

Fig. 3 shows how the voice issued by the user and marked as "speech" may be acquired by audio sensor 2 , for generating an operation control, for example for selecting one of the icons 4'' on the screen and the start of relative programmes; or the selection of a sliding bar to "attach" it to the pointer and be able to drag it with subsequent pointer movement controls; or, again, to open a drop-down menu on which the pointer is positioned or something else which a person skilled in the field may easily imagine.

In substance, according to the example shown, the movements of pointer 4' on the video screen, classically controlled by the movements of the mouse managed manually by the user, are here controlled (first two control axes), by the roll & pitch movements of the user's head V. Similarly, the function normally assigned to the left-hand button of the mouse (third activity control axis) is here determined by a sound signal.

Said in more general terms, the activity control axes act on a series of objects 4', 4^{1 1} which may be selected/activated (visible on display 4), to which activities and functions (start of a programme, movement of data, entering of text characters, sliding of information, control of other apparatuses and so on) implemented by IT processor 3 correspond.

The sound signal may be very simple - for example an audio signal of an intensity sufficiently higher than the environmental noise, such as a bang or a vocal sound at a loud voice - or it can be modulated with different functions: in this last case, it can be thought of using a suitably trained voice-recognition software (known per se) to be able to manage a wide range of different controls (in such case not only a third activity control axis would be obtained, but several additional control axes) . Sound recognition allows to achieve a range of different controls similar to those which may be used with traditional keyboard and mouse; for example by pronouncing "go" a control equivalent to a simple mouse click may be obtained, "go, go" would correspond to a double click, "home" would equal to moving the pointer on the start button of the Windows^® operating system, and so on.

Audio input may be combined in multiple ways with that relative to the attitude of head V, thus producing a remarkable variability of controls until covering the entire availability which a conventional user would have by using keyboard and mouse .

The system so arranged is extremely economical, because it resorts to hardware equipment already available on many PCs, smart-phones and tablet PCs or in any case which may be purchased at a low cost. For the acquisition and the recognition of the movements of head V, library files are already available suited to be able to extract the essential parameters for the purposes of the invention, that is, the orientation in time for example of the main axes of the head, so as to be able to establish the angulation of "pitch" and of "roll" and the gradient (that is, the velocity of angle variation) thereof.

Typically, all those points are extracted from the scene which characterise the user's position and orientation in space, such as for example the eyes, the mouth, the nose, the chin, the neck and the shoulders. The control axis is hence determined, for example, using the change of position and of orientation of a polygon built on characteristic points of the eyes and of the mouth, with respect to the position of the shoulders. This information is suitable to define the first two activity control axes on the corresponding output device, for example the displacement of pointer 4' on screen 4.

The analysis and processing means, included in the IT pro- cessor, operate in real time or off time on the acquired information. To the analysis and processing means an expert apparatus 6 is preferably connected, comprising an inference engine with which deductive rules are applied to the data coming from the analysis and processing means, in order to extract the informa- tion concerning the positioning and orientation of the face and the recognition of the voice controls. In substance, for example, with analysis and processing means 5 the significant points of the data received from the video detector are identified, then with the inference engine the information relating to atti- tude and position of the user's head are obtained, to then transform this information into activity control axes.

In order to improve the interaction, the apparatus may be calibrated on the parameters of the individual user. For such purpose, it is preferable that, in the system according to the invention, processor 3 be coupled with a mass storage (or in any case a permanent storage) on which a database 7 is installed. In the database 7 of the system the information relating to the own parameters of the user are stored, such as a predetermined mapping of the head or a vocal correspondence of the words/sounds most used by the user, in order to obtain the maximum desirable accuracy of the operation of the apparatus. These customized parameters of the user may be acquired in a first learning step, and possibly progressively updated during use, so as to train as best as possible the system, which will recognise with greater accuracy both the movements of the head V, and the voice controls .

The mass storage on which the data of database 7 are stored, is preferably a removable storage, such as a flash-card or a USB key, or they may be remotely stored (for example on a remote server in the distributed systems of cloud-computing) . In such case, the user may always have with himself/herself his/her W customized data and avoid repeating the learning and adapting process in case he/she finds himself/herself using a system according to the invention which is not the one of his/her usual workstation .

The apparatus according to the invention, as mentioned, through a data input obtained as a combination of the acquisition of the head movements (first two activity control axes) and of audio signals (further activity control axis) , allows to use the control results with different output modes and on different hardware/software platforms, such as not only mobile or desktop computers, but also mechanical devices in use to disabled people (for example the motion motors of a disabled person's wheelchair) or other.

Based on the configuration set forth above, the present in- . vention provides an operation according to the following general lines, as illustrated in the flow diagram of fig. 4:

- acquisition of at least one video signal from at least one video detector pointed towards the user's face;

- acquisition of an audio signal from an audio sensor capa- ble of acquiring sounds in the environment near the user, in particular the user's voice or other noises which he/she is capable of determining voluntarily;

- detection of the user' s head, preferably through inferential mode, from the images of the video sensor and calculation of the position, orientation and movement vectors thereof in space, with reference to the screen or to another possible reference system, with subsequent transformation into controls either of the position of the cursor/pointer through multiple windows (in window desktop systems) , simulating the movements of a conventional mouse, or selection of a screen area (in widget systems or smart-phones) ;

- sound recognition (single control or control modulation with words and/or sentences) and transformation into controls for the selection and/or management of activities of the proces- sor.

As stated, the apparatus is capable of storing the profile and the calibration status of a specific user, so as to improve interaction quality, calling up the data stored in the apparatus. Therefore, the last two activities of input signal transformation into controls issued to the output device may benefit from the possible presence of the local or remote database, in which the information tailored on the specific user is stored beforehand.

The calibration may possibly be repeated upon the arising of changes to the operating conditions of the system (change of brightness of the room, changes to the user's hair, new user, ...)

According to an alternative embodiment, the system comprises a recognition section of the opening/closing state of a user's eyes (eye-blink) . Through this acquisition section - which acts as an additional input device - it is possible to provide 2-bit controls (two eyes, open/closed; or to differentiate the following controls: left eye open/closed, right eye open/closed, both eyes shut at the same time) by which to manage a plurality of actions on output devices.

Fig. 5 shows a possible implementation, wherein the acquisition section or input device is represented by a videocamera integrated in a smartphone . The signal acquired by the videcamera is transformed into suitable controls depending on the detected eye condition. For example, the pointer means are in the shape of a rotating ring nut (carousel) whereon a plurality of different choice boxes are reported (typically the alphabet letters) : by the control bit of one eye it can be chosen to stop or rotate the carousel ring, while by the control bit of the other eye the choice of the control box is made; others can be the types of controls, chosen also according to the user's preferences: an alternative way may be left eye closure = clockwise rotation, right eye closure = anti-clockwise rotation, closure of both eyes = selection. Thereby it is possible, for example, to compose a word (through the sequential choice of the letters it consists of) and hence to enter the word as a more complex control or as a search string. This entry way may be aided by Known mechanisms of word self-completion or of access to libraries of onto-configured or standard messages.

By this specific arrangement it is possible (i) to access a telephone directory and other functionalities of this type of devices, (ii) to build sentences and send them through messenger services or, using a text-to-speech engine, to repeat them at a loud volume, enabling people with no body mobility except for eyelid muscles to speak, (iii) to view reply messages, (iv) to control other output devices (as also mentioned above) which may be reached or which are connected to the smart-phone/tablet.

This operation mode of the system according to the invention requires to develop a graphic user interface, which may take up various forms.- In substance, not being able to have a continuous control on two axes - such as the one which may be obtained with the user' s head movement - it is necessary to provide translation means (in fig. 5 consisting of the rotating carousel ring) which sit between input devices 1 and output devices, apt to interpret the 2-bit control and to transform it into a more complex control.

The translation means typically take up the form of a software application suitable to run on the operating system on which the analysis and processing means 5 are based. In this case, it is not necessary to have an audio input to provide a further activity control axis, because the (open/closed) posi- tion of the two eyes, interpreted through the translation means, already provides the necessary control axes.

Fig. 6 schematically shows a system which includes both the above-mentioned operating modes and a series of possible output devices to be controlled.

As illustrated, the user may act on the IT system through a complex control, consisting of a specific configuration of the head (pitch&roll inclinations and opening/closing of the eye^¬ lids) and of sounds issued through the mouth. Moreover, output devices may take up the form of a monitor of a PC, a standard TV apparatus, a Hi-Fi audio system, an information kiosk or a conditioning system. In order to be able to send controls to pre-existing standard apparatuses , an output device advantageously consists of a suitably configured universal transmitter/receiver. In such case, a remote transmitter (a typical IrDA transmitter/remote control) is interfaced with a personal computer on which suitable translation means (in the form of application software, which represents a kind of virtual remote control) are arranged, suited to receive controls through the input device according to the invention and to drive the remote control accordingly so that it may send the desired signals (for example on/off, volume speed adjustment, selection of the radio/tv station, ...) to the corresponding drive receiver of the desired apparatus (the same receiver embedded in most of the remotely controllable apparatuses, or a suitable optional receiver) .

Fig. 7 shows a flow diagram of the general operation, including calibration activities of the · system according to the invention.

As can be understood, the system and the apparatuses according to the invention allow to achieve the object set forth in the premises. As a matter of fact, an extremely simple and inexpensive construction system has been provided, easily available on the market and little bulky, such as to be able to be used immediately on any smartphone or modern PC (at least provided with a webcam and a user interface display) provided a suitable software is installed for causing the components to work in the way taught here.

The control system according to the invention is interfaced between the user and a series of apparatuses, among which mainly a personal computer (PC) , replacing the function of a conven- tional pointer or mouse (which would require the full ability of the fingers of a hang) and hence defining a new input device or control system in an IT system. As a matter of fact, the control system of the invention allows to control a pointer on a screen and to interact with the classical interface of a personal com- puter or with the controls of another household apparatus (air conditioner, Hi-Fi system, TV, ...) through the head movements, voice controls and through the opening and closing of the eyes (eye-blink) .

However, it is understood that the scope of protection of the above-described invention must not be considered limited to the particular embodiment shown, but extends to any other technically-equivalent construction variant as defined in the attached claims.

Claims

1. Control system of an output device (4) by a disabled person unable to use his/her upper limbs, comprising an IT processor (3) provided with processing and analysis means (5) , as well as with at least one data input entry (1) and an outlet towards said output device (4), characterised in that

said at least one entry comprises a video detector (1), apt to frame an area around a user' s head (V) , said video detector (1) being connected to said entry of the IT processor (3) , so as to acquire and interpret head configuration data of a user (V) , and in that

said head configuration data (V) of the user are transformed downstream of said processing and analysis means (5) into two axes of activity controls, while additional means are pro- vided for the video/audio data acquisition of at least a third axis of activity control, said activity controls being issued towards said outlet for controlling said output device (4) .

2. System as claimed in claim 1, wherein said additional means are in the form of translation means arranged between said video detector (1) and said output device (4) .

3. System as claimed in claim 2, wherein said translation means are apt to enhance the combinations of said configuration data of the head (V) consisting of at least the opening/closing of the eyes.

4. System as claimed in claim 2 or 3, wherein said translation means also comprise a communication interface for remotely controlling an output device (4) .

5. System as claimed in any one of the preceding claims; wherein said additional means comprise an audio sensor (2) apt to detect sounds in the area surrounding the user and to transform them into audio data to be combined with said video data for obtaining said third activity control axis.

6. System as claimed in any one of the preceding claims, wherein said video detector (1) consists of any device in the group of webcam, rgb camera, time-of-flight camera, structured- light camera, multicamera, depth-map camera, IR motion capture camera with marker, motion sensing input device.

7. System as claimed in any one of the preceding claims, wherein a permanent storage is furthermore provided on which a database (7) is resident containing customized parameters of the user relative to said configuration data of a user' s (V) head and to said video/audio data of said additional means voluntarily issued by said user.

8. System as claimed in claim 5, wherein said permanent storage on which a database (7) is resident is removable or remotely-accessible .

9. System as claimed in any one of the preceding claims, wherein with said analysis and processing means an expert apparatus is coupled (6) having an inference engine by which deductive rules are applied to the data coming form said analysis and processing means (5) before transforming them into said activity control axes.

10. System as claimed in any one of the preceding claims, wherein said analysis and processing means (5) operate in real time on the acquired signals.

11.^' System as claimed in any one of claims 1 to 9, wherein said analysis and processing means (5) operate off-time on the acquired signals.

12. System as claimed in any one of the preceding claims, wherein said IT processor (3) is any one of a personal computer (PC) , a personal digital assistant (PDA) , a tablet (tablet PC) , an advanced telephone (smart-phone) , an information kiosk or a gaming console.

13. System as claimed in any one of the preceding claims, wherein said output device (4) is a display on which objects are displayed (4¹, 4'') which may be selected and activated through said activity control axes and to which activities/functions of the IT processor (3) correspond.

14. System as claimed in any one of claims 1 to 12, wherein said output device (4) is at least a driving motor of a wheelchair for disabled people.