The paper describes the methodology and the preliminary results of an evaluation of Ciber232, a Spanish text-to-speech system used by blind persons. The assessment has been carried out by means of personal interviews guided by a questionnaire focussed on t wo aspects: user's acceptance and overall quality. Preliminary results for professional and for non-professional users show a fair degree of acceptance and adequacy to their needs, although a wide range of requirements is identified even with a small sample of users.
Ciber232 is a text-to-speech system implemented in Spanish and Catalan that has been developed by Ciberveu S.A. The Spanish version of the system has been used for some time by visually impaired people either in their professional environment or for privat e use at home. Within the framework of a joint project between the ESEI "La Salle" and the UAB, an evaluation of the system is being carried out. So far, the main results concern user's acceptance and satisfaction with the product, although an assessment o n linguistic grounds is also planned.
Ciber232 is an autonomous text-to-speech system that can be connected to any computer through a standard RS232 interface. It is designed to be used in an IBM PC environment, and it is mainly prepared to synthesize the text that the applications display on the screen. The system's main task is to expand the features of the operating system, providing the user with an extra interface through synthetic speech. In the case of blind users, speech is used as an alternative feedback.
The system has two major parts: a resident program for IBM PC compatibles that processes the information to be synthesized, and an external module connected to a printer serial port, accepting the preprocessed text to be read coming from the previous modu le.
The main function of the resident program is to redirect the information shown on the screen or the input from the keyboard to the external module. The user, by means of several key combinations, can control what is to be sent to the synthesizer: a single key stroke, the entire screen, the word in which the cursor is placed, etc. The loudness, rate and pitch of the speech output can be controlled by the user.
The core of the external module is a formant synthesizer that uses a diphone concatenation technique to generate the speech output. The final waveform is generated by a Philips PCF8200 formant synthesizer. The overall system is shown in figure 1:
Figure 1 Block diagram of Ciber232
The system performs some lexical and syntactic analysis in order to obtain the prosodic information required for the generation of pitch and intensity contours.
A questionnaire has been developed in order to assess the adequacy of the system from the point of view of its users. It has been divided into three parts: characterization of the user, global quality assessment and ideas for improvement.
In the first part the questions concern three different areas: the profile of the subject, his knowledge of the system and of the task carried out with it and the manner in which he uses the system. The first set of questions is aimed at obtaining inform ation about the subject: sex, age, studies, occupation and the present condition of his sight. In the second set we try to gather some information about the time the subject has spent working with the system, about his previous working experience if he use s the system for tasks he has done before loosing his sight, and about his experience with other text-to-speech systems. The third set of questions deals with the tasks performed by the subject with Ciber232, the amount of time spent with the system either working or during leisure time, the possibility of performing the same tasks with alternative methods (e.g., Braille or human assistance) and with his motivation to use Ciber232.<
The second part of the questionnaire assesses the global quality following; it is based on the test proposed by Robert, Choinière and Descout (1989), that has been adapted to our purposes. It consists of 30 pairs of opposite adjectives. The subject has to give a rate in a 1 to 5 point scale, 1 being close to the first adjective of the pair and 5 to the second one. The first 14 pairs of adjectives are related to voice quality and to other aspects in which there can be a relationship between the task and the voice produced by Ciber232 (e.g., whether the type of voice used helps to concentrate, or whether the voice is perceived as interesting or dull). The next 7 pairs try to assess the satisfaction with the reading style chosen by the developers (for example easy to understand vs. difficult to understand, artificial vs. natural or fluent vs . irregular). Two pairs concern the accent -- foreigner vs . Spanish and South American vs . Peninsular --, and the final 7 pairs evaluate the adequacy of the system with respect to the tasks, the results obtained, the degree of difficulty and the possibilities of adaptation to new tasks. Finally, 4 questions about the frequency of the changes i n loudness, pitch, rate and voice allowed by the system are presented.
In the final part of the questionnaire we try to collect more spontaneous replies with open questions concerning the difficulties found with Ciber232 in its present implementation and the freedom attained using the system. We finally ask for suggestions f or further improvements.
Up to now four subjects have replied to the questionnaire, but the final goal is to have a sample of at least 30 subjects. The subjects studied here are all male, and their ages range from 50 to 77 years old. Two of them use Ciber232 in his professional li fe, while the two others use it for other purposes. One of the professional user is a consultant in the field of human resources, training and communications and had worked in this area before loosing his sight; he has been using Ciber232 for 4 months and it is now a proficient user. The second subject is retired, has held top executive posts and now uses the system for reading and writing all sorts of materials. The third one uses it for the same purpose in activities related to his hobbies (radio amateur and computers). Both have been using Ciber232 for 2 years. The fourth subject is learning Ciber232, being in the process of loosing his visual capacity; he needs to use the system for his work as a chemist.
It was felt that an interview was the best way to gather all the necessary information. This is due both to the aim of the evaluation and to the intrinsic characteristics of the population. Personal interviews are carried out by one of us (JJP) and this is one of the most rewarding aspects of the assessment procedure. The interviews are usually long -- in some cases they lasted for more than two hours -- and the subjects tend to give much more information than the one strictly required. However, the overall design and the careful instructions also allow replies by mail in the case of users who are not available to be personally interviewed and who are able to get some help to read the questionnaire.
In order to summarize the results, it is useful to divide the subjects into two groups: the professional users and those who apply Ciber232 to tasks outside their professional environment.
As far as the professional subjects are concerned, using Ciber232 has improved their working conditions; the subjects mention three main reasons: (a) increase of autonomy in information processing tasks, combined with the confidentiality required in certa in executive posts; (b) faster access to information compared to traditional reading and writing systems such as Braille; and (c) possibility to carry out competitive work and increase of public awareness of the working potential of blind professionals tha t may contribute to their integration in society. The use of a text-to-speech system in this context is also associated with an increase of the status and prestige of the individual, the company and the blind community as a whole.
For the non-professional users, Ciber232 helps to carry out a large range of activities related to the everyday life of the subjects. They use it to read and write with the help of a computer -- coupled to a scanner in one case --, and the text-to-speech system is perceived as a useful help to pursue their personal interests.
The main difference between the two groups is the degree of motivation and the time the subjects are willing to spend getting to know the system. Professional users are fully motivated and are prepared to spend a substantial amount of time in training. Th e second difference between both groups concerns their expectations: while the professional users are looking for a system perfectly adapted to their specific needs, the non-professionals would like a very flexible system able to perform a large number of different tasks.
The results from the second part of the questionnaire do not differ substantially for professional and non-professional users. The voice is rated as rather pleasant but artificial, monotonous and cold. Moreover, it does not seem to have an influence on th e attitude or the state of the listener. The reading style is rated as fluent and easy to understand, although is also labelled as artificial. As far as the accent is concerned, the users agree that the system reads in a Peninsular Spanish accent.
The ratings to pairs of adjectives related to the system as a tool show that the users consider it compatible with their usual tasks, efficient and sure. They evaluate CIber232 as a rather complete and practical solution to their problems. It is worth men tioning that none of the subjects changes the values of the parameters that control loudness, pitch, rate and voice once they get used to the system.
As far as suggestions for improvements were concerned, the interviews yielded a wide range of ideas for the developers. The need for a better manual, for headphones that do not tie up the user to the machine, for feedback from the computer when is switche d on and is ready and for a better integration with word processing software was mentioned.
In this paper we have intended to show the methodology that has been designed to test the acceptance of Ciber232 by blind users. A questionnaire concerning the profile of the subject, his knowledge of the system and of the task, his habits in using the sys tem and the difficulties encountered is combined with a test consisting in opposite pairs of adjectives to assess the global quality of the system. Although the present sample is small, the information obtained seems to validate the adequacy of the questio nnaire. In our opinion, a crucial part of the assessment procedure is the personal interview: even if it is a time consuming activity, the wealth and the reliability of the data obtained compensate for the amount time spent on it.
The work in the next few months will concentrate on two aspects: enlarging the sample population on the one side, and testing the understanding of the spoken output as a function of length and complexity on the other. It is hoped that, as the sample studi ed here has shown, the results will be useful to improve the present implementation of Ciber232.
This work has been supported by a grant from the Dirección General de Investigación Científica y Tecnológica (PB90-0704).
ROBERT, J.M. - CHOINIÈRE, A.- DESCOUT, R. (1989) " Subjective evaluation of the naturalness and acceptability of three text-to-speech systems in French " in TUBACH, J.P.- MARIANI, J.J. (Eds.) Eurospeech 89. European Conference on Speech Communication and Technology. Paris, September 1989. Edinburgh: CEP Consultants Ltd. vol. 2. pp. 640-643
Joaquim Llisterri, Departament de Filologia Espanyola, Universitat Autònoma de Barcelona
Last updated: 21/8/03 22:41