Statistical Policy Working Paper 19 - Computer Assisted Survey Information Collection
MEMBERS OF THE FEDERAL COMMITTEE ON STATISTICAL METHODOLOGY (April 1990) Maria E. Gonzalez (Chair) Office of Management and Budget Yvonne M. Bishop Daniel Kasprzyk Energy.Information Bureau of the Census Administration Daniel Melnick Warren L. Buckler National Science Foundation Social Security Administration Robert P. Parker Charles E. Caudill Bureau of Economic Analysis National Agricultural Statistical Service David A. Pierce Federal Reserve Board John E. Cremeans Office of Business Analysis Thomas J. Plewes Bureau of Labor Statistics Zahava D. Doering Smithsonian Institution Wesley L. Schaible Bureau of Labor Statistics Joseph K. Garrett Bureau of the Census Fritz J. Scheuren Internal Revenue Service Robert M. Groves Bureau of the Census Monroe G. Sirken National Center for Health C. Terry Ireland Statistics National Computer Security Center Robert D. Tortora Bureau of the Census Charles D. Jones Bureau of the Census PREFACE The Federal Committee on Statistical Methodology was organized by the office of Management and Budget (OMB) in 1975 to investigate methodological issues in Federal statistics. Members of the committee, selected by OMB on the basis of their individual expertise and interest in statistical methods, serve in their personal capacity rather than as agency representatives. The committee conducts its work through subcommittees that are organized to study particular issues and that are open to any Federal employee who wishes to participate in the studies. Statistical Policy Working Papers are prepared by the subcommittee members and reflect only their individual and collective ideas. The Subcommittee on Computer Assisted Survey information Collection investigated the use of computers in collecting survey information. This report covert the different ways in which small computers can be used to improve data collection.- For example, the report describes computer assisted telephone interviewing (CATI), computer assisted personal interviewing (CAPI), data collection using touchtone telephones, and voice recognition. More than most working papers the relevance of the information in this report will age very quickly. Various methodological issues are also addressed in this report. For example, issues discussed include human-machine interfaces, software development, hardware planning, and computer security. The Subcommittee on Computer Assisted Survey Information Collection was chaired by Terry Ireland of the National Computer Security Center, Department of Defense. i CASIC Subcommittee Members C. Terrence Ireland, Chair National Computer Security Center (Defense) Thomas Anastasio National Computer Security Center (Defense) Martin Baum National Center for Health Statistics (Health and Human Services) William Blackmore Energy Information Administration (Energy) Richard Clayton Bureau of Labor Statistics (Labor) Ann Ducca Energy Information Administration (Energy) Ralph Gillman Energy Information Administration (Energy) Maria E. Gonzalez, Ex officio Office of Management and Budget (Executive Office of the President) Stuart Katzke National Institute of Standards and Technology (Commerce) George Kraft National Institute of Standards and Technology (Commerce) Cathy Mazur National Agricultural Statistical Service (Agriculture) John Sietsema National Center for Education Statistics (Education) ii Acknowledgments The idea to develop a Statistical Working Paper on the use of computers to support the collection of survey information was first put forward by Yvonne Bishop of the Energy Information Administration. Ms. Bishop has a special interest in data collection techniques that do not involve an interviewer. With the advice of members of the Federal Committee on Statistical Methodology (FCSM), Maria Gonzalez organized a subcommittee with an expanded scope to examine a range of computer methodologies that supported the collection of information, the subcommittee on Computer Assisted Survey Information collection, (casic) . The members of the CASIC Subcommittee further expanded the report to include the three important methods of data collection, Computer Assisted Telephone interviewing (CATI), Computer Assisted Personal Interviewing (CAPI), and Computer Assisted Self Interviewing (CASI). For each related technological area from software interfaces to computer security, the, CASIC Subcommittee investigated and wrote sections of the working paper that showed the application of these areas to CATI, CAPI, and CASI. The CASIC Subcommittee thanks the members of the FCSM for their advice and comments on several drafts of the working paper. Special thanks go to Charles Caudell (HASS) and Joe Garrett (Census). for their in depth comments on the various drafts. iii Click HERE for graphic. COMPUTER ASSISTED SURVEY INFORMATION COLLECTION (CASIC) TABLE OF CONTENTS Part I. Executive Summary 1 A. Introduction 1 B. Computer Assisted Survey Information Collection 2 Part II. Introduction 3 A. Objectives, Scope, and Users 3 B. Federal Information Processing Standards 8 C. Organization of Report 9 Part III. Options for Automated Statistical Surveys 11 A. Computer Assisted Telephone Interviewing (CATI) 11 B. Computer Assisted Personal Interviewing (CAPI) 15 C. Computer Assisted Self Interviewing (CASI) 17 Part IV. Methodological Issues 25 A. Human-machine Interfaces 25 B. Software Development 32 C. Data Collection Programs 36 D. System Interfaces For Data Conversion 41 E. Computer Security 44 F. Hardware Planning 50 G. Network Planning 54 Part V. References 63 Part VI. Appendices 67 A. Costs 67 B. Quality Improvements offered by CASIC 73 C. Survey Examples 78 D. Taxonomy 94 E. Glossary 96 v I. Executive Summary I.A. Introduction Surveys have used computers since the Bureau of the Census obtained the UNIVAC I. Since that breakthrough,, the power of rapid calculating has been applied to almost every phase of the survey process, including sample design, sample selection, and estimation. The most important implication of these applications is that survey practitioners can now consider a growing range of techniques that were not affordable, or even thought of, before the availability of inexpensive and fast calculating capability. The last major survey operation to benefit from automation is data collection. Computers were first applied to collection using mainframes to control certain aspects of telephone collection, and computer Assisted Telephone Interviewing (CATI) was born. The first applications of CATI provided a flood of research worldwide evaluating the impact of this technique on the survey error profile and costs. CATI is now used to help interviewers in all collection activities, including scheduling calls, controlling detailed interview branching, editing and reconciliation, thus providing much greater control over the collection process and reducing many sources of error. Simultaneously, a tremendous storehouse of information is captured by the computer to provide additional insight into the data collection process. In just two decades, CATI has become a standard collection vehicle grounded strongly in a firm body of research. The ongoing advances in computer technology, and particularly the arrival of microcomputers, continue to offer survey practitioners more fertile ground for improving the quality of published data. The first portable computers were quickly pressed into service to duplicate the advantages of CATI in a. personal visit environment. Thus, Computer Assisted Personal Interviewing (CAPI) grew from the seeds of CATI. While CATI and CAPI represent advances for surveys requiring interviewers, microcomputers are now finding important roles in self- administered questionnaires, where interviewers are not needed. These roles take advantage of more advanced technology and the widespread availability of technology to allow respondents to complete the questionnaire without the assistance of an interviewer. Prepared Data Entry (PDE) allows respondents that have a compatible microcomputer or terminal to access and complete the questionnaire directly on their screen. Touchtone Data Entry (TDE) allows respondents to call and answer questions posed by a computer using the keypad of their touchtone telephone for well-controlled and inexpensive collection. As an extension of this approach, recently developed techniques in 1 Voice Recognition Entry (VRE) allow respondents to answer questions by speaking directly into the telephone. The computer translates the respondent's answers into text for verification with the respondent and then stores the text in a data base. These and other collection methods will continue to evolve out of the work now underway. New technology will assuredly bring more options for survey practitioners to consider. The use of these collection methods, while bringing needed improvements in the quality of collected data, has created other challenges. These automated collection methods are made possible through the close interaction of statisticians, subject matter experts and colleagues in the computer sciences. To use these methods effectively, each profession must learn and use the models and techniques of the other professions. This close relationship will continue to grow, with advances in each field supporting advances in the others. The goal of this report is to profile several automated survey collection methodologies and provide a glimpse of what future technological advances may offer to survey operations. The selection of one or more of these collection methods depends on a clear understanding of computer applications. Software and hardware selection can be essential to success,, as may be the use of networks for the computers. As with any survey method, the need to assure the confidentiality of the data gathered and stored by the computers is critical. This report discusses several data collection methodologies now being used in Federal agencies in terms of procedures, impact on quality and costs. It also discusses the significant issues surrounding the use of advanced technologies to augment survey data collection. I.B. Computer Assisted Survey Information Collection (CASIC) For this report, the Subcommittee defines Computer Assisted Survey information Collection as those information gathering activities using computers as a major feature in the collection of data from respondents, and in transmitting of data to other sites for post-collection processing. It is in this area of survey operations that technology is now having the greatest impact. 2 II. Introduction II.A. Objectives, Scope, and Users The Subcommittee on Computer Assisted Survey Information Collection was established in October 1988 to document and discuss the status and potential use of advanced technology for collecting statistical data, for its transmittal to central processing sites, and the conceptual and practical issues surrounding implementation. High quality published data begins with collecting high quality data from respondents. Much of survey processing addresses, and compensates for, weaknesses in the quality of the collected data and absence of uncollected data. The survey questionnaire, received on time, completely filled out and accurate, can reduce post-collection errors and.their related costs. The Computer Assisted Survey Information Collection Subcommittee of the Federal Committee on Statistical Methodology has studied the various implications of the vast computing power now available to support statistical surveys and is providing this information for use throughout the Federal Government. Objectives The primary objective is to describe emerging methods of interactive electronic data collection and transmission, potential benefits, and current examples of their use in Federal surveys. This report also covers techniques and appropriate references to the literature. A secondary objective is to consider specific methodologies and related issues stemming from the use of computer assisted statistical surveys. Also addressed are other practical considerations involving human-machine interfaces, software design, hardware features, data transmission and computer security. The issues involve such f actors as quality, costs, and respondent reaction to computerized surveys. Some advantages of automated surveys are: a. improved data quality from (1) the introduction of automated questionnaire branching, editing features, and computer utility support; and (2) a shorter processing path from data collection to data processing (e.g., reduced keying errors because keying of the paper questionnaire is no longer necessary). b. improved timeliness of data capture by the elimination. of some data entry steps and of extensive editing. 3 c. increased flexibility in data gathering (e.g., for conducting multiple version questionnaire surveys involving question reordering and different natural languages). In deciding which collection method to use, quality is a relative idea that is affected by a tradeoff between cost and benefit. The choice of a data collection method is usually based on a combination of performance and cost factors. Together they determine affordable quality. For traditional collection methods, these factors and the decision-making process are usually well-known. Now, as technology progresses, new methods are being tested that expand the array of potential collection tools and challenge the survey-designer to reevaluate old cost/performance assumptions. These semi-automated collection applications fall naturally into 3 areas: (1) Computer Assisted Telephone Interviewing (CATI) where the interviewer and respondent talk over a telephone, limiting their personal interactions while maintaining the substantial flexibility provided by a telephone; (2) Computer Assisted Personal Interviewing (CAPI) where the interviewer and respondent talk directly across the table, although this direct access comes with the cost of additional logistical problems; and (3) Computer Assisted Self Interviewing (CASI), a newly coined phrase to describe situations where the interviewer is replaced by interaction with the computer. - Subcategories include Prepared Data Entry (PDE) where the respondent uses a computer terminal; and Touchtone Data Entry (TDE) and more recently, Voice Recognition Entry (VRE) where the respondent interacts with a computer over a phone line. However, computer applications are not limited to obtaining data from respondents. In addition, the prompt transmittal of reported data to the processing facility and the conversion of data to proper formats are important to the publication of timely and relevant information. New options will encourage reconsideration of old assumptions about quality, cost, and technology. Decisions made years ago in an era of fewer alternatives should be reviewed periodically. Many factors can change in a short period. Only a few years ago, automation costs were driven by the scarcity of mainframe hardware capacity. Now the labor involved in developing specialized systems dominates automation costs. Portable and desktop microcomputers were not widely available at the beginning of this decade. Now, widely available, inexpensive and powerful, they are an assumed part of the work environment. The tough questions involve the selection of the appropriate system configuration. The general goal of this report is to challenge Federal survey managers to reconsider their operations in light of recent changes 4 in survey methods available, or attainable through new technology, and to reassess their methods of providing information to the public that is accurate, timely and relevant. Scope Automated data collection includes three major groups of people: the respondents, the interviewers, and the designers and developers of the system and procedures for collection. This report covers the essential factors involved in successfully including the requirements of each group. The survey operations considered in this report include the computer-related activities of design - and development of the questionnaire, interviewing, data entry, editing and follow-up for nonresponse or edit reconciliation, data transmission and data conversion. The critical activities of sample design, sample selection and estimation are not included in the scope of this report. Still, the choice of an automated collection method is important to these activities. This choice must be an integral part of the survey design. For example, the decision to use CATI to improve collection of time critical data may provide the sample designer with additional flexibility to consider techniques that require rigorous sample control or complex questionnaire branching logic. Respondents The respondent must be considered the primary user of any survey vehicle, whether automated or not, and all aspects of the response environment must be developed with the respondent in mind. The cooperation of respondents is the single most critical factor in survey operations, and they must be treated with the greatest care. Even one-time surveys must strive to leave the respondent with the feeling of contribution and importance, and the willingness to participate in future surveys. Thus, our primary job is to consider computer-related techniques that allow the respondent to answer the survey completely and accurately in a natural environment. Automated collection methods provide survey managers with opportunities to improve control and reduce sources of error. These methods also can be designed to capture workload and performance data in the background while interviews are conducted. However, these - features must not interfere with the natural interactions during the interview. The transition to automated surveys presents additional challenges. For example, in a switch from mail questionnaires to CATI, the surveyor must work with the respondents to remove their 5 uncertainties about the transition in order to retain their continuing cooperation. The arrival of a variety of automated self-response methods involving computerized questionnaires presents new challenges for ensuring that the respondent is sufficiently knowledgeable and comfortable dealing directly with the computer. As always, the respondent must be trained in the use of the collection process. Whether by simple instructions or more formal procedures manuals, the surveyor must work diligently to develop simple, clear directions for use, or risk losing the full cooperation of the respondent. For example, in the use of PDE, respondents must interact directly with computer displays. This requires understandable questions, adequate help facilities, and a clear set of allowable answers. Finally, just as managers must worry about interviewers' illness, absence, vacations, and vacancies, designers of automated self-response systems must include emergency back-up procedure to assure that respondents can complete the survey. The design of the human-machine interface requires a clear understanding of what the respondent expects. Do people react to questions differently when presented on paper compared to telephone interviewers and still differently if posed from computerized displays or computerized voices? Also, what information is lost by changing from personal visits, where the interviewer can assess a variety of non-verbal clues, to telephone collection, or automated self-response where voices are not directly heard? What are the differences in application of these techniques in household versus establishment surveys? While new automated methods provide many features attractive to survey designers, new responsibilities come with their use. The respondent must be assured of the confidentiality of the data provided. Confidentiality is the cornerstone of respondent cooperation, from the interview through final processing, estimation, and storage of microdata. Whereas face-to-face interviews provide an environment where the respondent can assess and control access by others, use of telephone collection and transmission of self-reported data creates new problems in confidentiality. The integrity and authenticity of the respondents answers during the transmission process is a related issue. The ability to transmit large volumes of data from remote sites may only partially solve collection, problems in some surveys that require actual signatures and protection of the transmitted data. Interviewer The second most important user is the interviewer. The systems provided to help in the interview process must be easy to use, must work consistently and must provide improvements in the interview environment. Early use of CAPI required interviewers to 6 carry the first generation of portable computers to the respondent's home. These heavy machines were often left in automobiles until the interviewer could decide that the respondent was home. The result was reduced productivity and higher costs. Interviewers must believe that computer assistance will improve their effectiveness. They need to be convinced that the computer is simply a tool to speed and simplify their work. CATI, CAPI and CASI support specific wording for each question, and simplify moving to the next question, which is often dependent on previous answers. However, these systems can be over-developed so that interviewers are left little or no discretion for judgment or contribution. The result may be low morale, indifference, deviation from established procedures, and high turnover rates. System Designers The third important user is the system designer who may use the computer environment to design the survey and to lay out the procedures for its use. Besides the ease of use to both respondent and interviewer, the decisions made early in the development process carry over to the ongoing use and maintenance of the system for years. The design environment is similar to that used in any software development process. Software tools that support this "software engineering" process should give flexibility to the designer and provide, for long-term maintenance of the survey. System designers face difficult choices, such as building customized systems from scratch versus linking standardized "off the shelf" software packages. The inevitable limitations must be compared against reduced maintenance and lower start up costs. 7 II.B. Federal Information Processing Standards Today, more than ever, information is the force that drives the activities of the Federal Government and information processing systems are the mechanisms that process, store, and transfer this information. Information processing standards play an increasingly important role in the strategies of Federal Agencies to make more effective use of their information processing systems by providing needed interoperability of systems and equipment, portability of data and software, and methods for protecting data and computers from accidental and intentional harmful events. CASIC systems, like other Federal information processing systems will be more effective if they implement standards that provide for interoperability, portability, and security. Within the Federal Government, the National Institute of Standards and Technology (NIST) has the responsibility of promulgating Federal Information Processing Standards and Guidelines for hardware, software engineering, electronic document interchange, data management, ADP operations, computer security, and ADP related telecommunications. in addition, NIST develops conformance tests for its standards where appropriate. Developers of computer assisted statistical survey systems should use NIST's standards and guidelines whenever possible during - the design, implementation, and operation of their systems. A reference to NIST's standards program and available standards and guidelines can be found in Section V under the heading of "Standards." Additional information about NIST's program may be obtained from: Program Coordination and Support Group National Computer Systems Laboratory Building 225, Room B151 National Institute of Standards and Technology Gaithersburg, MD 20899 Telephone: (301) 975-2833 8 II.C. organization of the Report This report is intended to provide reference and guidance for survey practitioners across the Federal Government in planning and refining data collection methods. By sharing information and experiences, others may gain and add to the effectiveness of governmental survey activities. The potential audience is much broader than those involved in statistical surveys. Many of the methods described and the technological issues discussed are applicable to any information collection activity, including the collection of management information, program cost, productivity,, and workload data. Part III covers the 3 major areas of CATI, CATI, and CASI where the computer supports survey information collection. Each major application is defined and current survey application experiences are described. Each discussion describes the impact on specific survey error components and potential for future applications. Part IV provides a discussion of broad technological and developmental issues in the use of computer assisted surveys. The areas selected for consideration are: the human-machine interface; software development; data collection systems; systems interfaces for data conversion; computer security; hardware planning; and network planning which includes electronic mail. Part V contains references organized by categories consistent with the organization of the report. Part VI contains the appendices. Appendix VI.A provides a discussion of cost measurement relating to use of computers to collect survey information. Appendix VI.B provides a general discussion of the improvements of quality that can be expected with the use of computers. Appendix VI.C provides a series.of survey efforts currently underway, with a point of contact for additional information. Appendix VI.D lays out a suggested classification model for surveys that depend on computer support. It is consistent with the various models in the body of this report. Appendix VI.E contains a glossary of words in active use where computers and surveys come together. 9 III. options for Automated Statistical Surveys III.A. Computer Assisted Telephone Interviewing (CATI) Definition Computer Assisted Telephone Interviewing or CATI is a computer assisted survey process that uses the telephone for voice communications between the interviewer and the respondent. CATI replaces the traditional paper-and-pencil questionnaire interviewing. The questionnaire is displayed to the interviewer by the computer who then relays the question over the telephone to the respondent. The answers are given to the interviewer for entry into the computer. The collections of questions are structured so that computer examination of previous answers can be used to select the next question in sequence. Computer-generated help facilities can be initiated by the interviewer on command. The interview environment can be computer generated or handled manually by the interviewer. As the CATI systems grow in sophistication, many manual functions will be taken over by the computer: sampling unit selection, scheduling of telephone calls, automatic dialing, and callbacks to respondents who are not reached on the initial call. Data collected by CATI should have significantly fewer errors than manual methods because the interviewer can validate directly respondent's data that fails internal and historical edit checks. Time and cost requirements for data collection, validation, and data conversion should be reduced. Computer controlled questionnaires make it possible to use more sophisticated designs than can be administered with paper-and-pencil forms. They can include complex logic structures and questions finely tailored to the circumstances associated with a specific sampling unit. Examples of Current Use The exact number of CATI installations throughout the world is unknown. It probably is more than 1,000 considering the number of countries, universities, and private sector vendors and survey research installations involved in surveys. In 1988, the U.S. Government had 51 cooperating CATI centers. Both opinion and factual data are collected using CATI. Most questionnaires contain a mix of these data types. Questionnaires range from several questions with very little data validation to several hundred questions customized for specific respondents providing the ability to collect conveniently the same data in different respondent environments. 11 The National Agricultural Statistics Service (KASS) within the United States Department of Agriculture (USDA) executed its first CATI questionnaire (Multiple Frame Cattle Survey) during 1982 in California using four workstations and completing 100 interviews. The questionnaire consisted of 41 questions. Today the largest known CATI questionnaire is the December Agricultural Survey. it is used in 14 states with questionnaires customized for each state. This survey has over 200 questions with production items recorded in units convenient to the respondent and converted to a common unit for data validation and recording purposes. Today, HASS conducts a total of nine recurring CATI surveys. The surveys are monthly, quarterly and annual. In 1988, NASS completed 125,000 CATI interviews using 183 data collection work stations in 14 remote sites located in state statistical offices. Besides the recurring CATI activity, NASS conducted three special data collections in 1988 and two already were scheduled for 1989. The questionnaires were developed over a very short period. Training time was short. The data collection period was somewhat short (3 days - 2 weeks). NASS found that CATI lends itself very well to applications with short implementation schedules. Field testing of the questionnaires is efficient because once a problem area is identified, the questionnaire can be modified and tested on another respondent in generally less than an hour. Also, the Bureau of Labor Statistics (BLS) currently uses CATI in 17 States to collect monthly data on employment, hours and earnings from 6,000 respondents. BLS further uses CATI (1) to collect Consumer Price Index (CPI) housing data; (2) to. collect hours at work and hours paid as an input to productivity measures; and (3) for special purpose studies to support Department of Labor initiatives. In addition, BLS uses CATI methods to conduct telephone record check surveys to improve data quality. Computing Environment The Uses of CATI are limited only by the capability of telephone technology and the use of personal interviewers. CATI is one of several phases of the total data collection process. It can be used for nonresponse follow-up where initial contact is made by CATI, mail or capi. The ability to use varied data collection techniques is contingent upon the ability to develop computer questionnaires with common software that can support the various data collection options. Common software is important to assure the same data is collected and the same validations are applied. The computer has to be responsive in delivering sample units and questions-to the interviewer. The computer response times for both interviewer and respondent must be less than what they would 12 perceive as an unnecessary delay. For example, experience has shown that longer than a second between questions is too long for an impatient respondent. Longer than half a second wait for the display of the next question is too long for the interviewer. During this period the computer may be required to access several databases and do complex mathematical computations which would include logical decisions affecting subsequent questions. The computer must deliver a different sampling unit in less than 10 seconds, and ideally in less than five. During this period the machine may have to query several potential respondent queues that relate to scheduled callbacks in different time zones,- to previous busy signals to be retried every 15 minutes; to special handling of specific respondents by specific interviewers; to the generation of new sampling units; and to the disposition of the completed interview as correct. The software that drives the questionnaire must be easy for the interviewer to use. Question paths through a questionnaire must be simple and easy for the interviewer to handle. Menus with abbreviated questions or questionnaire areas are desirable. Skipping back to an earlier question, changing that answer, and establishing another routs through the questionnaire must be easy and quick to do. Commands must be standardized for use in related surveys to enable "second nature" reactions by the interviewer in any given situation. The design of a CATI questionnaire poses problems beyond the design of standard questionnaires. If the designer has problems developing the questionnaire, the interviewers will almost surely find it difficult to use. The objectives of the survey questions in a computerized questionnaire may be no more complex than questions used in pencil-and-paper surveys. However, the flexibility provided by automated question paths makes their design more difficult as the possible sequences of questions must be worked out during design. Paths and branching must be worked out in advance and there may be significant differences in question wording and in their number. Automatic sampling unit management can pose some difficult logic problems for the automated survey designer. Data validation using historical or internal data correlations is a complex logic problem, but is essential for recurring surveys. Well designed computer environments provide the interviewer with the ability to review the respondent's answers for correctness and to annotate unusual circumstances. Before the computer questionnaire designer can begin, the questions must be developed by the survey staff using knowledge of statistical theory and specific subject matter. This survey staff also must be well versed in face-to-face, self-administered, and telephone questionnaire design. In the face-to-face interview the interviewer can offer explanations of the question, then probe for additional information; and if necessary,, provide the respondent 13 with the paper version of the questionnaire. The respondent can study, read ahead, reflect, and finally answer with a clear understanding of the meaning of the question. For a self-administered questionnaire, the respondent no longer has the benefit of the interviewer, but still can examine the questionnaire in detail. in telephone interviewing the respondent may not have the form in-hand and thus may be missing the visual clues needed to understand the question. Therefore, questions used in telephone interviewing should be structured using single concept questions. Some simple applications rely less on posing very structured questions and more on a "forms-screen" approach. This approach replicates the survey form on the computer screen. Edit failures may be highlighted, perhaps with a different color, and the interviewer is trained to ask probing questions to reconcile suspected inconsistencies in the responses. 14 III.B. Computer Assisted Personal Interviewing (CAPI) Definition Computer Assisted Personal Interviewing (CAPI) is a personal interview conducted usually at the home or business of the respondent using a portable personal computer. In many respects it differs from CATI only in the presence in the same room of the interviewer and the respondent. As with CATI, the questionnaire is programmed into the computer with all the necessary logic to control the question path -- the logical flow of the questions based on such factors as previous answers -- and provides both for computer generated editing by pointing out inconsistencies to the interviewer and for direct editing by the interviewer. The system must be self-contained as the interviewer does not have immediate access to supervisory assistance or to other data sources. The interviewer reads aloud each question as it appears on the screen and records the respondent's answer in the computer while providing interactive assistance to the respondent. Examples of Current Use CAPI is currently being used by the National Center for Health Statistics (NCHS) for the implementation of the National Health Interview Survey (NHIS). The Census Bureau is performing the field data collection for NCHS. The NHIS is a household survey conducted in approximately 50,000 households per year. CAPI has been used to collect a portion of the survey data: the AIDS supplement questionnaire that requires approximately 15 minutes to complete. The 1990 Health Promotion and Disease Prevention Questionnaire of the NHIS will be fielded in January 1990. Major tests of CAPI have been conducted by the Bureau of the Census and the Research Triangle Institute. National Analysts conducted a nationwide CAPI for the USDA sponsored 1987 Nationwide Food Consumption Survey. The Bureau of Labor Statistics used CAPI for establishment record check surveys. National opinion Research Center also is experimenting with CAPI. In Europe, CAPI has been used by the Netherlands Central Bureau of Statistics to collect data for the Netherlands Labor Force Survey. The U.K. Office of Population Censuses and Surveys has also carried out a major test of CAPI. Most of these efforts are at an early stage of CAPI development. Potential Uses CAPI can be used for all household surveys and establishment surveys, and the software can be used f or any of the other automated data collection mechanisms. As the technology improves to provide lighter computers with longer battery life and user friendly software, CAPI will be used more often, particularly for quick turnaround surveys. Procedures for developing CAPI 15 questionnaires are similar to those for CATI. However, greater emphasis must be placed on help features because the CAPI interviewer cannot rely on nearby experts. The type of resources and expertise needed to apply CAPI technology to a survey are dependent on the availability of a good authoring system. If an authoring system is readily available, the CAPI, survey instrument can be prepared by the typical survey instrument designer with little or no computer experience. Computer programming assistance will be needed to write the case management and output portions of the software. Usually these portions of the survey vary with each survey or survey instrument; therefore they must be custom programmed. On the other hand, if an authoring system is not available, the entire CAPI instrument must be custom programmed with either a general purpose language or a special purpose CAPI language. In either case, computer programming expertise is required. The level of expertise is dependent on the language selected. in addition, the survey instrument preparation will require the services of a survey instrument designer who will need to work very closely with the computer programmers. 16 III.C. Computer Assisted Self Interviewing (CASI) Definition Computer Assisted Self Interviewing (CASI) has been introduced into this report as a category to cover a new but growing area of computer assisted surveys that involves data collection without the direct presence of an interviewer. CASI can take several different forms that are differentiated by the collection method. These include Prepared Data Entry (PDE) where the respondent answers questions displayed on a computer terminal; Touchtone Data Entry (TDE) where the respondent answers computer generated questions by pressing buttons on a telephone; and Voice Recognition Entry (VRE) where the respondent answers questions by speaking directly into a telephone. We consider each in turn. Background Self-response data collection has always been used for many surveys that are mailed out. This form of self-response collection features simplicity in administration leading to low initial overhead when compared to CATI and CAPI. However, mail self-response necessarily involves a reduction in control over the collection process. It is difficult f or the survey practitioner to assess the status of the collection effort, e.g., whether the responses are in transit or still in the respondents' hands. Extensive mail or telephone follow-up involves great costs, perhaps offsetting the original simplicity of mail, and-risks ongoing cooperation, especially if the response is "in the mail." In annual or quarterly surveys, mail may be the appropriate vehicle. In time critical surveys, the characteristics of mail collection leave wide gaps in control. Computer Assisted Self- Response methods now being introduced into surveys hold great promise to maintain the advantages of mail self-response, while improving control and the ability to intervene in the collection process. Definition - Prepared Data Entry (PDE) Prepared Data Entry (PDE) places the respondent in direct contact with a computerized questionnaire through a computer terminal. in a sense the computer is acting as the interviewer in a manner similar to CATI or CAPI interviewers. The respondent uses a personal computer or terminal to fill out interactively the survey questionnaire. As each item appears on screen, instructions and definitions for that item appear on a split screen or are accessible by pressing a help key. As data are entered, range and consistency checks are automatically applied and 17 anomalies pointed out to the respondent. The response to previous items may control the question path of the questionnaire. Because of the lack of an interviewer to help the respondent, the guidance provided by the program must be substantial and the computer literacy of the respondent is essential, at least at this stage of development. This category of automated data collection programs includes a rapidly expanding set of respondent initiated data entry and transmission methods. These methods are directly dependent upon the computer and telecommunications hardware available to the data providers. Individuals, small businesses, or reporting agents can enter data into a personal computer in response to pre-programmed floppy disks and mail the disks to the collecting agency. Firms with modems can transmit the data through telephone lines directly to the collecting agency's mainframe, or via an electronic mail service. Larger firms with mainframes can download the data to a PC, then either transmit directly from the PC over a modem to the agency's mainframe or place the data on a diskette and mail it to the agency. These methods eliminate the need for rekeying the data and suffering the risk of data entry errors. The transmission methods using telephone lines save several days in each collection cycle by eliminating dependence on the physical transportation of machine- readable data whether by mail or special couriers. The data must be checked to detect and correct errors introduced during transmission. Examples of Current Use In the early 1980's, the Internal Revenue Service (IRS) decided that the electronic transmission of returns by tax preparers to IRS would be both a practical and cost-beneficial alternative to the mailing of paper tax returns when a refund is claimed. According to the Agency, the benefits of electronic filing would include: (1) reduced manual labor costs required to process, store, and retrieve returns; (2) faster processing and retrieval of tax data; and (3) reduced interest IRS must pay to taxpayers who file timely refund returns that are not processed on time by the IRS. Further, IRS reports show that electronically transmitted returns are processed with significantly fewer errors than paper returns. According to IRS figures for the 1988 filing season, as of April 29, 1988, 20 percent of paper returns processed by IRS had errors and only 5.5 percent of those filed electronically had errors. For taxpayers, electronic filing can mean refunds up to 3 weeks sooner, and because IRS can deposit these refunds directly into taxpayer bank accounts, refunds may arrive 3 to 4 days earlier 18 than that. For tax preparers, the ability to provide electronic filing services to taxpayers promises a competitive business edge. The Petroleum Supply Division (PSD) of the Energy Information Administration (EIA) decided in 1987 to investigate electronic forms submission to collect the Petroleum Supply Reporting System (PSRS) survey forms. Ten of the major petroleum companies who file the mandatory "Monthly Refinery Report" were contacted to assess their PC and communications capabilities. The respondents contacted showed interest in investigating the use of PC's to collect this data. Often they were already using PC's for business, personal or academic purposes. The respondents either had a PC in their office area or had access to one in another office. Software such as Lotus 1-2-3 and dbaseIII could usually be found on these PC'S. Some PC's were equipped with communications capabilities and those respondents were already using telephone lines for company reporting. It appeared to be the appropriate time for the PC to enter the PSRS data collection process. Early in 1988, PSD developed the Petroleum Electronic Data Reporting option (PEDRO) and began providing its respondents with a software diskette by which they could create an electronic image of the form on a PC screen and enter their data in the appropriate cells. Firms having the necessary software capabilities can use their data base to feed data directly to the electronic survey form eliminating keying and transcription errors. User-friendly software with help functions has been added to data entry functions to provide quick reference to definitions, conversion factors or other information to speed the completion of the survey form. This eliminates the need to search hard-copy files for survey forms instructions, product definitions, conversion tables, etc. Definition -- Touchtone Data Entry Touchtone Data Entry (TDE) has been used for many years in the private sector for a growing range of applications. TDE, also known as voice response, is used for banking by telephone, call routing, college class registration and "talking yellow pages" to name just a few. The process is simple. The caller initiates a call to a computer which asks a series of questions. The caller answers using the touchtone keypad and the tones are recognized by the computer. The process offers inexpensive collection because there are little ongoing labor costs after development. In a survey environment, TDE may be applied where the desired responses are numerical, or when responses can be linked to a numerical code, such as "yes" is "1" and "no" is "0." As in other applications, the respondent initiates the call to the collection computer which controls the flow of the interview. The computer asks questions in either a synthesized voice or from a file of 19 digitized phrases prerecorded by a human speaker. After each question, the respondent keys the answer. The computer also repeats each entry for verification directly with the respondent, and an acknowledgement is required, such as "1" equals "correct." TDE offers many advantages over other collection methods. In repetitive surveys, the respondent retains a single form for monthly or quarterly calls, reducing.the costs of both postage and the labor involved in mail handling, both outgoing and incoming. Costs for data entry and data verification are eliminated. Most importantly, the uncertainty about sample status is minimized. The status of the sample can be assessed through analysis of the received calls versus the list of active TDE respondents. Informed judgments can be made about the timing and extent of the nonresponse workload. No time. is. lost while survey forms are in the mail or waiting for data entry. This is especially important for time-critical surveys. TDE also offers convenience for the respondent. The computer is always available to accept the calls. For busy respondents who are frequently out of the office or away from home, in meetings or traveling, this feature may be preferable to scheduling calls in advance and risking interruptions and repeated callbacks. TDE reporting may require less time than CATI. TDE has some limitations that should be carefully addressed in each survey environment. First, not all respondents have touchtone phones. Thus, implementation of TDE would likely be in combination with other collection modes, adding to the complexity of survey management. As with mail collection, the respondent also may need to be reminded to call in, although a simple advance notice postcard has proven very successful when properly timed. Examples of Current Use The only known survey application of TDE is the Current Employment Statistics (CES) survey at the Bureau of Labor Statistics (BLS). The CES program covers over 300,000 non-farm business establishments monthly. The, data items are few, essentially employment, hours paid, and earnings, and the CES is conducted by mail in conjunction with each state, the District of Columbia, Puerto Rico, and the Virgin islands. collection of CES data is time critical. Preliminary estimates are published after 2 weeks of collection. Thus, the time lost due to the variability of the mails has A severe impact on response rates. Initial experiments were done using CATI. Large scale tests of CATI collection, involving 13 states and over 5000 respondents monthly, successfully showed the ability to collect data from the vast majority of respondents in time for the first publication. More than half the CATI sample was drawn from chronically late 20 respondents. Response rates are routinely 85 percent versus 50 percent for mail. The higher costs of CATI stimulated interest in TDE self- response. The results of small scale tests in 4 states suggest that TDE can retain high response rates over a sustained period. Calls average less than 2 minutes, and about 25 percent of respondents are given short reminder calls just before the collection deadline. BLS is expanding TDE use to over 15 states during 1990. Procedurally, the combination of advance notice postcards, timed to arrive during the reference period, and short nonresponse calls provide a strong, inexpensive collection process. TDE respondents receive a package of materials that explain the new collection method, how it differs from mail and telephone collection. First-time TDE users are requested to call the computer on a test basis using special codes before they are asked to submit real data. The machine readable data are uploaded to mainframes for further editing and reconciliation. The respondents chosen for the first TDE tests were drawn from those under CATI collection. In this way the higher costs of CATI can be offset by savings from TDE. Other TDE tests targeted mail respondents who generally reported on time. The widespread use of touchtone systems has spawned an industry- wide working group to standardize features (e.g., the key on the telephone) to simplify user access. Definition -- Voice Recognition Entry Voice Recognition Entry (VRE) is Just developing as a technology. The characteristics of VRE are essentially the same as TDE. The respondent initiates the call to the computer, but instead of using the touchtone keypad, the respondent speaks to answer, in this application the spoken digits 0 through 9 and "yes" and "no." Both "oh" and "zero" are recognized. There are two essential features for VRE systems. First, they should provide speaker independent recognition, meaning that almost any voice can be recognized without any "training" of the system. Some systems require extensive training of the software for each voice. While this is used in some office dictation systems, it is probably impractical for survey operations. Also, systems should provide for rapid entry of responses using continuous or connected digits. These features are commercially available for both microcomputers and minicomputer applications. VRE also has limitations in application. First, VRE is only applicable to respondents with access to a phone, a small but 21 unavoidable problem. Recognition accuracy is the primary determinant of respondent acceptance. The system in use at the Bureau of Labor Statistics was designed using speech profiles drawn from the mid- western states. Dialects from other regions may reduce the accuracy of the recognition leading to respondent frustration and low acceptance. Early test results suggest that recognition remains high in Maine, the home of a very difficult dialect for the speech interpreting algorithms. More testing is planned to decide the limits of current technology. Improving, recognition accuracy is the primary objective of the companies involved in speech research and development. Development of VRE is presently limited because there are few current applications to provide advance training and public acceptance. Early results suggest that respondents familiar with TDE and VRE prefer the later as more "natural." This finding points out the differences in questionnaire design. TDE questions ask respondents to "enter" data, whereas VRE respondents are asked questions in a manner similar to CATI because the responses are spoken. Recently, experiments using voice recognition have begun to appear, conveniently providing training for future survey respondents. Also, the similarities between TDE and VRE may minimize acceptance problems. Both TDE and VRE applications at BLS use short questionnaires. These techniques may limit the length of the survey, but this requires testing. They provide convenience and low costs, but respondents may balk at long lists of questions and the current limitation on the range of allowable answers to numbers and a few words. VRE offers a variety of interesting research problems in speech recognition and natural language understanding. These systems have not yet come into widespread use. Examples of Current Use The BLS is now conducting tests of voice recognition in the CES survey. The procedures will parallel those used for TDE and will assess the effectiveness of VRE for the entire U.S. population. They will examine any limitations involving multiple telephone systems, geographic distances, and respondents, acceptance. Acceptance by respondents has been high. Potential Uses These computer assisted self-response methods have wide potential applications. Ideal surveys are repetitive, short and numerical, especially if the data are entered into a computer before the call is made. 22 TDE has been considered for screening eligible respondents from the population. Since eligibility is usually determined by very few criteria, a mailed form could direct the respondent to call in the answers to one or two questions to a central computer. After entering the unique identification number, the respondent would answer these questions. Then the survey manager would use the machine readable file for nonresponse follow-up and subsequent sampling. BLS is considering TDE for pilot tests of survey supplements and other special one-time surveys to reduce costs and add valuable control, to augment or replace the traditional mail process, and to gain experience in the design and use TDE systems. The logical extension of existing TDE and VRE technology is the linking of them into a single system. For example, respondents call the system which then asks the respondent to respond by touchtone. If the tone is not recognized, the respondent is automatically switched to a VRE component. A third feature would be available to record changes in the respondent's attributes (e.g., name or address), or to record open-ended responses for later transcription -- voice mail. Self-response methods are not limited to survey applications. Any ongoing project that collects cost, workload or other management data could use self-response methods for inexpensive collection. For example, a large copier company uses TDE for collecting billing information. Equipment renters are required to call in the monthly usage levels by entering copier usage as touchtone data. The computer then generates a bill in response to the touchtone entry. Also, the U.S. Postal service uses TDE to link callers to prerecorded tapes covering the most frequently asked questions. The BLS will begin using similar technology to answer routine inquiries for economic information. Future Voice technology is still being developed. "The NIST report argues that the most natural mode of data collection is not paper or keyboards, but speech" (William Nicholls, 1989). Recorded voices are currently being used in some surveys. Speech technology includes voice simulation which is useful today in TDE applications. While numerical and very limited vocabulary are being used in data collection, it will be some time before automated speech systems will be used to recognize free-form human speech in. a telephone interview or in a personal interview setting. Summary Some items to consider when deciding between data collection methods are as follows: 23 1. CATI offers cost saving over the personal interview setting and would be useful f or a large complex survey environment. However, it misses people without telephones. 2. CAPI retains the benefits of a personal interview setting where response rate is important, and does not require a telephone. 3. TDE is cheaper than CATI, but cannot handle the complex survey, and respondent acceptance is a concern. 4. PDE is typically used in an establishment survey. It does not require a separate-key entry stage, but requires respondents to have access to a terminal, typically a PC. 5 . VRE will see only specialized application in the medium term. Whichever technique is selected, the integration of the electronic data collection method into computer based survey system should be considered. For example, address; labels and other administrative items must be created from the sample database, then the interview proceeds, editing is done, and the resulting data are fed into the analysis or summary system. Also, the decision maker should consider whether to use a single or mixed mode of data collection. Two examples of mixed modes are the Census" integrated CATI/CAPI design, or the BLS" integrated TDE/CATI design. William Nicholls comments that "In the long run, the best data collection strategy for establishment surveys may prove to be a readiness to accept whatever combination of methods the respondent finds most convenient." The creation of new technologies and improvements to existing technologies will continue to have an effect on data collection methodology. 24 IV. Methodological Issues IV.A. Human-Machine Interfaces Introduction The design of the interface between a person and a computer can decide the success or failure of the interaction. Although the situation is improving, there is generally too little attention paid to the effect of interface design on user performance. Interface design is often not considered until the last stages of software development when the total design has already been "locked-in." Automated surveys will involve people with widely differing abilities using machines ranging from manual data-entry devices to powerful,computers. Interface issues will reflect this diversity in people and machines. There is no one interface that will satisfy all needs. The relative importance of a given interface issue will depend entirely on the context of person-machine environment. Nonetheless, there are some guiding principles of user interface design. CASIC benefits from consideration of user-related factors in interactive systems, interaction styles, interaction devices, response time considerations, system messages, printed manuals, online help, tutorials, and development styles. Many of these topics involve detailed consideration of how to present the computer power to the user. For example, interaction styles can be broken down into command languages that the user must learn before using the computer, menus that guide the user through the necessary procedures, and the direct manipulation of objects whose icon representation appears on the screen. similarly, interaction devices can take on many forms -- keyboards, function keys, pointing devices, speech recognition, displays, printers, etc. Techniques for automated information collection include CATI CAPI, computer assisted self-response surveys, and prepared data submission on tape. Except for tape submission, these techniques involve user interface design considerations. All must be successfully used with little or no training. The user interface must be "self-evident." Error recovery is important. The user must be protected from making errors wherever possible. When it is possible for the user to err, the recovery procedures must be positive, helpful, and easy to follow. User of the Interface It is essential to determine who the user of the interface will be before designing the interface. In automated statistical surveys, a user may be a well trained and highly motivated survey 25 professional. At the other end of the range, the user may be a first- time or only grudgingly cooperative survey respondent. Even within somewhat narrow user populations, there will be differences among users that can affect the usefulness of the interface. It may not even be possible to design an interface that perfectly suits a single user because the user is subject to changes over time due to personal factors, new experiences, and changing needs. A user-interface design team should include an applied psychologist to help determine the psychological profile and needs of the user. The personality, training, and experience of the potential users are large factors in determining the most appropriate interaction style or styles for the user interface. Interaction Styles The choice of interaction style is also affected by the hardware to be used in the survey. Survey techniques that make use of computers with standard input/output devices can use command languages, menus or direct manipulation. Command languages are used to interact directly with the operating system of the computer. They allow a wide range of system functions -- storage, deletion, copying and printing of files -- to be done. The cost is a steep learning curve to master the commands. Command languages, while hard to learn, are also easy to forget. They can be intimidating to novice users who realize that information can be lost or damaged by poorly chosen commands. On the other hand, a person familiar with command languages can work rapidly and effectively. For some people, mastery of a command language is a source of pride which provides a sense of satisfaction and motivation for good job performance. Menu selection represents another approach to interaction style. Menus present the user with a set of only those choices that are appropriate at a given time. The choices are often numbered or lettered so the user can choose by entering the appropriate number or letter from a keypad or keyboard. Sometimes the choices are keyed to the first letter of the line containing the choice. Then, the designer must be sure to avoid duplicate use of the starting letters. Some menus use pointing devices such as cursor keys, a trackball, a joystick, or a mouse to highlight choices. The user moves the pointing device to make a choice, then pushes a button to make the selection. Also, menus may offer only single-line choices. For example, a menu may ask for confirmation of a request by entry of y (for yes) or n (for no). Menus are often organized hierarchically in graphs - data structures used to represent relationships among objects. Family trees are a form of graph that show the relationships of a person to other family members. Airline route maps are graphs that show paths the airline follows in flying between locations. With menus, the user is essentially "flying" by making selections from the 26 graph of menus (the technical term is "walking"). Selection of one item from a menu takes the user on a different path through the graph than does selection of another item. Graph structures can ease the design problem for complex user interfaces, but also can lead to user confusion. The user must be able to maintain a sense of location in relation to previous choices made. The user also must be given easy access to "escape hatches" if an unwanted path (undesired choices) has been walked on the graph. CATI and CAPI designs rely heavily on complex branching structures to control the interview. The menus and list of allowable responses must be clear, exhaustive and enable the interviewer to retain effective control. Direct manipulation (DM) interfaces offer a third approach to interaction style. in DM, the user is given the impression of directly interacting with the objects of interest. As an example of a DM interface, consider a modern word-processing system. The screen representation of the document is made to be as close to the appearance of the finished document as possible. This is sometimes called WYSIWYG, (pronounced "whizzi-wig"), for "What You See Is What You Get." The user operates directly on the screen representation of the document and immediately sees the results of the operation. Many commercially available graphical interfaces show how far DM can go toward helping the user. A mouse is typically used as the pointing device to objects on the screen. A typical screen object is an icon that symbolically represents the object. To delete a file, for instance, the user simply points to the file name and "drags" it over to a trashcan icon. Menu selection and direct manipulation are important user interface techniques in situations that involve novice users with little opportunity for training. Although the interfaces must accommodate novice users, they also must be flexible enough to avoid frustrating more experienced users. Direct manipulation can accommodate novice and experienced users equally. Menu systems should allow experienced users to "select ahead" or to revert to a command language style of interaction. Survey techniques that do not use more-or-less standard computers will raise unique interface issues. Alphabetic input, such as name entry, in telephone keypad-entry systems raises the question of letter assignment to keys that have multiple letters on them. Disambiguation may be possible when the entries can be compared to a fixed list of permissible entries. Speech recognition and synthesis devices have the potential for radically changing the preferred interaction style in user interfaces. Although speaker-independent recognition of free-form spoken natural language is still in the future, rapid technological advances are being made in the ability to recognize automatically a subset of articulated words. Advances are also being made in the ability to synthesize natural-sounding speech under computer 27 control. The best form of human-machine interfaces in any give situation or for any specialized group of users is still a research question. This can lead to degradation of the quality of the survey due to user errors and frustration. Some survey techniques are already speech based. In CATI and CAPI, the user interacts with a speaking and listening person who is visually and manually interacting with a computer. The person conducting the survey uses common sense to interact with the respondent. Although there are substantial efforts to imbue a computer with common sense, practical use of this research remains in the future. Thus, the effective replacement of the human interviewer by a computer also remains in the future. Error Avoidance and Recovery Whenever possible, interfaces should be designed so that errors are not possible. The nature of potential errors in a given interface must be thoroughly understood to lessen the probability of their occurrence and the cost of recovering from them. When a particular sequence of operations is necessary to do a complex operation, the interface should be designed to combine the entire sequence into a single operation. This will reduce the number of operations required of the user (who probably thinks of the sequence as one operation anyway). All displays must have consistent layouts so the user does not have to spend time and mental energy scanning the screen for information. The interaction style can have a profound effect on errors. Properly designed menu systems can reduce errors by simply not offering poor choices. Choices offered must be clearly labelled. The consequences of a choice must be shown before the choice is made. There must be consistency between menus. For example, a choice common to all menus (such as Cancel Menu), must appear in the same place in each menu and must have the same consequence (such as reversion to the previous menu). Error messages should be designed to help the user. The messages should be specific, positive in tone, and constructive. They should tell the user what can be done to correct the error. Whenever an error is made, the user must have a clear and easily followed path to recovery. This not only reduces the seriousness of the consequences of the error, but increases the user's confidence even in the face of a few errors. Adequate training can help to reduce errors and increase respondent acceptance. Certainly, respondents should be trained before using the system. Good training can be reinforced by providing on-line or telephone-accessible help and on-line tutorials. on-line or telephone-accessible help gives the user an 28 immediate reminder about proper operation of the system. On-line tutorials allow the user to review the correct procedures. Design of Automated Form In general, automated forms should not be automated versions of the manual forms they replace. They should be designed from scratch to consider to make use of opportunities and limitations introduced by automation. Sometimes, it might be appropriate to maintain the same "look and feel" between a manual form and its automated counterpart. For instance, user training might be reduced by minimizing changes. In these cases, the form designers should compare the benefits of staying with the old form with the costs of designing a new form. Automation provides opportunities for higher productivity, lower errors, and greater user satisfaction over manual methods. Repetitive information can be automatically filled in from one form to another. Automatic editing for internal consistency and logical consistency should help to lower error rates. Automated forms also can provide on-line help and tutorials for the user. Automated forms need not even look like paper forms. The user can be led through an interactive dialogue while the computer does the data formatting. Form fill-in is just one interactive style. Menu selection has already been mentioned as another style. Form designers should consider using hypertext, a recent development in interactive systems which provides a browsing environment. For example, the reader can display a definition simply by pointing at a word or phrase with a mouse. Hypertext would allow non-linear traversal of forms, as appropriate for the data being filled in. For example, in surveying for medical information, gender data can be used to steer the user around inappropriate survey questions. Form designers should have a repertoire of techniques for designing and testing forms. Expert systems might be developed to help in form design and interaction design. Effort placed in designing expert systems would pay off handsomely in easing individual design tasks. Such systems also should produce forms that are more consistent and complete than forms produced in a paper environment. Quality Measures It is critically important to test user interfaces before presenting them to the users. Professor Ben Shneiderman of the University of Maryland has identified five goals that lend themselves to precise measurement: 29 1. Time to learn - how long does a typical user take to learn to use the system? 2. Speed of performance - how long does it take to carry out a benchmark set of tasks? 3. Error rate - how many and what kinds of errors are made by typical users? 4. Subjective satisfaction - how much do users like using the system? 5. Retention over time - how well do users maintain their knowledge? It is not enough to guess how well a system meets these quality measures. It is essential to test the system. A testing laboratory is essential for any significant design work. Design groups may build in-house laboratories, or may seek help from existing laboratories. It often happens that persons who are skilled in computer programming, data collection techniques, or statistical methods are not fully aware of the skills and deficiencies of the user population. It is not a good idea to concentrate the entire design effort in the hands of task specialists. The human factors role must be an integral part of every design team. Large teams might include psychologists, sociologists, and other human factors specialists. Smaller teams should at least assign one team member the role of human factors specialist. If nothing else, this person can play "devil's advocate" to be sure the appropriate questions are raised. Data about user performance under current conditions must be collected before beginning new systems. It will not be possible to determine the relative quality of a new system unless quantitative measures of the quality of the old system are available. The first task of the design team must be to develop guidelines for the design. Such items as menu selection formats, terminology, screen layout, data entry formats, error messages and recovery procedures, on-line help, and training should be considered and decided upon before any other significant design work is begun. Rapid prototyping is a powerful technique which allows .iterative convergence to a design. Partial system implementations are made quickly, presented to potential users, and tested. Further development is based on these interim tests. Because each step in the development cycle is small, and tested incrementally, only small corrections in direction are needed at each step. Conceptual errors are quickly uncovered and are easy to correct. Rapid prototyping methods contrast sharply with the more conventional "waterfall" design methodology. The waterfall method requires detailed up-front specification of the design, with a 30 full-blown design f lowing down to a full-blown implementation. While this method may be appropriate in situations where the goal is clearly understood at the start, it has the disadvantage that changes made in any phase of the design tend to be large and expensive. This usually discourages change and leads to Acceptance of a lower,quality product or total abandonment of the design. A disadvantage of rapid prototyping is that formal specifications and documentation may never get produced in the flush of excitement over the rapidly evolving (and working) system. The waterfall methodology is appropriate as the final phase of a rapid prototype design. Because rapid prototyping quickly produces a working model and deep understanding of goals and tradeoffs, waterfalling can be effectively used to provide the missing rigor and discipline. Evaluation must continue. even after a design has been completed and fielded. on-line suggestion boxes and trouble reports, designed right into the survey forms, provide easy channels of communication between the user and the designers. A user who suggests improvements or reports trouble should receive prompt responses and fixes. Large surveys might consider the use of a commercial bulletin board system as the communications medium for problems, suggestions, and fixes. 31 IV.B. Software Development Introduction There are two types of software that will be discussed in this section: software that helps in the creation of a survey questionnaire and software that makes up the actual programming code to execute the survey questionnaire in the field. This distinction is directly analogous to the usual notion of a highlevel programming language (e.g., FORTRAN, COBOL) in which you describe the problem in terms that humans can understand. This high-level description is then passed to a compiler that translates the description into an application program the computer can understand. For convenience, refer to the survey creation software as the survey definition process and to the use of the resulting application program as the survey application process. Most of the discussion will relate to the creation software. Historically, software development for automated field data collection began with a mainframe application for CATI. As hardware technology progressed, CATI was moved first to a minicomputer and then to a microcomputer. The CAPI application became possible with the development of the "light weight" portable microcomputer. Software to produce an automated questionnaire is perhaps the most important and potentially the most costly ingredient in the automated field data collection equation. Ideally, such software should be available off- the-shelf . Although there have been several attempts to develop such software, success has been limited. To date, the development of automated questionnaire software has been done in one of two ways. The questionnaires are custom programmed using one of a variety of general programming languages (e.g., Pascal, C, FORTRAN), or they are custom programmed using a specialized CAPI/CATI programming language. The specialized languages generally provide a means to describe a variety of attributes: the question text; the answer text; the type of answer expected, (e.g., single, multiple, fill-in, free text) ; question paths (e.g., simple -- go to next question in order or complex -- based on the answers to previous questions, or some related calculation); response editing (e.g., restrictions to specific values or - range of values) ; and in some instances, screen layout design. In either case, the development of an automated questionnaire usually has required the skill of a computer programmer. 32 Flexibility There are several issues that need to be considered in the development or purchase of existing software for automating field data collection of survey questionnaires. Among these considerations is the level of flexibility needed. Flexibility is defined in terms of the amount of control the automated questionnaire exercises over the conduct of the survey and in terms of the features available to design an automated questionnaire. With respect to the control, consideration must be given to the extent the automated questionnaire will allow the interviewer or respondent to exercise control over the conduct of the interview. That is, should the person controlling the interview have the same control as in a paper-and-pencil conducted survey; total freedom to roam anywhere in the Questionnaire and change questionnaire answers at anytime or should the automated questionnaire be designed to limit the person collecting the data to a specific process and skip patterns or some level in-between? If so, what is that level? The answer to these questions is critical because the software selected, particularly if - it is a specialized package, might not have the specific capabilities needed to implement the desired design. The design of the questionnaire software also will be affected dramatically by the level of flexibility chosen. With respect to software flexibility,. there are several capabilities that should be considered. These capabilities are: 1. The question types: open ended, closed ended, single value, multiple values. 2. Case management: administration of each questionnaire, e.g., status of completion, restart .incomplete questionnaire. 3. Back-up: ability to back-up to any question in the survey and change an answer, with the system thereafter automatically following the skip patterns implied by the changed answer. 4. Editing: ability to perform edits such as consistency, range, and specific value or values. 5. Screen manipulation: ability to create any screen design desired. 6. Comments: ability for person recording answers to record comments associated with any question. 33 7. Skip patterns: simple and complex, e.g., skip based on answers to previous questions or some arithmetic calculation. 8. Context sensitive help: ability to get help based on place in survey. 9. Rostering: ability to handle household member enumeration, identification, and skip patterns based on the individuals. 10. output format: form collected data is stored, e.g., a flat file. ii. Accessibility of collected data: how easy is it to access the data, e.g., quality control. 12. Coding: ability to code collected data automatically or manually. 13. Authoring system: ability to create questionnaire and software to execute the survey questionnaire (program code) simultaneously with no computer programming skills. 14. Output reporting: reports about the functioning of the data collection process and about the actual data collected. This list of features is not inclusive, but does contain the most important features determining the level of flexibility. Range There are several additional factors that are important to the decision of level of flexibility and software design. These factors are the size and complexity of the survey questionnaire and the period between major changes in the questionnaire or the preparation of an entirely new questionnaire. Complexity is defined by the number of different question types, complexity of skip patterns, and need for Fostering. Size and complexity are directly proportional to software development time. The shorter the period between major software developments, the greater the requirement for a user-friendly authoring system. An authoring system significantly decreases development time and decreases computer programmer dependency. The size of the questionnaire also may impact the hardware and software requirements. Several software packages have certain restrictions that may be affected by the size of the application. 34 Automated Forms Design Unlike CAPI and CATI software, there are many off-the-shelf software packages that can produce automated forms f or computer assisted data entry. Many CAPI and CATI specialized software packages also can be used for this function. Training The amount and type of training required to use selected survey questionnaire development software is dependent upon the level of - user-friendliness of the software. For example, programming the questionnaire in Pascal would require considerably more skill and therefore more training than programming the questionnaire using an authoring system. Usually, it is necessary to have a skilled computer programmer working with the survey questionnaire designer in order to use the current software. Under these circumstances the questionnaire is most likely to be a pencil-and-paper questionnaire programmed for the computer rather than one designed for the computer. Computerized questionnaires will improve in quality as their designers come to understand and use the environment provided by the computer. Software documentation for the specific survey questionnaire should be complete enough to insure easy revision of the questionnaire by someone other than the original author. For the general programming languages there are many software packages available to help in such documentation The liberal use of comments in the computer programming code also is a good way of providing additional documentation. 35 IV.C. Data Collection Programs Introduction When producing a survey, several factors will affect the selection of a data collection method. The three primary factors are cost of resources, the time available to collect, edit, and summarize the data, and the desired quality. Because it is unusual to have all three in abundance, trade-offs must be considered. Several other important factors relate to the design and operation of the survey, and will affect the cost timing and quality factors. First, the survey may be one-time or ongoing. A one-time survey may want to maximize quality for a fixed cost, where an ongoing survey - may want to maximize quality for a minimized cost. With ongoing surveys automated capabilities can evolve over extended periods thereby spreading out the costs. The second factor is the target population, and whether it is a household or an establishment. The chance of finding PC's in establishments is greater than in households, although not all households have telephones. The third factor is the operational nature of the survey, that is whether the setup should be centralized or decentralized, and whether the PC's would be networked. Lastly, the sample size and complexity of the questionnaire is relevant. The remaining nine factors relate to the characteristics of the technology used to collect data. 1. The Speed at which data may be entered is determined by the technology's hardware (such as XT, AT, or 386 PC's, disk speeds, and phone lines) and software (the complexity of the questionnaire and therefore the length of the program). 2. The Size of the machine can refer to its weight or ungainliness (which is important in situations where it must be moved around) or its available memory (which limits the amount of data and the complexity of the program that can be stored on the machine). 3. The portability of a computer's software is important in situations where data collection is carried out on different computer systems. 4. The Type of Display selected may be based on environmental factors (where conditions are indoors and usually fixed, or outdoors and variable therefore screen color is important), and on the complexity of the questionnaire (and therefore screen size). 36 5 . The Mode of Data Entry varies from keyboard, to push button phone, to voice data entry. 6 . Data verification is based on the importance of quality, the complexity of the data, and other factors as hardware speed and available memory. 7. The Database Generation refers to the way in which the data is brought together and integrated with the rest of the survey system. This may mean using telecommunications, or simple computer tasks. 8. The Hardware selected is based on cost, amount of time available, data quality desired, power of the machine, amount of memory, and other available features. 9. Training is important in any survey, and the amount of time available and the background of the staff dictates the technology chosen. The priorities of these factors and the relationships between them help to decide which data collection strategy to use. A discussion of these factors with regards to CATI, CAPI, and other methods, follows. CATI Introduction In a CATI interview, the interviewer is helped by an interactive computer system. It provides data quickly and offers good reliability, but a substantial cost investment is required to purchase and set up the system. The cost investment may be greater than other electronic data collection techniques, but it saves money over face to face interviews, since data entry is combined with data collection. It also can be used for follow-up of nonrespondents or edit failures, or key in of mail questionnaires. It can be used in a household or establishment survey with complex questionnaires (typically a new or infrequent survey where time series interruptions will not cause problems, and where sample size is large, or small and used over a longer period). It can be operated in a centralized or decentralized manner, but it requires the respondent to have a telephone. Hardware: The first generation consisted mostly of mainframe based systems, but the current generation consists of either multiuser minicomputer systems, or distributed systems over a PC local area network (LAN). The minicomputers are often UNIX-based and 37 used mainly in large centralized facilities that require greater resources to pay for specialized support staff. The PC's are mostly DOS-based and are used in multi-location f acuities. An added benefit of PC,s (even in large facilities) is that many clusters of networks can be used, and PC's can be added one at a time (lower initial cost). Speed: With minicomputers, the speed between questions could slow as the number of interview stations increases, or if another computer intensive program is run. With PC's on a LAN, the speed between interviews could slow as more stations are added to the network. Eventually, faster computers will solve this problem. size: The organization of the system (centralized or decentralized) and the hardware (minicomputers or PC's) will affect size requirements. The system can range from a single stand-alone PC to 100 or more workstations on a mainframe system. The PC's and minicomputers usually have from 5 to 60 networked workstations. Portability: The software should run on multiple hardware platforms with different operating systems. It should be written in a portable language and use common user interface standards. Today, software costs are increasing while hardware costs are decreasing. Portable software should provide a cost savings across different hardware platforms. Displays: The use of color can aid the interviewer, but the Color Graphics Adaptor (CCA) standard is not clear enough for use over a long time. Either the non-composite monochrome, the higher resolution Extended Graphics Adaptor (EGA), or the very high resolution Video Graphics Array (VGA) standard should be used. However, EGA and VGA are more expensive. Data Entry: screens can be item based, screen based, form based, or a combination of these. Movement between items can be forward only, or forward and backward. Most systems have question skipping and branching capabilities, interviewer notes can be added, and the interviewer can resume at the point where the previous session ended. Data Verification: The data quality is improved by incorporating longitudinal (historical) editing, arithmetic calculations, range, and consistency checks. Database Generation: Outputs consist of an audit trail and response data. Often numeric and open ended data is stored separately, then linked by respondent number. Some systems include cross- tabulation capabilities, and the ability to generate accurate and timely reports is a benefit. 38 Training: one benefit is that centralized supervision and monitoring is available (on-line and audio-visual). It helps the supervisor identify interviewers who need more training. CAPI Introduction In CAPI, the equipment is less expensive than CATI, but travel costs are higher. It requires the same amount of time as personal interviews, but data quality is improved and the separate data entry step is deleted. One advantage of the personal interview setting is that it causes higher response rates. Hardware: The following criteria can be used to evaluate potential portable computers: interview duration and complexity, memory capacity, weight, power source,and duration, screen size and legibility, disk type and capacity, speed, serviceability (important because service centers might not be locally available), portability, durability, price, ease of use and software compatibility. Speed: The speed depends on the computer hardware and complexity of the questionnaire. Size: A larger portable computer would be needed to put a complex questionnaire in- 2 languages. Even a small portable computer is not necessarily portable as many have complained that they are too heavy to carry around for very long. Electrical outlets are not always available. The battery power required for additional memory and for disk drives can add substantially to the weight requirements. Although small portable computers can be used on a table top or in one's lap, interviews conducted on the doorstep require handheld computers. That technology is coming but has yet to arrive for general use. A smaller portable computer, or one with a different keyboard would be needed for this environment. Portability: As in CATI, the questionnaire writing software is often portable from one type of hardware to another. Displays: Different portable computers have different size screens with various readability factors. The various lighting conditions that would be met in the field is also a factor. For example, a "back light" screen is required for dim lighting conditions. If the interviews are conducted outdoors, glare reflection is a problem. Data Entry: often the software that was designed for CATI is also used for CAPI. It provides forward and backward movement, and incorporates skipping and branching between questions. 39 Data Verification: Similar to CATI, improved data quality results from reduced clerical and machine activities, and being able to incorporate various editing techniques. Database Generation: Data output can be consolidated more rapidly due to reduced clerical and machine activities. Data transmission options are mail, courier, or phone lines. Data security and the quality of phone lines may be a factor against using phone lines. Training: Basic interview skills are considered very important (even more so than computer knowledge) . With this assumption, training should on the computer and questionnaire details. Training materials can include a tutorial (helps coordinate the different learning rates), self study materials, and hands on practice with interviews. Good software and manuals are also important. CASI Data collection using TDE requires the respondent to have a touchtone telephone, and a dedicated computer with a multiple phone line capability at the other end. one benefit to the respondent is the convenience to call in at any time. Existing TDE systems limit editing primarily because of limits on hardware capacity, lack of visual clues and restriction to push buttons on the telephone. However, the computer can synthesize the answer and play it back to the respondent thereby providing the opportunity to verify or correct the answer. TDE offers lower cost than CATI (less labor and mail costs with key-entry costs born by the respondent), and the data quality is good. TDE has been able to retain very high response rates over long periods when coupled with appropriate nonresponse prompting. VRE again requires only a telephone and carries a cost profile similar to TDE. Surveys which use PDE require the respondent to have access to a microcomputer. Data can be entered using the keyboard or a file containing the data can be imported. Displays are typically an electronic image of the form on the screen. Error checking and other edits can be included, after which the data is transmitted back to the required agency where it is combined with other data. Computer - security issues are important here. Integrity checks to make sure the data received is the same as the data sent must be part of the system. Appropriate manuals and other training materials including on-line help should be provided. This type of data collection would be worthwhile in an establishment survey where respondents report data monthly, quarterly, or over a given period. 40 IV.D. System Interfaces for Data Conversion Introduction Automated submission of data has the benefit of reducing reporting errors because a keying step can be eliminated. Traditionally, respondents entered data onto paper forms which were mailed to central site where they were keyed into a computer system. With automated data submissions, intermediate keying steps can be eliminated. Automated data transmission requires hardware and software compatibility between the respondent site and the Federal site. In recent years the number and types of software and hardware options have greatly multiplied into the current myriad of products and technologies on the market. Due to these developments, Federal agencies are often looking at heterogeneous sources f or data transmission. Federal agencies conduct many surveys with many types of respondents. These data sources, such as state and local governments and businesses, will increasingly have capabilities for reporting data in an automated way. Many now have personal computers (PC's) while others have only mainframes available. Complexity arises as Federal agencies, looking at a mix of hardware and software technologies available at respondent sites, must select the best way to collect data from these heterogeneous sources. Planning for system Interfaces Managers of data collection projects can expect interface problems, but these problems can be minimized by good planning. Knowledge about the availability of communications capability, hardware, and software at respondent sites will aid managers in their planning for system interfaces for data collection. Communications Capability Perhaps the most important issue for system interfaces is communications. Communications may be thought of as networking or as linking technologies together. With networking capability, data can be transmitted across telephone lines or special private line arrangements such as local area networks (LAN's). See the section on Networks Planning in this report for a discussion of networking issues. A related issue is maintaining the confidentiality of data transmitted in such a manner. See the section on Computer Security in this report. 41 Hardware Hardware is needed at both the respondent site and the Federal site for data transfer. The type of hardware available at the respondent site will often decide what options the Federal survey managers will offer for submitting data. It may be necessary for the Federal site to have hardware for data conversion available, for example, hardware to read both 5 1/4 inch and 3 1/2 inch diskettes. Also, communications may need to be set up between hardware devices. The section on Hardware Planning in this report discusses these issues further. Three common types of hardware links are discussed below. Mainframe to Mainframe: Data can be transmitted from one mainframe to another via a communications network. Either the respondent or the Federal site can specify record layout and formatting instructions for data submission. Front-end processors can do data conversion before the data are sent to the host computer. Another option is submission of a computer tape in a specified format. PC to PC: A link between two PC's can be established using a network system. Another way to transmit data from one PC to another is to mail the data on diskette. The record layout and diskette format would be agreed upon by the respondent and the Federal site. Because diskette sizes vary, the Federal site may need conversion hardware and software to read diskettes of different sites. Another option is to provide software on a diskette to the respondents. Mainframe to PC: This type of hardware link combines the options described above. Again, a link can be established using a communications network. If the PC is at the respondent site, a diskette with software may be provided to set up the PC to send data over to the mainframe in the-appropriate format. Software Compatibility Although Federal survey managers usually cannot provide hardware to respondent sites to use for data transmission, they often can provide software for this purpose. If the respondent's software is used, the Federal site must have the same software or be able to convert the data to the correct format. Not only can different software products be incompatible, but two versions of the same software product can be incompatible. One version may have a higher level of functionality than the other. Again, there must be planning for document transfer. See the section on 42 Software Development in this report for more guidance on planning for software compatibility. 43 IV.E. Computer Security Introduction Computer security refers to the continued operation of computer applications at acceptable levels of risk to the organizations) being supported by the applications. Risk is usually measured in terms of potential loss, specifically losses that occur from: 1. Disclosure of information to unauthorized parties (i.e., loss of confidentiality), 2. Modification or other adverse actions that affect the expected quality of information (i.e., loss of integrity), and 3. Destruction or other adverse events that affect either the availability of the information when it is needed or the availability of the computer system to process that information (i.e., denial of service/loss of availability). The types of losses described above can result from accidental and intentional events, as well as from natural hazards. When estimating risk, it is important to consider direct losses (e.g., the cost to replace modified or destroyed information), as well as indirect losses (e.g., the inability of the organization to meet its mission which can lead to public embarrassment, congressional wrath, loss of lives, legal actions, competitive disadvantage, etc.). After estimates of risk are derived, it is necessary to select and implement cost-effective safeguards (e.g., physical, administrative, technical, management) to reduce these risks to acceptable levels. With respect to automated statistical surveys, the types of losses discussed above can occur during data entry from the respondent, during transmission of the survey information to the host computer system, and within the host system. While the ideas discussed below are generally applicable to all of the survey types addressed in Section III of this report, this section will focus on surveys collected through or with the use of a computer where the following occurs: 1. Data entry using a terminal or computer system to collect the response information (i.e., not directly applicable to response information collected over the telephone). The data entry Process may "batch" the respondent's information for later transmission to the host computer for processing or may have the respondent connected 44 directly to the host system where the survey data is being captured in real-time (and may be processed in real-time). 2. Transmission of the response information over telecommunications lines/circuits, including future ISDN networks discussed above, and transmission on magnetic media (e.g., floppy disk) through public and private mail delivery services, and 3. Receipt and processing of the survey information by a host computer system. Problem Areas Data Entry During the data entry process, the following issues need to be addressed with respect to computer security. Identification and Authentication: Respondents and other users of computer systems that are used to collect survey information must be positively identified and authenticated to assure the validity of the survey and to hold users accountable for their accidental or intentional actions. While passwords are still the most widely used method of authenticating the users claim of identity, other methods such as biometrics and smartcards can be used when increased protection is desired--usually at increased cost. Passwords can be effective for authentication when used in accordance with FIPS 112, Password Usage Standard. Access Control: Access to information on computer systems should be strictly controlled so that users only have access to information they are authorized to see or change. Most commercial computer systems provide mechanisms that support this function. Systems that appear on the National Computer Security Center's Evaluated Products, List contain operating system level access controls that provide protection from unauthorized disclosure of information. Access controls are important on multi-user systems that are used to collect survey data in order to prevent the survey data from being intentionally or accidentally read, modified or destroyed. Accountability: Unless computer systems contain mechanisms for recording and analyzing users, computer security relevant actions, it will not be possible to hold users accountable for actions that cause computer-related losses. When users know that a computer system has an effective audit trail collection and processing mechanism, they are less likely to make mistakes or to attempt unauthorized access to information for fear of being caught. When survey data is collected on systems that provide 45 accountability mechanisms, it will be easier to determine if the survey data have been tampered with or have been disclosed to unauthorized users. Confidentiality: Besides access controls discussed above for preventing survey data from being disclosed to unauthorized individuals, cryptography can be used to protect data while it is being stored in a computer system or on other magnetic media such as floppy disk or magnetic tape. FIPS 46, Data Encryption Standard (DES), defines the only government-wide standard for encrypting and decrypting unclassified computer data. Since the DES has also been widely accepted by the commercial sector, there are many off-the- shelf-products that can be purchased for implementing DES cryptographic protection. Integrity: During data entry, the integrity of survey data can be affected by entering false/inaccurate data or by modifying data already entered. Approaches for addressing these issues include; 1. Editing through the use of error detecting- or correcting software that determines reasonableness of input data with respect to any number of criteria such as character composition of data input, numerical bounds checks, data dependent checks on previously entered data, etc. 2. Access control (see above) that prevents unauthorized users from gaining access to the survey data 3. Cryptographic check sum as defined in FIPS 113, Data Authentication Standard that places a cryptographic "seal" on the survey data for the purpose of detecting modification of the survey data from some initial state. This technique is useful when the survey data is stored-in computer memory or on magnetic media such as floppy disk or magnetic tape. 4. Accountability is the primary method for detecting modification to survey data by individuals who ARE AUTHORIZED (i.e., access controls do not apply) to access the data. While effective against both accidental and intentional modification, authorized users that intentionally modify data can subvert accountability controls if they have a high degree of technical knowledge about the computer system. 5. Software-engineering assurance techniques should be used in developing the data entry and other system 46 software to preclude errors from being introduced into the survey data through faulty software. Restart/Backup/Recovery: It is necessary to plan for restart/backup/recovery activities whenever the data entry process is interrupted or the survey data is destroyed. Techniques such as maintaining backup files, permitting restart points in the data entry process, and planning for an alternative data entry processing capability are all directed at maintaining continuity in the data entry process. Transmission During transmission, the respondent's survey data are sent from the data survey system to the host system that will process the survey data. While authentication applies primarily to transmission of survey data through telecommunications networks, confidentiality and integrity techniques are applicable to telecommunications networks and mail delivery of magnetic media. Authentication of host computers (e.g., the host computer of the data entry system) to the transmission network is required by and provided for most telecommunications networks to prevent unauthorized use of the network and to facilitate billing for network services. Sometimes, depending,on the sensitivity of the survey data, it might be necessary to have the transmission network authenticate itself to the data entry host system before sending such data over the network. In this way, the data entry system can be sure that the survey information is being sent over the actual network rather than being given to an intruder that is spoofing the data entry system into giving the intruder the survey data. If the network lacks capability for authenticating itself, then techniques used for confidentiality and integrity described below may be considered as alternative methods of protection. Confidentiality: The most common technique for preventing disclosure of information within transmission networks is to use cryptography. As discussed above, the DES is the only government-wide standard for encrypting and decrypting unclassified computer data. Integrity: integrity with regard to transmission of survey data is the assurance that the survey data has not been altered, either accidentally or intentionally, during the transmission process. Cryptographic checksum techniques, as described above in the section on Data Entry Integrity, are effective in providing this protection. Availability/Reliability of Network Services: Sometimes, particularly in real-time data collection and transmission, continuity of the transmission service can be very important to the 47 success of the survey activity. Discontinuities due to the unavailability of the network or some of its intermediate nodes or due to noise in the transmission lines can result in survey data being lost, erroneous, or delayed. This could be particularly annoying to a-respondent that has to keep repeating the survey data entry process or is unnecessarily prompted for nonresponse. it is possible to minimize such problems by using networks that provide error detecting/correcting procedures, dynamic routing around unavailable nodes, and other services that assure network availability and reliability. Host Computer System Computer security concerns at the host computer are similar to those at the data entry computer. The reader should refer back to these discussions to supplement the material contained in the corresponding areas below. Identification and Authentication: All users of the host system, including the respondent data entry system, should be required to identify and authenticate themselves to the host system to assure the validity of the survey and to hold users accountable for their accidental or intentional actions. The same authentication techniques that were discussed for the data entry system apply to the host system. Access Control: Access to information on the host systems should be strictly controlled so that users only have access to information they are authorized to see or change; in particular only authorized users should be permitted to access survey data on the host system. Accountability: The host computer system should contain mechanisms for recording and analyzing users, computer security relevant actions in order to hold users accountable for actions that cause computer-related losses, particularly losses to the survey data. Confidentiality: Besides access controls discussed above for preventing survey data from being read by unauthorized individuals, cryptography can be used to protect data while it is being stored in the host system or on other magnetic media such as a floppy disk or magnetic tape. As with the data entry system, the DES should be used for this purpose. Integrity: on the host computer, the integrity of survey data can be affected-by entering false/inaccurate data during the data 48 entry process or by modifying data already entered. Approaches f or addressing these issues include; - editing - access control - Cryptographic check sum - accountability - software engineering/assurance techniques Restart/Backup/Recovery: This is necessary when the host computer system's processing is interrupted or the survey data is destroyed. Techniques such as maintaining backup files, permitting restart points in the host's processing sequence, and planning for an alternative host processing capability are all directed at maintaining continuity in the host' s processing of the survey data. 49 IV. P. Hardware Planning Introduction Hardware issues are related to the type of Computer Assisted Statistical Survey system and the particular software to be used. The adage that says to "choose the software first and then the hardware" may be accurate if the software is already available. if software needs to be developed, however, it may be better to settle the main hardware issues first. Hardware issues may be divided into the types of hardware needed and the criteria used for selecting products. We will explore these issues for current and forthcoming products. Current Hardware - General Issues There are certain hardware issues that arise no matter what. the application. They may be categorized into ergonomic, performance, capacity, and cost issues. Ergonomic issues include keyboard layout and touch (a tactile response reduces input errors), screen visibility and readability, and adjustability of the computer. Performance and capacity can usually be improved only at higher cost. However, if the hardware is optimally designed for the application in mind, no higher-cost may be incurred. For example, performance can be further divided into CPU and I/O speed. It may suffice to maximize only CPU or I/O speed. Software techniques also may be employed to improve performance: use a RAM disk for files that are frequently accessed, delay I/O operations until they can be more conveniently done, and use machine language routines for CPU intensive operations. Core memory requirements are driven by software needs. The main question is whether the DOS RAM address space of 1 megabyte is sufficient or not. If it isn't, various options are available. By swapping pages of memory in and out as needed, the address space can be expanded. Note that extra memory is not usable without a software driver. Respondent Data Entry If respondents will be using their computers, try to find out as much as possible about the machines they have. Respondents may not have access to a personal computer (PC) even in a large company. For example, an accounting department may have a mainframe, but not a PC. IBM-compatible computers are the most common in the business world, but they may be earlier models. Software that respondents will be using should be tested on minimal 50 hardware configurations. Don't assume that respondents have extended or expanded, memory. A hard disk probably can be assumed. The 5 1/4" diskettes are now the most common but the new 3 1/2" diskettes are coming into use. Capability for reading either type would be helpful. There is a compatibility problem between 5 1/4" high density (1.2 megabyte) disk drives and lower density drives. The latter cannot always read disks formatted by the former even at lower densities. Also, writing high density data on a lower density disk can corrupt the contents. CATI Computers for CATI must support interactive processing, e.g., a multi-user mini-computer or a PC network. Speed is the most important factor. The time from entry of one item to display of the next should be less than two seconds. To minimize data transfer problems, the system used for data entry should be the same or compatible with the one used for subsequent processing. CAPI The main criteria for CAPI computers are screen readability, speed, and weight. Many portable computers are too heavy and awkward to carry around. A truly portable computer is necessary. While the lightest portable computers now weigh 4 to 7 pounds, the screens on these machines may not 'be good enough. Full-sized screens with good visibility require extra battery power that implies a total weight of about 10 lbs. Screen visibility and readability have come a long way. Many types of screens are available: cathode-ray tube (CRT), liquid crystal display (LCD) backlit supertwist LCD, gas plasma, DC plasma, and electroluminescent display. Quality varies so much from vendor to vendor and within each type that it is difficult to make generalizations. Factors to judge include screen contrast, resolution, blur when scrolling, size, adjustability, and power consumption. The screen should be tested in environments that approximate actual interview conditions such as dim lighting. Good performance is now available, but the cost can be high. The 3 1/2" diskettes are used on portable computers; their smaller size and harder cover make them preferable. The carrying case should protect the computer if it is dropped or banged. it also should have a government emblem or insignia to identify the interviewer. The battery charge on a portable computer may last up to four hours, but some models portable battery packs that can be inserted as needed. Respondents might allow the use of their AC outlets. 51 A low battery indicator is helpful; nickel-cadmium batteries should not be recharged before power runs out. A car battery adapter is useful on the road. CASI Touchtone data entry (TDE) and voice recognition entry (VRE) require special hardware cards and sufficiently powerful computers. The current BLS TDE configuration uses a 286 PC with 640k RAM. One PC can support many phone lines. BLS estimates that for a survey with 1.5 minute calls received during a 2 week collection period, one phone line is needed for every 500 respondents so that during peak collection periods respondents will get a busy signal less than 5% of the time. Facsimile (FAX) transmission requires a hardware card or a separate FAX machine. There are machines that combine Fax, image scanning, laser printing, and photocopying. Telecommunication usually means analog transmission over phone lines. Digital computers must have a way of sending and receiving analog signals; the device that handles this is called a modem. The main distinction between different modems is the speed of transmission. Bits per second (also erroneously called baud) rates of 1200 and 2400 are the most common while 300 and 9600 Are also used. As a rule of thumb, about one byte is transmitted per 10 bits because of parity and stop bits. Therefore, sending and receiving a large data set can take a long time. Software should have error checking capabilities. Future Hardware Besides general technology trends (smaller, faster, less expensive, more capable machines), a few specific observations can be made. International standards are taking on a new importance. Standards committees are no longer just reacting to de facto market standards but are taking the lead before products are developed. Compatibility and interconnectivity with other products are often as important as the capabilities of a product itself. The future for portable computers is bright. Color screens and more memory and disk space are going into smaller and lighter machines. Handheld PC's are starting to appear. computers the size of today's miniature calculators are not far off. Cellular telephones will be combined with portable computers. Peripherals such as printers' are becoming more portable. Electronic Data Interchange (EDI) is changing business practices by automating orders, invoices, etc. As this becomes more widespread, surveys could be designed to "piggyback" onto EDI 52 to take advantage of the systems already in place. Wide area computer networks with electronic mail are becoming more like public utilities. Developments in digital- telecommunications (e.g., the Integrated Digital Services Network, or ISDN) will have many hardware implications -- see Network Planning. Modems will no longer be necessary because the entire path from computer to computer will be digital. Data transfer rates will be much faster. Optical and optical-electronic technologies are dramatically increasing data storage capacities. High definition television (HDTV) and digital video interactive (DVI) will intensify graphic applications. Improved optical character recognition (OCR) will help the transition from paper to completely electronic representation. 53 IV.G Networks Introduction The computer revolution has come upon us in a series of waves: the first computers transformed the speed of computation by several orders of magnitude; improved technology provided computer access to large organizations; personal computers provided computers to everyone; and the relative recent introduction of computer networks created the information community which has brought information to everyone. Networks have made possible the development of information utilities that serve the entire spectrum of the human community, providing services from computer games to newspapers for anyone owning a personal computer. The pervasiveness of these information services enables survey information to be collected locally and transmitted directly to a center processing utility. The Arpanet developed by the Department of Defense was the first widespread network to join researchers, system developers, and administrators into an information community. Although electronic mail or E-mail was the immediate gain from this network, the ability to transfer files of data, to access remote databases, and to use the computing services of a geographically remote computer showed the real value of a network. Access to computer networks by the public has increased dramatically as the network cost for an individual has dropped to the cost of a local phone call. Some commercial services cost less than a monthly phone bill for unlimited access. A new network technology is about to transform our ability to use the distributed processing systems available on a network by dramatically increasing the amount of data that can pass over these networks. Data Collection Networks will have a profound effect on data collection. They will provide the opportunity for close contact between the interviewer and the respondent. For example, CATI provides limited voice interaction over a telephone. Networks will provide visual and audio interaction with television or computer screens. They will enable the interviewer to display previously collected data to the respondent, and to use graphical diagrams and pictures to convey the conceptual background to questions. Moreover, it will provide the opportunity for more frequent updates to survey information that will match the data requirements rather than the economical constraints. High-speed networks will put interviewers in closer contact with experts who can resolve troublesome issues while a survey is 54 being conducted. For example, CAPI interviewers do not have immediate access to their supervisors. With high bandwidth networks, the CAPI interviewer can contact a supervisor in much the same way as a CATI interviewer. The net result should be greater interaction and reduced costs as the network bandwidth increases by an order of magnitude over the next decade. Background There has been a separate, and independent, evolution of networks in this century for the transport of voice and data. The classical voice network was based on the telephone handset that converts speech into electrical signals which are transported over the local loop via a twisted pair of copper wires to a telephone system end-office. Traditionally, the signalling involved has been analog (the transported signal varies continuously in time) and the communication link established between two telephone handsets has been termed an analog voice transmission circuit. The human ear is an extremely good filter, and has permitted analog voice circuits to be established in which the analog voice signal was noisy. A good ear and contextual information made it possible to understand the communication. As the separation between two handsets engaged in an analog voice communication link increased, the electrical signals required amplification for continued distribution. Such amplifiers are often called repeaters and they had the unfortunate characteristic of amplifying both the noise and the electrical voice signal being transmitted. Consequently, it was very difficult to remove. specific noise components from the analog voice signal. The analog telephone handset is connected to a local exchange or end-office. This is nothing more than a local switch that is in turn connected to a trunk exchange. This trunk exchange, in North America, is a five level hierarchical arrangement of switches for routing telephone calls. It forms a circuit switched network that is connected to an international access exchange and provides the capability of global voice telephone communications. The network described here was still made up of twisted-wire copper pairs in the local loop and electromechanical switches that performed routing of the voice calls until 1966. With the appearance of very large scale integrated (VLSI) technology -- the computer on a, chip, the network switches evolved into electronic switching systems. The intelligence in the switches allowed the established transmission fabric to be rendered more cost- effective by simplifying maintenance through higher reliability features and better strategically planned network maintenance. As the employment of more sophisticated electronics 55 was accelerated in the switching matrix, the conversion of analog voice signals to purely digital signals led to the appearance of dedicated digital networks. While the handset in most installations remains analog, the local switch to which the handset is attached performs an analog to digital conversion of the initial voice signal. From there the signal is entirely digital. The digital transmission networks that are dedicated to voice and data are called Integrated Digital Networks (IDNs). Standards have now emerged internationally using the guidelines of Consulting Committee for International Telephony and Telegraphy (CCITT) - The digital transmission systems are rapidly evolving toward IDNs that are interoperable and make use of the intelligence associated with each network switch that is digital because each such switch may be regarded as a generalized computer. In making distinctions between data applications and voice applications using a modern IDN, networks used for data applications can be characterized according to the activities of the terminals on the network: 1. Start-stop terminals are used to generate interactive data traffic to and from the computer. This traffic-tends to be low speed with occasional bursts as the computer responds to an interactive request for a specific file to be transferred. 2. Batch data transfers and data display image transfers that occur as bursts of data that can be placed on the network. 3. Continuous data traffic that is typically carried by circuit switched IDNs at data rates from 2.4 to 64 Kbits/sec (thousands of bits per second) . The data traffic within the network is often combined from separate low data rate bit streams, and interleaved into a single 64 Kbits/sec data channel for transmission across the network. A packet switched network decomposes a digital message into smaller chunks of bits (typically 1008 bits or 2000 bits) and routes these chunks, called packets, through the network from a source to a destination on an end-to-end basis. Current Network Systems The modern communications environment may be regarded as made up of three basic functional blocks: 1. User terminals that support a human interface with the network. They allow a human to interact with 56 another user terminal or a computer connected via the network. 2. A communications network that is transparent to the user and provides conventional information transfer capabilities. 3. Information service centers that provide computing functions at the center. Network systems breakdown conceptually into Local Area Networks (LAN) and Wide Area Networks (WAN). The IEEE definition for a LAN is a "data communication system that allows a number of independent devices to communicate with each other." A WAN is one that covers a much larger area (e.g., nationwide or worldwide) , and has one or more computer nodes that are central to the operation of the network. These specialized computer nodes support the routing -- storing and forwarding -- of packets of information. The simplicity of Local Area Networks makes them useful for specialized applications within a small organization. They can continue to operate with some of their devices broken or down because any one unit does not affect the operational status of others. Moreover, LANs promote and extend a cooperative work environment for both people and machines. When discussing LANS, an understanding of the following terms is important: Centralized main or host computer that does all data processing; Distributed some remote computers do their own processing; Gateway hardware and software for two technologically different networks to communicate with each other; Bridge linking two technically similar networks to one another; Servers network peripherals that support specialized use by the entire network community, e.g., file storage servers and printers. These elements make up a multilayered communications facility that represents a multitude of telecommunications networks that must interoperate on both a national and a global scale. Because telecommunications have developed in different ways in various foreign countries, there has been a continuing pressure for standards and for the cooperation of all countries in the efforts of the (CCITT). This overall international telecommunications 57 environment supports a communications arrangement that may be logically segmented into: 1. A public communications network layer. The public network (at least, in the United States) is required to provide uniform service of good quality and on an equal access basis. It must permit uniform management of the network across the nation, and it must exhibit acceptable reliability characteristics to the public user. The regional Bell operating companies provide local public telephone service. 2. A business communications network layer. In this category, the communications structure is privately owned and operated. There are a multiplicity of these proprietary networks developed by private companies to reduce the communications costs to corporations. Tymnet, AT&T; and the regional holding companies assist corporations in building such private structures. Private networking will most likely increase in the future, but it may be implemented as virtual private circuits using the intelligent digital networks (IDNS) of the 1990's. 3. A business distribution network layer. This type of network transmits from one site and is received by many sites. Cable TV and the broadcasting of commercial television shows are examples. The evolution of the IDNS must support the following operational characteristics of the local and national telecommunications system: 1. The current arrangement of public telephone networks and packet switched networks does not support the simultaneous operation of voice and data services. The simultaneous transmission of speech, data, telemetry and signalling will be natural in future IDN networks. 2. The message content must be transparent to the various- services employed by the network. 3. The embedded base of existing network equipment must be accessible- by the evolving IDN. Such things as classical two-wire telephony must be supported. 4. The security and privacy of information must be available for all users of the network. 58 5. The appropriate levels of network management for handling accounting, performance, configuration control, reliability and security of information must be available on the network. Planned Systems The ultimate evolution of the current intelligent digital network is the Integrated Services Digital Network (ISDN) which has been emerging in the industrialized nations for the last ten years. It is a technology that ultimately will place end-to-end digital signalling capability throughout the network. It has been slowed because of two major factors: 1. The lack of standardization between vendors of transmission equipment within the United States and Canada, as well as widely divergent option selections that are specified by CCITT in terms of its so-called ISDN standard reference model. This latter situation has resulted in the inability of the Postal, Telephone and Telegraph Agencies of various nations including the United States to establish ISDN environments that could exchange information. The ability to exchange information is called interoperability. Two ISDN networks that can exchange information transparent to two end users, one on each network, are said to be able to interwork. 2. The enormous established base of analog switching equipment. This base is measured in the 10's of billions of American dollars and represents an investment by service providers such as AT&T; and end user organizations that cannot simply be replaced in a short period. The United States government through the Brooks Act of 1987 has mandated that all agencies of the government must move to a common communication backbone that is to be an ISDN environment as soon as acceptable standards can be put in place. The National Institute of Standards and Technology (NIST) has been actively pursuing the realization of standards since February, 1988. The General Services Administration (GSA) with the awarding of the FTS2000 contract to AT&T; and Sprint is now working to develop an ISDN migration-plan that will be acceptable to all government agencies. This plan may have to proceed on an agency-by-agency basis because different agencies will have unique problems in their telecommunications environment. The result is to be an intelligent network that will offer many services using digital signalling, and that will provide individual users with. an extremely friendly 59 interface with their ISDN workstations (i.e., handsets, PCI's, integrated voice, data, and video consoles). To the user, the ISDN environment appears as a highly intelligent network in which, aside from the network access points, no clear distinction can be made as to where their personal computer or mainframe ends and the network begins - in a sense, the computer becomes a part of the network and the network appears as a geographically dispersed computing environment. In essence, the intelligence that resides in the individual switching machines is made available to the users of the network as a menu of services which can enhance the capability of the user to do a variety of functions. In an attempt to capture the needs of the user, NIST and the industrial telecommunications community created the North American ISDN User Forum in the Spring of 1988. This forum has been generating user applications for ISDN. As of June 1989, 81 applications had been cataloged. Because of the high-level of intelligence invested in the ISDN environment, such concerns as user authentication at both the sending and receiving ends, end-to-end integrity of a message, and security of the information sent, can be dealt with by the network in a manner transparent to the user. It must be recognized that the ISDN environment is a multimedia services facility that allows end-to-end transport of voice, data or slow-scan video. Facsimile (FAX) transmission is also part of this media mix. The current ISDN implementations in North America can support a maximum bit rate per channel of 64 Kbits/sec. This is called narrowband-ISDN. A separate standardization process is also taking place in North America and around the world. It is called broadband-ISDN with anticipated bit rates more than of 600 Mbits/sec, an increase by a factor of roughly 10,000 over narrowband-ISDN. This network will provide services with an ultimate impact on the business and commercial customers of North America that will be larger than all the capabilities now associated with narrowband-ISDN. The usage of broadband-ISDN, in conjunction with rewiring the North American continent with fiberoptic circuits, will revolutionize information processing. With the emergence of a single, seamless ISDN communications fabric the proliferation of private networks should be greatly reduced in both private industry and the Federal Government. This should substantially reduce the costs, of network operations, administration and maintenance. in particular, one governmental agency has estimated an annual cost savings of $7 million in lust moving to an ISDN environment in terms of the reduction of network management charges. These savings do not address the potential increases in productivity through the acquisition of the new user services provided by an ISDN facility. The NIU-Forum is considering the cost-benefit concerns of organizations they move to a fully-ISDN equipped telecommunications environment. This work 60 helps the unsophisticated user to use the intelligent network to carry out well-defined functions such as efficient data collection. A further aspect of an ISDN environment is that the network could act as a highly intelligent protocol converter. In a sense, it could function as a concurrent multiple gateway between many different types of data networks. Uploading and downloading of data would be taken care of automatically and in a manner transparent to the users. Verification of the data sent on an end-to-end basis also would be done automatically by the network. In a multi-media environment media conversions (voice-to-datal data-to-image, image-to-data, data-to- voice, and image-to-voice) also could be done by the ISDN facility. The key here is the high intelligence of network, and the transparency of the ISDN operations to its attached user community. 61 V. REFERENCES A. CATI Curry, Joseph; "computer Assisted Telephone Interviewing: Technology and Organization management"; Sawtooth Software; June 17, 1987. Groves, Robert M.; editor et al; Telephone Survey Methodology; John Wiley & Sons; 1988. Nicholls, William L.; "The Impact of High Technology on Data Collection"; CATI Research Report No. GEN-1; Bureau of the Census; February 24, 1989. Werking, George; Tupek, Alan; and Clayton, Richard; "CATI and Touchtone Self-Response Applications for Establishment Surveys"; Journal of official Statistics; Vol 4; No. 4; 1988; pp 349-362. B. CAPI Danielsson, L.; and Maarstad, P.A.; "Statistical Data Collection with Handheld Computers - A Test in Computer Price Index"; Unpublished report of Statistics Sweden; Orebro, Sweden; 1982. National Center for Health Statistics; "Report of the 1987 Automated National Health Interview Survey Feasibility study - An Investigation of CAPI": November, 1988. National Center for Health Statistics and Bureau of Census; "Report of the 1987 Automated National Health Interview Survey Feasibility Study, An Investigation of Computer Assisted Personal Interviewing"; U.S. Department of Health and Human Services; National Center for Health Statistics; November, 1988. Netherlands Central Bureau of Statistics; "Automation in Survey Processing"; Select Report 4; Central Bureau of Statistics; Voorburg, Netherlands; 1987. Nicholls, William L.; "The Impact of High Technology on Data Collection"; CATI Research Report Number GEN-1; U.S. Department of Commerce; Bureau of Census;, February 24, 1989. Rice Jr., Stewart C.; Wright, Robert A.; and Rowe, Ben; "Development of. Computer Assisted, Personal Interview for the National Health Interview Survey 1987"; Proceeding of the Survey Research Methods Section, American Statistical Association; 1988. Rothchild, Beth B.; and Wilson, Lucy B.; "Nationwide Food consumption survey 1987: A Landmark Personal Interview Survey 63 Using Laptop Computers"; Proceedings of the Bureau of the Census Fourth Annual Research Conference; pp 347-356; U.S. Department of Commerce; Bureau of the Census; 1988. Sebestik, Jutta; Zelon, Harvey; DeWitt, Dale; O'Reilly, James M.; and McGowan, Kevin; "Initial Experiences with CAPI"; Proceedings of the Bureau of the Census Fourth Annual Research; pp 357365; U.S. Department of Commerce; Bureau of Census; 1988. van Bastelaer, Alois; Kessemakers, Frans; and Sikkel, Dirk; "Data collection with Hand-Held Computers: Contributions to Questionnaire Design"; Journal of official Statistics; Vol.4; No. 2; pp 141-154; 1988. C. CASI Clayton, Richard, L.; and Winter, Debbie L.S.; Voice Recognition and Voice Response Applications for Data Collection in a Federal/State Establishment Survey"; Official Proceedings of Military and Government Speech Tech '89, Media Dimensions; November, 1989 Ponikowski, Chester; and Meily, Sue; - "Use of Touchtone Recognition Technology in Establishment Survey Data Collection"; Presented at the First Annual Field Technologies Conference, St. Petersburg, Florida; 1988. Werking, George; Tupek, Alan; and Clayton, Richard; "CATI and Touchtone Self-Response Applications f or Establishment Surveys"; Journal of Official Statistics; Vol 4; No. 4; 1988; pp 349-362. D. H -machine interfaces Card, S.K.; Moran, T. P.; and Newell, A.; The Psycholocry Computer Lawrence Erlbaum Associates; Hillsdale, NJ; 1983. Conklin, Jeff ; "Hypertext: An Introduction and -survey"; IEEE Computer; pp 17-41; Sept, 1987. Draper, Norman D.; User Centered System Design; Lawrence Erlbaun Associates; Hillsdale, NJ; 1986. Hartson, H.R. (ed); Advances in Human-Computer Interaction; Ablex Publishing Co.; Norwood, NJ; 1985. Myers, Brad A.; Creating User Interfaces by Demon on; Academic Press; San Diego, CA; 1988. 64 Shneiderman, Ben; Designing the User Interface; Addison-Wesley; Reading, MA; 1987. Shu, Nam; Visual Programming; Van Nostrand; New York, NY; 1988. E. Computer Security Department of Defense; Trusted Computer System Evaluation Criteria; DoD 5200.28-STD; 1985. Federal Information Processing Standards Publication (FIPS PUB) 39; Glossary for Computer Systems Security; February, 1976. Federal Information Processing Standards Publication (FIPS PUB) 461; Data Encryption Standard; January, 1988. Federal Information Processing Standards Publication (FIPS PUB) 73; Guidelines for Security of Computer Applications; June, 1980. Federal Information Processing Standards Publication (FIPS PUB) 112; Standard on Password Usage; May, 1985. Federal Information Processing Standards Publication (FIPS PUB) 113; Standard on Computer Data Authentication; may, 1985. Gasser, Morrie; Building a Secure-Computer System; van Nostrand Reinhold; New York; 1988. National institute of Standards and Technology Publication List 91; Computer Security Publications; January, 1988. Pfleeger, Charles P.; Security in Computing; Prentice Hall; New Jersey; 1989. P. Networks Arni, D.; "Standards in Process: Foundations and Profiles of ISDN and OSI Studiest"; National Telecommunications and Information Administration; Report 84-170; U.S. Department of Commerce; Washington, DC; December, 1984. Browne, T.; "Network of the Future"; Proceedings of the IEEE; September, 1986. Lutchford, J.; "CCITT Recommendations on the ISDN: A Review"; IEEE Journal on Selected Areas in Communications; May, 1986. Madron, Thomas W.; Local Area Networks: The Second Generation; John Wiley and Sons; 1988. 65 Stallings, W.; Handbook of Commuter-Communications Standards, Volume 1: The Open System Interconnection (OSI) Model and OSI-Related Standards; MacMillan; New York; 1987. Stallings, W.; ISDN: An Introduction; MacMillan; New York; 1989. U.S. Department of Commerce; "NTIA TELECOM 2000: Charting the Course for a New Century"; National Telecommunications and Information Administration; NTIA Special Publication 89-21; U.S. Department of Commerce, Washington, DC; October, 1988. G. Applications Clayton, Richard L.; and Harrell, Louis J., Jr.; "Developing a cost Model for Alternative Data Collection, Methods: Mail, CATI, and TDE"; ASA Proceedings of the Section of Survey Research Methods., 1989. Energy Information Administration; "PEDRO - Respondent User Guide to the Petroleum Electronic Data Reporting Option"; Version 3.0; February 3, 1989. Groves, Robert M; Survey Errors and Survey costs; John Wiley and Sons, New York, 1989. Statistical Policy Working Paper 15; "Quality in Establishment Surveys"; Office of Management and Budget; July, 1988. H. Standards National institute of Standards and Technology Publication List 58; Federal information Processing Standards publications; June, 1989. 66 VI. Appendices Appendix VI.A. Costs Introduction The choice of a collection method is usually based on a combination of performance and cost factors. For traditional methods, these factors are easily identified and the selection of a collection mods is not difficult. With recent technological advances, new methods described in this report expand the array of potential collection tools and challenge the survey designer to reevaluate old cost and performance assumptions. The decision of which method or methods to use is now more difficult. This section reviews the structure of costs in the data collection function covering several collection methods including mail, CATI, CAPI, TDE and VRE. It also briefly describes the impact of automated collection on costs, particularly versus mail operations. This profile of costs is limited to data collection; ,considerations of impact on sample design, questionnaire-changes, edits, and other issues are excluded. Collection Methods Defined CATI: The application of CATI is usually considered to address timeliness and other quality problems. The computer assists by automatically controlling questionnaire branching, conducting on-line editing for reconciliation directly with the respondent, scheduling future calls and capturing a variety of management information about the interview. Thus, most data collection activities are conducted through the CATI system. The use of CATI generally vastly reduces or eliminates routine mail handling activities and postage costs. CATI adds new costs in equipment purchase and replacement and telephone charges. CAPI: This method extends the benefits of controlled branching and on-line edit reconciliation to improve the quality of data collected by personal interviewing. in surveys already using personal visit collection, CAPI adds direct costs of computer hardware for each data collector and software design and maintenance. Self-response -- Prepared Data Entry: By offering Prepared Data Entry to respondents, the collecting agency adds the costs of software design and maintenance, and possibly the costs of telephone charges for electronic transmission of the completed questionnaire. Self-response -- Touchtone Data Entry and voice Recognition Entry: These methods include many of the same sample monitoring 67 features of CATI and eliminate many of the labor-intensive activities associated with the traditional mail methods. TDE and VRE methods are currently used as a replacement of mail collection. By comparison, the regular mail handling to and from respondents is reduced to a single postcard to remind the respondent that it is time to call in their data. TDE and VRE further reduce manual operations by transferring key entry to the respondent. Short nonresponse calls may be employed to remind respondents to call in their data as publication deadlines approach. while reducing labor costs, TDE and VRE involve added costs for computer hardware and software development and maintenance. Cost Model The data collection function is the series of activities that follow sample selection and precede estimation. Data collection is comprised of a series of activities for capturing the data, converting the data to machine-readable form, performing editing and edit reconciliation, and follow-up for nonresponse. The conduct of these activities varies greatly under mail, CATI, CAPI and self-response modes (PDE, TDE and VRE). Major recurring cost categories for these collection modes are outlined in Table 1. Table 1. major Recurring Cost Categories by Collection Mode Major Cost Categories Mail CATI CAPI PDE TDE VRE LABOR mail out x x x x mail return x x data entry x x x edit reconciliation x x x x x nonresponse follow-up x x x x x software development/maint. x x x x x interviewer training x x x NON-LABOR postage x x x x telephones x x x computer hardware x x x x travel x The cost categories presented in Table 1 can be used to evaluate the coats of other collection methods. By comparing the activities of the alternative method to the current method, a rough determination of affordability can be made. Detailed cost studies would be necessary for each specific survey application. 68 Assumptions Realistic assumptions are a vital part of an analysis of costs. Several assumptions should generally be made about the level of workload and equipment requirements. These may include the number of units per CATI interviewer during normal collection period, and the number of minutes per interview. The TDE cost assumptions include the length of the average call, effects of peak calling periods, the number of incoming lines per TDE board, and the average proportion of units receiving nonresponse prompting actions. Also, the number of boards that can be placed in the microcomputer should be included. The following factors, independent of collection mode, should be included in the model: salaries and benefits, administrative overhead allocations, standard non-personnel services, postage, amortization of computer hardware to cover replacement, and telephone charges, including fixed monthly line charges and variable call costs. The following factors are generally difficult to quantify and often cannot be treated equally for all methods: start up costs for research and development, ongoing systems design and maintenance, training, and emergency back-up features for CATI and TDE. Other Important Considerations Critical decisions concerning changes in the data collection methods are not made solely on costs; there are many other considerations to include in these decisions. Organizational Impact: The design of an effective production environment is essential to timely, ongoing output of data. For example, the success of CATI and TDE in compressing the collection period may pose peak period staffing problems. Also, the cost model assumes the managers can perfectly capture and reallocate resources as collection methods change. For example, TDE eliminates key entry. The costs are only truly saved if these resources can be captured and reinvested in new equipment and telephone charges, and with remaining savings redirected toward improving the quality of other survey functions. Also, it is assumed that postage savings also are identifiable and may be similarly captured and redirected. Staffing for Research and Development: The development of new techniques usually requires a small staff dedicated to achieving the change desired. Also, this staff must have a variety of skills, including economics, statistics, methods test design, computer systems design, questionnaire development, and analytical, writing, and presentation skills. This combination of individuals may be difficult to identify and remove from ongoing production 69 tasks. Given the frequency of new issues and problems, this group may require special attention from management and latitude in trying creative approaches to solving the wide range of problems that will inevitably arise in development efforts. Systems Design, Programming, and Maintenance: There are significant start up costs, although these can be easily amortized over large, recurring surveys. These costs will vary with the complexity of the application and the experience of the development staff. Ongoing maintenance depends on the frequency and magnitude of the changes. Training: Training requirements for staff to maintain manual operations, such as would be needed under mail, are small. Under CATI, a broader range of skills is required, including telephone communications skills and some working knowledge of the computer. The TDE system requires little special knowledge, keeping costs low. Emergency Procedures: As we increasingly rely on technology to do work for us, we are increasingly at risk when it fails. All implementation approaches should include back-up procedures and equipment at appropriate locations to ensure uninterrupted service to respondents. Telephone based methods may require back-up computers and associated equipment standing ready for instant replacement. In addition, TDE and VRE applications consider establishing "call forwarding" services ready to route incoming TDE and VRE calls to an alternative collection site if the primary collection computers malfunction. Quality Costs The costs of quality are notoriously difficult to identify. Often, it is easier to invert this idea to address costs of poor quality. For example, address refinement workload for solicitation is a cost of poor quality in the sample frame. Some edit reconciliation activities compensate for poor quality of collected data that may stem from deficiencies in concept or questionnaire design. Efforts expended to prevent future costs of poor quality, while often difficult to justify, generally pay off in lower ongoing costs. Future Costs The choice. of collection mode, or which combination, will depend on the particular survey application and the existing cost structure. However, it is important to view investments in data collection over the long-term as the relative costs of each of the above inputs do not remain constant over time. Table 2 shows recent annual data on cost trends for the major cost inputs. 70 Labor and labor-intensive inputs, such as postage, are increasingly more expensive, while capital-intensive factors, such as telephones and computers, become less expensive. Based on these data, and other historical cost trends, there may be a growing advantage to switching to collection methods that use less labor and more capital. Table 2. Recent Annual Changes in Costs of Inputs into Data Collection Cost Category Recent Annual Cost Changes (source) Labor: +5.8% for state and local government employee compensation (ECI for the 12 month period ending June 1989) Postage: +4.5% for the 1st class postage (U.S.P.s. for the rate increase in April 1988 to 25 cents) Telephones: -1.3% for interstate toll calls (CPI-U unadjusted change December 1988 to December 1989) -2.5% for intrastate toll calls (CPI-U unadjusted change December 1988 to December 1989) Travel: +3.9% for private transportation (CPI-U unadjusted change December 1988 to December 1989) Computers: -10.0% for microcomputers (PPI experimental price indexes for the 12 months ending January 1990) Survey managers should project unit costs for their surveys for alternative collection methods over a ten year period using recent price trends. This approach illustrates that decisions to implement alternative methods should be viewed in terms of estimates of future price levels. Decisions on conducting research and development testing need not await a current favorable cost benefit situation. Conclusion The decision on exactly how to use each collection mode will vary by survey application. For example, CATI and TDE could be combined to address chronically late mail respondents. These units will first be -converted to CATI collection to improve their reporting behavior in terms of timeliness and accuracy. These units will remain under CATI collection f or about 6 months; a period adequate f or reducing nonresponse problems, determining exact data availability dates (for subsequent nonresponse prompting), educating respondents to the importance of their data 71 and reinforcing timely reporting behavior. Then, the units will be converted to TDE collection to reduce costs while retaining sample control. voice recognition collection could be used for those units without touchtone phones or for those respondents who prefer voice collection. The approach outlined here is a basic tool for survey managers in assessing the potential application of new collection methods. Survey researchers should not be dissuaded by current costs from considering the use of automated collection methods. Recent cost trends suggest that the cost-effectiveness of collection methods changes over time. This should be considered in decisions concerning choice of collection methods for the future. 72 Appendix VI.B. Quality Improvements offered by CASIC Quality problems generally result from inadequate planning or control of one or more steps in the survey process. CASIC cannot replace or compensate for poor planning, but it may offer vast improvements in control by reducing manual intervention, promoting consistent procedures, by using supplementary data sources, and on- line editing to improve the accuracy of the data collection process. The automation of the questionnaire is the primary way casic improves control, by offering consistent procedures, on-line editing, and use of other information, to monitor and control the interview which otherwise would have proven too difficult or burdensome on the interviewer. While CASIC offers the potential for improvements, actual reductions in error components can only be made through efforts to delineate error potential and incorporating specific error-reducing techniques in the questionnaire. Some error reductions may be great, and others may be small. However, none will result without thorough evaluations of error sources and planning to address each. Often, knowledge of the magnitude of various errors may be necessary to decide on the cost- effectiveness of addressing some error sources. The automation of the data collection process directly reduces some sources of error. For example, telephone collection of data may reduce the potential f or processing error resulting from mailing the wrong form to a respondent. Other indirect benefits can be obtained through automation, including reductions in coverage error. For example, on-line evaluation of respondent characteristics provides immediate identification of out-of scope respondents. This section discusses several error components that may be reduced through CASIC methods. The structure, definitions and background of this discussion were derived from Statistical Policy Working Paper 15, entitled "Quality in Establishment Surveys." Readers are encouraged to refer to this document for more information on error definition, sources, control methods and measurement aspects. Specification Error Specification error occurs at the planning stage of a survey when specification is inadequate or inconsistent with the objectives of the survey. It can result from the difficulty of measuring abstract concepts or from poorly worded questionnaires and instructions. 73 CASIC methods may reduce specification errors in several ways. For example, difficult concepts may require very detailed questionnaires with complex branching patterns to obtain correct measures. CATI and CAPI can allow greater flexibility in structuring questionnaires than would be possible using paper forms. Also, CASIC provides a means for correcting specification error once identified. If one or more questions are difficult to use during collection, or.responses seem improper, corrections can be made centrally and software transferred quickly to all collection points. Given printing timing and costs, use of paper forms probably would not allow such mid-stream changes and the survey results could be severely compromised. While developing and printing questionnaires, skip pattern indicators may be omitted, or the tedious work of proofreading multiple variations may lead to errors. Use of CASIC instruments are just as susceptible to this error as are written forms. Automated questionnaires, and the associated code, must be checked thoroughly to ensure their accuracy. Forms also may be faint or smudged leading to difficulty for the respondent. Traditional methods for measuring specification error include record check studies, cognitive studies, questionnaire pretests, and comparison of results with independent estimates. CASIC can contribute to these approaches. First, record check surveys that scrutinize detailed definitional areas may be very complex. Such detailed branching is a strength of CATI and CAPI. Coverage error Coverage error includes both undercoverage, the exclusion of in- scope units; and overcoverage, the inclusion of out-of-scope units. CASIC may reduce overcoverage if the questionnaire includes checks for scope-determining characteristics. Data for sample units failing these criteria may be noted for review or exclusion, or the interviews may be ended rather than waste time. Also, duplication errors, stemming from duplicates on the sample frame may be identified through an automated records review at any point during collection. Again, such benefits are only possible with initial planning. Response Error Response error is the difference between the correct value and the value collected. Respondent error is the failure to report the correct value, and interviewer error is the failure to record the data properly. Respondent error may be controlled by comparing current data to previously reported data. Such on-line logic and internal 74 consistency edits can identify and resolve response errors directly with the respondent, rather than waiting for post-collection editing to catch errors for often spotty reconciliation follow-up. The power of an automated questionnaire also reduces interviewer error through instantaneous editing on any data entry mistakes large enough to trigger edit failures. Interviewer consistency also may be controlled by monitoring interviewer practices and assuring conformance with specified procedures. Most large, centralized CATI facilities allow supervisors to listen to interviews in process and to view screens simultaneously. Nonresponse error Nonresponse errors follow from failures to collect complete information from all units in the selected sample. There are three types of nonresponse error: noncontacts, unit nonresponse and item nonresponse. Each can be addressed through CASIC methods. Noncontacts of selected units may be the result of interviewer oversight, failure to locate the designated respondent due to incorrect address or telephone number, or failure to get the form to the respondent. CASIC cannot address weaknesses in mailing procedures except by replacing them with accurate telephone contact. This would, of course place additional burden on the accuracy of telephone numbers. interviewer oversight would be addressed by monitoring sample status data that can be collected during-interviews. For example, a detailed CATI system may capture information each time a call is placed, the number of attempts made to each number and the result of each attempt, such as "no answer" or "busy." Noncontacts may then be classified as not attempted versus unsuccessful attempts. Unit nonresponse occurs when no information is received from the respondent. The survey designer must strive to make reporting as easy as possible to reduce intentional nonresponse. Almost any effort that improves the respondent's understanding of the survey is worth the cost. The convenience of reporting is essential, as is the clearest and shortest possible interview. One CATI application reduced sample attrition by over one third compared to mail, attributed mostly to strong scheduling and building strong rapport with the respondent and providing information about the importance of the survey and its timing needs. Item nonresponse occurs when the respondent does not answer certain questions during the interview. This error may occur when the respondent cost of compiling data is too great, or the data are not easily available during the collection period. of course, some data may be sensitive or confidential. 75 Item nonresponse also may occur through the failure of the interviewer to ask questions or follow procedures. It is in this area that CASIC is most beneficial. By using software to control interviews, CATI and CAPI interviewers are not allowed to make errors of omission or purposely to skip questions. Another important part of reducing item nonresponse is to use a priori knowledge about the respondent. For example, in establishment surveys, information about the record keeping practices of the respondent may be retained on the computer for access during the interview that could provide special branching to elicit firm-specific data. This approach would generally be too cumbersome without computer assistance. Processing Error Processing error stems from the faulty use of correctly designed survey methods. It encompasses many collection and post-collection errors and the printing of the questionnaires. Also, processing error may arise from clerical handling of forms whether in mailing or key entry. CASIC methods, by reducing or eliminating these labor intensive and error-prone activities, can substantially reduce processing errors. CASIC respondents in recurring surveys may receive a mailed form once per year rather than once each month or quarter, reducing the opportunity for mail-related errors. All CASIC methods ensure that data entry and other coding is done by a well trained interviewer or by the actual respondent, thus reducing keypunch error. All CASIC procedures should include repetition of the incoming data for verification with the respondent. CATI and CAPI interviewers repeat the data aloud as they are keying it, and CASI methods must provide for repeating the data for verification. by the respondent. on-line edits again play a role in. assuring that data errors are caught before they get to the post-collection stage. Another source of processing error is data processing by computer. All the benefits of CASIC methods described above may be diminished by errors in computer processing. Failures in designing and constructing CASIC methods may substantially reduce data quality. For example, poor branching or non-exhaustive response options may prevent knowledgeable interviewers or self-response systems users from properly completing interviewers. Quality, as discussed above, is often defined in terms of statistical error or lack of accuracy. However, the idea of quality contains several other elements. For example, the element of timeliness is critical to most surveys. Accurate data that are too late to be of use have little quality. The use of CASIC methods, like CATI and TDE, has proven useful in improving the 76 timeliness of data in one large establishment survey, thus offering the potential to reduce the number and magnitude of estimate revisions. Quality also includes costs. Two identical products with differing costs are of different quality. Also, quality control should be applied to the process of methods development. A high quality CASIC application must be easily understood and easily used by interviewers and respondents. Anything less is of low quality. Conclusion CASIC methods have great potential for improving the control over data collection activities and the quality of the resulting data as it moves toward the post-collection survey functions. This discussion of survey error and the application of CASIC methods is not exhaustive of either current or potential approaches. Many other creative approaches will be developed to further use the power of computers to aid in improving the quality of Federal surveys. Equally important to add to the discussion of quality is a caution that the mere use of CASIC methods does not automatically guarantee higher data quality. Failures in designing and testing questionnaires or in using other standard survey practices will inevitably result in data quality problems. The increased reliance on software development has important implications for hiring and training skilled survey designers. Statistical methods knowledge and experience alone are not sufficient qualifications to achieve satisfactory results. Previously distinct boundaries between occupational groups will continuously blur or disappear. in the future survey design will likely be increasingly accomplished through teams of skilled workers from different occupations. Just as statisticians must be familiar with software design techniques to understand their implications, systems analysts and programmers must be familiar with the statistical aspects of the survey and questionnaire design. Managers of automated surveys cannot avoid having a background in all aspects of the design, implementation and maintenance of integrated systems. 77 Appendix VI.C. Survey Examples The following examples.provide additional examples of current CASIC applications. Each provides a point of contact for additional information. 78 National Agricultural Statistics Service (NASS) Agricultural Surveys Collection Type -- CATI Point of Contact USDA - NASS CATI Section, Survey Management Branch Research and Application Division 1400 Independence Avenue Washington, DC 20250 Type of Data to be Collected The Agricultural Surveys are conducted in January, March, June, July, September, and December to collect data on crops, livestock, grain stocks, and other information from farmers. Starting with the March 1987 survey, data were collected using Computer Assisted Telephone Interviewing (CATI) to replace the paper-and-pencil mode. CATI is a computer driven telephone interviewing system developed to replace a paper questionnaire with a more efficient, error-reducing questionnaire. It can edit the data as it is entered by accepting only valid responses; checking sums and edit limits; carrying forward responses required for subsequent questions; and refusing answers inconsistent with current or historical responses. CATI provides question branching and some systems can handle each state's customized version of the Questionnaire. Currently, 14 of the 45 field offices are collecting data with CATI using 183 calling stations, and in 1989, over 70,000 farmers were contacted to obtain Agriculture survey data. CATI usage will expand rapidly with the installation of the new PC Local Area Networks (LANS) in the-field offices. By 1992, all 45 field officer. will be equipped with a PC LAN and there should be about 750 calling stations available -for making CATI calls. Approach to Respondents The Agriculture Surveys CATI application is written using the Computer Assisted Survey Execution System (CASES) software developed by the University of California at Berkeley. CASES has an automated sample delivery system that is in use and an automated call scheduling and dialing option will be initiated in the future. Other features make CASES one of the most powerful systems on the market today. These include: interactive editing (coding), sample management, records keeping, conversational survey Analysis (CSA), audit trails, jump-back menus, and full screen mode with cursor control. The interview sessions are initiated by the interviewer. 79 The computer program controls branching to or skipping among questions, and validates the data as it is entered. In addition, the interviews are more personalized, probing questions are standardized, use of historic data is standardized, and the questions can be more sophisticated than those on paper questionnaires. Transmission Data collected via CATI is currently up-loaded to an IBM mainframe leased from the Martin Marietta Corporation where a SAS edit is done, and data summarized. Since the survey data is currently collected via different modes (CATI, telephone, on paper. personal interview, and mail), it is necessary to convert the data to one standard system for summarization. Factors Affecting Choice of Method The implementation of the CATI for collecting Agricultural Survey data has resulted in higher quality data and a reduction in time and cost of collection. This is due to combining the collection, entry, validation, analysis, and conversion of data. More complex questionnaire design is possible since the program controls branching and logic. CATI works particularly well in situations where a short implementation schedule exists. Quality Issues Significantly fewer errors occur, as data is validated at the time it is reported and keyed. The data validation currently includes internal data checks but some work has been done on using historic edit checks as well. Since the program controls the logic, you are assured that all questions are asked consistently. A totally menu driven system is being designed and will be in operation soon. 80 National Health Interview Survey (NHIS) Computer Assisted Personal Interview (CAPI) Case Study Collection Type -- CAPI Point of Contact Division of Health Interview Survey National Center for Health Statistics 3700 East-West Highway Hyattsville, MD 20782 (301) 436-7085 Type of Data to be Collected The case study involved the collection of health data f rom approximately 500 households in two Census Regions: Chicago and Charlotte. The questionnaire consisted of the NHIS core questionnaire that contains more than 600 questions on the composition of the household, demographic characteristics, health status of the individuals, health care visits and incidents, and other pertinent health care data. The respondents are contacted at their residence, and are not contacted again unless the interview was not completed on the initial visit or additional clarifications are needed. Because this effort was a feasibility study for CAPI, only a small portion of the normal survey respondents were contacted. The normal survey size is 50,000 households per year. Approach to Respondents CAPI was used to obtain the survey information. A portable computer containing the survey questionnaire was carried-into the household by the interviewer. The portable computer was a Toshiba 1100+ weighing approximately 10 lbs. The survey questionnaire was programmed in the Computer Aided Survey System (CASS) language developed by Dawn and Charles Palit at the University of Wisconsin. The interviewer conducted the survey by reading the questions from the computer screen and entering the answers on the keyboard. Transmission The survey questionnaire data is collected on 3 1/2" floppy disks by the interviewer. The disks are collected from each interviewer in the region, merged at the regional office, and then mailed to the computer center in North Carolina for uploading to the mainframe computer. 81 Factors Affecting Choice of Method The choice of CAPI provided several advantages. First, improved timeliness of survey data availability through the ability to quickly put the survey into the field and the subsequent elimination of the keying of the completed questionnaire. Second, improved data quality because (1) significant editing can be done as a part of the data collection process; (2) there is greater flexibility for questionnaire design, e.g., more opportunity to make changes closer to the field implementation date; (3) good measurements for non-sampling error are easily provided as a part of the process; and (4) immediate interviewer quality control is available from an analysis of the data, e.g., time to complete a section or the entire questionnaire. 82 Current Employment Statistics Survey Bureau of Labor Statistics Collection Type -- CATI, TDE, VRE Point of Contact Division of Monthly Industry Employment Statistics U.S. Bureau of Labor Statistics Room 2089 441 G Street, N.W. Washington, D.C. 20212 202--523-1446 Type of Data to be Collected The Current Employment Statistics (CES) survey collects data from over 300,000 nonagricultural business establishments each month covering employment, hours and earnings. The CES is voluntary and is conducted in a Federal-State cooperative system in which BLS provides the statistical standards and procedures for use in each state and the District of Columbia, Puerto Rico and the Virgin Islands. in this way, the resulting data can be aggregated to National totals, and are comparable among the states, which produce estimates at the state and metropolitan area levels. The national data are first published after only two weeks of collection. Then, based on additional sample receipt, revised estimates are published after 3 more weeks of collection, followed by final estimates after a total of 8 weeks of collection. The short collection period poses the toughest problem for the CES survey. Approach to Respondents Under mail collection, respondents return the form sometime after their data become available. Given the very short, two week collection period before the publication of preliminary estimates, any delay in completing the form, or returning it to the state has severe implications for response rates. Under CATI collection, respondents are called on a pre-arranged date, if possible, the same day -as the firm's data are available. The data are entered and edited during this call, and the next month's call is scheduled. The conversion of respondents from mail to CATI includes sending selected units a package of materials with information on the importance and uses of the CES, data, and instructions on 83 reporting by telephone. As respondents are converted to TDE or VRE collection, another package is sent containing instructions on how to participate using these methods. Under TDE and VRE, respondents receive an "Advance Notice" postcard during the reference period that serves as a reminder that it is time to call in their data. The collection microcomputer is available 24 hours, 7 days a week to receive calls. A few days before the end of each collection period, the TDE and VRE collection files are checked, and those respondents for which data are missing receive a short call to ask that the data be called in. After the first month of collection by TDE or VRE, respondents are called to discuss the new method, to identify and correct any problems that may have been encountered, and to insure trouble-free collection. Transmission Under the mixed mode of collection in the CES program, responses are received by mail, CATI collection and TDE self-response. In the Federal/State cooperative system, the state collects the microdata, through the appropriate mix of methods, for electronic transmission to the central computing facility in Washington. The State data are then aggregated for the production of national estimates. At each level, the microdata are subjected to rigorous editing for logical, consistency, and longitudinal checks. Factors Affecting Choice of Method Timeliness BLS has been conducting research and development in the area of computer assisted methodology since 1984. Currently, over 5300 units are collected via CATI each month. The use of CATI within the CES program is limited by the resources available. The current implementation strategy is based on targeted use of CATI for specific segments of the sample which warrant special treatment and commitment of funds. These segments include large, "certainty" units, and late respondents. These units are converted to CATI collection for a short period, usually 6 months, to educate respondents on the importance of the CES data and the reporting timing requirements and to improve reporting habits. After reporting improves, these units will be returned to either TDE self- response collection, or mail, if there is no access to a touchtone phone. Thus, CATI is seen as a transitional tool 84 for improving the overall timeliness of the CES sample over a period of just a few years. Costs While CATI is a very strong method for improving the timeliness, it is currently more expensive than the mail collection process that has been used for decades. The high costs of CATI prompted BLS to pursue development and testing of TDE and VRE methods. These automated self-response methods offer lower costs through reducing or eliminated many manual activities and postage involved in mail collection. Respondents without touchtone phones will be collected using voice recognition. Quality Issues By every measure, CATI proved.superior to mail collection, and TDE has shown the ability to maintain high response rates over extended periods of more than two years. The tests of VRE collection show similar ability to maintain high response rates. Performance Measure Collection Method Mail CATI TDE/VRE Sample received for: preliminary estimates 50% 85% 85% revised estimates 75% 99% 99% final estimates 87% 100% 100% Sample attrition (annual rate) 10-15% 2-4% 2-4% Besides reducing nonresponse error for the preliminary estimates, the CES program uses a CATI system to evaluate and correct response error. Large scale tests using telephone record check surveys have shown that this approach is useful for insuring that the reported data conforms as closely as possible to CES definitions. 85 Energy Information Administration (EIA) Reserves Information Gathering System (RIGS) Form ET.A-23 Collection Type -- PDE Point of Contact Reserves and Natural Gas Division Energy Information Administration 1114 Commerce St., Room 804 Dallas, Texas 75242-2899 (214) 767-2200 Type of Data to be Collected There are approximately 600 respondents who are oil or gas well operators who produce at least 400,000 barrels of crude oil or 2 billion cubic feet of gas annually. There are 15 detailed questions in this annual survey. A system of reporting on PC diskettes was set up on an operational test basis for the collection of 1988 data. Ten percent of 1988 production was reported with RIGS. Approach to Respondents The questionnaire runs on IBM PC compatible computers with at least 360K bytes of RAM and two floppy drives or a floppy drive and a hard disk drive. The user only needs to know basic Dos functions. The program is menu driven, and on-line help is available as well as a toll-free telephone hotline during business hours. It comes with a fifty page Users Guide. Transmission Respondents copy the data files onto a floppy disk and mail the disk (with the cover page sent to them) to EIA. They also have the option of sending in the original paper form. Factors Affecting Choice of method RIGS was developed to provide respondents with an alternative, more user-friendly means for reporting data. The PC compatible computer was chosen because of its wide availability. Use of the mail avoids security concerns about data transmission. EIA processing is done on a secure machine. 86 Quality Issues (Human Interface) RIGS includes data edit checks to prevent inadvertent entries and an on-line correction capability. Company totals are automatically calculated. Respondents are requested to keep a copy of the data files and a printed copy of the output in case EIA's quality control analysts need to contact them. Reduction of follow-up calls is a significant benefit. 87 Internal Revenue Service Electronic Filing System Office Collection Type -- PDE Point of Contact Operations and Marketing Branch Electronic Filing System Office Internal Revenue Service 1111 Constitution Avenue, N.W. Washington, DC 20224 (202) 535-6394 Type of Data to be Collected In the early 1980's, the Internal Revenue Service (IRS) decided that the electronic transmission of returns by tax preparers to IRS would be both a practical and cost-beneficial alternative to the mailing of paper tax returns when a refund is claimed. According to the Agency, the benefits of electronic filing would include: (1) reduced manual labor costs required to process, store, and retrieve returns, (2) faster processing and retrieval of tax data, and (3) reduced interest IRS is required to pay to taxpayers who file timely refund returns, but who are not issued refunds within the interest- free period allowed to the IRS to process these refunds. Further, IRS reports show that electronically transmitted returns are processed with significantly fewer errors than paper returns. According to IRS figures for the 1988 filing season, as of April 29, 1988, 20 percent of paper returns processed by IRS had errors and only 5.5 percent of, those filed electronically had errors. For taxpayers, electronic filing can mean refunds up to 3 weeks sooner, and because IRS can deposit these refunds directly into taxpayer bank accounts, refunds may arrive 3 to 4 days earlier than that. For tax preparers, the ability to provide electronic filing services to taxpayers promises a competitive business edge. Approach to Respondents in 1986, the program was initially tested in three metropolitan areas, and five preparers electronically filed 24,820 returns to the Cincinnati Service Center. In 1987, 69 preparers in 7 metropolitan areas electronically filed 77,612 returns. For the 1988 filing season, IRS expanded its electronic filing program to 16 IRS districts and a second service center in Ogden, Utah. With the expansion in 1988, the number of preparers increased to 2,339. Of that total, 1,114, or about half, filed all of the 583,077 88 electronic returns for 1988. Furthermore, H & R Block offices accounted for 82 percent of the total returns filed electronically during the 1988 filing season. Transmission To operate electronic filing at each of the two service centers in 1988, IRS bought the International Business Machines Corporation (IBM) Series I computer, a local area network, and the related computer software. The network has IBM and IBM-compatible personal computers, high-resolution graphics display workstations, laser printer, tape drives, and optical disk drives. IRS uses the Series I to receive preparers' transmissions of electronic returns and to transmit certain information to preparers. The local area network was expected to do two primary functions: (1) retrieve and visually display the electronic returns on the tax examiners I workstations for error correction, and (2) permanently store these returns. The basic components needed to prepare and transmit electronic returns include a computer, IRS-approved software to prepare tax returns, and the communications equipment and IRS-approved software to transmit the returns to IRS. In addition, IRS tests and verifies the preparers' competence in transmitting electronic returns. The electronic filing process begins when a preparer transmits electronic returns to the service center. The Series I receives the transmission and writes the data onto a magnetic tape. The tape is then manually transferred from the Series I to the service center mainframe computer processing. The mainframe generates an acknowledgment file specifying the received returns and whether each is accepted or rejected, and then writes this file onto magnetic tape. This tape file is hand carried from the mainframe to the Series I for electronic transmission to the individual preparers. Mainframe processing also identifies electronic returns containing errors - After IPS corrects the errors, tapes containing data from accepted error-free returns are sent with data from returns filed on paper to the IRS National Computer Center in Martinsburg, West Virginia, where the master files of tax account data are updated. 89 Energy Information Administration Petroleum Electronic Data Reporting Option (PEDRO) Collection Type -- PDE Point of Contact Petroleum Supply Division Energy Information Administration 1000 Independence Avenue, S.W. Washington, D.C. 20585 Type of Data to be Collected The Petroleum Supply Division (PSD) of the Energy Information Administration (EIA) decided in 1987 to investigate electronic forms submission to collect the Petroleum Supply Reporting System (PSRS) survey forms. Ten of the major petroleum companies who file the mandatory "Monthly Refinery Report" were contacted to assess their PC and communications capabilities. The respondents contacted showed interest in investigating the use of Pc's to collect this data. Most of these were already using PC's for business, personal, or academic purposes. The respondents either had a PC in their office area or had access to one in another office. Software such as Lotus 1-2-3 and Dbase III could usually be found on these PC's. Some PC's were equipped with communications capabilities and those respondents were. already using telephone lines for company reporting. It appeared to be the appropriate time for the PC to enter the PSRs data collection process. Approach to Respondents Early in 1988, PSD developed the Petroleum Electronic Data Reporting Option (PEDRO) and began providing its respondents with a software diskette by which they could create an electronic image of the form on a PC screen and enter,their data in the appropriate cells. Firms having the necessary software capabilities can use their database to feed the data directly to the electronic survey form, eliminating keying and transcription errors. User-friendly software with help functions has been added to data entry functions to provide quick reference to definitions, conversion factors or other information to speed the completion of the survey form. This eliminates the need to search hard-copy files for survey forms instructions, product definitions, conversion tables, etc. 90 Transmission The data received on EIA survey forms are subjected to rigorous edit tests before they are accepted for inclusion in the EIA database. These data are later summarized to produce EIA publications and reports used by the industry, the Congress, and the public. Timeliness and accuracy are needed in every step of the data collection process. Collecting data via electronic means allows EIA to pursue another approach to saving time by providing respondents with electronic forms software that also does the survey edits and isolates anomalies for review before submitting the survey response to EIA. Issues which would require an EIA data analyst to contact a respondent by telephone for resolution are highlighted immediately. This allows the respondent to correct any errors or attach a resolution indicator/comment to explain any anomalies. Additional telecommunications software has been added to allow the direct link between the respondent's PC and the EIA system. Now the capability exists on a PC to create, quality check, and transmit an electronic file directly to EIA. This file is immediately accessed by EIA processing software and security and data transmission integrity tests are done. The PEDRO software contains electronic forms for data entry, software for statistical editing and establishes a communications link between the respondent's PC and the EIA Computer Facility. The functions are menu-driven and use macro languages and script files to eliminate rudimentary tasks. The PEDRO system only requires that the respondent's PC run DOS software and be equipped with telecommunications capability. 91 Energy Information Administration (EIA) Annual Survey of Nuclear Utilities Collection Type -- PDE Point of Contact Nuclear and Alternate Fuels Division Energy Information Administration 1000 Independence Avenue, S.W. Washington, D.C. 20585 (202) 254-5558 Type of Data to be Collected The Nuclear and Alternate Fuels Division of the EIA conducts an annual survey of nuclear electric utilities that own commercial nuclear reactors. The BIA collects data on over 100,000 nuclear fuel assemblies that are owned and managed by these utilities. These data are collected in support of the programs of the Department of Energy's Office of Civilian Radioactive Waste Management. A system of reporting on PC diskettes was set up in 1986 and began with the collection of 1985 data. Approach to Respondents The respondents are supplied with a program diskette containing compiled software and a data diskette. The data diskettes -have the respondent's prior data submissions that are needed for comparison purposes and space for the current submission. The respondents load the program and data diskettes on their compatible PC's and enter the current data which is. verified by the data entry program as it is keyed. They print a copy of the data submission, sign a certification statement for it, and return the printed copy and statement to the EIA with the diskette. Transmission The diskettes are mailed from the EIA to the respondents and the completed data diskettes are returned to the EIA by mail. Telecommunication between the EIA and the respondents is not needed. When the diskettes are received at the EIA, they are loaded onto a PC and checked. The data are uploaded from the PC's to the EIA mainframe over local telephone lines. Note that since these data are for public utilities, they are in the public domain and thus not confidential or proprietary. Certain issues of data security do not apply for this survey. 92 The diskette form of submission is preferred, but not mandatory. Respondents have the option of filing a paper form. Now there are approximately 70 utilities required to report for approximately 125 reactors, and all reports are filed on diskette. Factors Affecting Choice of Method The major advantages of the diskette collection are: Data accuracy has been improved by (1). editing the data as they are keyed and (2) in some instances, data entry by technical rather than clerical personnel. The second reason suggests that a higher level of technology in data collection may result in. the availability of a higher level of respondent skill to complete the survey. More data, including data of a more complex nature, can be collected using the diskettes compared to using paper forms. Data are available sooner. In planning such a system, government agencies must be careful to create a, system that does not require or endorse a particular brand of hardware or software. Software licensing agreements also must be carefully reviewed to ensure they are not violated when software it provided to respondents. 93 Appendix VI.D. A Taxonomy of Information Gathering Using a Computer During this study there have been wide-ranging discussions on naming conventions for information gathering using a computer. The discussion has been so wide-ranging that the name of the committee has changed at least 3 times. This note was originally titled "Acronyms for Survey Technologies." However, it provides a good model of the different procedures for collecting information with computer assistance. The title of this section has been changed to reflect this model. We can distinguish two aspects of the data collection process which may include automation: (1) assistance during the interview and (2) interaction with the respondent. A computer or other technology may be involved in one or both. Here is a system of acronyms using codes to show how each part is handled: Operation types: CA = computer assisted MA = manually assisted Interaction types: PI = personal interviewing (person to person) SI = self interviewing (respondent reads the questions) TI = telephone interviewing (person to person on the phone) TO = touchtone interviewing (respondent talks on the phone to a machine that discerns touchtones) VI = voice recognition interviewing (respondent talks on the phone to a machine that discerns voices) From these we get various possibilities, old and new: CAPI = computer assisted personal interviewing CASI = computer assisted self interviewing CATI = computer assisted telephone interviewing CATO = computer assisted touchtone interviewing MAPI = manually assisted personal interviewing MASI = manually assisted self interviewing MATI = manually assisted telephone interviewing A third aspect in some cases is how the data are sent to the processing center: MA = mail NE = network (wide area computer network) TE = telephone line (direct line to computer) 94 The diskette form of submission is preferred, but not mandatory. Respondents have the option of filing a paper form. Now, there are approximately 70 utilities required to report for approximately 125 reactors, and all reports are filed on diskette. Factors Affecting Choice of Method The major advantages of the diskette collection are: Data accuracy has been improved by (1) editing the data as they are keyed and (2) in some instances, data entry by technical rather than clerical personnel. The second reason suggests that a higher level of technology in data collection may result in the availability of a higher level of respondent skill to complete the survey. More data, including data of a more complex nature, can be collected using the diskettes compared to using paper forms. Data are available sooner. In planning such a system, government agencies must be careful to create a system that does not require or endorse a particular brand of hardware or software. Software licensing agreements also must be carefully reviewed to ensure they are not violated when software is provided to respondents. 93 Appendix VI.E. Glossary of Technical Terms 286, 386 Short for 80286, 80386. 80286, 80386 Microprocessors from Intel used in PCIS. ASCII American Standard Code for Information Interchange; a seven bit representation of alphanumeric characters and control codes. ASCII file A file with ASCII codes; loosely, a text file. AT The name for the second microprocessor generation of personal computers. These personal computers use the 80286 microprocessor. Audit trail A record of changes made to a data set over its lifetime. Authoring Computer software that allows a non-computer system programmer to write a CAPI survey questionnaire instrument. Batch Computer processing with no human involvement after start-up; the opposite of interactive. Baud Baud rate; the number of times per second that a signal in a communications channel changes states; often confused with bps. Benchmark The use of some standard computer program (e.g., a sort program) to measure the use of computer resources in a particular environment. This could include computational speed and storage resources. Bit Binary digit; symbolically, a one or zero. bps Bits per second; the number of bits transmitted each second over a communications channel. Bridge A communications channel between two technically similar networks. Byte Eight bits. CAPI Computer Assisted Personal Interviewing is a personal interview usually conducted at the home or business of the respondent using a portable computer. 96 Case Portion of the CAPI software that handles the management administrative management of the survey. This portion usually includes keeping track of the status of each interview, interviewer assignments, and other similar administrative tasks. CASI Computer Assisted Self Interviewing (CASI) involves data collection without the direct presence of an interviewer. CASI can take several different forms which are differentiated by the means of collection. These include Prepared Data Entry (PDE) where the respondent answers questions displayed on a computer terminal; Touchtone Data Entry (TDE) where the respondent answers computer generated questions by pressing buttons on a telephone; and Voice Recognition Entry (VRE) where the respondent answers questions by speaking directly into a telephone. CASIC Computer Assisted Survey Information Collection. CATI Computer Assisted Telephone Interviewing CATI) is a computer assisted survey process which uses the telephone for voice communications between the interviewer and the respondent. CCITT Consulting Committee for International Telephony and Telegraphy; standards setting organization from which have emerged international standards following their guidelines in the area of computer networks. Centralized Interviews carried out from one central location (e.g., nationwide). Centralized Main or host computer provides all of the processing computing power. Chip See microchip. CPU Central processing unit; the computer part which interprets and executes instructions. CRT Cathode ray tube; the most common type of computer screen. Decentralized CATI interviews carried out from several geographically dispersed.locations (e.g., states). 97 Distributed Computing power is distributed over a number processing of computers which may be co-located or geographically distributed. Disk A circular, magnetized medium which holds electronic data. Disk drive A device which reads a disk electronically. Diskette A floppy disk. DM Direct Manipulation: A type of human-computer interface which accentuates the user's feeling of directly operating on responsive display objects. Example: Macintosh user interface. DOS Disk operating system; an abbreviation for MS-DOS or PC-DOS, the original operating system for IBM PC's. Download The processing of transferring a file from a mainframe computer or host to a connected personal computer or terminal. EDI Electronic data interchange; the automated exchange of business information such as invoices. Establishment Business. Floppy disk A bendable disk, usually 5 1/4 inches in diameter, although increasing use is being made of unbending disks 3 1/2 inches in diameter. Gateway A communications channel used to pass data between two different networks to communicate with each other. Hard disk An unbendable disk and its disk drive; holds more data than a floppy disk. I/O Input and output. IDN Integrated Digital Networks; digital transmission networks which are dedicated to voice and data. ISDN Integrated services digital network; an emerging technology which offers many new telecommunication services such as the mixing of the transmission of voice and data. 98 File server A computer, usually on a Local Area Network that provides a group of users with storage facilities to store and access their files. GB Gigabyte(s). Gigabyte Loosely, one billion bytes; strictly, 1,073,741,824 (2 to the 30th power) bytes. Interactive Computer processing which prompts for and accepts human input. KB Kilobyte(s). Kilobyte Loosely, one thousand bytes; strictly, 1024 (2 to the loth power) bytes. LAN Local area network; the interconnection of microcomputers at one site. Mainframe A large computer; often designed to serve many users at one time, although some mainframes, often called supercomputers, are designed to provide high-speed computing; their purchased costs are often in excess of a million dollars. MB Megabyte(s). Megabyte Loosely, one million bytes; strictly, 1,048,576 (2 to the 20th power) bytes. Microchip A printed circuit etched on a silicon chip. Microcomputer A small computer, e.g., costing less than $10,000. Microprocessor A CPU on a microchip. Minicomputer A medium sized computer; larger than a microcomputer but smaller than a mainframe; costing on the order of $100,000. MS-DOS Microsoft's DOS for PC's. On-line (1) A peripheral device is on-line when it is connected and ready for use; (2) involving interactive use of a computer. One-time Non-repeating survey. Data is collected once, or over great intervals (e.g., 5-10 years). Ongoing Repetitive survey (e.g., weekly, monthly or yearly). 99 PC Personal computer; broadly speaking, any microcomputer; narrowly speaking, an IBM-compatible computer; even more narrowly speaking, IBM's first microcomputer. PC-DOS IBM's version of MS-DOS (they are virtually identical). PDE See Prepared Data Entry Prepared Prepared Data Entry ( PDE) where the respondent Data Entry answers questions displayed on a computer terminal. Print server A computer, usually on a Local Area Network that provides a group of users with a range of printing services. Question path See skip pattern. RAM Random access memory; the core memory for a computer's CPU. RAM disk RAM used as if it were disk space. Sampling Unit A selected element for data collection in a survey' usually selected from a defined population of units by a random mechanism. In a survey of households in a state, the sampling unit is the household. Skip pattern The sequence questions are asked in a survey questionnaire instrument; this sequence is often based on the answer to each question. Target Collection of survey units about which you wish to population make some measurement, but to quantify it, a sample is obtained and an estimate is calculated. TDE See Touchtone Data Entry Touchtone Touchtone Data Entry (TDE) allows respondents to Data Entry call and answer questions posed by a computer using the keypad of their touchtone telephone for well-controlled and inexpensive collection. User-friendly Software that provides an interface to the software user that is simple and intuitive; thus making the software easily to use. UNIVAC I The name of the first digital computer in widespread commercial use. 100 UNIX An operatin system initially designed for small computers, but currently in use over a wide range of computers. Upload The process of transferring a file from a personal computer or terminal to a mainframe computer or host. Voice Voice Recognition Entry (VRE) allows respondents to Recognition call and answer questions posed by a computer by Entry speaking directly into the telephone. The machine translates the incoming sounds for verification with the respondent and storage in a data base. WAN Wide Area Network. Waterfall A straight-forward approach to software development methodology by stepping through specification, design, implementation, debugging and testing without ever looking back -- as opposed to moving back and forth between these steps as the objectives become more clearly understood. WYSIWYG Pronounced whizzy-wig. What You See Is What You Get. A style of presentation to users in which the displayed material is essentially identical in form to the final product. Example: modern word processing software. XT The name given to IBM to an early version of the Personal Computer which had internal disk storage (i.e., a hard disk) that could hold 10 or more megabytes of data. 101 Reports Available in the Statistical Policy Working Paper Series 1. Report on Statistics for Allocation of Funds (Available through NTIS Document Sales, PB86-211521/AS) 2. Report on- Statistical Disclosure and Disclosure-Avoidance Techniques (NTIS Document Sales, PB86-211539/AS) 3. An Error Profile: Employment as Measured by the Current Population Survey (NTIS Document Sales PB86-214269/AS) 4. Glossary of Nonsampling Error Terms: An Illustration of a Semantic Problem in Statistics (NTIS Document Sales, PB86- 211547/AS) 5. Report on Exact and Statistical Matching Techniques (NTIS Document Sales, PB86-215829/AS) 6. Report on Statistical Uses of Administrative Records (NTIS Document Sales, PB86-214285/AS) 7. An Interagency Review of Time-Series Revision Policies (NTIS Document Sales, PB86-232451/AS) 8. Statistical Interagency Agreements (NTIS Document Sales, PB86-230570/AS) 9. Contracting for Surveys (NTIS Document Sales, PB83-233148) 10. Approaches to Developing Questionnaires (NTIS Document Sales, PB84-105055/AS) 11. A Review of Industry Coding Systems (NTIS Document Sales, PB84-135276) 12. The Role of Telephone Data Collection in Federal Statistics (NTIS Document Sales, PB85-105971) 13. Federal Longitudinal Surveys (NTIS Document Sales, PB86- 139730) 14. Workshop on Statistical Uses of Microcomputers in Federal Agencies (NTIS Document Sales, PB87-166393) 15. Quality in Establishment Surveys (NTIS Document Sales, PB88- 232921) 16. A Comparative Study of Reporting Units in Selected Employer Data Systems (NTIS Document Sales, PB90-205238) 17. Survey Coverage (NTIS Document Sales, PB90-205246) 18. Data Editing in Federal Statistical Agencies (NTIS Document Sales, PB90-205253) 19. Computer Assisted Survey Information Collection (NTIS Document Sales, PB90-205261) Copies of these working papers may be ordered from NTIS Document Sales, 5285 Port Royal Road, Springfield, VA 22161 (703) 487-4650