Practical Realities of Computer-based Finding Aids: The NARS A-l Experience
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The American Archivist / Vol. 42, No. 2 / April 1979 167
Practical Realities of
Computer-based Finding Aids:
The NARS A-l Experience
ALAN CALMES
THE NATIONAL ARCHIVES and Records System Study
Service (NARS) A-l System is a com- At first, the NARS program analysts
puter-assisted procedure for compil- asked whether gaining administrative
ing all the series level inventories of control over record groups and com-
the National Archives record groups piling series descriptions required the
into one master file. Two main pur- creation of a new system. The A-l sys-
poses of the system are: (1) to provide tem study consisted of an analysis of
the Office of the National Archives existing procedures for describing and
with administrative control over the controlling archival records. Program
record groups (allocation to custodial analysts determined the adequacy of
units and quantity control); and (2) to the various procedures, compiled a list
compile all series descriptions into one of problems, identified types of infor-
machine-readable file according to a mation of interest to archivists and re-
standard format and a hierarchical ad- searchers, and assembled a variety of
dressing scheme. The NARS A-l Sys- finding aids. In the process of evaluat-
tem came about after an in-depth ing existing series descriptions, the
background study between 1972 and program analysts identified strengths
1974 found the NARS administrative and weaknesses of each and outlined
and descriptive programs inadequate methods to bring all the series descrip-
for controlling archival records.1 The tions together into one system. It was
design of the A-1 system took place in recognized that the format (layout of
1974-75, and the system went into information and the kinds of informa-
production in 1976.2 tion) varied widely from one series de-
1
Claudine Weiher, "Control and Description of Records in the National Archives," unpublished,
NARS-NAA, 4 vols. (1974).
2
Alan Calmes, "NARS A-l System Documentation," unpublished, NARS-NNB (15 August
1978).168 The American Archivist—April 1979
scription to the next and that a stand- efit analysis evaluated the relative
ard format ought to be adopted. In the worth of manual processing as com-
final analysis, the study envisioned a pared to automated processing and
system which would incorporate desir- concluded that costs would be compa-
able new features, such as format con- rable. However, this was a moot point
trol, into the existing system. since creation of the file would repre-
The main consideration then be- sent about 60 percent of the total cost,
came how to assemble and compile the and the Archives administration could
formatted series descriptions into a not hire a large enough number of
master file. Subject indexing was con- clerk typists and provide them with
sidered impractical because a dictionary enough office space for typing and
\ ,of terms would have to be developed maintaining an active file of complete
and applied systematically to all series inventories of all the record groups.
descriptions. It would make the previ- Therefore, to assist the manual opera-
ously described series descriptions in- tions and reduce the need for addi-
adequate and would require a consid- tional personnel and office space,
erable amount of additional time for NARS management looked toward au-
all future series descriptions. Indexing tomation. The A-l system study, which
would require that an archivist identify cost $70,000,3 concluded that a com-
appropriate index terms for each series puter-assisted system mainly for the
description. This would slow down the purpose of text editing—sometimes
decision-making process during series called "wordprocessing"—should be
description writing. It was estimated implemented instead of a computer
that such a process would triple the se- centered system designed for infor-
ries description processing time. The mation retrieval by subject, which
A-l system was supposed to accelerate would require the use of index terms.
the production of series descriptions,
not retard their production. Further-
System Design and Implementation
more, the A-l system was thought of
as a means of gaining administrative System design of the A-l system in-
rather than intellectual control over volved two different activities: the
the records. The analysts recom- preparation of input forms, and com-
mended that subject retrieval receive puter processing. The following de-
serious attention only after the prob- scription of the A-l system will not
lems of administrative control were deal with the preparation of input
solved. forms.4 Computer processing con-
Initially, automation was seen as the sisted of the conversion of statistical
second of two alternatives; for at first and descriptive data into machine-read-
the problem was conceptualized in able form, the manipulation of the ma-
terms of manual processing. The ana- chine-readable file, and the produc-
lysts estimated the time needed to re- tion of computer printouts. Because
produce old series descriptions accord- the A-1 system study called for a com-
ing to a standard format and to process puter assisted system rather than a
new series descriptions. The cost-ben- computer centered one, batch processing
3
It took four and one-half man-years to complete the system study.
4
GSA Form 6710A, Change of Status Record, was designed for the collection of series description
information. Form design was carried out independently of the system design.The NARS A-l Experience 169
was viewed as the main mode of oper- record contains information describ-
ation. ing the record group as a whole; for
During the design stage of the A-l example, an organizational history.
system, system analysts first consid- Level 9 is the subgroup which is most
ered the purposes and objectives of the embedded in the record group's hier-
computer printouts. NARS manage- archical structure. Subgroups thus
ment decided that the traditional represent such organizational struc-
NARS inventory format should be the tures as "office," "division," and
main output product. Batch process- "branch." Levels 10, 11, and 12 are
ing of the record group descriptions in series, subseries, and sub-subseries, re-
inventory format required the ma- spectively; they may be immediately
chine-readable file to be arranged se- associated under a record group as a
quentially according to a hierarchical whole (level 1) or with any subgroup
numbering scheme (an addressable (levels 2-9). Figure 2 illustrates the
control number) which reflected the operation of the control number illus-
placement of archival series within the trated above.
originating agency's organizational The resulting control number shown
and/or functional structure. The ad- in Figure 1 identifies the address for
dressable control number was a com- the series description of "LETTERS
bination of record group number, sub- RECEIVED."
record group numbers (eight possible In order to avoid the problem of
layers), series number, subseries num- variable length records,5 all the data be-
ber, and sub-subseries number. longing to the same hierarchical ad-
In Figure 1, the term "Level" iden- dress are linked sequentially by a set
tifies the depth of the hierarchically lo- identification code (i.d.) and subset
cated record. Level 1 represents the rec- number. A set is a repeatable line of
ord group number level. A level 1 the same type of information, such as
Figure 1, Control Number
Level 1 2 3 4 5 6 7 8 9 10 11 12
RG SUBGROUPS SERIES SUB-S SUB-SUB-S LEVEL
CON- A B C D D E F H
TROL
No. 127 1 1 1 1 1 1 0 0 18 0 0 10
5
Variable length records require special handling called "list-processing," which involves a com-
puter check for the end of each record each time it processes the file. Multiply one variable length
record times two hundred thousand records to make up the entire file, and it is easy to see that the
computer's job would have been slowed down considerably and been costly. For an example of list-
processing see Alan Calmes, "A PL/I Free-Field Handling System," in Historical Methods Newsletter 8
(December 1974): 39-47.170 The American Archivist—April 1979
Figure 2, Record Hierarchy
RG 127 Records of the U.S. Marine Corps
A. 1 Textual Records
B.I Headquarters U.S. Marine Corps
C.I Office of the Commandant
D.I General Records
E.I Correspondence
S.I REGISTER OF LETTERS RECEIVED
S.18 LETTERS RECEIVED
in Figure 3. Each line of information is may be repeated up to 999,999 times
one subset of a periodic data set, some- by means of the subset number. Figure
times called a "repeating group." In 3 outlines the descriptive narrative and
the control number example above, a location register part of the computer
set i.d. such as "05," may be attached file format.
to the end of the control number to in- A line of text narrative may exist
dicate that the following information alone or be followed by an indefinite
consists of one narrative line. Subset number of additional lines up to
number one attached to the set i.d. in- 999,999. For output-page-width rea-
dicates that the information represents sons, each line of text consists of a
line number one of a paragraph. A set maximum of seventy-six characters.
Figure 3, File Format
Control # + Set i.d. + Subset # + DATA
Repeating (full control #) 05 000001 (narrative line)
group (full control #) 05 000002 (narrative line)
(full control #) 05 000003 (narrative line)
Repeating (full control #) 06 000001 (location)
group (full control #) 06 000002 (location)The NARS A-l Experience 171
X} - T>
cm —
« A C •
L CT
I c 3 0 c
1 gne
octa
e Cl c L
CT « c O 3 i o "O
o o < c
c e •* i
9U • -* •1 u - - 4> * X.
*n «
41
•Jl ) 4J o
> M . 4> HI u • •> c
r [ c t, D 3 O • c
rve
ona
oo X •P 1) L u in >
« t.
- O D
r
m 4J
: •* > t
* 0 <
E
m • * -
J ^ r
cer
fee
l- > 1
c o o c o : j t. C
o 4i i i
ti — 15 X o o 1
L T3 t- o « M o •• » ID172 The American Archivist—April 1979
Within each subset of a set the data oc- retyping whole series entries after each
cupies fixed fields of descriptive infor- change. (An archivist submits a draft,
mation. Most fields contain uncoded it is keyed into the system, and a print-
information. A limited amount of cod- out is returned to the archivist for re-
ing reduces the length of some fields; vision. Input operators key-in the
for example, a "2" might equal the changes; a batch update cycle replaces
phrase "still pictures." or deletes, or inserts data.) With this
The General Services Administra- method, the design and implementa-
tion (GSA) insisted that NARS obtain tion of which cost about $60,000,6 an-
computer services from GSA's Office alysts estimate that it will take the Ar-
of Data Systems. Furthermore, GSA's chives twenty years to input
systems analysts had to design systems approximately 200,000 series descrip-
more from the point of view of their tions of the previously described, the
machinery than according to the needs undescribed, and the newly acces-
of the system. Therefore, in order to sioned archival records, and be up-to-
avoid the limitations imposed on the date with the latter.
system by GSA and to accomplish as
Hardware
much of the A-1 system requirements
at the National Archives as possible, The most attractive aspect of com-
NARS decided to buy a mini-computer . puter assistance, from the point of view
to handle data input and to handle the of the A-l objective, is fast, accurate
relatively small statistical file on each data-entry (the process of typing infor-
record group. Because the mini-com- mation and creating machine-readable
puter is programmable, part of the data).7 About 60 percent of the entire
A-1 system is handled on-line. The Of- system's estimated cost consisted of
fice of the National Archives maintains data-entry. NARS looked for hardware,
statistical control of the record groups i.e., computer machinery, for an in-
(by quantity allocation to custodial house, data-entry facility to carry out
units) by means of the mini-computer; some limited programmable process-
batch processing of the entire inven- ing, especially data verification, edit-
tory file for the compilation of series ing, and reformatting. GSA's Office
descriptions, however, takes place at a of Procurement studied the problem
large computer installation controlled of whether to rent or to buy. On the
by GSA. basis of the A-l long-term needs, the
The systems analysts designed the office decided it would be cost-effec-
series description part of the system tive to buy.8 Over a twenty year pe-
around batched text editing and mag- riod, the annual cost of the in-house,
netic tape storage. Batched text editing mini-computer, data-entry equipment,
is designed to eliminate the need for plus maintenance, will be $10,350 per
6
This figure does not take into account all the GSA, Office of Data Systems, costs which could only
roughly be tabulated because GSA did not bill NARS directly for all costs.
7
Reliable sources of information on mini-computers may be found in Data-Entry Awareness Reports,
published monthly by Management Information Corporation, 140 Barclay Center, Cherry Hill, NJ
08034; and Datapro, special reports published by Datapro Research Corporation, 1805 Underwood
Boulevard, Delvan, NJ 08075.
8
NARS bought a Four Phase, Data IV/90 computer for $147,000. It included a 96K byte processor/
memory, 67.5 million byte direct access disk, 1600 BPI tape drive, 300 lines per minute printer, and
eight CRT/KEY stations, capable of displaying and processing both upper and lower case alphabeticThe NARS A-l Experience 173
year. Rental would have been $41,000 of a range, and for comparisons to
per year. stored values. Automatic field genera-
tion for repetitive and incremented
Software fields, automatic insertion of con-
stants, automatic duplication of data,
The A-1 system uses data-entry soft-
and a key-verify mode10 with immedi-
ware (a collection of computer pro-
ate, direct access correction are other
grams) that came with the hardware.9
required features. In summation, the
The software allows the input operator
hardware/software combination pro-
to see projected on a television-like
vides for quick and accurate data-en-
screen questions (prompts) which ask
try.
for specific information to be read off
the input form and typed into the
computer. The typed data appear on Data Entry Process
the screen and the software checks to A key-entry operator takes the
see if the data meets a set of criteria source document (GSA Form 6710A,
previously programmed into the ma- Change of Status Record) containing
chine. If the data fail the validation- archival descriptions of records, and
check, the machine sounds an alarm. keys the data into a television-like
With the machine in the "search" screen, field by field. Some fields are
mode, the operator makes the com- repetitive and do not require rekeying.
puter locate a particular record stored A format control program edits the
on disk by a unique identification code, data—right-justifies and zero-fills nu-
field, or logical search strategy, after meric fields—and restarts the display
which the operator inserts, corrects, for the next source document. The
changes, or deletes one character at a work-cycle consists of data-entry, veri-
time. Because several input operators fication, and batch input to the master
may be doing a variety of tasks, the data-base.
software/hardware combination has to With four input operators, the A-
allow for simultaneous data-entry into system averaged 6,000 finished series
different formats and into the same descriptions per year during its first
format, while background processing three years of production. There was
takes place and data is being trans- an average of ten lines per series con-
ferred to the tape-drive or printer. Au- sisting of an average of fifty keyed
tomatic editing is another desirable characters per line. Each series, with
feature—checking for alphabetic or identifying subrecord group titles and
numeric characters, checking for limits text, averaged 500 characters. Thus
characters. A short-term five year project would be better off with rented equipment. Due to rapid
changes in computer technology, especially in the mini-computer field, and the decreasing cost of
computer parts, it may be best to rent for two or three years and then obtain a new contract or a new
mini-computer.
9
Four Phase provided a wide variety of software including data-entry formatting, editing, and re-
formatting. It also provided COBOL programming, tape read-and-write instructions, and format in-
structions to the printer. Four Phase trained the data-entry operators and the supervisory computer
operator, and distributed manuals.
10
Key-verify mode consisted of a second typing. The input operator typed over the previously typed
data which was disguised by scrambled letters; and, when the keyed character disagreed, the machine
alerted the operator that an inconsistency existed. The operator would than sight verify the particular
character in question and correct the mistake.174 The American Archivist—April 1979
ten lines times fifty characters, times gram creates a new, updated, sequen-
six thousand series, equals three mil- tial file on tape for storage.
lion characters. The data on the correction/update
tape are replaced and/or merged into
Data Base Maintenance
the addressed sequential slots of the
The master machine-readable file of "father" data-base tape. A new, re-
record group inventories, stored as a vised tape called the "son" tape, in-
data base on magnetic tape at a large cluding the revisions and additions,
computer installation, receives a becomes the latest revision of the data-
monthly update of new and revised base. During the process, the com-
data in batch processing mode. As il- puter also produces printout reports
lustrated in Figure 5, the update trans- of the actions taken. The old data-base
actions cause the data-base tape to be tape remains unchanged as a "father"
read into temporary disk storage. The tape. Backup tapes go back to the
insertions, replacements, and deletions "grandfather" generation. A "great
take place on the disk. Then the pro- grandfather" tape is recycled into a
Figure 5, Flow Chart
"FATHER" , MASTER
I TAPE J
INPUT
TEMPORARY COMPUTER
DISC MANIPULATION
STORAGE
OUTPUT
NEW
"SON" MASTER PRINTOUT
TAPEThe NARS A-l Experience 175
new, current, master, data-base tape in the Archives building, however,
(the "son" tape). After an update eventually it may be possible to main-
cycle, a listing of valid and invalid tain the data in-house and to have on-
transactions instructs the input opera- line, control number access to each se-
tors to make corrections. For example, ries description.
a new line of narrative description
might be entered with a used subset
number. This transaction would be re- Output
jected as an invalid transaction, and
For the most part, output means
the operator would have to change the
printout pages and microfiche. The
subset number to an unused number.
batch mode computer programs at the
The transaction would have to be
GSA-controlled computer produce
reentered with the next update cycle.
printout pages or place print-page im-
This error could be avoided by the op-
ages on magnetic tape. The Archives
V erator's check of the current, master
personnel may either print the mag-
J\ «file. The master file, however, is so
netic tape on paper at the in-house
™ I large that printouts are impractical and
mini-computer or hire a service con-
^ 1 microfiche must be used instead. The
tractor to produce computer output
^ I example of the error above results
microfiche (COM). The entire file is
from misreading one of the thousands
placed into inventory format about
of lines of data identified by the forty-
every three months. Microfiche is made
nine character control-number; mis-
twice a year. Smaller reprints are run
reading one of the digits causes the
locally by the mini-computer, where a
mistake.
great deal of diversity of format is pos-
The A-l data base will grow in size sible. Together, the various forms of
to about one hundred million charac- output cost about $5,000 per year, in-
ters over the next twenty years. This cluding paper and microfiche.
will require the machine-readable file
to be segmented by record group. It
will become inefficient to keep the en-
tire file in one sequential series of tapes Personnel
which will have to be read into the An overwhelming part of the A-l
computer and put on disk. Instead, it system involves people. An in-house
will be better to have a separate tape task force of program analysts carried
for each record group or group of rec- out the initial study. The system re-
ord groups. Segmentation of the file ceived a wide range of participation
will require some program changes. and input from archivists throughout
This will have to be done by GSA pro- the National Archives. Several system
grammers, however, and will consti- analysts worked on the system design.
tute an extra cost and some trial and Computer programmers and opera-
error programming and testing; this tors wrote programs for the GSA-con-
will result in some "down time." Fur- trolled computer. At the Archives, four
thermore, the GSA programmers will key-entry operators and one supervi-
have to make constant adjustments to sor are carrying out the task of coding
the programs after each hardware en- and keying-in the data.
hancement and after each new soft- After three years experience build-
ware release. With the development of ing the machine-readable file of series
the mini-computer base of operations descriptions, the rate of production176 The American Archivist—April 1979
appears to be very slow.11 An average Total Operating Costs
rate of 375 characters in final form per
1 hour is based on the production of six
thousand series descriptions per year
The Archives personnel cost of
$44,500 annually amounted to nearly
60 percent of the total annual costs.
by four input operators during a two
The cost of a series entry was com-
thousand hour work-year. The rate of
puted by dividing the six thousand se-
production is low because the data are
ries placed into the system each year by
keyed twice (once entered and once
the total annual cost, producing $12.39
verified); furthermore, the update
per series. Based on an average of 500
cycle requires re-input of invalid trans-
characters per series, each final char-
actions each time a line of data fails to
acter costs the government almost two
go on the machine-readable file. The
and a half cents ($.0248). Each char-
A-l data-base is constantly being
acter was keyed at least twice (entry
changed; some series descriptions are
and verify) and some three or four
revised several times a year. Finally,
times for revision and correction.
personnel turnover and training also
slow the rate of production, as does
computer down-time. Comparison of the A-l Experience
with the A-l Study Projections
The A-l personnel costs were
$44,500 per year, broken down as fol- The A-1 experience costs were fairly
lows: $8,000 per entry operator, times close to the A-l study projected costs.12
four operators, equaling $32,000 per The total estimated costs, though
year. The supervisor cost $12,500 per aligned somewhat differently from ac-
year. tual costs, are shown in Figure 7.
Figure 6, Annual Cost
System Study* $ 3,500
System Design and Implementation* 3,000
Hardware/Software* 10,350
Data Entry Personnel 44,500
Data-base Maintenance 8,000
Output 5,000
Total Annual Cost $74,350
*Total cost distributed over a twenty-year period.
11
A formula for estimating time and personnel needed to accomplish a comparable job would be:
number of lines per average entry (a) times number of characters to be keyed in each line (b) times
number of entries (c) divided by the constant 750,000 equals the number of man-years required (d) to
produce a finished product, ( a x b x c v 750,000 = d). The constant 750,000 was derived by dividing
the three million characters of the six thousand series descriptions by four input operators.
12
Weiher, vol. 4, chapter 5.The NARS A-l Experience 177
Figure 7, Estimated Costs
TOTAL ANNUAL
System Study $ 70,000 $ 3,500
Program development 103,000 5,150
Data Entry 868,000 43,400
Data base maintenance 8c output 127,725 6,386
$1,168,725 $58,436
(or $9.75
per series)
There was no allowance in the pro- puter assistance, the overall cost of the
jection for data-entry hardware. There system is high. If a fully automated sys-
was mention of an unspecified amount tem with on-line retrieval by index
for "other costs," which covered the terms had been implemented, the cost
projected cost of some sort of type- would have been excessively high, and
writer to produce machine-readable the production rate so slow that it
data. The cost of data-entry equip- would have taken sixty years to catch
ment left out of the study may acount up. The A-l system is a fair warning,
for the difference of $2.65 per series therefore, that automation of archival
between the study and the actual im- finding aids must be approached care-
plementation cost of the A-l system. fully. The size of the data-base and the
cost per keystroke must be calculated
in advance to see if there is enough
Conclusion time, enough people, and enough
Though the automated part of the money to convert the finding aid in-
NARS A-l system is limited to com- formation into machine-readable form.You can also read