Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive

Page created by Clara Simmons
 
CONTINUE READING
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
Digital Data Delights:
50 years of bits and bytes

Hersh Mann and Louise Corti
UK Data Archive
University of Essex

50th Anniversary
University of Essex
13 September 2014
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
Overview
•   What are Socioeconomic data?
•   Why the need for an archive?
•   Why the University of Essex?
•   The Archive through the decades
•   Types of data and media over time
•   What types of data collections does the Archive hold?
•   The evolution of our services as technology changes
•   New data landscapes
•   The future of the UK Data Archive?
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
What do we mean by ‘data’?

• Quantitative
   •   Statistics
   •   Census data
   •   Survey microdata
   •   Macrodata               We are now seeing the
                               emergence of new forms and
                               sources of data e.g.
• Qualitative                  adminstrative data, big data
   •   Historical documents
   •   Diaries
   •   Interview transcripts
   •   Field notes
   •   Audio recordings
   •   Photographs and video
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
Survey microdata
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
Survey microdata

 Percentage of party supporters who believe large
 numbers of people falsely claim benefits

                                 British Social Attitudes
                                 Survey 2010
                                 (weighted data)
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
Trends in domestic burglary, 1981-2011/12
   Crime Survey for England and Wales

Figure 8 from ‘Crime in England and Wales Quarterly First Release, March 2012’ www.ons.gov.uk
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
Religion

Source: UK Census
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
Qualitative data

 http://discover.ukdataservice.ac.uk/QualiBank/?f=CollectionTitle_The%20Edwardians
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
The types of data that underlie these
  outputs need to be preserved and
made available for secondary analysis.

                So…
Digital Data Delights: 50 years of bits and bytes - Hersh Mann and Louise Corti UK Data Archive
Planning for an archive

• The Social and Economic Archive Committee (SEAC)
  was established in 1963 to tackle the problem of data
  being ‘lost’ to British researchers because poor
  communication was leading academics to replicate work
  that was costly and time-consuming
• SEAC was hosted by Political and Economic and
  Planning (PEP)
• Funding was received from the London School of
  Economics (LSE) and the new Social Science Research
  Council (SSRC)
• SEAC was well supported and well connected and
  aimed to establish an archive for social research
The contenders
• Three proposals were submitted
   •   University of Essex
   •   PEP
   •   SSRC
   •   (Strathclyde had been considered a candidate but did not submit
       a proposal)
• The PEP plan was not worked out in detail and mostly
  relied on the argument that such a national resource
  should be located in the capital
• The SSRC bid was more interesting and argued that the
  data should be preserved by the funders because they
  are better placed than a university to obtain data and
  would operate ‘neutrally’ across the sector
Let’s gang up on Essex
• The drawbacks of the SSRC bid were the higher costs of
  locating in London and their lack of computing facilities
• To counter this problem the leaders of the SSRC bid
  teamed up with Claus Moser at the LSE who was highly
  opposed to the new archive being housed at Essex
• They failed
• SEAC chose Essex
   •   it could be established quickly
   •   provided value for money
   •   had the office space
   •   had the computing facilities
The only way is Essex
• Essex was invited to submit an application
• Allen Potter (Head of the Department of Government)
  was named as the Principal Applicant
    •   £33,000 over 5 years
    •   £4,500 p.a. for staff costs
    •   £500 for travel
    •   £1,000 p.a. for magnetic tapes
• The SSRC Data Bank was set up at the University of
  Essex in 1967

"Data [are] deposited with the Bank on a wide range of topics including such
intriguing questions as 'Did you go on a school visit to a coal mine?'"
                                                      Wyvern, 16 February 1968
1960s - The Data Bank

“The creation of Britain’s first memory bank on the computer at the University of
Essex is a tremendous feather in the cap of the University…Hitherto the fame of
the town of Colchester has rested upon the past, specifically it’s Roman
background. From now on it will rest equally, if not more so, on the University”
                                             Colchester Gazette, 7 February 1968
1970s – The Survey Archive
• The 1970s was a period of growth in empirical social science
  research. By the mid-1970s approximately £50 million per
  annum was being spent on social research in Britain, half in
  universities and the rest in ‘in-house’ government research
  and independent research units.

• In its early years the Data Bank experienced difficulties in
  populating its collection due to:
   • an immature culture of data sharing
   • the high standards it required from deposits
   • restrictions on use attached to certain studies particularly
     government surveys
• The turning-point came in the early 1970s when the
  Government Statistical Service enabled government surveys
  to pass to the Survey Archive, as the Data Bank had been
  renamed in 1972.
1970s - The Survey Archive
1980s - The SSRC/ESRC Data Archive
•   The Survey Archive was renamed the SSRC Data Archive in 1982 to reflect the
    broader range of data resources being collected and stored

•   At this time the work of the SSRC was reviewed by the Government the Rothschild
    Report supported a stronger focus on empirical research and research considered
    to be of ‘public concern’. This led to the SSRC being renamed, becoming the
    Economic and Social Research Council (ESRC) in January 1984. This resulted in a
    second name change to the Archive in two years – ESRC Data Archive!

•   Whilst the 1980s could be seen to be a low point for the social sciences, in
    retrospect, pressures on funding had both negative and positive impacts on the
    Archive. Less was spent on primary data collection, yet this in turn encouraged
    increased secondary use of research data and a greater acceptance of data
    sharing.

•   The 1980s also saw the Archive branch out through its involvement in a number of
    large co-operative data-orientated projects – key of which were the Domesday
    Project and the Rural Areas Database. This set a trend which has continued up to
    the present.
1980s - The SSRC/ESRC Data Archive
1990s - The Data Archive

• The 1993 White Paper on Science and Technology led to an
  emphasis on wealth creation and the need to establish closer
  and deeper partnership between the academics and users of
  its research. In line with this, the 1990s witnessed an
  extension of our activity.

• In 1992 the History Data Unit was formed as a specialist unit
  within the Archive, becoming part of the Arts and Humanities
  Data Service (AHDS)

• In 1996 direct funding from the Joint Information Systems
  Committee (Jisc) was received in recognition of the support
  provided by the Archive for teaching and learning. This led to
  a new name and a logo that complemented that of the
  University.
1990s - The Data Archive
2000s - UK Data Archive

• To reflect both its UK-wide remit and the importance of its role within the
  international data network, we became the UK Data Archive (UKDA)
• New initiative in the form of the Economic and Social Data Service (ESDS)
  which came into operation in 2003 to include the Archive and Institute for
  Social and Economic Research (ISER) at Essex, and the Cathie Marsh Centre
  for Census and Survey Research (CCSR) and Manchester Information and
  Associated Services (MIMAS) both located at Manchester.
• In recognition of its position in disseminating and preserving an increasingly
  diverse collection of government data, from 1 January 2005 the UKDA became
  a designated Place of Deposit for public records for The National Archives
  (TNA), thus making the deposit of materials a legal requirement for the first
  time, and thereby ensuring the supply of key social surveys for future
  research.
• In 2007, the 40th anniversary year, together with ISER, the UKDA moved into a
  new purpose-built social science research centre
2000s - UK Data Archive
2010s - UK Data Archive

•   New look
•   New services
•   Launch of the UK Data Service
•   Big data network
Our Directors over time
The colours, buildings,
computing media and hair styles
change dramatically…
We have gone from this…
…to this, as we adapt to technological
changes and embrace the digital age…
Digital age

  As technology advances,
so must we. The history of the
Archive is tied to the history of
          computing
Inside our ‘data factory’ over time
     the process has remained pretty
             much the same!

“The greatest misconception about survey archives is the belief…that
when data…arrive… their transfer is complete”
                                                     Allen Potter
A united UK Data Service?
• a comprehensive resource funded by the
  Economic and Social Research Council
  (ESRC)
• a single point of access to a wide range
  of secondary social science data
• support, training and guidance
  throughout the data life cycle
• listen to our recorded webinars at
  http://ukdataservice.ac.uk/news-and-
  events/videos.aspx
UK Data Service

                      Integrates ESDS, Survey
                      Question Bank and
                      Census.ac.uk

ukdataservice.ac.uk
What does the UK Data Service do?
• put together a collection of the most valuable data and
  enhance these over time
• preserve data in the long term for future research
  purposes
• make the data and documentation available for reuse
• provide data management advice for data creators
• provide support for users of the service
• information about how data are used
• easy access through website
Who is it for?

• academic researchers and students
• government analysts
• charities and foundations
• business consultants
• independent research centres
• think tanks
• citizen scientists, where skills enable analysis
Our data portfolio

  UK Surveys       Longitudinal        International
  Large-scale      Major UK            Multi-nation
  government       surveys following   aggregate
  funded surveys   individuals over    databanks and
                   time                survey data

  Census           Business            Qualitative
  Census data      Microdata and       Range of
  1971 – 2011      administrative      multimedia
                   data                qualitative data
                                       sources
How many data collections are there in
  the UK Data Service catalogue?

              A. 4,800
              B. 5,200
              C. 5,700
              D. 6,200
              E. 7,300
UK survey series
• high quality repeated cross-sectional surveys
• Individual or household level data
• cover many topics including health, work, crime, social
  attitudes, family expenditure, living costs, housing etc.

• Labour Force Survey
• Crime Surveys
• Health Survey for England
• British Social Attitudes
• Annual Population Survey
….
Longitudinal studies

• British Household Panel Survey and Understanding
  Society
• 1958, 1970, 2000-01 Birth Cohorts
• English Longitudinal Study of Ageing
• Families and Children Study
• Growing Up in Scotland
• Longitudinal Study of Young People in England
International macrodata

• time series data aggregated to
  country/region
• International governmental
  organisations (IMF, OECD, IEA, World
  Bank)
• wide range of socio-economic topics
• regularly updated
• currently limited to UK HE/FE
  institutions
• World Bank data are open access
Trade value, US$ thousands

                                           G

                                                           1000
                                                           2000
                                                           3000
                                                           4000
                                                           5000
                                                           6000
                                                           7000
                                                           8000
                                                           9000

                                                              0
                                            re
                                       R         ec
                                           om       e

Graph: Celia Russell
                                                an
                                                   i   a
                                           Tu
                                                rk
                                                   e
                                         Po y
                                             la
                                      B nd
                                          el
                                             gi
                                     H um
                                        u
                                  C nga
                                    ze
                                       ch ry
                                             Re
                                    In
                       B               d  on
                                                 p.
                         os
                            ni                e
                                     L i si a
                              a         t
                                He hu
                                    rz ani
                                      eg           a
                                           ov
                                                                                                       .stat: UN COMTRADE, 2008
                                                                                French snail imports

                                               in
                                        C a
                                           yp
                                               r
                                      B us
                                         ul
                                             ga
                                                r ia
                                 M
                                   ad Ita
                                      ag ly
                                           as
                           U                   ca
                             ni                    r
                                te
                                   d        Sy
                                     K           r
                                       in ia
                                           gd
                                               om
UK census data
•   1971-2011 census data
•   baseline for other statistics
•   detailed combinations of characteristics
•   small geographies
•   Census outputs
    •   aggregate data
    •   boundary data
    •   flow data
    •   microdata
• aggregate data is open access
• some restricted to UK HE/FE
Qualitative data
Qualitative data in a number of different formats: interview
transcripts, visual data, focus groups, essays, diaries, online
data, observation notes, documents, audio data, open-
ended survey questions, case notes etc.
Examples of sociology data collections:
 • Family Life and Work Experience before 1918, Middle and Upper
   Class Families in the Early 20th Century, 1870-1977 (SN 5404)
 • Gender Difference, Anxiety and the Fear of Crime, 1995 (SN 4581)
 • Mothers Alone: Poverty and the Fatherless Family, 1955-1966 (SN
   5072)
 • Affluent Worker in the Class Structure, 1961-1962 (SN 6512)
QualiBank
Another example of qualitative data
Ray Pahl, SN 4867: School Leavers Study, 1978

Teachers at a comprehensive school on the Isle of Sheppey
were asked to set a particular essay to those pupils who were
students in English lessons about ten days before they were due
to leave school. The students were asked to imagine that they
were nearing the end of their life, and that something had made
them think back to the time when they left school. They were
then asked to write an imaginary account of their life over the
next 30 or 40 years.

The resulting data: 141 handwritten essays in 1978 by school
leavers aged 15 and 16 years old. These can be browsed online.
Links with other data archives worldwide
Some statistics about our Service

    • 6,000 datasets in the collection

    • 400 new datasets and new editions added
      each year

    • 23,000 registered users

    • 60,000 downloads worldwide per annum

    • 4000+ user support queries per annum
What do our users do with the data ?

• Comparative research, restudy or follow-up study
• Re-analysis/secondary analysis
• Research design and methodological advancement
• Replication of published statistics
• Teaching and learning
Expert advice on creating high quality data
• We have supported research funder data policies since
  1995

• We advise and support grant applicants and award
  holders

• We write guidance for applicants and Data Management
  Planning (DMP) reviewers

• We provide detailed training

• We have published the first researcher-oriented textbook
  on this topic
Our managing and sharing data resources
•   Online ukdataservice.ac.uk/manage-data
•   Managing and Sharing Research Data – a Guide to Good Practice:
    www.uk.sagepub.com/books/9781446267264 (SAGE Publications)
•   Training programme
On our horizon

• More data that can be linked to our more traditional
  data
• Big data
• Cloud computing
• Mobile access to data
• Access to powerful data through safe settings
What do we mean by ‘big data’?
 Big data is a buzzword used to describe
 massive volumes of data - structured
 and unstructured - within organisations
 that is so large and moves so quickly
 that it exceeds current processing
 capabilities

 Big data have the potential to help
 society improve operations and allows
 us to make faster, more intelligent
 decisions

Nicole Miskelly, bobsguide, 8 August 2014
http://www.bobsguide.com/guide/news/2014/Aug/8/is-big-data-the-new-normal.html
Big data – the three V’s

High-volume
  • Transaction-based data stored over the years
  • Unstructured data streaming from social media channels
  • Huge amounts of sensor and machine-to machine data
Big data – the three V’s

High-velocity
• Data are streaming at high speeds and needs to be processed
  quickly
• This is a challenge for many organisations
Big data – the three V’s

High-variety
• Data come in structured and unstructured formats
• Numeric data in traditional databases are usually structured
• Text documents, email, video, audio and financial transactions are
  all unstructured
• Hard to govern, merge and manage these different varieties
    •   Formats
    •   Licensing
    •   Dissemination
Three significant changes in big data

• Lower costs
• Cloud storage
• Technological advancement of
   open-source software

“The cost to store a gigabyte of data is ten times cheaper than it was
ten years ago. Open-source tools also now allow users to use
commodity software and link together inexpensive computers instead of
having to buy one big expensive server and Cloud Computing has
enabled companies to borrow servers rather than having to buy and
maintain them, which means they can just pay for what they use and
then give them back.”

                Karl Rieder (Executive Consultant, GFT UK Limited)
How can big data help?

• Can provide organisations will the ability to harness
  relevant data and analyse it to find answers

• Examples in business might be:

    Optimisation of processes
    Reduce costs through efficiency
    New product development
    Smarter business decision making
Will it be possible to share data
collected by your iPhone?
What is our future?

•   New forms of data are emerging
•   Technology watch
•   Collaboration
•   Enable safe and trusted access to data

• We have much more computing power in our pockets
  today than the University had when it was founded. What
  will the picture be like when the University is 100? What
  types of data will the UK Data Archive have in 2064?
UK Data Archive Media Exhibition
Keep connected
• Subscribe to UK Data Service list:
  www.jiscmail.ac.uk/cgi-
  bin/webadmin?A0=UKDATASERVICE

• Follow UK Data Archive on Twitter: @UKDataArchive
• Follow UK Data Service on Twitter: @UKDataService

• Facebook: https://www.facebook.com/UKDataService

• Youtube: www.youtube.com/user/UKDATASERVICE
Acknowledgements

• Many thanks to Andrew Harrison, Maths@Essex for his
  slides on new technologies
Contact

UK Data Archive
http://www.data-archive.ac.uk/contact

UK Data Service
http://ukdataservice.ac.uk/help/get-in-touch.aspx
Questions?
You can also read