ISYS1055/1057 Database Concepts - School of Science - Inventorium

Page created by Kim Daniels
 
CONTINUE READING
School of Science

ISYS1055/1057 Database Concepts
Assignment 3

        Assessment Type: Individual assignment; no group work. Submit online via Canvas→Assignments→Assignment 3. Marks are
        awarded for meeting requirements as closely as possible. Clarifications/updates may be made via announcements/relevant
        discussion forums.
        Due date: Tuesday 09 June 2020, 23:59. Please check Canvas→Syllabus or via
        Canvas→Assignments→Assignment 3 for the most up to date information.
        As this is a major assignment in which you demonstrate your understanding, a late penalty of 10% of full
        available marks per day or part day applies for up to 5 days late. After 5 days, 0 marks will be awarded.
        Weighting: 30 marks

1. Overview
Database systems are a key technology for the storage, management, manipulation, and retrieval of structured data. In this
assignment you will apply the skills and concepts that you have learned about database systems in the course so far to analyse data,
and then write a report based on your findings.

2. Assessment Criteria
This assessment will determine your ability to:
    1. Follow coding, convention and behavioural requirements provided in this document and in the lessons.
    2. Independently solve problems by using database concepts taught in the course.
    3. Understand the relational model.
    4. Write and understand SQL queries.
    5. Meet deadlines.
Seek clarification from your instructor, when needed, via discussion forums.

This assignment is worth thirty points in total, which accounts for 30% of the overall assessment for the course. The
revised assessment components and weights for the course are:
      Assignment 1                 Assignment 2                      Assignment 3
      20%                          50%                               30%

3. Learning Outcomes

This assessment is relevant to the following Course Learning Outcomes:
    •   CLO 1: Describe various data modelling and database system technologies.
    •   CLO 2: Explain the main concepts for data modelling and characteristics of database systems.
    •   CLO 3: Identify issues with and compare, justify relational database design using the functional dependency
        concepts.
    •   CLO 4: Apply SQL as a programming language to define database schemas and update database contents.
    •   CLO 5: Apply SQL as a programming language to extract data from databases for specific users’ information
        needs

It also supports the following Graduate Learning Outcomes:

    •   Enabling Knowledge: You will gain skills as you apply data modelling knowledge effectively in diverse contexts.
    •   Critical Analysis: Analyse and model requirements and data to understand underlying issues.

                                                                                                                         Page 1 of 9
•   Problem solving: Design and implement database solutions that accommodate specified requirements and
        constraints, based on analysis or modelling or requirements specification.

4. Submission
Submit your assignment via Canvas→Assignments→Assignment 3. Your submission must be a single .pdf file, with the
filename being your student number (e.g., S1234567.pdf) that contains the following sections:
     1. A report that presents your analysis and findings. Before submitting, check carefully that your report follows all
        of the formatting requirements, and includes all of the section and subsection headings, as detailed in Section 8
        below.
     2. An appendix that contains the SQL code that you developed to analyse the provided data. The SQL must be able
        to be run on the RMIT Oracle database server using the SQL Developer interface without producing errors (e.g.
        if one was to enter them exactly as presented in your appendix).

●   It is your responsibility to correctly submit your file. Please verify that your submission is correctly submitted by
    downloading what you have submitted to see if your .pdf file includes the correct content.
●   Never leave submission to the last minute -- you may have difficulty uploading files.
●   You can submit multiple times – a new submission will override any earlier submissions. However, if your final
    submission is after the due time, late penalties apply.
●   If unexpected circumstances affect your ability to complete the assignment, you can apply for special consideration.
    An outcome of special consideration may be an equivalent assessment, assessing the same knowledge and skills of
    the assignment (time to be arranged by the course coordinator).
●   More information on special consideration is available at
    https://www.rmit.edu.au/students/student-essentials/assessment-and-exams/assessment/special-consideration

5. Academic integrity and plagiarism (standard warning)
Academic integrity is about honest presentation of your academic work. It means acknowledging the work of others
while developing your own insights, knowledge and ideas. You should take extreme care that you have:
    • Acknowledged words, data, diagrams, models, frameworks and/or ideas of others you have quoted (i.e. directly copied),
        summarised, paraphrased, discussed or mentioned in your assessment through the appropriate referencing methods,
    • Provided a reference list of the publication details so your reader can locate the source if necessary. This includes material
        taken from Internet sites.
If you do not acknowledge the sources of your material, you may be accused of plagiarism because you have passed off
the work and ideas of another person without appropriate referencing, as if they were your own.
RMIT University treats plagiarism as a very serious offence constituting misconduct. Plagiarism covers a variety of
inappropriate behaviours, including:
    • Failure to properly document a source
    • Copyright material from the internet or databases
    • Collusion between students
For further information on our policies and procedures, please refer to the University website.

6. Assessment declaration
When you submit work electronically, you agree to the assessment declaration.

7. Rubric/assessment criteria for marking
The detailed rubric and assessment criteria are available online via Canvas→Assignments→Assignment 3.

                                                                                                                           Page 2 of 9
8. Assignment Questions

Assignment Overview
For this assignment, you will be applying your SQL skills to analyse research data, and write a report that
details your investigations into the question of whether particular variables such as class size and the
perceived attractiveness of teaching staff influence course evaluations.

As part of this assignment, you are likely to need to carry out some research and refer to additional
information beyond what was covered in the course. This is an important skill. Keep a note of any external
references that you use, as these will need to be detailed in your report.

Analysing Variables That Influence University Course Evaluations
At most universities, teaching is evaluated through a process whereby students complete course experience
surveys, rating courses in response to questions regarding the content, clarity of material, presentation, and
other factors. These questions are typically distilled into a single score that is supposed to reflect overall
teaching quality. In most Australian universities including RMIT, this is the Good Teaching Score (or GTS).

Prior research has indicated that many factors can influence student feedback, and these may include things
that are directly asked as part of the surveys (Were the teaching staff good at explaining things? Did the staff
work hard to make the course interesting?) and other factors that are not explicitly asked (Was the lecture
room too crowded and noisy? Did an unexpected event occur part-way through the semester that required
fundamental changes in teaching delivery? Are the teaching staff attractive?).

Daniel Hamermesh and Amy Parker, two researchers form the USA, collected data to investigate the question
of whether teaching evaluations are influenced by the attractiveness of teaching staff [1]. In this assignment,
you will be analysing their data to carry out a preliminary investigation into answering this question. The data
was collected at the University of Texas at Austin, USA, and includes information about 455 courses, taught by
teaching staff in various departments (note that some staff taught multiple courses included in the data set).
Courses were of various sizes in terms of the number of enrolled students. Each course was evaluated using
student surveys, with responses to the question “Overall, this course was…?” being collected on a 5-level
ordinal scale with a minimum score of (1) “very unsatisfactory” and a maximum score of (5) “excellent”.
Information was obtained on each faculty member, based on characteristics including their gender, whether
they are on a tenure track (roughly speaking, working towards being offered a permanent position at their
university), whether they are part of a minority group, and whether they received their education in an
English-speaking country.

Separately, a picture of each teaching staff member was rated by 6 undergraduate students. Hamermesh and
Parker describe the rating process as follows: “The raters were told to use a 1 (lowest) to 10 (highest) rating
scale, to concentrate on the physiognomy of the instructor in the picture, to make their ratings independent
of age, and to keep 5 in mind as an average” [1]. The ratings – subsequently referred to as “beauty” scores –
were then normalised to have a mean score of zero. (This means that someone with a rating greater than zero
was judged to be more “beautiful” than the average, while someone with a negative score was judged to be
less “beautiful” than the average”.)

                                                                                                         Page 3 of 9
Data File
The raw research data that you need to analyse is in the file profEvaluations.csv, available from the Course
Canvas as part of the Assignment 3 specification.

The file is in comma-separated value or “CSV” format. This is a format for representing table data, where each
row of the file corresponds to a single record (row, or tuple); and the individual data items (attributes/cells)
are separated by the comma (“,”) symbol. The first row gives the column headings (schema). To explore the
file, you can open it in a text editor, or in a spreadsheet program (e.g. MS Excel, Numbers).

Notice that each row of the original file corresponds to observations about a single course, and includes details
such as number of students, and course evaluation score. It also includes information about the teaching staff
member who taught the course, including a staffid, their age, and their educational background. Notice that a
particular teaching staff member can teach more than one course – that is, their individual information may
be repeated for each course that they teach.

The meaning of the variables is explained in the following table. Each variable can be for courses (C), or
teaching staff (T), indicated in the third column.

           Variable         Description                                                            C/T
           id               Course identifier. Each row gives the data for a particular            C
                            course instance.
           staffid          Identifier for a particular member of teaching staff. One staff        T
                            member can teach multiple courses in the dataset.
           age              The age of the staff member.                                           T
           gender           Gander of staff member: female (f), male (m).                          T
           tenuretrack      Whether the staff member is on the tenure track (working               T
                            towards a permanent position): yes (1), no (0).
           nonenglish       Did the staff member complete their undergraduate                      T
                            education in a non-English speaking country: yes (1), no (0).
           beauty           Rating of the staff member’s appearance in a photo, averaged           T
                            across responses by 6 undergraduate students, and
                            normalised to have a mean score of zero.
           students         Total number of students in course                                     C
           division         Is the course lower or upper division: lower (L) [usually first-       C
                            or second-year courses], upper (U) [usually third- or fourth-
                            year courses].
           courseevaluation Mean student course evaluation score on a scale from 1                 C
                            (lowest) to 5 (highest).

Note: The data you will be using is a subset of the original data collected by Hamermesh and Parker.
Therefore, your results will not be identical to those reported in their paper. The identifier variables (id and
staffid) are not necessarily a contiguous sequence. We thank Daniel Hamermesh for supplying the original
data.

                                                                                                          Page 4 of 9
Data Preparation
Your first task is to load the raw CSV data into the Oracle database, so that you can analyse it.
   • You need to design the relation schemas for two appropriate tables (to reflect that the data is at two
        levels of granularity). Note that there is data redundancy in the provided (starting) CSV data file.
   • You can use the “import data” function from SQL Developer to import data from .CSV files to tables.
        The following links provide some helpful suggestions.
             o https://docs.oracle.com/database/121/ADMQS/GUID-7068681A-DC4C-4E09-AC95-
                 6A5590203818.htm#ADMQS0826
             o https://www.thatjeffsmith.com/archive/2012/04/how-to-import-from-excel-to-oracle-with-
                 sql-developer/comment-page-5/
Confirm that you have loaded the full research data into your database by comparing the number of rows in
your database tables with the number of rows that you would expect, based on your decomposition of the
source file.

Analysis
Now that the data is loaded into a database, you can begin to analyse it. The broad goal is to investigate the
effect that different variables such as age, gender, and beauty, have on course evaluation scores.

In the following subsections, you will be asked to carry out numerical analysis of a particular variable or
variables. For each, you should format your numerical results and present them in a table in your report. You
should also briefly comment on your findings, explaining what the numbers show about the variable(s) in
question. This commentary should be brief, one or two sentences at most for each specific analysis below.

For each analysis, you should consider carefully whether it is at the course level, or at the teaching staff level.
If the analysis is at the teaching staff level, each data point for a staff member may only be included once. If
the analysis is at the course level, data points must be included for each course; this also applies if the analysis
uses both course and teaching staff variables (unless noted otherwise below).

Note that the table layouts as shown in the following subsections indicate the formatting that should be used
in your report document for presentation. You should write SQL queries to obtain data to complete the tables.
Your queries do not need to generate tables in the exact format given, and you may sometimes need to write
several SQL queries to complete one analysis table.

Course Sizes – Number of Students
Calculate the minimum, mean and maximum number of students in a course. Present the results in your
report, in a table similar to the following:

                                    Minimum               Mean                Maximum
          Number of students

Course Sizes – Course Evaluation Score

Analyse the minimum, mean, and maximum course evaluation score for groups of courses, binned into size
groups of 18 or less, 19—28, 29—60, 61 or more. (For example, a course size group of 19—28 includes all
those courses that had from 19-28 students enrolled, inclusive).

                                                                                                           Page 5 of 9
Course size             18 or less         19-28           29-60              61 or more
           Number of courses in
           group
           Minimum course
           evaluation score
           Mean course
           evaluation score
           Maximum course
           evaluation score

Division
Analyse minimum, mean, and maximum course evaluation score by division (course level).

                                   No. courses in    Minimum         Mean                Maximum
                                   group
           Upper division
           Lower division

Gender – Course Evaluation Score
Analyse minimum, mean and maximum course evaluation score by gender.

                                  No. courses in     Minimum         Mean                Maximum
                                  group
           Female
           Male

Gender – Beauty
Analyse minimum, mean and maximum beauty by gender.

                                  No. academics in   Minimum         Mean                Maximum
                                  group
           Female
           Male

Tenure track
Analyse minimum, mean and maximum course evaluation by tenure track status.

                                  No. academics in Minimum           Mean                Maximum
                                  group
           Tenure track
           Not tenure track

                                                                                                 Page 6 of 9
Education Background
Analyse minimum, mean and maximum course evaluation by education background.

                                    No. academics in     Minimum           Mean              Maximum
                                    group
         English education
         Non-English education

Interactions between Tenure Track, Gender and Education Background
Analyse course evaluation by gender, tenure track, and education background. Present the results in your
report, in a table similar to the following:

         Tenure track             Gender               Education         No. academics in     Mean
                                                                         group
         Tenure track             Female               English
         Tenure track             Female               Non-English
         Tenure track             Male                 English
         Tenure track             Male                 Non-English
         Not tenure track         Female               English
         Not tenure track         Female               Non-English
         Not tenure track         Male                 English
         Not tenure track         Male                 Non-English

Correlation Analysis
Age, course size, beauty and course evaluation score are variables that take on many different values
(continuous), rather than defining groups (categorical). Therefore, it is useful to analyse the correlation
between these variables. Correlation is a measure of association and gives a numerical value to quantify the
degree of relationship between two variables. Here, you should use the Spearman rank correlation, which
compares the rank ordering of two variables and the extent to which these agree.

Oracle has a built-in Spearman rank correlation aggregate function, CORR_S (Important: NOT the CORR_K
function, or the CORR function):
https://docs.oracle.com/cd/B28359_01/server.111/b28286/functions029.htm#SQLRF06314
There are two return values of interest that you need to consider from the CORR_S function:
    • COEFFICIENT
    • TWO_SIDED_SIG

Broadly speaking, COEFFICIENT measures the strength of the association between two variables, in a range
from +1 to -1:
   • A high positive value indicates that the observations for both variables have a similar rank (i.e. when
       one variable is high, the other also tends to be high)
   • A high negative value indicates that the observations for both variables have an inverse rank (i.e. when
       one variable is high, the other tends to be low)
   • A value close to zero indicates that there is not a strong relationship (i.e. when one variable is high,
       that doesn’t tell us much about the other variable)

                                                                                                      Page 7 of 9
TWO_SIDED_SIG is a value returned by a statistical significance test that seeks to establish how confident we
can be that our observed correlation coefficient value is different from zero (indicating no correlation, or no
relationship between the variables). In other words, based on our data, does it seem reasonable to infer that
the correlation coefficient value is showing an actual relationship, beyond random variation or noise in the
data?
    • As a broad rule of thumb, if this value is smaller than 0.05, you can conclude with some confidence
        that the relationship indicated by the Spearman rank correlation coefficient is not just due to chance.

Returning to the data of interest, calculate the Spearman rank correlation between the four pairs of variables,
as follows:
    • Course evaluation score and course size
    • Staff age and beauty
    • Staff age and mean course evaluation score
    • Staff beauty and mean course evaluation score

Note that the first analysis is at the course level, whereas the remaining three should be carried out at the
staff level. In particular, this means that for the third and fourth correlations, you should first calculate the
mean course evaluation scores for each staff member, by aggregating the course level data. (Hint: you can
create Views to complete analysis in several steps if needed.) Report your analysis results in a table similar to
the following:

    Variables                                  Correlation Coefficient        Two-sided Significance
    Course evaluation score & course size
    Staff age & beauty
    Staff age and mean course evaluation
    score
    Staff beauty and mean course
    evaluation score

Report
Report Structure
You should aim to make your report as clear and readable as possible. This involves using clear language that
is easy to follow and understand. Your argument should be structured to effectively convey your message.
Include sufficient detail, but avoid adding irrelevant material.

Your report must include the following headings and sub-headings. Details of mark allocation are described in
the marking rubric document.
    • Data Preparation (5 points)
          o A brief opening paragraph that describes the analysis to be undertaken (what is the high-level
              question being studied?).
          o Give the relation schemas (table definitions) with primary keys and any foreign keys annotated,
              and the sizes of tables.
          o Description of the steps for importing data from the .CSV data file into tables in SQL Developer.
          o You must submit all your SQL CREATE TABLE statements as an appendix.

                                                                                                         Page 8 of 9
•   Analysis (22 points)
              o Include a sub-heading corresponding to each of the previously listed analysis sub-sections.
              o For each, include your numerical results, following the given table layouts and a brief (1 to 2
                 sentences maximum) explanation of what the numbers show about the variable(s) being
                 analysed
              o You must submit all your SQL queries, including the queries you used to carry out your analysis
                 and obtain your reported results in the appendix.
      •   Discussion and Conclusions (3 points)
              o Write a paragraph that brings together your overall findings.
              o Write a paragraph that briefly mentions any limitations of the analysis

Report Formatting
      •   Use at least 11-point font.
      •   Your final report, including tables and references, may be at most 4 pages (A4 page size) in length.
          Any material beyond this will be removed and not considered when assessing your submission. The
          appendix of SQL code does not count towards this limit.
      •   Be sure to include references to any material that you used while completing your assignment.

Referencing
When working on your analysis and writing your report, you are expected to find and refer to additional
materials, as needed. You will need to correctly reference any materials that you use for your assignment. If
you are unfamiliar with referencing requirements, this short video from the RMIT library is a useful starting
point: https://www.rmit.edu.au/library/study/referencing

You should follow the IEEE referencing style. The RMIT library has put together a helpful guide:
https://www.rmit.edu.au/content/dam/rmit/documents/library/referencing/IEEE-referencing-examples.docx
An example of the style is given in the references section at the end of this document.

9. Assignment References
[1]       D. Hamermesh, and A. Parker, “Beauty in the classroom: instructors’ pulchritude and putative
          pedagogical productivity,” Economics of Education Review, vol. 24, pp. 369-376, 2005.

                                                                                                         Page 9 of 9
You can also read