Using LINQ to XML in poly-hierarchical tree structures: A programmatic approach

Page created by Frank Navarro
 
CONTINUE READING
Using LINQ to XML in poly-hierarchical tree structures: A programmatic approach
Using LINQ to XML in poly-
hierarchical tree structures:
A programmatic approach

             Jonathan Sexton, Data & Services Developer, UK Data Archive
                                                           17 April 2014
                                                                    V1.0
Using LINQ to XML in poly-hierarchical tree structures: A programmatic approach
UK Data Archive
•   based at the University of Essex since 1967
•   curator of the UK’s largest collection of digital data in the
    social sciences
•   currently holds nearly 6,000 data collections for research
    and teaching, both quantitative and qualitative
•   certified to ISO 27001, the international information
    security standard
•   makes these available via the new UK Data Service

Website: www.data-archive.ac.uk                                     2
UK Data Service
• the UK Data Service
  indexes all data
  collections in the Archive
   – all catalogued at thematic level
   – many indexed at variable level

• also harvests metadata
  from other sources
• all are available for
  download via Discover
  search-and-browse
  catalogue:
  discover.ukdataservice.ac.uk          3
UK Data Service
• led by experts at University of Essex along
  with colleagues at Manchester, Leeds,
  Southampton, Edinburgh and UCL
• also provides access to UK Census data (1971 to 2011)
• source of guidance, training, and support for data users in
  UK and around the world
• currently serve approx. 24,000 registered users
• newly funded to coordinate the Administrative Data
  Research Network, part of UK’s Big Data strategy

Websites: ukdataservice.ac.uk, census.ukdataservice.ac.uk
                                                                4
Lesson Objectives
To discuss the method used to navigate a poly-
hierarchical tree structure, in XML, using C# and LINQ
to XML, in order to provide the calling application with
the data it requires, given an XML object to navigate in
and a valid identifier within it.

                                                      5
Scenario
•   An application allows a user to add a ‘suggestion’ to a tree
    view based structure, in this particular case a Thesaurus for
    the Social Sciences (ELSST).

•   The tree view is poly-hierarchical, in that identical terms may
    appear in several places in the structure.

•   The data for the tree view is stored in an XML file, stored in
    the user’s session.

•   The application and its supporting architecture is written in
    .NET v4.5 (C#).                                               6
The XML (a snippet of the data)
Top/Broader Term

   Identifier

                                  Attributes for Term

      Narrower Term

                                                        7
The tree view in action in the application
A user has performed a
search on ‘communication
skills’ as a Preferred Term in
the thesaurus, and the
results are as shown.

We can clearly see two
instances in the hierarchy,
under different ‘Top Terms’,
‘Ability’ and
‘Communication’.

                                             8
Requirement
•   When the user is making a ‘suggestion’ they may wish to
    alter the position in the hierarchy that the term in scope
    appears.

•   To do this three dropdown lists are presented to them, valid
    ‘broader’, ‘narrower’ and ‘related’ terms, each appertaining to
    the term in scope.

•   Given the full XML data and the identifier of the term in
    scope, calculate the content of these lists.

                                                                 9
Design
•   We need a method that, given the full XML (as a
    System.Xml.Linq.XElement object) and the identifier (as a
    string), will populate three lists (representing valid ‘broader’,
    ‘narrower’ and ‘related’ terms) and return them as an
    IEnumerable List of strings (a List of a List of strings).

•   We will need to use the System.Xml.Linq library to enable
    us to easily navigate through the passed XElement object.

•   We will need to implement our method into an existing UK
    Data Archive DLL, namely our UKDA.Utility.Library
    component, in the DataAccess/XmlFileReader class.
                                                                  10
Implementation (1 of 12)
•   OK, our new method is going to be:
        public static IEnumerable GetRelevantTermsLists(
                     XElement xElementOriginal,
                     string cid)
        {
                                                                          NOTE: An XElement object, when passed
                                                                          in as a parameter, will be by reference, not
•   And let’s get something to work with:                                 by value, therefore we need to make a
        const string Delimiter = "|";                                     copy before we do any manipulation on it.
                                                                          The identifier (‘cid’) is OK as that’s a string
        const string ClassString = "class-";                              and is hence passed in by value.
        var xElement = new XElement(xElementOriginal);
        var el = xElement.DescendantsAndSelf("element").Where(
                      nm => string.Equals(nm.Element("attr").Element("class").Value,
                                             cid,
                                             StringComparison.CurrentCultureIgnoreCase));
        var btList = new List();
        var ntList = new List();
        var rtList = new List();
        var xElements = el as IList ?? el.ToList();

                                                                                                                  11
Implementation (2 of 12)
•   So now let’s put some meat on the bones: firstly we should get the
    ‘broader’, ‘narrower’ and ‘related’ terms in our xElements object and
    store them in our declared lists…

        foreach (var e in xElements)
        {
                     var bts =
                                e.DecendantsAndSelf("element")
                                              .First()
                                              .Elements(“attr“)
                                              .Elements(“BTs")
                                              .Elements(“BT");
                    btList.AddRange(
                                bts.Select(bt => bt.Attribute(“BTLex").Value + Delimiter + ClassString +
                                bt.Attribute(“BTID").Value;

•   And so on for the ‘narrower’ and ‘related’ terms too.
                                                                                                           12
Implementation (3 of 12)
•   So far so good, all pretty easy stuff, but now let’s crank-up the gameplay
    a little.

•   We’re now going to have to ‘walk’ up our XElement object, looking for
    parent nodes to add to an exclusions list (because we’re not going to
    want these to be returned from the method).

•   We’re going to need two, and only two, iterations. The first iteration will
    get us all the instances of ‘self’, whereas the second will bring back all the
    ancestors.

•   The following code snippet will hopefully shed some light onto what we’re
    trying to achieve here
                                                                             13
Implementation (4 of 12)
var exclusionsList = new List();
for (var i = 0; i < 2; i++)                                                   Only TWO iterations are required.
{
               var pt = el.Elements("data");
               var ptId = el.Elements("attr").Elements("class");
               var elements = pt as IList ?? pt.ToList();
               var ptArray = pt as XElement[] ?? elements.ToArray();
                                                                              Add our data into our ‘ptArray’ variable in the form of:
               var id = ptId as IList ?? ptId.ToList();
                                                                              “DATA ANALYSIS|class-E72A058B-8A32-E311-93C1-000BDB5CC6D5”.
               var ptIdArray = ptId as XElement[] ?? id.ToArray();
               for (var j = 0; j < ptArray.Count(); j++)
               {
                              if (!ptArray[j].Value.Contains(Delimiter)) { ptArray[j].Value += Delimiter + ptIdArray[j].Value; }
               }
               var xElements1 = pt as IList ?? ptArray.ToList();

                                                                      Break out of the outer loop if we encounter an empty list.
              if (!xElements1.Any()) { break; }
              foreach (var p in xElements1.Where(p => !exclusionsList.Contains(p.Value)))
              {
                                                                      Loop around and only put in items to exclude if they don’t
                           exclusionsList.Add(p.Value);
                                                                      already exist.
              }
              el = el.Ancestors();
}                                                                                                                                 14
Implementation (5 of 12)
•   Still with us? OK, so now we should consider getting the ‘child’ nodes for
    removal, thus:

        var elementsToRemove =
                    from a in xElement.DescendantsAndSelf("element").Where(
                                 s => string.Equals(
                                              s.Element("attr").Element("class").Value,
                                              cid,
                                              StringComparison.CurrentCultureIgnoreCase))
                    select a;

                                                                                            15
Implementation (6 of 12)
•   Let’s now add this data into our exclusions list, thus:

        foreach (var element in elementsToRemove)
        {
                     var v = from child in element.Descendants("element")
                                  select
                                  new
                                  {
                                               DataToExclude =
                                                          child.Element("data").Value + Delimiter +
                                                          child.Element("attr").Element("class").Value
                                  };

                    exclusionsList.AddRange(v.Select(x => x.DataToExclude));
        }

                                                                                                         16
Implementation (7 of 12)
•   We now need to jump back to our original XElement, the one we were
    originally passed, thus:

        var allEles = xElementOriginal.DescendantsAndSelf("element");

•   And using this object get all the remaining nodes into a big list of strings,
    thus:

        var bigList = (from ele in allEles
                     select ele.Element("data").Value + Delimiter + ele.Element("attr").Element("class").Value
                     into itemToAdd
                     let badFlag = exclusionsList.Any(badEle => badEle == itemToAdd)
                     where !badFlag
                     select itemToAdd).ToList();

                                                                                                                 17
Implementation (8 of 12)
•   Let’s now get rid of any duplicates, thus:

        var distinctBigList = bigList.Distinct().ToList();

        btList = btList.Distinct().ToList();
        ntList = ntList.Distinct().ToList();
        rtList = rtList.Distinct().ToList();

•   Pretty simple. We’re now ready to populate our lists that we’ll be
    returning…

                                                                         18
Implementation (9 of 12)
•   We will add the items to our lists, for ‘broader’, ‘narrower’ and ‘related’
    terms, thus:

        foreach (var item in distinctBigList.Where(item => !btList.Contains(item)))
        {
                    btList.Add(item);                                                 ‘broader’ terms.
        }

        foreach (var item in distinctBigList.Where(item => !ntList.Contains(item)))
        {
                     ntList.Add(item);                                                ‘narrower’ terms.
        }

        foreach (var item in distinctBigList.Where(item => !rtList.Contains(item)))
        {
                     rtList.Add(item);                                                ‘related’ terms.
        }

                                                                                                          19
Implementation (10 of 12)
•   Nearly there. We must now remove the item that was clicked on by the
    user in the application (we won’t be needing this), thus:

       foreach (var bt in btList.Where(bt => bt.Contains(cid)))
       {
                                                                  ‘broader’ terms, there’ll only be one so once
                    btList.Remove(bt);
                                                                  we’ve removed it, exit the loop.
                    break;
       }

       foreach (var nt in ntList.Where(nt => nt.Contains(cid)))
       {
                    ntList.Remove(nt);                            ‘narrower’ terms, there’ll only be one so once
                                                                  we’ve removed it, exit the loop.
                    break;
       }

       foreach (var rt in rtList.Where(rt => rt.Contains(cid)))
       {
                                                                  ‘related’ terms, there’ll only be one so once
                    rtList.Remove(rt);                            we’ve removed it, exit the loop.
                    break;
       }
                                                                                                            20
Implementation (11 of 12)
•   Finally, populate a container object with our nice new lists and return it.

        var containerList = new List();

        containerList.Add(btList);
        containerList.Add(ntList);
        containerList.Add(rtList);

        return containerList;
}
         End of the method, yay!

                                                                             21
Implementation (12 of 12)
•       And of course, to make all of the code in the previous slides actually
        work, we’re going to need to include the following .NET class libraries in
        the class our method sits in:

    •    System
    •    System.Collections.Generic
    •    System.Globalization
    •    System.IO *
    •    System.Linq
    •    System.Text *
    •    System.Xml
    •    System.Xml.Linq

    * Used for the WriteOutList method, used for testing (next slide).

                                                                               22
Testing (1 of 2)
•   It was found to be quite useful to employ a simple private method, that
    could be called to write the content of a list to a text file, for checking and
    comparison purposes.
        private static void WriteOutList(string newFileLoc, IEnumerable dataList, string type = null, string cid = null)
        {
                       using (var writeText = new StreamWriter(newFileLoc))
                       {
                                     var sb = new StringBuilder();
                                     if (!string.IsNullOrEmpty(type))
                                     {
                                                     sb.Append("Valid ");
                                                     sb.Append(type);
                                                     sb.Append("'s for class id:'");
                                                     sb.Append(cid);
                                                     sb.Append("'.");
                                                     writeText.WriteLine(sb.ToString());
                                                     writeText.WriteLine("----------------------------------------------------------------------");
                                     }
                                     foreach (var line in dataList) { writeText.WriteLine(line); }
                                     writeText.Close();
                       }                                                                                                                       23
        }
Testing (2 of 2)
•   Why the optional parameters (‘type’ and ‘cid’) then?

•   For ‘broader’, ‘narrower’ and ‘related’ terms, that are associated with an
    identifier, simply call the method thus:
        WriteOutList([file path], [data as a list of strings], [“BT”, “NT” or “RT”], [identifier]);

•   For data in the ‘big list’ simply call the method thus:
        WriteOutList([file path], [data as a list of strings]);

•   Saves on the need for overloading so one size fits all

                                                                                                      24
Lesson Summary
We have discussed the method used to navigate a
poly-hierarchical tree structure, in XML, using C# and
LINQ to XML, in order to provide the calling application
with the data it requires, given an XML object to
navigate in and a valid identifier within it.

                                                     25
Questions?

             26
Find Out More
Find out more
   Data Archive – data-archive.ac.uk
   UK Data Service – ukdataservice.ac.uk

Contact Information
   Jonathan Sexton
    UK Data Archive
    Wivenhoe Park, University of Essex
    Colchester CO4 3SQ
    E-mail: jpsexton@essex.ac.uk

                                            27
You can also read