CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta

Page created by Russell Chambers
 
CONTINUE READING
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
CDAC Technical Challenges &
     Lessons Learned
      Data extraction and conditioning
      Ontology for each cancer center

        Project Lead: Swati Mehta

012                                      1
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
out C‐DAC
 re for Development of Advanced Computing (C‐DAC) is the premier R&D
nization of the Department of Electronics and Information Technology
tY), Ministry of Communications & Information Technology (MCIT) for
ying out R&D in IT, Electronics and associated areas. Different areas of C‐
  had originated at different times, many of which came out as a result of
tification of opportunities.

out Applied Artificial Intelligence Group
 Applied Artificial Intelligence (AAI) Group of C‐DAC, Pune is involved in a
 ber of activities such as knowledge‐based understanding systems vis‐à‐vis
ural Language Processing (NLP), Machine Translation System, Information
action & Retrieval, Speech Technology, e‐Learning / e‐Education & m‐
 ning.
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
Dr. Hemant Darbari

Executive Director, C‐DAC, Pune and Head of Department ‐ Applied
Artificial Intelligence Group (AAIG), Advanced Computing Training
School (ACTS), and E‐Governance Cell.

Ajai Kumar

Associate Director, Applied Artificial Intelligence Group (AAIG) C‐
DAC, Pune
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
Swati Mehta

Principal Technical Officer, Applied Artificial Intelligence Group
(AAIG) C‐DAC, Pune

Vivek Koul

Project Engineer, Applied Artificial Intelligence Group (AAIG) C‐DAC,
Pune

Srikanth Jaggari

Project Engineer, Applied Artificial Intelligence Group (AAIG) C‐DAC,
Pune
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
Dana-Farber Cancer Institute

Name, Specialization, Department, Interests
012                                           5
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
Fred Hutchinson

Name, Designation/Appointment, Division, Interests, Phone, email, Fax
012                                                                     6
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
HCGOncology Cancer Center

                Only one profile can be
                accessed at a time

012                                       7
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
octor Profile in HCGOncology Cancer Center

012                                          8
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
octor Profile in HCGOncology Cancer Center

table >
   
      Name: Dr Sanjay Mishra 
   
      Qualification:M.D. (RT) 
   
      Specialisation: Radiation Oncology 
   
      Location: Hubli 
   
/table>

Data Structure: class=“txtblue” is the label; class=“txtcont” is the content

012                                                                                                                9
CDAC Technical Challenges & Lessons Learned - Data extraction and conditioning Ontology for each cancer center Project Lead: Swati Mehta
octor Profile in HCGOncology Cancer Center

                      Qualificatio   AD,                                                            Bangalor
N.K.Vinod             n:             PDCCA      Specialization:   Anesthesiologist    Location:     e
                      Qualificatio                                                                  Bangalor
Prabha Seshachar      n:             MBBS, DA   Specialization:   Anesthesiologist    Location:     e
                      Qualificatio                                                    Years of
H.C.Rajesh            n:             MD         Specialization:   Anesthesiologist    Experience:   16 yrs
                      Qualificatio
Gaurav Dwivedi        n:             MBBS, MD   Specialization:   Anesthesiologist    Location:     Delhi
Kshirod Kumar         Qualificatio
arya                  n:             MBBS, MS   Specialization:   Anesthesiologist    Location:     Cuttack
                      Qualificatio                                Cardio Thoracic                   Bangalor
Ganesh Nayak          n:             MS         Specialization:   Surgery             Location:     e
                      Qualificatio                                                                  Bangalor
B C Bommaiah          n:             MD         Specialization:   Cardiologist        Location:     e
                      Qualificatio
Kshitish Ch. Mishra
012                   n:             MBBS, MD   Specialization:   Clinical Oncology   Location:     Cuttack
                                                                                                    10
Structure of Data for Profile in
                 HCGOncology Cancer Center
able>                                                •   Data of HCG Oncology site is present
 mg
c="phpThumb.php?src=uploads/doctors_images/4f840e5181        in the form of embedded tables.
 2.png&amp ”/> 
                                                         •   Every Profile is present in a separate
able >
                                                         page, so the structure of data and
     Name:            pages is difficult to retrieve using
      Dr Sanjay Mishra
span>                                                   DEiXTo.
 
                                                     •   CDAC has developed an extraction
 Qualification:
 M.D. (RT)             tool to get the data from this site.

 Specialisation:
   Radiation Oncology
span>
 
  Location:
   Hubli 
 
table>
012                                                                                             11
Researcher Profile In Dana Farber Cancer Institute

012         Fig: Researcher’s Profile in Dana Farber   12
Researcher Profile In Dana Farber Cancer Institute

s="abcGroup">

 ass="fLeft title">
a href="/directory/profile.asp?pgt=Gregory+A%2E+Abel%2C+MD%2C+MPH">Gregory A. Abel, MD, MPH
em>Medical Oncologist, Hematologic Oncology

 ass="oHide">
strong>Clinical InterestLeukemia, Myelodysplastic syndromes, Myeloproliferative disorders

ass="clear">Â 

012                          Fig: Researcher’s Profile in Dana Farber                                13
Structure of Data for Profile in
                   Dana Farber Cancer Center

                 Gregory A. Abel, MD, MPH
      Medical Oncologist, Hematologic Oncology
  
      Clinical Interest
       Leukemia, Myelodysplastic syndromes, Myeloproliferative disorders
  
....

012                                                                            14
Structure of Data for Profile in
                       Dana Farber Cancer Center
"abcGroup">
     A

                         Gregory A. Abel, MD, MPH
       Medical Oncologist, Hematologic Oncology
        
           Clinical Interest
           Leukemia, Myelodysplastic syndromes, Myeloproliferative disorders
        
        Â
bservation on Structure Profile Data Present in
         Dana Farber Cancer Center

 n Dana Farber Cancer Institute Profile data is present in structured
form which DEiXTO is able to extract .

Since Data is organized in Structured manner, we can extract data
using “DEiXTo” Tool.

iXTo (or ΔEiXTo) is a powerful web data extraction tool that is based on the
3C Document Object Model (DOM). It allows users to create highly accurate
xtraction rules” (wrappers) that describe what pieces of data to scrape from
website.
012                                                                     16
ructure of Profile Data Present in TATA Memorial
                    Hospital

>

> Designation:  
>Anaesthesia

valign="top">Special Interests:
>rashmikailashsharma@yahoo.co.in
      Phone No. (+9122) 24177044
>

012                                                                                       17
Observation on Structure Profile Data
 Present in TATA Memorial Hospital

rofile page data structure is not uniform.
 sufficient data with profiles
ormat in which profiles are present are not
niformly structured.
ata extracted manually

012                                           18
References
      1. http://www.dana-farber.org/

      2. http://www.hcgoncology.com/

      3. http://tmc.gov.in

      4. http://en.wikipedia.org/wiki/Web_scraping

      5. http://deixto.com/

012                                                  19
Future work
Incorporate medIND and IndMED
biomedical journal databases as well as
PubMED into VIVO.

012                                       20
Global Cancer Collaboratory Portal
Global Cancer Collaboratory

                    Fig: Global Cancer Collaboratory Researchers

arber Cancer Institute [US]                          www.dana-farber.org
tchinson Cancer Research Center [US]                 www.fhcrc.org/
are Global Enterprises [INDIA]                       www.hcgoncology.com/
morial Centre   [INDIA]                              http://www.tatamemorialcentre.com
mple Data of Departments Present in Global Cancer Collaboratory

                               Sample Data used
                                  for Display
Department profile page as presented in
     Global Cancer Collaboratory
Sample researchers profiles
Sample researchers profiles from HCG Oncology
Sample researchers profiles in Global Cancer Collaboratory

Researcher of University Medical Center Mainz Germany (Ref: http://www.unimedizin-mainz.de/)
Navigate Researchers Profiles
Sample researcher profile page

                Full Profile Link
Authors and Publications
A Link to Co-Authors of the same Article
Co-Author’s Network of the same Article
eferences:
 http://vivoweb.org/
 www.dana-farber.org
 www.fhcrc.org/
 www.hcgoncology.com/
 http://www.tatamemorialcentre.com
 http://www.unimedizin-mainz.de/
You can also read