DSCI 558: Building Knowledge Graphs: Units: 4 Term-Day-Time
DSCI 558: Building Knowledge Graphs: Units: 4 Term-Day-Time
DSCI 558: Building Knowledge Graphs: Units: 4 Term-Day-Time
Units: 4
Term—Day—Time:
Fall 2020 – Monday/Wednesday – 3:30-5:30pm
Location: Virtual
Grader: TBD
Contact Info: TBD
Learning Objectives
The learning objectives for this course are:
● Understand the algorithms and techniques for crawling web sites, structured data extraction, and
information extraction from unstructured text.
● Understand the theory and techniques for cleaning, aligning, matching, and linking data.
● Understand the foundations and techniques of the Semantic Web, including RDF, ontologies,
SPARQL, and linked data.
● Understand how to work with graph databases, including how to load massive datasets into such
databases, how to organize the data for efficient access, and how to efficiently query the contents.
● Understand the entire process of how to design, construct, and query a knowledge graph to solve
real-world problems.
● Understand how to apply the big data tools and infrastructure (e.g., Spark) to build and query
knowledge graphs.
Required Preparation:
Prerequisite(s): INF 551 or CSCI 585
INF 552 or CSCI 567
Recommended Background: Experience programming in Python
Course Notes
The course will be run as a lecture class with student participation strongly encouraged. The first 4-5 weeks
of the course are structured as a quickstart to provide a shallow primer on the end-to-end process of
knowledge graph construction, followed by deeper presentations and more technical material for the
remainder of the course. There are weekly readings and students are encouraged to do the readings prior
to the discussion in class. All of the course materials, including the readings, lecture slides, and homeworks
will be posted online. The class project is a significant aspect of this course and at the end of the semester
students will present their projects in class.
Homework Assignments
There will be weekly homework assignments for the first 11 weeks of class. The assignments must be done
individually. The homework assignments are expected to take 8-10 hours per week. Each assignment is
graded on a scale of 0-100 and the specific rubric for each assignment is given in the assignment. The
homework topics are listed in the Course Schedule.
Grading inquiries and questions about the grading of the homeworks and the quizzes can be asked (to the
TA) within two weeks from the grading date.
Course Project
An integral part of this course is the course project, which builds on the topics and techniques covered in
the class. Students can work in teams of up to two people on this project. They will present their project
proposals in class, conduct the project, and then create a video demonstration of the work and present the
project in class.
Project Timeline:
▪ Week 6: Project proposals presented in class (team members, topic)
▪ Week 8: Project status update due (1 page status report)
▪ Week 10: Project status update due (1 page status report)
▪ Week 12: Project status update due (1 page status report)
▪ Week 13: Project presentation in class (short talk and video demonstration)
Project description: Each project team will build a knowledge graph for a topic of their choice. The
knowledge graph must combine data from at least 3 different sources and at least 2 of those sites must be
from online web sites. The best projects build on many of the topics covered in the class. The homeworks
have been designed so that you can work on your projects in the process of doing your homework.
An example project would be to build a knowledge graph of used bicycles that could be purchased near the
USC campus. This project would combine data from used sources, such as Craig’s List, new bike sources
such as BikeNashbar, and bicycle review sites, such as bicycling.com. The project would collect the data
from each of these sources using wrapper techniques, extract the details of the used bicycle ads from
Craig’s List using information extraction techniques, align the data across these various sources to a domain
ontology, link the entities across sources to combine the used data with the reviews from bicycling.com and
prices from BikeNashbar, store all of the data into a graph database such as elasticsearch, and then build a
simple user interface to show the results by executing queries against the graph database.
Grading Breakdown
Quizzes: There will be weekly quizzes at the start of class based on the material from the week before. The
lowest three quiz grades will be dropped. Missed quizzes will receive a zero grade, and there will be no
make-up quizzes for any reason.
Midterm: There is no mid-term exam for this class.
Homework: There will be weekly homework based on the topics of the class each week.
Final Exam: There is a final exam at the end of the semester covering all of the material covered in the class.
The final exam will be on the date designated by USC in https://classes.usc.edu/term-20203/finals/
Class Project: Each student will do a group class project based on the topics covered in the class. Students
will propose their own project, do the research and build a proof-of-concept, create a video demonstration
of the proof-of-concept, and present the project in class.
Grading Schema:
Quizzes 20%
Homework 20%
Final: 25%
Class Project 35%
__________________________________________
Total 100%
Grades will range from A through F. The following is the breakdown for grading:
94 - 100 = A 74 – 76.9 = C
90 – 93.9 = A- 70 – 73.9 = C-
87 – 89.9 = B+ 67 – 69.9 = D+
84 – 86.9 = B 64 – 66.9 = D
80 – 83.9 = B- 60 – 63.9 = D-
77 – 79.9 =C+ Below 60 is an F
12 09/30/2020 Project
Proposals
15 10/12/2020 Ontologies and Frank Manola and Eric Miller. Rdf primer. FI
RDF Technical report, W3C, February 2004.
18 Semantic Typing Pham, M.; Alse, S.; Knoblock, C.; Minh Pham
10/21/2020 / Semantic and Szekely, P, Semantic labeling: A Binh Vu
Models domain-independent approach. In ISWC
2016.
Taheriyan, M., Knoblock, C.A., Szekely, P.
and Ambite, J.L., 2016. Learning the
22 11/04/2020 Few-shot MR
learning over
KGs
23 11/09/2020 Question MR
Answering
25 11/16/2020 Review /
26 11/18/2020 Project /
Presentations
27 11/23/2020 Project /
Presentations
Academic Conduct
Plagiarism – presenting someone else’s ideas as your own, either verbatim or recast in your own
words – is a serious academic offense with serious consequences. Please familiarize yourself with
the discussion of plagiarism in SCampus in Section 11, Behavior Violating University Standards
https://scampus.usc.edu/1100-behavior-violating-university-standards-and-appropriate-sanctions.
Other forms of academic dishonesty are equally unacceptable. See additional information in
SCampus and university policies on scientific misconduct,
http://policy.usc.edu/scientific-misconduct.
Discrimination, sexual assault, and harassment are not tolerated by the university. You are
encouraged to report any incidents to the Office of Equity and Diversity http://equity.usc.edu or to
the Department of Public Safety
Support Systems
A number of USC’s schools provide support for students who need help with scholarly writing.
Check with your advisor or program staff to find out more. Students whose primary language is
not English should check with the American Language Institute h ttp://dornsife.usc.edu/ali, which
sponsors courses and workshops specifically for international graduate students. The Office of
Disability Services and Programs
http://sait.usc.edu/academicsupport/centerprograms/dsp/home_index.html provides certification
for students with disabilities and helps arrange the relevant accommodations. If an officially
declared emergency makes travel to campus infeasible, USC Emergency Information
http://emergency.usc.edu w ill provide safety and other updates, including ways in which
instruction will be continued by means of blackboard, teleconferencing, and other technology.