- Instructor: Semih Salihoglu
- Seminar Room: DC 2585
- Seminar Time: Tue 4pm-6:50pm
The quest to build intelligent machines that are capable of logical reasoning, i.e., ones that can induce new information or deduce logical rules from prior information, is as old as the field of computer science. Research topics motivated with this ultimate goal is at the heart of symbolic (as opposed to statistical) artificial intelligence and connects with many sub-fields of computer science, such as data management and semantic web, as well as other scientific fields, such as linguistics. These systems are generally based on forming a knowledge base or knowledge graph that represent a set of facts about a real world application domain, as well as elementary logical rules specifying constraints about the domain, as well as an inference system that can answer questions using the base facts and the rules. Historically, there has been several periods that have popularized the development of such knowledge graph-based applications. The strongest of these have been the expert systems of 1980, the semantic web of 2000s, and the current wave of question answering in search engines, recommender systems, and dataset cataloging/search for extremely heterogeneous large public goverment and private enterprise data lakes.
This seminar will cover seminal work in the space of knowledge graph representation, querying, management, and past and primarily modern applications that are powered by knowledge graphs. Topics include knowledge models, ontologies, query languages, graph data management systems, public knowledge graphs, knowledge graph embeddings, popular successful past and present applications.
The seminar is based on weekly paper readings and student presentations, discussions, and a term project.
The below schedule is subject to change:
Week | Date | Topic | Readings |
---|---|---|---|
1 | 9/13 | Introduction (Semih lecturing) | Knowledge Graphs The Semantic Web SWFO Ch 3, 5-7 |
2 | 9/20 | Guest Lecture: Prof. Grant Weddell Foundations of Knowledge Representation |
KRR Ch 2 & 3 (No reviews but do Exercises 1 and 4 in Ch 2.7 of KRR and submit a pdf (latex or hand written)). |
3 | 9/27 | Datalog (Semih lecturing online) | PDKBS 3 (from pgs 96 to 139 but I highly recommend 139-164 as well if you have not read on the formalism of relational algebra) |
4 | 10/04 | Query Processing in Deductive DBMS: Magic Sets (Semih lecturing) | PDKBS 12.1-12.8, PDKBS 13.1-13.5 Optional: Magic Sets Original Paper Optional: Magic is Relevant |
-- | 10/11 | No Class (Reading Week) | |
5 | 10/18 | RDF Systems | RDFox RDF3x |
6 | 10/25 | Property Graph Data Management Systems (Semih and Amine lecturing) | The Ubiquity of Large Graphs User Survey FDB Optimizing Subgraph Queries by Combining Binary and WCOJ Optional: Umbra WCOJ Implementation |
7 | 11/01 | Large Public Knowledge Graphs and Ontologies | DBPedia SNODEM: Ch 3, Ch 4.1-4.3.8.3, 4.4 Schema.org |
8 | 11/08 | Enterprise Knowledge Graphs & Management Systems | Sequeda Thesis Ch 3 Sequeda Thesis Ch 4 Optional: Pay-as-you-go Methodology Case Study (Focus on Section 4) |
9 | 11/15 | Natural Language Interfaces to Data | BELA ATHENA Optional: ATHENA++ Optional: IRNet |
10 | 11/22 | Graph Embeddings/Deep Natural Language Embeddings | TransE Q&A With Embeddings Optional: Survey on KG Embeddings |
11 | 11/29 | Guest Lecture on Data Cataloging: Juan Sequeda & Other Enterprise Applications | Datanami Article Saga |
12 | 12/06 | Linked Open Data Movement | Google Dataset Search Table Union Search Optional: Making Open Datasets Transparent |
This seminar's reading will cover chapters from the following surveys and textbooks in addition to research papers, which will be posted in the schedule.
- Knowledge Representation and Reasoning (KRR), Brachman & Levesque, 2004
- Designing and Building Enterprise Knowledge Graphs, Sequeda & Lassila, 2021
- Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases, Weikum, Dong, Razniewski, Suchanek, 2021
- Natural Language Interfaces to Data Quammar,Efthymiou, Lei, Özcan, 2022
- Semantic Web for the Working Ontologist (SWFWO), Allemang & Hendler, 2008
- The Protégé Project: 1, 2
- Rule-Based Expert Systems: The MYCIN Experiments of the Stanford Heuristic Programming Project, Buchanan, Shortliffe, 1984
- Principles of Database and Knowledge-Base Systems (PDKBS), Ullman, 1989
- Class Participation: 15%
- Paper Reviews: 20%
- Presentation: 15%
- Project: 50%
For each seminar (except the first 2 seminars) we will be writing two reviews for two of the papers assigned to that day. If there are more than two papers assigned, you can pick any two of the assigned papers. You are allowed to skip 1 review throughout the term. I am flexible in the formats of your review. The reviews will be 1 pages long (if you need more space take another 0.25 pages but try not to). You have to finish your review with one question to start a discussion in the seminar. The reviews are due the Monday at 6pm before the seminar. You are expected to (very very) briefly answer the following 6 questions and finish your reviews with a question that can start a discussion in class:
- What is the problem?
- Why is it important?
- Why is it hard? Why don't previous methods work?
- What is the solution to the problem the authors propose?
- What interesting research questions does the paper raise?
- (If related) How does the paper relate to other papers we have read? The first 4 of these questions are from Jennifer Widom's tips for writing introductions to technical papers. I strongly recommend that each one of you read this entire document very carefully (probably multiple times) at some point in your graduate studies. There is no fixed format for the reviews but I recommend: Single column, 1.5 space, 12 pt, in Latex.
Ultimately, the main thing I am looking for is a demonstration of serious critical reading of the paper.
There are two main deliverables of your project, a 6-page paper and the source code of your project with instructions to run your code.
- Project Paper: The project papers will be 6 pages. You can have extra pages for the references. They will be written in the two-column ACM proceedings format, using one of the ACM SIG Proceedings Templates.
- Project Source Code: Please put your source code into github and include a link in your project writeup. On the github page, please document exactly how to run your source code.
Each student will be doing 1 presentation in the term. Each presentation will be about 25 minutes long. Here are the important points summarizing what you have to do for your presentations.
- You must present with slides. The content in your slides should be your own but you can use others' materials, e.g., figures from the paper we are reading, when necessary and by crediting your source on your slide.
- Please have a separate slide for each of 4 questions in the summary item in the Paper Review section.
- It is very helpful to demonstrate the ideas in the paper through examples. So try to have examples in your presentation, e.g., a simulation of some code or system component.