Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] New Architecture for Requirement Computation #139

Closed
SamChou19815 opened this issue Sep 17, 2020 · 3 comments
Closed

[RFC] New Architecture for Requirement Computation #139

SamChou19815 opened this issue Sep 17, 2020 · 3 comments

Comments

@SamChou19815
Copy link
Contributor

Current setup and issues

Currently, we compute requirement fulfillment in a quite naive way. We first go through a list of all requirements, attach courses taken by the user to them, and then utilize these local information to decide the fulfillment progress.

The current setup can solve some simple cases, but it's not scalable when we consider these complications:

  • Some classes cannot be double-counted. If we use a local requirement -> class[] map, it would be hard to filter them out.
  • Some requirements have multiple fulfillment options.
  • We want to have the ability for user to choose between several different fulfillment options.

New Architecture

Insights

The main issue with the current implementation is that we are looking at the whole picture only from the perspective of requirements. In reality, the "viewpoint" from a class is also important. (e.g. filter away double counting) Therefore, we natually found an abstraction for a two-way view: GRAPH.

Mathematical definition

The graph is not a general-purposed graph. Instead, we add a few restricions to the graph. Formally, let's define CoursePlanRequirementFulfillmentGraph using the standard notation: (R \union C, E), where R means a set of requirements to satisfy, C means a set of courses that is taken by user, and (r, c) \in E is a tuple denoting the edge between a requirement node r and a course node c. This graph can also be thought as a mathematical relation between R and C. Thus, the problem of computing requirements can be thought as building and refining this graph.

Graph Interface

Programmatically, we can define an interface like:

interface CoursePlanRequirementFulfillmentGraph {
  // These are the minimal required methods.
  getAllRequirements(): readonly Requirement[];
  getAllCourses(): readonly Course[];
  getAllEdges(): readonly (readonly [Requirement, Course])[];
  addEdge(requirement: Requirement, course: Course): void;
  removeEdge(requirement: Requirement, course: Course): void;
  
  // Some helper methods can be added for efficiency or usability, like
  getConnectedCoursesFromRequirement(requirement: Requirement): readonly Course[];
  getConnectedRequirementsFromCourse(course: Course): readonly Requirement[];
  existsEdge(requirement: Requirement, course: Course): boolean;
  // ...
}

Algorithm

The algorithm has a simple interface:

const computeCoursePlanRequirementGraph = (
  requirementsNeedToSatisfy: readonly Requirement[],
  coursesTakenByUser: readonly Course[],
  userChoicesOfRequirementFulfillmentOptions: readonly (readonly [Course, Decision])[])
): CoursePlanRequirementGraph => {
  // ...
}

Internally, we can now describe the new proposed computation algorithm as a series of passes on building and refining the graph:

Phase 1: Building a rough graph

In this phase, we build a coarse graph directly from the pre-computed requirement json.

When we see user takes a class c, we find a subset of requirements R' \in R such that c can potentially make progress to satisfy r \in R'.

Phase 2: Respect user's choices

User may make some decision on how courses are used to satisfy requirements. For example, whether to use MATH 4710 as an elective or as a substitude of CHEM class. In this phase, we will consider the userChoicesOfRequirementFulfillmentOptions parameter.

For each (course, decision) tuple, we inspect the graph from the perspective of course. We can call the method graph.getConnectedRequirementsFromCourse(course) to get a list of requirements that are currently associated with the course. Then we remove the edges that are incompatible with the user's decision.

The representation of decision and the compatibility check are not important in the discussion of the architecture. We can specify it in another design document.

(Note: we can also make the decision keyed by requirement, the flow would be similar).

Phase 3: Making default choices

This is a followup of phase 2. The user might not make choices for every single requirement that need some choices. For example, the user might just added MATH 4710 and has not decided whether it can be used to fulfill elective or engineering's chem requirement.

In this phase, we will take a look at all requirements that have multiple fulfillment strategies and have not been covered by user's decision. For all these requirements, we use the default strategy, call graph.getConnectedCoursesFromRequirement(requirement) to get existing edges, and remove those that are incompatible with default strategies.

Phase 4: Eliminating double counting

In this phase, we inspect each course c. We call graph.getConnectedRequirementsFromCourse(course) to get a list of requirements that can be satisfied by the course without considering double-counting constraint. Let's call this list of requirement R.

Then for all r \in R, we check whether the requirement allow a course to be used for something else. If it does allow, then we will always keep this edge. Now we are left with a subset of R which we call R'. We can then use some simple rule to pick one or zero r' \in R' to keep and discard the rest edges.

Phase 5: Profit

🎉

Some important implementation details

Cross-listed classes

In order to correctly handle cross-listed course, we need to change the course type so that all cross listed classes maps to the same class. e.g. CS 2112 might be represented as:

{
  "id": 1234567,
  "codes": ["CS 2112", "ENGRD 2112"],
  // ...
}

The requirement checking step in phase 1 also need to resolve a course code to a course object and use that to consider all cross-listed possibilities.

Ensuring user never makes bad choices

Here, a bad choice means that a choice that is invalid according to Cornell's rules. For example, using MATH 4710 as both an external spec and a free elective. We can enforce this by giving user only a list of valid choices when we let user make decisions.

The easiest but inefficient way to find a list of valid choices would be considering all invalid choices. For each potential choice, try to add the choice into the decision list and run the algorithm. The algorithm should have a bunch of sanity checks to defend against invalid choices. Then, if the algorithm doesn't throw an exception, we know the choice is valid.

Post-processing

In the end, we got a graph, but the frontend UI actually need a list like

type Fulfillments = { 
  requirement: Requirement;
  courses: Course[];
  progress: number;
  total: number;
}[]

Therefore, we need a post-process pass that flattens the graph into this list.
The step needs to take account both the graph and user's choices, so the signature would look something like

const postProcessRequirementFulfillmentGraphIntoList = (
  graph: CoursePlanRequirementFulfillmentGraph,
  decisions: readonly (readonly [Course, Decision])[])
): Fulfillments => { /* ... */ }
@willespencer
Copy link
Member

willespencer commented Sep 18, 2020

Hey Sam! This looks really great! And thanks for being so thorough, it really helps with my understanding 😄! Got a question for you.

When you say, 'we remove the edges that are incompatible with the user's decision,' what does this mean? As in, what type of course <-> requirement edges would be incompatible? It is just like, if they picked it for a requirement and they cannot double count it, it would be removed from other requirements?

@SamChou19815
Copy link
Contributor Author

SamChou19815 commented Sep 18, 2020

Hey Sam! This looks really great! And thanks for being so thorough, it really helps with my understanding 😄! Got a question for you.

When you say, 'we remove the edges that are incompatible with the user's decision,' what does this mean? As in, what type of course <-> requirement edges would be incompatible? It is just like, if they picked it for a requirement and they cannot double count it, it would be removed from other requirements?

That's a good question, we might need some revision of the phases. I think we will likely only let user make two types of decisions in the system:

  1. When there is a requirement with multiple fulfillment strategies. In this case, we let user choose between several strategies.
  2. When there is a class that can satisfies multiple requirements, but all these requirements do not allow double count. In this case, we let user choose between several requirements.

In case 1, we need to do the computation starting from requirement nodes, and purge req-course link that belongs to a fulfillment strategy that is not chosen by the user.
In case 2, we need to do the computation starting from course nodes, and purge all other req-course link except the one chosen by the user. In this case, it's indeed double-counting aware.

Therefore, the current division of phase 2-4 might not make many sense. Case 1-2 above look very different so that they should be separate phases, and case 2 is also double-counting aware, so it kinda overlaps with phase 4.

Thus, here comes my revised phases:

  • Phase 1: Building a rough graph
  • Phase 2-1: Respect user's choices of fulfillment strategies for courses with multiple fulfillment options
  • Phase 2-2: Make default choices for users for courses with multiple fulfillment options
  • Phase 3-1: Respect user's choices of using one course to satisfy one of the several requirements that don't allow double count
  • Phase 3-2: Make random choices for user for the kind of choices above
  • Phase 4: Profit

SamChou19815 added a commit that referenced this issue Sep 25, 2020
### Summary

This PR starts to implement proposal in #139 by setup the `RequirementFulfillmentGraph` data structure.

I made a generic `RequirementFulfillmentGraph` that doesn't depend on a specific representation of requirement or course object, so that the structure itself can be developed independently to avoid potential changes from requirement or course representation. It contains all the methods I described in #139. All the methods are tested in Jest.

Inside my implementation of graph, we need to lookup all courses by a requirement and all requirements by a course, so I setup two hash maps for O(1) lookup. `addEdge` and `removeEdge` ensures that they are in sync. 

Unlike Java where every object has `equals` and `hashCode`, we don't have the same luxury in JS. Therefore, I also implemented `HashMap` and `HashSet` that takes in a function `getUniqueHash` for the key type to simulate the `hashCode` method in Java. The basic idea is that we call this function to get a string that can uniquely identify a requirement/course, so that we can use that unique identifier as a key for quick lookup using JavaScript's `Map`. (samlang uses the same approach: 
https://github.com/SamChou19815/samlang/blob/86ffed2edcb58573b34dae94bbea83e80c591d98/samlang-core-utils/index.ts#L54-L134)

### Test Plan

Added unit tests for `HashMap`, `HashSet` and `RequirementFulfillmentGraph`
Added a step for `npm run test` in CI.
@SamChou19815
Copy link
Contributor Author

Everything in this issue has been implemented. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants