Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mermaid graph of onigumo processing with a new approach #237

Merged
merged 7 commits into from
Aug 16, 2024

Conversation

nappex
Copy link
Collaborator

@nappex nappex commented Aug 3, 2024

No description provided.

@nappex
Copy link
Collaborator Author

nappex commented Aug 3, 2024

You can play with graph by live service https://mermaid.live/

It is partially solving #232

Copy link
Owner

@Glutexo Glutexo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the diagram causes a slight mismatch in the description above it: although “Onigumo is composed of three sequentially interconnected components” is still correct in some sense, it doesn’t mention the new Feeder part. I’d say something like “The crawling part of Onigumo is composed of three sequentially interconnected components”. We can do that in a follow-up pull request.

It would be possible to object that the diagram doesn’t reflect the actual architecture of the project at the moment, but it didn’t do so before the change either: there is no Parser (will be resolved by #89), no Operator, no Materializer, no flow from start to end… So I think we don’t need to wait with merging for the actual feeder to emerge.

README.md Outdated Show resolved Hide resolved
Copy link
Owner

@Glutexo Glutexo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diagrams are very nice! ❤️ I like how the Feeder, the starting point, sits almost in the upper-left corner and the Materialized, the finishing point, is located in the bottom-right corner, nicely flowing like a text on a page.

Comment on lines +21 to +22
spider_parser(🕷️ PARSER)
spider_operator(🕷️ OPERATOR)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the emoji to get around GitHub not supporting fontawesome icons in Mermaid diagrams. 💪🏻

README.md Outdated
Comment on lines 26 to 31
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only use a single space,

Suggested change
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])

or align the items to table columns?

Suggested change
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])

I don’t have preference (at least now), but I’d like it to be at least consistent.

README.md Outdated
Comment on lines 33 to 35
spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
spider_parser -. "<hash>.json" .-> spider_operator
Suggested change
spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator. -. "<hash>.urls" .-> onigumo_downloader
spider_parser -. "<hash>.json" .-> spider_operator

README.md Outdated
Comment on lines 52 to 55
onigumo_feeder -- .json --> spider_operator
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
onigumo_feeder -- .json --> spider_operator
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
onigumo_feeder -- .json --> spider_operator
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
Suggested change
onigumo_feeder -- .json --> spider_operator
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
onigumo_feeder -- .json --> spider_operator
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser

README.md Outdated
Comment on lines 53 to 59
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser

spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
spider_operator ---> spider_materializer

Let’s push the spider_operator ---> spider_materializer relation to the bottom, so the Spider subgraph arrows are coupled together

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed 🤝

README.md Outdated
Comment on lines 26 to 31
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])

I‘d split this into smaller groups.

README.md Outdated
Comment on lines 52 to 59
onigumo_feeder -- .json --> spider_operator
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser

spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
onigumo_feeder -- .json --> spider_operator
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_feeder -- .json --> spider_operator
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
spider_operator ---> spider_materializer

See my comment from my previous review. I’d push the spider_operator --→ spider_materializer relation to the bottom, so the Spider subgraph arrows are coupled together

Copy link
Owner

@Glutexo Glutexo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worth updating the description “The flowchart below illustrates the flow of data between those parts:” to use a plural form “flowchart” now when there are two. But it’s not necessary: we may consider that to be one flowchart consisting of two parts. 😃

onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
```

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were thinking about adding some title to the second diagram. But I don’t know what it should say.

@Glutexo Glutexo merged commit fc1b54e into Glutexo:master Aug 16, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants