Skip to content

WIP: Bake faster#728

Open
ancho wants to merge 9 commits intojbake-org:masterfrom
ancho:experiment/2-7-0-cuncurrent
Open

WIP: Bake faster#728
ancho wants to merge 9 commits intojbake-org:masterfrom
ancho:experiment/2-7-0-cuncurrent

Conversation

@ancho
Copy link
Copy Markdown
Member

@ancho ancho commented Oct 3, 2021

This is a Proof of Concept to speed up the baking process using parallel execution.

It speeds up the following three core activities:

  • crawling
  • rendering
  • asset management (copy stuff around)

This is just a quick sketch to start a discussion on possible solutions on how to get more performance out of the baking process.

I kept it as simple as possible to just get it working.
But the code needs to be cleaned up/improved a little bit. And there are missing tests regarding possible side effects.
I think it would be better to walk the file tree instead of using a parallel stream with recursive calls for example.

But first I would be happy to know what others think about the current approach.

Copy link
Copy Markdown

@bmarwell bmarwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some quick findings. I wonder if it would be better to use a global ExSvc instead of shutting new ones down all the time (see comments about waiting).

Comment on lines -293 to +302
private DocumentList<DocumentModel> query(String sql) {
private synchronized DocumentList<DocumentModel> query(String sql) {
activateOnCurrentThread();
OResultSet results = db.query(sql);
return DocumentList.wrap(results);
}

private DocumentList<DocumentModel> query(String sql, Object... args) {
private synchronized DocumentList<DocumentModel> query(String sql, Object... args) {
activateOnCurrentThread();
OResultSet results = db.command(sql, args);
return DocumentList.wrap(results);
}

private void executeCommand(String query, Object... args) {
private synchronized void executeCommand(String query, Object... args) {
activateOnCurrentThread();
db.getTransaction().begin();
db.command(query, args);
db.getTransaction().commit();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why use synchronize here? Instead, you could catch OConcurrentModificationException and retry.

See: https://orientdb.org/docs/3.0.x/general/Concurrency.html

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That could be done. But currently we have only this one Contentstore that is passed around everywhere, so all Threads use the same db session, which can lead to a deadlock especially when adding new documents.

I think It would be better to restructure the contentstore to have one single orient instance and each thread uses it's own database session. Like they recommend in the docs.

That's a bit more work that needs to be done. So I decided to keep it simple first and synchronize the potential critical parts.

Comment on lines +51 to +52
try {
try (Writer out = createWriter(outputFile)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this double-nesting needed? I do not see a reason.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup mistake. Can be merged together. Thanks for the hint.

Comment on lines +177 to 178
utensils.getRenderer().shutdown();
errors.addAll(asset.getErrors());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdown() does not wait for the ES to run all threads until finished (ie non-blocking operation). You need for all threads to finish, or you will not get all asset errors.

Comment on lines +186 to +187
} catch (InterruptedException e) {
e.printStackTrace();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Re-Interrupt
  • use logger

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. That's right. A log message would be better. And thinking about it we also miss a shutdown handler to shutdown all executors directly.

* Load {@link RenderingTool} instances and delegate rendering of documents to them
*/
private void renderContent() {
private void renderContent() throws InterruptedException {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method never checks if it got interrupted. The for loop would be a sane choice.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm....my thought was....if an interrupt happens, better stop processing and fail fast....but yes. could be handled in the loop.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there could be an interrupt in a method call inside the loop. But what happens if the interrupt happens on the "jump back" instruction? I am by no means an expert with Threading and Interrupts, but I think adding a simple check in each iteration would be a good idea:

if (Thread.isInterrupted()) { /* LOG, then */ throw new InterruptedException(); }

Comment on lines +304 to +307
public void shutdown() throws InterruptedException {
this.excutor.shutdown();
this.excutor.awaitTermination(10, TimeUnit.MINUTES);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you made it synchronous. Why wait here? Better name it "shutdownAndWait" then, because usually the shutdown method does not wait.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will rename it to make the intention clearer. cooldown would be another idea to name it to keep the oven analogy.
Like at the end of the baking process you let the oven cooldown and afterwards cleanup the dirt that may have happened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants