Skip to content

Conversation

@ChristofferEmilKristensen
Copy link
Contributor

Implementation of a source and sink operator for the Java Platform

@zkaoudi zkaoudi requested a review from juripetersen January 13, 2026 21:07
@juripetersen
Copy link
Contributor

@ChristofferEmilKristensen This PR contains merge conflicts. You will have to resolve the merge conflicts before we can merge it.

Copy link
Contributor

@juripetersen juripetersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution :)

I added some comments regarding the usage of imports and namespaces that need to be addressed.

@ChristofferEmilKristensen
Copy link
Contributor Author

Hi,

The comments has been resolved. The only places where there are full qualified class names is in the operators where have a references to both a Wayang Record and Iceberg Record. Here both are alwats fully qualified to avoid misconfusions. Besides that everything else has been updated.

@juripetersen
Copy link
Contributor

Thank you for adapting my comments @ChristofferEmilKristensen.

I would like to have some (non exhaustive) tests for the Operators you introduced.
You can see how we do that for Sinks/Sources here: https://github.com/apache/wayang/blob/60d5d7428915217286f1d8fa6dbe2dd55bdee0f6/wayang-platforms/wayang-java/src/test/java/org/apache/wayang/java/operators/JavaObjectFileSinkTest.java

"wayang.java.parquetsource.load.prepare", javaExecutor.getConfiguration()));
ExecutionLineageNode mainLineageNode = new ExecutionLineageNode(operatorContext);
mainLineageNode.add(LoadProfileEstimators.createFromSpecification(
"wayang.java.parquetsource.load.main", javaExecutor.getConfiguration()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be using a different identifier, right?
Most likely, iceberg.

@mspruc
Copy link
Contributor

mspruc commented Jan 15, 2026

build errors should be fixed with #658

@ChristofferEmilKristensen
Copy link
Contributor Author

Hi,

  1. The build error is fixed now. There was a unambiogus reference with the CardinalityEstimator, when i removed the full qualification. I have readded this.

  2. In regards to writing some test i am bit unsure how to do this. The problem is that when we Write to an iceberg table, it expects a folder location where a Iceberg location is already created. Also it expects some data to write to. This data can of course be generated by just creating a few Wayang Records. But when writing the data it will create a lots of files within the folder as there are metadata files actual data files and other various things needed for an Iceberg Table.

  • In regards to loading from a table, this also expects that an Iceberg warehouse has been created. This can definitely be solved, but doing it automated and ensuring that everything is cleaned up correctly after the test is not something that i have looked into. I can point to a reasearch paper that i have created with Zoi, where you can see the validation. If it is a requirement that the testing should be added for the operators i would like to setup a meeting where we can discuss the different possibilties. I am not so familiar with writing automated testing and ensuring that it does not cause errors for other users. If you need the research paper for reference for evaluation for the operators then let me know and i can send it over.

Let me know how we should proceed.

@juripetersen
Copy link
Contributor

Hi,

1. The build error is fixed now. There was a unambiogus reference with the CardinalityEstimator, when i removed the full qualification. I have readded this.

2. In regards to writing some test i am bit unsure how to do this. The problem is that when we Write to an iceberg table, it expects a folder location where a Iceberg location is already created. Also it expects some data to write to. This data can of course be generated by just creating a few Wayang Records. But when writing the data it will create a lots of files within the folder as there are metadata files actual data files and other various things needed for an Iceberg Table.


* In regards to loading from a table, this also expects that an Iceberg warehouse has been created. This can definitely be solved, but doing it automated and ensuring that everything is cleaned up correctly after the test is not something that i have looked into. I can point to a reasearch paper that i have created with Zoi, where you can see the validation. If it is a requirement that the testing should be added for the operators i would like to setup a meeting where we can discuss the different possibilties. I am not so familiar with writing automated testing and ensuring that it does not cause errors for other users. If you need the research paper for reference for evaluation for the operators then let me know and i can send it over.

Let me know how we should proceed.

We ususally use Mockito, which can be utilized to mock things that would be impractical to set up in tests:

final OptimizationContext.OperatorContext operatorContext1 = mock(OptimizationContext.OperatorContext.class);

It's basically used to fake the API of the thing you are interacting with, so that unit testing is possible. Do you think this would be feasible for testing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants