Make all OpenML classes to inherit ReprMixin#1567
Make all OpenML classes to inherit ReprMixin#1567JATAYU000 wants to merge 15 commits intoopenml:mainfrom
OpenML classes to inherit ReprMixin#1567Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1567 +/- ##
==========================================
+ Coverage 52.04% 52.98% +0.94%
==========================================
Files 36 36
Lines 4333 4339 +6
==========================================
+ Hits 2255 2299 +44
+ Misses 2078 2040 -38 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fkiraly
left a comment
There was a problem hiding this comment.
Question: have you considered inheriting from OpenMLBase, or at least move the __repr__ related logic to a common place, instead of writing a new one?
Not saying that this is how it has to be done, I would like to hear your rationale.
|
I think it might be a good idea, simply because of the DRY principle. |
|
can we put this in utility functions? I don't think that's going to be clean, then |
|
__repr__ method from OpenMLSplitOpenML classes to inherit ReprMixin
…as non-strict expected fail. (openml#1587) #### Metadata * Reference Issue: Temporarily fix issue openml#1586 #### Details - Running the pytest locally, I found only one failed test which is: `tests/test_runs/test_run_functions.py::test__run_task_get_arffcontent_2` - However, when trying to go through the failed tests in the recent runed jobs in different recent PRs, I found many other failed tests, I picked some of them and tried to make a kind of analysis, and here are my findings: ##### Primary Failure Patterns 1. OpenML Test Server Issues (Most Common) The majority of failures are caused by: - `OpenMLServerError: Unexpected server error when calling https://test.openml.org/... with Status code: 500` - Database connection errors: `Database connection error. Usually due to high server load. Please wait N seconds and try again.` - Timeout errors: `TIMEOUT: Failed to fetch uploaded dataset` 2. Cache/Filesystem Issues - `ValueError: Cannot remove faulty tasks cache directory ... Please do this manually!` - `FileNotFoundError: No such file or directory` 3. Data Format Issues - `KeyError: ['type'] not found in axis` - `KeyError: ['class'] not found in axis` - `KeyError: ['Class'] not found in axis`
…openml#1556) #### Metadata * Reference Issue: Fixes openml#1542 #### Details Fixed sklearn models detection by safely importing openml-sklearn at `openml/runs/__init__.py`
…#1559) I have Refactored the `OpenMLEvaluation` class from a traditional Python class to use the `@dataclass` decorator to reduce boilerplate code and improve code maintainability. #### Metadata * Reference Issue: openml#1540 * New Tests Added: No * Documentation Updated: No * Change Log Entry: Refactored the `OpenMLEvaluation` class to use the `@dataclass` #### Details Edited the `OpenMLEvaluation` class in `openml\evaluations\evaluation.py` to use `@dataclass` decorator. This significantly reduces the boilerplate code in the following places: - Instance Variable Definitions **Before:** ```python def __init__( self, run_id: int, task_id: int, setup_id: int, flow_id: int, flow_name: str, data_id: int, data_name: str, function: str, upload_time: str, uploader: int, uploader_name: str, value: float | None, values: list[float] | None, array_data: str | None = None, ): self.run_id = run_id self.task_id = task_id self.setup_id = setup_id self.flow_id = flow_id self.flow_name = flow_name self.data_id = data_id self.data_name = data_name self.function = function self.upload_time = upload_time self.uploader = uploader self.uploader_name = uploader_name self.value = value self.values = values self.array_data = array_data ``` **After:** ```python run_id: int task_id: int setup_id: int flow_id: int flow_name: str data_id: int data_name: str function: str upload_time: str uploader: int uploader_name: str value: float | None values: list[float] | None array_data: str | None = None ``` - _to_dict Method Simplification **Before:** ```python def _to_dict(self) -> dict: return { "run_id": self.run_id, "task_id": self.task_id, "setup_id": self.setup_id, "flow_id": self.flow_id, "flow_name": self.flow_name, "data_id": self.data_id, "data_name": self.data_name, "function": self.function, "upload_time": self.upload_time, "uploader": self.uploader, "uploader_name": self.uploader_name, "value": self.value, "values": self.values, "array_data": self.array_data, } ``` **After:** ```python def _to_dict(self) -> dict: return asdict(self) ``` All tests are passing with accordnce to the changes: ```bash PS C:\Users\ASUS\Documents\work\opensource\openml-python> pytest tests/test_evaluations/ ======================================= test session starts ======================================= platform win32 -- Python 3.14.0, pytest-9.0.2, pluggy-1.6.0 rootdir: C:\Users\ASUS\Documents\work\opensource\openml-python configfile: pyproject.toml plugins: anyio-4.12.0, flaky-3.8.1, asyncio-1.3.0, cov-7.0.0, mock-3.15.1, rerunfailures-16.1, timeout-2.4.0, xdist-3.8.0, requests-mock-1.12.1 asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 13 items tests\test_evaluations\test_evaluation_functions.py ............ [ 92%] tests\test_evaluations\test_evaluations_example.py . [100%] ================================= 13 passed in 274.80s (0:04:34) ================================== ```
…enml#1566) #### Metadata * Reference Issue: Fixes openml#1531 * New Tests Added: No * Documentation Updated: Yes * Change Log Entry: Update supported Python version range to 3.10–3.14 and extend CI testing to Python 3.14 #### Details This pull request updates the officially supported Python version range for openml-python from 3.8–3.13 to 3.10–3.14, in line with currently supported Python releases. The following changes were made: Updated pyproject.toml to reflect the new supported Python range (3.10–3.14). Extended GitHub Actions CI workflows (test.yml, dist.yaml, docs.yaml) to include Python 3.14. Updated documentation (README.md) wherever Python version support is mentioned. No new functionality or tests were introduced; this is a maintenance update to keep Python version support and CI configuration up to date. This change ensures that users and contributors can use and test openml-python on the latest supported Python versions.
Fixes openml#1598 This PR adds the `@pytest.mark.uses_test_server()` marker to tests that depend on the OpenML test server. Changes * added `uses_test_server` on the relevant test sets. * replaced all the `server` markers with `uses_test_server` marker * removed all the `@pytest.mark.xfail(reason="failures_issue_1544", strict=False)` where the failure was due to race-conditions or server connectivity
fkiraly
left a comment
There was a problem hiding this comment.
Not ready to merge, requires conflicts to be resolved
|
I have resolved the conflicts, but there seems to be unrelated test failures |
| def __eq__(self, other: Any) -> bool: | ||
| return isinstance(other, OpenMLDataFeature) and self.__dict__ == other.__dict__ | ||
|
|
||
| def __hash__(self) -> int: | ||
| return hash( | ||
| ( | ||
| self.index, | ||
| self.name, | ||
| self.data_type, | ||
| tuple(self.nominal_values) if self.nominal_values is not None else None, | ||
| self.number_missing_values, | ||
| tuple(self.ontologies) if self.ontologies is not None else None, | ||
| ) | ||
| ) | ||
|
|
There was a problem hiding this comment.
I am not so sure about the custom implementation of __hash__, I know it's a requirement from pre-commit but we need to make sure we don't just write a bad implementation to satisfy the pre-commit checks
I think if it can be set to None, and that shuts the pre-commit and is right choice in code and no sdk code currently depends on hashing then do it like that:
If we want to implement __hash__, given the implementation of __eq__, doesn't it make more sense to create hash by creating a tuple of tuples by looping over all (key, value) pairs of self.__dict__
There was a problem hiding this comment.
pairs of self.dict
self.__dict__ would return unhashable items which would raise errors, Thats Why I picked immutable/hashable fields
I think if it can be set to None, and that shuts the pre-commit and is right choice in code and no sdk code currently depends on hashing then do it like that:I think if it can be set to None, and that shuts the pre-commit and is right choice in code and no sdk code currently depends on hashing then do it like that:
I have set it None and it does shut the pre-commit failure
There was a problem hiding this comment.
@fkiraly please have a look at this thread.
Is it fine to have __hash__ = None for a class?
Metadata
__repr__Methods toOpenMLSplit#1563ReprMixinfor common__repr__formatting across allOpenMLclassesReprMixinto share__repr__formatting #1595