-
-
Notifications
You must be signed in to change notification settings - Fork 5
Update to function as out-of-the-box test server #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c4d4927
d1673a0
456586a
4ab1036
273d8e9
7ac2879
b2e71be
0b185d8
582b732
eeaa47b
73baa3e
99f2e0a
848c73f
a2e5844
5515006
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1 +1 @@ | ||
| server=http://nginx:80/api/v1/xml | ||
| server=http://nginx:8000/api/v1/xml |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,4 @@ | ||
| CONFIG=api_key=AD000000000000000000000000000000;server=http://php-api:80/ | ||
| CONFIG=api_key=abc;server=http://php-api:80/ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand, here the api key is set from
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The evaluation engine needs administrator access currently. |
||
| JAVA=/usr/bin/java | ||
| JAR=/usr/local/lib/evaluation-engine.jar | ||
| LOG_DIR=/logs | ||
| LOG_DIR=/logs | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,16 @@ | ||
| #!/bin/sh | ||
|
|
||
| # We need to remove the default 127.0.0.1 localhost map to | ||
| # ensure the remap to the static nginx ip address is respected. | ||
| # Updating /etc/hosts in place isn't always allowed ("Resource Busy"), | ||
| # directly overwriting it instead seems to bypass that protection. | ||
| cp /etc/hosts /etc/hosts.new | ||
| sed -i '/^127.0.0.1.*localhost/d' /etc/hosts.new | ||
| sed -i -E 's/^(::1\t)localhost (.*)$/\1\2/g' /etc/hosts.new | ||
| cat /etc/hosts.new > /etc/hosts | ||
| rm /etc/hosts.new | ||
|
|
||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For other containers updating /etc/hosts through configuration was sufficient. |
||
| printenv | grep -v HOME >> /etc/environment | ||
|
|
||
| touch /cron.log | ||
| /usr/sbin/crond -l 4 && tail -f /cron.log | ||
| /usr/sbin/crond -l 4 && tail -f /cron.log | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,2 +1,2 @@ | ||
| apikey=AD000000000000000000000000000000 | ||
| server=http://nginx:80/api/v1/xml | ||
| apikey=normaluser | ||
| server=http://localhost:8000/api/v1/xml | ||
|
Comment on lines
+1
to
+2
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... and here the api key is set from So far, these were the keys for developers: has anything changed here? Also what are the api keys for
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This configuration is just for when you spin up a
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Python-based REST API uses the keys that are in the database. The server is unaffected, but I will need to update the keys that are used in its tests. |
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,6 +1,6 @@ | ||
| services: | ||
| database: | ||
| image: "openml/test-database:20240105" | ||
| image: "openml/test-database:v0.1.20260204" | ||
| container_name: "openml-test-database" | ||
| environment: | ||
| MYSQL_ROOT_PASSWORD: ok | ||
|
|
@@ -54,11 +54,15 @@ services: | |
| context: config/nginx | ||
| container_name: openml-nginx | ||
| ports: | ||
| - "8000:80" | ||
| - "8000:8000" | ||
| networks: | ||
| default: | ||
| ipv4_address: 172.28.0.2 | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the static ip address is required so that we can add entries to |
||
|
|
||
|
|
||
| php-api: | ||
| profiles: ["all", "minio", "rest-api", "frontend", "evaluation-engine"] | ||
| image: openml/php-rest-api:v1.2.1 | ||
| image: openml/php-rest-api:v1.2.4 | ||
| container_name: "openml-php-rest-api" | ||
| ports: | ||
| - "8080:80" # also known as /api (nginx) | ||
|
|
@@ -78,6 +82,8 @@ services: | |
| start_interval: 5s | ||
| timeout: 3s | ||
| interval: 1m | ||
| extra_hosts: | ||
| - "localhost=172.28.0.2" | ||
|
|
||
| email-server: | ||
| profiles: ["all", "frontend"] | ||
|
|
@@ -95,7 +101,7 @@ services: | |
|
|
||
| frontend: | ||
| profiles: ["all", "frontend"] | ||
| image: openml/frontend:dev_v2.0.20251111 | ||
| image: openml/frontend:v2.1.1 | ||
| container_name: "openml-frontend" | ||
| ports: | ||
| - "8081:5000" # also known as / (nginx) | ||
|
|
@@ -108,7 +114,7 @@ services: | |
|
|
||
| minio: | ||
| profiles: ["all", "minio", "evaluation-engine"] | ||
| image: openml/test-minio:v0.1.20241110 | ||
| image: openml/test-minio:v0.1.20260204 | ||
| container_name: "openml-minio" | ||
| ports: | ||
| - "9000:9000" # also known as /data (nginx) | ||
|
|
@@ -133,6 +139,8 @@ services: | |
| depends_on: | ||
| php-api: | ||
| condition: service_healthy | ||
| extra_hosts: | ||
| - "localhost=172.28.0.2" | ||
|
|
||
| croissants: | ||
| profiles: ["all"] | ||
|
|
@@ -157,7 +165,15 @@ services: | |
| depends_on: | ||
| php-api: | ||
| condition: service_healthy | ||
| extra_hosts: | ||
| - "localhost=172.28.0.2" | ||
|
|
||
| networks: | ||
| default: | ||
| name: openml-services | ||
| ipam: | ||
| driver: default | ||
| config: | ||
| - subnet: 172.28.0.0/16 | ||
| ip_range: 172.28.1.0/24 | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,135 @@ | ||
| #!/bin/bash | ||
| # This test assumes services are running locally: | ||
| # `docker compose --profile all up -d` | ||
| # | ||
| # It tests some of the most important services, but is by no means comprehensive. | ||
| # In particular, also at least check the frontpage in a browser (http://localhost:8000). | ||
|
|
||
| set -e | ||
|
|
||
| assert_contains() { | ||
| if echo "$1" | grep --ignore-case -q "$2"; then | ||
| echo "PASS: output contains '$2'" | ||
| else | ||
| echo "FAIL: output does not contain '$2'" | ||
| echo "Full output:" | ||
| echo "$1" | ||
| exit 1 | ||
| fi | ||
| } | ||
|
|
||
| assert_url_exists() { | ||
| if curl --output /dev/null --silent --head --fail --location "$1"; then | ||
| echo "PASS: $1 exists" | ||
| else | ||
| echo "FAIL: $1 does not exist" | ||
| exit 1 | ||
| fi | ||
| } | ||
|
|
||
| # nginx redirects request to the home page | ||
| HOME_PAGE=$(curl -s http://localhost:8000) | ||
| assert_contains "$HOME_PAGE" "OpenML is an open platform for sharing datasets" | ||
|
|
||
| DATASET_URL=http://localhost:8000/minio/datasets/0000/0020/dataset_37_diabetes.arff | ||
| DESCRIPTION_URL=http://localhost:8000/api/v1/json/data/20 | ||
|
|
||
| # The JSON response may contain escaped slashes (e.g. http:\/\/), so strip them | ||
| DESCRIPTION=$(curl -s "$DESCRIPTION_URL" | sed 's/\\//g') | ||
| assert_contains "$DESCRIPTION" "diabetes" | ||
|
|
||
| wget "$DATASET_URL" -O dataset.arff | ||
| assert_contains "$(cat dataset.arff)" "@data" | ||
| rm dataset.arff | ||
|
|
||
| if [ -d .venv ]; then | ||
| echo "Using existing virtual environment for dataset upload." | ||
| else | ||
| echo "Creating virtual environment for dataset upload." | ||
| python -m venv .venv | ||
| source .venv/bin/activate | ||
| python -m pip install uv | ||
| uv pip install openml numpy | ||
| fi | ||
|
|
||
| echo "Attempting dataset upload" | ||
|
|
||
| DATA_ID=$(.venv/bin/python -c " | ||
| import numpy as np | ||
| import openml | ||
| from openml.datasets import create_dataset | ||
|
|
||
| openml.config.server = 'http://localhost:8000/api/v1/xml' | ||
| openml.config.apikey = 'normaluser' | ||
|
|
||
| data = np.array([[1, 2, 3], [1.2, 2.5, 3.8], [2, 5, 8], [0, 1, 0]]).T | ||
| attributes = [('col_' + str(i), 'REAL') for i in range(data.shape[1])] | ||
|
|
||
| dataset = create_dataset( | ||
| name='test-data', | ||
| description='Synthetic dataset created from a NumPy array', | ||
| creator='OpenML tester', | ||
| contributor=None, | ||
| collection_date='01-01-2018', | ||
| language='English', | ||
| licence='MIT', | ||
| default_target_attribute='col_' + str(data.shape[1] - 1), | ||
| row_id_attribute=None, | ||
| ignore_attribute=None, | ||
| citation='None', | ||
| attributes=attributes, | ||
| data=data, | ||
| version_label='test', | ||
| original_data_url='http://openml.github.io/openml-python', | ||
| paper_url='http://openml.github.io/openml-python', | ||
| ) | ||
| dataset.publish() | ||
| print(dataset.id) | ||
| ") | ||
|
|
||
| # Make sure DATA_ID is an integer, and not some Python error output | ||
| if ! echo "$DATA_ID" | grep -q '^[0-9]\+$'; then | ||
| echo "FAIL: DATA_ID is not an integer: '$DATA_ID'" | ||
| exit 1 | ||
| fi | ||
|
|
||
| NEW_DATASET_URL=$(curl -s http://localhost:8000/api/v1/json/data/169 | jq -r ".data_set_description.url") | ||
| assert_url_exists "$NEW_DATASET_URL" | ||
| wget "$NEW_DATASET_URL" -O new_dataset.arff | ||
| assert_contains "$(cat new_dataset.arff)" "@data" | ||
| rm new_dataset.arff | ||
|
|
||
| # Wait for the dataset to become active, polling every 10 seconds for up to 2 minutes | ||
| WAITED=0 | ||
| while [ "$WAITED" -lt 120 ]; do | ||
| DATASET_STATUS=$(curl -s "http://localhost:8000/api/v1/json/data/${DATA_ID}") | ||
| if echo "$DATASET_STATUS" | grep -q "active"; then | ||
| echo "PASS: dataset $DATA_ID is active (after ${WAITED}s)" | ||
| break | ||
| fi | ||
| echo "Waiting for dataset $DATA_ID to become active... (${WAITED}s elapsed)" | ||
| sleep 10 | ||
| WAITED=$((WAITED + 10)) | ||
| done | ||
|
|
||
| if [ "$WAITED" -ge 120 ]; then | ||
| echo "FAIL: dataset $DATA_ID did not become active within 120s" | ||
| echo "Full output:" | ||
| echo "$DATASET_STATUS" | ||
| exit 1 | ||
| fi | ||
|
|
||
| echo "Checking parquet conversion" | ||
| PADDED_ID=$(printf "%04d" "$DATA_ID") | ||
| NEW_PARQUET_URL="http://localhost:8000/minio/datasets/0000/${PADDED_ID}/dataset_${DATA_ID}.pq" | ||
| wget "$NEW_PARQUET_URL" | ||
| DATA_SHAPE=$(.venv/bin/python -c "import pandas as pd; df = pd.read_parquet(\"dataset_${DATA_ID}.pq\"); print(df.shape)") | ||
| assert_contains "${DATA_SHAPE}" "(3, 4)" | ||
| rm "dataset_${DATA_ID}.pq" | ||
|
|
||
| CROISSANT_URL="http://localhost:8000/croissant/dataset/${DATA_ID}" | ||
| CROISSANT_NAME=$(curl -s ${CROISSANT_URL} | jq -r ".name") | ||
| assert_contains ${CROISSANT_NAME} "test-data" | ||
|
|
||
| ES_RESPONSE=$(curl -s "http://localhost:8000/es/data/_doc/${DATA_ID}") | ||
| assert_contains "${ES_RESPONSE}" "test-data" |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These removed updates are now embedded in the state of the database on the new image