Skip to content

dscc-admin-ch/lomas

Repository files navigation

Lomas logo.


GitHub License PyPI - Python Version Documentation GitHub Actions Workflow Status Coverage badge CodeQL PyPI - Version Checked with mypy Ruff Code style: black

Lomas: The Data Oases Hidden Behind the Mist.

Lomas is a platform for remote data science, enabling sensitive data to be queried remotely while staying protected by a layer of differential privacy.

Technical Overview:

The lomas platform follows a classic server/client model. On the client side, the user prepares queries for statistical analyses which are sent to the service's REST API via HTTP. The user never has direct access to the sensitive data. On the server side, the service is implemented in a micro-service architecture and is thus split into two parts: the administration database and the client-facing HTTP server (which we call server for brevity) that implements the service logic. The server is responsible for processing the client requests and updating its own state as well as administrative data (users data, budgets, query archives, etc.) in the administration database.

The service is not responsible for storing and managing private datasets, these are usually already stored on the provider's infrastructure.

Detailed description:

For a detailed description, please see the links below.

Client package lomas_client

The lomas_client library is a client to interact with the Lomas server. It is available on Pypi. Reasearcher and Data Scientists 'using' the service to query the sensitive data will only interact with the client and never with the server.

Utilizing this client library is strongly advised for querying and interacting with the server, as it takes care of all the necessary tasks such as serialization, deserialization, REST API calls, and ensures the correct installation of other required libraries. In short, it enables a seamless interaction with the server.

For additional informations about the client, please see the README.md of the client and for addictional examples please see the Demo_Client_Notebook.ipynb.

Server

The server is implemented in a micro-service architecture and is thus split into multiple parts:

  • The client-facing HTTP server (which we call server for brevity) handles incoming requests and manages the administration database (Python Shelf).
  • The administration database: as stated above, it is directly managed by the server and persisted on local disk (Python Shelf). The database serves as a repository for users and metadata about the datasets. User-related data include access permissions to specific datasets, allocated and used DP-budgets as well as query archives (past executed queries and their result). User role is also stored in the database (ie. admin or standard user). Dataset-related data includes information such as dataset names, links to credentials for accessing the sensitive datasets and dataset metadata for DP-related operations.
  • The workers run user queries.
  • RabbitMQ acts as a queue between the server and the workers. It is also used to implement RPC calls from the workers to the server (e.g. admin database calls).
  • The admin dashboard provides a graphical interface for Lomas administrators to interact with the server. User creation, budget updates as well as dataset updates can all be executed through the dashboard.
  • Telemetry: All components send metrics and logs to Opentelemetry-collector. The Grafana dashboard can be used to visualize the collected data.

Lomas is not responsible for storing and managing private datasets, these are usually already stored on the provider's infrastructure (private database in the sketch above). We currently implement adapters to S3 storage, http file download and local files.

Deployment

We aim to facilitate the platform configuration, deployment and testing on commonly available IT infrastructure for NSOs and other potential users. In this regard, we provide two Helm charts for deploying the server components and a client development environment in a Kubernetes cluster.

For extensive informations about how to deploy, please refer to our online documentation.

Disclaimer

Lomas is a Proof of Concept that is still under development.

The overall infrastructure security is not our current priority. While attention has been given to the 'logical' aspects within the server, many security aspects are not handled. For example, user authentication is not implemented. However, Lomas can be integrated into other secure infrastructures.

We welcome any feedback or suggestions for future improvements. External input is valuable as we continue to enhance the security and functionality of Lomas. Please open a bug report or issue here: https://github.com/dscc-admin-ch/lomas/issues#open.

History

The starting point of our platform was the code shared to us by Oblivious. They originally developed a client/server platform for the UN PET Lab Hackathon 2022.

About

Lomas: The data oases hidden behind the mist. A platform for confidential analysis of private data.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors