Created by Eduardo Davalos, Yike Zhang, Namrata Srivastava, Yashvitha Thatigolta, Jorge A. Salas, Sara McFadden, Cho Sun-Joo, Amanda Goodwin, Ashwin TS, and Guatam Biswas from Vanderbilt University, Trinity University, and St. Mary's University
As detailed in our paper, the Python version of WebEyeTrack performs the two-step training process: (1) autoencoder and (2) MAML. For first step of training aims to formulate a feature extractor for the gaze embedding that is gaze-aware and effective at translating to PoG estimation. In the second training step performs Model Agnostic Meta-Learning (MAML) to help transform the gaze estimation model into a meta-learner that is more capable of adapting to new unseen persons.
For the following repo, make sure to use Tensorflow/Keras 2.x and not the latest Tensorflow/Keras 3.x since the latest version of Keras isn't fully support in TensorflowJS and results in issues when trying to use the required LayersModel for on-device model adaptation.
Tested setup:
- Ubuntu 24.04
- Python 3.10
- Tensorflow 2.11.1
- TensorflowJS 3.21.0
Aside from this warning, run the following command to install the webeyetrack package
git clone https://github.com/redforestai/webeyetrack
cd python
pip install .The demo can be located within the demo directory. This demo uses OpenCV to visualize the webcam and a 1:1 display of the screen location in an image. Make sure to install the webeyetrack package first. To run the demo, run the following command:
cd demo
pip install requirements.txt
python main.pyTo use the webeyetrack package, you need to create a WebEyeTrack tracker instance.
from webeyetrack import WebEyeTrack, WebEyeTrackConfig
wet = WebEyeTrack(
WebEyeTrackConfig(
# Add your screen's meta information here
# You can use ``screeninfo`` (Windows/Linux)
# or ``Quartz`` for MacOS
screen_px_dimensions=(1920, 1080),
screen_cm_dimensions=(32, 18),
)
)Below is a simple example of reading from the webcam via OpenCV and processing the frame.
from webeyetrack import TrackingStatus
cap = cv2.VideoCapture(0)
while True:
ret, frame = self.cap.read()
if ret:
# Process the frame with WebEyeTrack
status, gaze_result, detection = self.wet.process_frame(frame)
if status == TrackingStatus.SUCCESS:
# If successful, perform your logic here
print(gaze_result.norm_pog)Perform new person adaptation, use the one of the adapt methods, such as adapt_from_gaze_results. Simply provide a list of prior gaze_results with their ground truth normalized PoG.
gaze_results = [...] # list of prior gaze results
norm_pogs = [...] # list of matching groudn truth PoG
wet.adapt_from_gaze_results(gaze_results, norm_pogs)The WebEyeTrack object returns a GazeResults object with a status regarding the tracking of the persons' face. Below is a break down of the attributes of GazeResults:
| Field Name | Type | Shape / Values | Description |
|---|---|---|---|
| facial_landmarks | np.ndarray |
[N, 5] | Detected facial landmarks |
| face_rt | np.ndarray |
[4, 4] | Face rotation-translation matrix |
| face_blendshapes | np.ndarray |
[N, 1] | Blendshape coefficients for facial expression |
| eye_patch | np.ndarray |
[H, W, 3] | RGB image of the eye region |
| head_vector | np.ndarray |
[3,] | Head direction vector in camera coordinates |
| face_origin_3d | np.ndarray |
[3,] (X, Y, Z) | 3D coordinates of the face origin |
| metric_face | np.ndarray |
[N, 486] | Reconstructed 3D face mesh |
| metric_transform | np.ndarray |
[4, 4] | Transformation matrix applied to metric face |
| gaze_state | 'open' or 'closed' |
Literal | Blink state of the eye |
| norm_pog | np.ndarray |
[2,] (X, Y) | Normalized point-of-gaze on the screen |
| durations | dict[str, float] |
— | Metadata timing for different processing stages |
| Value | Meaning |
|---|---|
| 0 | FAILED |
| 1 | SUCCESS |
The normalized PoG is from range [[-0.5, 0.5], [-0.5, 0.5]] where the origin (0,0) is located at the center of the screen. The positive Y axis is pointing downwards and the positive X axis is pointing toward the right.
For the training, we download the original datasets provided by MPIIFaceGaze, EyeDiap, GazeCapture, and Eye of the Typer (WebGazer) and performed our own preprocessing to ensure that our data normalization was possible to match in the browser. Methods such as cv2.solvePnP are challenging to translate and therefore should be avoided.
Before starting to download the datasets, please make sure to have more than 1.25 TB available to account for all datasets. Account that you will be downloading a zip and unzipping will increase memory twice over.
Download the following original datasets using the links below:
- GazeCapture: ~200 GB
- MPIIFaceGaze: ~6 GB
- EyeDiap:
- Eye of the Typer:
For GazeCapture, MPIIFaceGaze, and EyeDiap, place these datasets within the data directory at the root of the GitHub repository.
For Eye of the Typer, this dataset is more challenging to perform the preprocessing and needs to be placed within the WebGazer repository to prevent failing relative imports. Make sure to git clone the WebGazer repository and place the Eye of the Typer dataset within the www/data/src directory.
Similar to other gaze estimation methods, we perform a data normalization step to isolate key features (eye patch, head vector, and face position) to make the model training faster and more efficient. However, many of the prior preprocessing steps were either too slow or dependent on functions that aren't easily portable to JavaScript. Therefore, we have modified the data normalization from Park et al. 2019 to make it faster and real-time.
To perform the preprocessing for GazeCapture, MPIIFaceGaze, and EyeDiap, run the following commands:
cd python/scripts/preprocessing
python preprocess.py --dataset MPIIFaceGaze
python output_space.py --dataset MPIIFaceGaze
python preprocess.py --dataset GazeCapture
python output_space.py --dataset GazeCapture
python preprocess.py --dataset EyeDiap
python output_space.py --dataset EyeDiapBeware that this preprocessing routines can take up to hours and will consume more memory. The resulting preprocessing datasets will be stored in the data/generated directory.
For Eye of the Typer dataset preprocessing, following the instructions provided in the dataset webpage.
For training, we use configuration files to specify the runs parameters and keep record of the config of prior runs, as this is stored with the logs. The training losses, metrics, and resulting trained models will be stored within the logs directory inside the ml_routines via Tensorboard format. Below is how to run the training for all image datasets:
cd python/scripts/ml_routines
python train_autoencoder.py --config configs/train/autoencoder_mpiifacegaze_config.yml
python train_autoencoder.py --config configs/train/autoencoder_gazecapture_config.yml
python train_autoencoder.py --config configs/train/autoencoder_eyediap_config.ymlFor maml training, make sure to change the weight_fp parameter to point to your autoencoder checkpoint. To make this work, place the generated log directory inside the saved_models directory inside ml_routines allow prior checkpoint loading.
cd python/scripts/ml_routines
python train_maml.py --config configs/train/maml_mpiifacegaze_config.yml
python train_maml.py --config configs/train/maml_gazecapture_config.yml
python train_maml.py --config configs/train/maml_eyediap_config.ymlTo observe the training logs, run tensorboard with the following command:
cd python/scripts/ml_routines
python tensorboard --logdir logsSimilar to training, the evaluation uses configuration files to help explore parameters. Make sure to move the trained model directories inside the saved_model directory and modify the config files weight_fp to properly load the trained model checkpoint for the purposes of evaluation. To run the evaluations, use the following commands:
# Image Datasets
cd python/scripts/ml_routines
python eval_maml.py --config configs/eval/maml_mpiifacegaze_config.yml
python eval_maml.py --config configs/eval/maml_gazecapture_config.yml
python eval_maml.py --config configs/eval/maml_eyediap_config.ymlThe research reported here was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant R305A150199 and R305A210347 to Vanderbilt University. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education.
If you use this work in your research, please cite us using the following:
@misc{davalos2025webeyetrack,
title={WEBEYETRACK: Scalable Eye-Tracking for the Browser via On-Device Few-Shot Personalization},
author={Eduardo Davalos and Yike Zhang and Namrata Srivastava and Yashvitha Thatigotla and Jorge A. Salas and Sara McFadden and Sun-Joo Cho and Amanda Goodwin and Ashwin TS and Gautam Biswas},
year={2025},
eprint={2508.19544},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.19544}
}WebEyeTrack is open-sourced under the MIT License, which permits personal, academic, and commercial use with proper attribution. Feel free to use, modify, and distribute the project.