This is a placeholder with no clear deliverable :).
Mostly for open convos to get some thoughts on how helpful it would be.
I think reclist would benefit from the ability to compare models in the context of list output.
The reasoning for this is that reclist allow you to compare models at a "pointwise" level (IE. If you want to use Ndcg for 2 models you will compute it on each model independently and then compare the results) is only giving partial informations.
For example we don't know how different the recommendations are when it comes to the content.
A naive approach for example is just to do Jaccard similarity between the list of the 2 model. Other metrics could be weighted Kendall Tau correlation or rbo (https://github.com/changyaochen/rbo).
At a high level this is to answer the question how different theses 2 model are looking at the content and ranking difference between them rather than comparing metrics computed on each model independently.
Let me know if people have any thoughts if that's an approach we should consider.
If so, I am happy to take a stab but would welcome some small guidance so it follows whatever framework you will like.
This is a placeholder with no clear deliverable :).
Mostly for open convos to get some thoughts on how helpful it would be.
I think reclist would benefit from the ability to compare models in the context of list output.
The reasoning for this is that reclist allow you to compare models at a "pointwise" level (IE. If you want to use Ndcg for 2 models you will compute it on each model independently and then compare the results) is only giving partial informations.
For example we don't know how different the recommendations are when it comes to the content.
A naive approach for example is just to do Jaccard similarity between the list of the 2 model. Other metrics could be weighted Kendall Tau correlation or rbo (https://github.com/changyaochen/rbo).
At a high level this is to answer the question how different theses 2 model are looking at the content and ranking difference between them rather than comparing metrics computed on each model independently.
Let me know if people have any thoughts if that's an approach we should consider.
If so, I am happy to take a stab but would welcome some small guidance so it follows whatever framework you will like.