Data and Algorithmic Bias in the Web - Ricardo Baeza-Yates - WCCI 2016
The Web is the largest public big data repository that humankind has created. In this overwhelming data ocean we need to be aware of the quality and, in particular, of biases that exist in this data, such as redundancy, spam, etc. These biases affect the machine learning algorithms that we design to improve the user experience. This problem is further exacerbated by biases that are added by these algorithms, especially in the context of recommendation systems. We give several examples and their relation to sparsity, novelty, and privacy, stressing the importance of the user context to avoid these biases.
The Web is the largest public big data repository that humankind has created. In this overwhelming data ocean we need to be aware of the quality and, in particular, of biases that exist in this data, such as redundancy, spam, etc. These biases affect the machine learning algorithms that we design to improve the user experience. This problem is further exacerbated by biases that are added by these algorithms, especially in the context of recommendation systems. We give several examples and their relation to sparsity, novelty, and privacy, stressing the importance of the user context to avoid these biases.