Machine learning algorithms have recently been found to be somewhat prejudice. This might sound strange, but after a little thought it makes a lot of sense. A machine learning algorithm used to decide if a person is eligible or not, a risk or not, or make some other decision (a classification problem), maybe for a mortgage, or insurance, this algorithm requires historical data to learn a decision model. The algorithm is trained on past data, to find the model that will make decisions most similar to those made in the past; do you see the problem already? In the past, decisions have been made based on race, gender and other features that today are unethical, if not illegal [The racist housing policy that made your neighbourhood]. This means that algorithms are using yesterdays (sometimes unethical) decisions to learn how to best decide for today and tomorrow. This has caused a bit of a stir in the government with the Whitehouse publishing a report on ethics in Big Data to address the issue [White House techies explore the intersection of big data and ethics]. Some comments have been made to simply not use gender or race directly in the algorithms, and these people can't see the big problem. The problem is much more complicated than this though as there are many dependencies within the data from which discrimnation on sex and gender and other sensitive features can be made indirectly. An example above with the redlining of properties in the US (where African-American residents were literally redlined on the maps for insurance decision purposes) is that the use of address would be an indirect decision based on race. This paper addresses decision making on these sensitive features, taking indirect relations into account and adjusting the amount of noise based on how strongly related features are to the sensitive features.
For further reading on this interesting topic: