The ethical questions surrounding Natural Language Processing (NLP) pertain to how NLP systems are used, perceived as human rather than machine-generated, and who has access to them. As NLP depends on access to large amounts of publicly available text and massive computational power, it raises concerns about privacy, consent, and sustainability. Large, public data sets are human-generated and contain human bias, which can be reflected in predictions in the NLP models and skew their representativeness. Lastly, NLP prioritizes frequently spoken languages, largely English, which can widen the global digital divide. Ensuring the ethical development of NLP requires human-based interventions. This can be done by asking who will benefit from the system and who might be harmed, whether raw data is representative or reinforces bias, and ensuring that our NPL model training is objective, among other strategies.
Dan Goldwasser is an Associate Professor in the Department of Computer Science at Purdue University. He is broadly interested in connecting natural language with real-world scenarios and using them to guide natural language understanding.
Leading Ethically in the Age of AI and Big Data is the result of a grant to Purdue University’s College of Liberal Arts from the Lilly Endowment, Inc. Learn more about the project at https://bit.ly/LeadingEthically. A new Purdue Bachelor of Arts major in Artificial Intelligence will teach students to ask fundamental questions about intelligence and the ethical principles guiding the development of AI. Learn more at https://bit.ly/PurdueAI.