Historical Genealogies of Data Colonialism: From Colonial Censuses to Digital AI Systems
- DOI
- 10.2991/978-2-38476-533-1_59How to use a DOI?
- Keywords
- Linguistic Imperialism; AI Ethics; Indigenisation
- Abstract
Artificial intelligence is widely celebrated as a transformative force in the twenty-first century, yet its foundations reveal profound continuities with older systems of domination. Far from representing a clean rupture with the past, AI reproduces logics of appropriation and erasure that can be traced back to colonial regimes of knowledge extraction. This paper interrogates these continuities through the lens of data colonialism, situating contemporary AI practices within a longer genealogy of exploitation. During the British colonial period, the census, ethnographic surveys, and cartographic projects served as technologies of governance, converting social and cultural complexity into rigid categories designed for control. These archives, far from neutral, functioned as instruments of epistemic violence, shaping political hierarchies and silencing indigenous knowledge systems. In the present, AI operates through parallel mechanisms: the harvesting of massive datasets from online platforms, social media, and user-generated content without consent; the unauthorised use of personal images and creative works for training models; and the commodification of cultural production for algorithmic reproduction. Such practices echo colonial disregard for agency, positioning human life and creativity as resources to be mined. The persistence of linguistic imperialism compounds these dynamics, with English entrenched as the dominant medium of AI. This linguistic asymmetry marginalises indigenous epistemologies, distorts representation in underrepresented languages, and privileges Eurocentric perspectives. AI’s dependence on frequently searched and widely circulated content further amplifies mainstream narratives, producing outputs that reinforce silences around marginalised communities. The paper argues that addressing these challenges requires the indigenisation of AI, embedding linguistic diversity, centering cultural autonomy, and enforcing consent-driven data practices. By recovering the historical roots of extraction and domination, the study demonstrates that decolonial perspectives are indispensable for reimagining AI ethics and ensuring that intelligent systems foster plural, inclusive, and socially just futures, rather than reproducing the epistemic violence of colonial archives.
- Copyright
- © 2025 The Author(s)
- Open Access
- Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
Cite this article
TY - CONF AU - Susmita Banerjee PY - 2025 DA - 2025/12/31 TI - Historical Genealogies of Data Colonialism: From Colonial Censuses to Digital AI Systems BT - Proceedings of the International Conference on Smart Systems and Social Management (ICSSSM-2 2025) PB - Atlantis Press SP - 991 EP - 1004 SN - 2352-5398 UR - https://doi.org/10.2991/978-2-38476-533-1_59 DO - 10.2991/978-2-38476-533-1_59 ID - Banerjee2025 ER -