AI Code AI Code Editor AI Code Review Tools AI Coding Benchmark Screenshot to Code

AI Bias AI Ethics AI Governance Tools AI Hallucination AI Improvement AI Reasoning Artificial General Intelligence Singularity Timing Enterprise Generative AI

AI Chip Makers Cloud GPU Cloud GPU Providers Free Cloud GPU Serverless GPU

AI in Fashion AI Use Cases CRM AI Healthcare AI Use Cases Legal AI Software Logistics AI Manufacturing AI Supply Chain AI

Handwriting Recognition Invoice OCR OCR Accuracy Receipt OCR

Generative AI Copyright Generative AI Services

AI Avatar Generative AI in Email Marketing AI Video Maker Cloud LLM Generative AI Applications Generative AI Finance Generative AI in Education Generative AI in MArketing Generative AI Legal Speech to Text

AI Gateway Chatbot vs Chatgpt Large Language Models Large Language Models Examples Large Language Model Evaluation LLM Orchestration LLM Pricing

Agentic RAG Retrieval Augmented Generation

We follow ethical norms & our process for objectivity.

This research is not funded by any sponsors.

How to achieve data collection ethics? (Best practices)

History of Major Data Collection Lawsuits (Case studies)

Latest Regulations of Data Collection and Protection

Further reading

How to achieve data collection ethics? (Best practices)History of Major Data Collection Lawsuits (Case studies)Latest Regulations of Data Collection and Protection Further reading

Table of contents

How to achieve data collection ethics? (Best practices)History of Major Data Collection Lawsuits (Case studies)Latest Regulations of Data Collection and Protection Further reading

Data Collection

Updated on May 19, 2025

Ethical & Legal AI Data Collection in 2025: Examples & Policies

with Özge Aykaç

See our ethical norms

Ethics is a crucial aspect of life, and similar to its application in our daily lives, ethical considerations should also apply in the tech world.

Disruptive technologies such as AI, ML, Internet of Things (IoT), computer vision, etc., require all sorts of data to operate. This data often includes biometric data, such as facial images and voice recordings. Collecting and managing such data requires various ethical and legal considerations, which, if disregarded, can lead to expensive lawsuits.

See data collection ethics and legal practices that can be considered while sourcing/gathering data to develop and deploy data-hungry AI/ML solutions.

How to achieve data collection ethics? (Best practices)

Extensive research has been done on data collection ethics and how to achieve it; however, there is no golden door to the land of absoluteness. Ethics is more of a process and a culture that needs to be adopted by all contributors (data collectors, developers, decision-makers, sales, marketing, executives, etc.) in developing and implementing an AI/ML solution.

Specifically for data collectors, this section highlights some best practices they can follow:

An illustration showing all 6 data collection ethical considerations, including consensus, clarity, understanding, trust and consistency, awareness and transparency, risk consideration, and ethics training. — Source: O’Reilly

Ethics training

Providing sufficient training about data collection ethics can be beneficial in promoting and adopting the culture. A best practice to ensure the instructions are heeded is using an ethics checklist that the staff should tick off whenever they collect data.

You can also check our data-driven list of data collection/harvesting services to find the option that best suits your business needs.

Obtaining consent is one of the most critical parts of data collection ethics. This is part of the agreement between the data owner and the collector, and should be done. Before the data is collected, for instance, if a smart home device gathers voice data from its user, there should be a notification while setting up the app, giving the user the option to provide consent.

Clarity and understanding

This means that when collectors require user consent, their request should be clearly stated in easily understandable words. The data collectors should ensure that the user fully understands what he/she is permitting.

Trust and consistency

This means ethical and security practices while collecting data should be consistent to build trust in the data provider. For instance, if there are 500 data providers, then all 500 of them should be subjected to equal ethical considerations.

Awareness and Transparency

The data collection process should be transparent. The data provider should know what data is being collected, who will have access to it, and how it will be used.

Additionally, data providers should have control over how their data is used. For instance, if the data provider wants to stop using and sharing data in the future, he/she should be able to opt out easily.

Risk consideration

Another critical point to consider is that the risk of problems occurring in the future can never be eliminated. Therefore, the data collection must assess the risk of such unforeseen events and prepare a mitigation plan. Additionally, the data collector should communicate this risk to the data provider.

History of Major Data Collection Lawsuits (Case studies)

Unethical facial recognition data collection

The Washington Post reported that the U.S. immigration and Customs Enforcement authority unconstitutionally collected facial image data to track the activities of immigrants.

Watch the video to see how JFK airport only gathers facial images of foreigners.

To learn more about facial recognition, check out this quick read.

Voice data collection by smart home devices

Similarly, brands that offer smart home devices have also been under scrutiny for unethically collecting voice (biometric) data from their users.

For instance, Alexa was under a lawsuit for collecting user voice data without consent. A collaborative study by researchers from the University of Washington and three other institutions found this, leading to the lawsuit.

Watch this video to see how smart home devices gather user data:

Latest Regulations of Data Collection and Protection

Updated at 05-19-2025

Regulation	Jurisdiction	Enacted	Scope	Key Requirements
CCPA & CPRA	California, USA	2020	Personal information of residents	Right to opt out of sale, deletion request handling, privacy notice update
Data Security Law & Cybersecurity Law	China	2021/2017	Data localization, critical infrastructure	Data localization, security assessments for “important data,” network operator obligations
PIPL	China	2021	Personal information of Chinese citizens	Explicit consent, DPIA for critical data, cross‑border transfer approvals
GDPR	EU	2018	All personal data	Consent, DPIA, breach notification within 72 hrs, data subject rights
UK Data Protection Act 2018	UK	2018	Mirrors GDPR	Data protection principles, UK‑specific derogations, ICO enforcement powers
GINA	USA	2008	Genetic data	Prohibits use by insurers/employers, requires written consent
COPPA	USA	1998	Data of children under 13	Parental consent, clear privacy policy, data minimization

Europe’s General Data Protection Regulation (GDPR) gives people the right to delete their data from the systems where it was uploaded.
The Children’s Online Privacy Protection Act (COPPA) is in place in the U.S. to protect children’s data. It includes the dos and don’ts of gathering and using children’s data, such as when to take consent from the guardian, where not to use the data, etc.
The Genetic Information Nondiscrimination Act (GINA) in the U.S. protects people’s genetic data from being used by insurance companies, hospitals, and other organizations that might exploit it.
The Federal Trade Commission Act (FTC) in the U.S. also protects consumer data.
The Data Protection Act 2018 is the UK’s version of the GDPR.
Data governance in China is regulated by 3 main laws:

Further reading

If you need help finding a vendor or have any questions, feel free to contact us:

Find the Right Vendors

Share This Article

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

Researched by

Industry Analyst

Özge is an industry analyst at AIMultiple focused on data loss prevention, device control and data classification.

Next to Read

Innodata Review & Top 3 Alternatives in August 2025

Aug 43 min read

AI Data Collection: Risks, Challenges & Tools in 2025

Jul 248 min read

Ethical & Compliant Web Data Benchmark in 2025

Jul 2717 min read

Comments

Your email address will not be published. All fields are required.

0 Comments

Related research

Innodata Review & Top 3 Alternatives in August 2025

Aug 43 min read

Top 10 Data Crowdsourcing Platforms in August 2025

Aug 46 min read