Ethical & Compliant Web Data Benchmark in 2026

updated on Jan 27, 2026

As enterprises scale their web data operations, compliance, data, and risk executives increasingly evaluate the associated ethical, reputational, and legal risks.

We benchmarked 5 leading web data collection services across 3 dimensions and tested each service with more than 20 potentially unethical scenarios.

Our work helps you assess the ethical standing of your data collection practices and understand the potential consequences of unethical approaches. We also provide guidelines for ethical web data collection and assess web data collection services from an ethics and compliance perspective:

Assessment of web data collection services

We evaluated leading web data collection services (also called web data providers or web data infrastructure) using our ethical web data checklist. These scores represent maturity levels with 5 being the highest level:

Providers	Summary	Ethical use by customers	Ethical supply	External certification	Insurance coverage shared**
Bright Data	Level 5	Level 5	Level 5	Data security, PII processing. IP sources whitelisted. Ethical practices evaluated.	✅
Apify	Level 1	Level 1	Level 1	Data security	✅
Zyte	Level 1	Level 1	Level 1	Data security	✅
NetNut	Level 1	Level 1	Level 0	Data security	TBD
Nimble	Level 1	Level 1	Level 0	Data security	❌

* These are codes for vendor names. These vendors did not want to be referenced in this report and are listed at the bottom of the list until we resolve this issue.

** ✅ indicates that the company chose to share its insurance certificates with AIMultiple. ❌ indicates that the company decided not to share its insurance certificates with us and therefore we couldn’t validate their insurance cover. Insurance cover is the only category where we relied on web data services companies’ participation to evaluate them.

Sorted by summary score.

Scoring model for ethical web data

Below, we outline how these scores are derived. You can also see the rationale for selecting these scoring dimensions.

In the first 2 categories, we identified 5 competencies, and companies received scores based on the number of competencies that they satisfied. Level 5 represents the highest maturity observed in the market, reflecting current best practices rather than perfection.

Capabilities for ethical use by customers

Effective processes for ethical use: We assess each provider’s ability to prevent unethical use of their residential proxy services through controlled testing scenarios. If any one of our requests gets blocked by the provider, then this is achieved.
Improved processes for ethical use: Similar to “effective processes for ethical use”. However, this capability denotes that the service provider blocked more than one of our attempts to use their services for unethical use cases.
Best practice processes for ethical use: Similar to “effective processes for ethical use”. However, this capability denotes that the service provider blocked most of our attempts to use their services for unethical use cases.
Abuse management foundation: Publishing abuse management policy and a method to report abuse
Responsive abuse management: We measured how companies responded to multiple abuse reports. Even if there was no hotline for reporting abuse, we used the emails listed by the company to reach their team. If we didn’t receive any responses to our report within a week, the company is assumed to be unresponsive.

Capabilities for ethical supply

Ethical supply involves acquiring IP addresses in an ethical manner. Our market analysis identified the following levels of transparency regarding ethical IP supply:

Level 1: Published IP sourcing policy.
Level 2: Disclosed at least a source (e.g. a mobile app) for IPs that supplies IPs in an ethical manner. Disclosed source should have in total at least 10k reviews on third-party platforms, including Google, Apple, Amazon app stores, and Trustpilot.
Level 3: Same as Level 3 but with 100k reviews
Level 4: Same as Level 3 but with 1M reviews
Level 5: Same as Level 4 but with 10M reviews

Reviews are an indicator of the popularity of apps and are an important signal for this assessment. Web data collection services need to work with popular applications to be able to satisfy the IP needs of their customers.

For qualification, the disclosed apps should follow these best practices. We will not check this for every disclosed app, but check it for a few randomly selected ones:

Informed consent:
- Users need to opt-in before sharing their internet connection. The opt in screen should outline:
  - The provider
  - The service
  - How their IP will be used
- Users should be able to access detailed info on
  - How their internet connection will be used
  - Privacy policy
Value: Users must receive some value from the app (e.g. payment, ability to skip ads or some other functionality)
Privacy: Limited and transparent user data collection.

External certification

We evaluated external certification based on whether companies acquired these certificates relevant to enterprise-grade security and compliance.

PII certification: Demonstrated capability to manage PII by acquiring ISO 27018
Data security certification: Demonstrated data security practices by acquiring one of these certificates: SOC 2 or ISO/IEC 27001
IP source whitelisted: External certification providers like McAfee certify either:
- Specific 3rd party apps that supply IPs
- SDK that collects IPs from 3rd party apps
Ethical practices evaluated: An ISAE 3000 assurance project can be completed to evaluate internal compliance and ethics practices.

Insurance

We asked vendors to provide us these insurance documents:

Professional liability insurance certificate providing coverage for vendors’ liabilities in case of issues in the service
Cyber insurance certificate providing coverage for vendors’ liabilities in case of information security-related issues.

Summary score

This score is the sum of all scores divided by 3. The scores are:

0 to 5 for capabilities for ethical use by customers
0 to 5 for capabilities for ethical supply
0 to 3 for external certification
0 to 2 for insurances

Leading web data collection services

AIMultiple selected the largest 7 web data collection services in terms of employees on LinkedIn. We chose this metric since it is both public and should be correlated with the company’s revenues and enterprise-readiness. Better metrics such as revenues or the number of employees on payroll are not publicly available for these private companies.

All of the selected companies have more than 100 employees connected to their LinkedIn profile pages in April 2025. Currently 5 out of the selected 7 are displayed on this page and the remaining 2 companies have chosen not to be included in the report.

Web data collection products in focus

These companies provide a range of products including proxies, data scraping APIs and datasets. While all products can be examined from an ethical perspective, we initially focused on the product that provides the highest level of flexibility and powers most other products: Residential proxies.

Web data collection products can be considered as a hierarchy where proxies form the core layer upon which all other services are built. This is because proxies allow machines to access the internet through different destinations, allowing a diverse and large set of internet connections crucial for data collection. Therefore, proxies are the most capable web data collection product, it can be used to carry out functions that would not be possible with datasets or data scraping APIs.

Among proxies, residential proxies are the product which is the hardest for websites to identify as a proxy. For example, other proxies such as datacenter proxies are easy-to-identify given their location. Therefore, residential proxies power most other web data products like data scraping APIs.

Verify: Is your web data collection compliant & ethical?

Your business is most probably leveraging web data. However, the industry faces limited regulation, making it important to choose an ethical and compliant provider. To achieve that, we prepared a holistic framework to take into account different aspects of web data collection including ethical sourcing, ethical usage and external certification.

Web data is a common operational asset

As an enterprise, your business partially relies on web data because of its numerous use cases like:

Dynamic pricing for retail & e-commerce
Real-time alt data for investment funds
KYC process in commercial banking
AI model training or finetuning
AI inference or RAG
Market research

With AI, web data is now more important

Though web data collection is as old as the web, its importance increased drastically after the rise of generative AI models. Builders of these models such as OpenAI and Anthropic started out without any significant content partnerships and used mainly online data to build their initial models which has led to the rise of the trillion dollar AI industry.

Limited regulatory oversight

Although AI regulation is under spotlight, the data collection industry remains mostly unregulated in most countries. Clear illegal online activities are well defined. However, there are limited regulatory requirements for industry players to proactively prevent misuse of their services by users.

It is up to the platforms themselves to set best practices and compliance standards to ensure ethical data collection and proxy usage. Therefore, your choice of vendor matters more in data collection compared to heavily regulated industries like banking where every service provider is required to abide by numerous regulations.

Your suppliers’ ethical stance is part of your company’s reputation

Regardless of whether you collect or consume the data, you are responsible for its acquisition process.

Enterprises’ responsibilities for unlawful activities in their supply chain depends on the jurisdiction. For example, in Germany, enterprises are responsible to carry out KYS and risk management activities to identify and prevent harms caused by their supply chain. Even when companies are not responsible for harms caused by their supply chain, they can suffer reputational risk.

What is the cost of unethical & noncompliant data collection?

Reputational risk

If it becomes public that an enterprise is leveraging a web data collection service which engages in unethical behavior or actions that endanger its data security, this can lead to significant reputational damage such as lost business, customer churn, talent churn and loss of investor confidence.

Real-life examples of enterprise suppliers’ leading to reputational loss:

Nike has suffered reputational damage numerous times due to its suppliers’ unethical labor practices.¹
Many enterprises like EY lost their customers’ trust when they were affected by the MOVEit managed file transfer software breach. ²

Legal risk

Reputational loss, especially that leads to public outrage, is typically followed by lawsuits from the company’s customers or other stakeholders who have been harmed by the unethical practices.

Real-life example: Starbucks is one of the recent brands to be sued over sourcing from companies with unethical practices.³

Ethical web data checklist

Enterprise web data needs to satisfy 3 requirements to be ethical:

Ethical use by customers

As part of their Know Your Supplier processes, enterprises avoid using services that enable unethical activities. Using such services exposes businesses to reputational harm.

Real-world example: In cases where a provider was documented while allowing its platform to be used in unethical activities, numerous enterprises distanced themselves from the provider until it improved its practices.⁴

How this relates to web data: Web data is collected via different IP addresses. These addresses can be used to engage in different unlawful activities such as DDOS attacks to prevent digital services delivery, unauthorized non-public data collection or ad fraud. Bad actors need IPs to power their actions and web data infrastructure/proxy providers are the largest suppliers of IPs to retail users.

Ethical supply

Services used for ethical purposes can cause unethical and harmful actions during their production. For example, brands like Nike and Nestle suffered reputational harm and faced lawsuits due to their contractors’ use of child labor.

How this relates to web data:

Businesses need to access a large number of and diverse sources of bandwidth for rapid and global data collection. This requires the use of residential proxies: While collecting public data is legal under many conditions, ⁵ websites can also choose to block some of their visitors. For example, they can block their competitors’ crawlers. In such cases, businesses need to rely on a large number of connections from retail users or other 3rd parties to collect web data.

Proxy providers collect millions of internet connections from various sources and provide them to businesses which use IP addresses to access these connections. Some of these IPs originate from residential users’ devices. Collecting these connections can be legal or unlawful:

Legal: Legally compliant practices involve obtaining informed user consent, providing compensation, and offering opt-out mechanisms in accordance with local regulations. The web data provider should
- Inform users about how their bandwidth would be used
- Get their consent digitally
- Compensate them in return
- Allow them to opt out at any time
Illegal: Bad actors can gain access to users’ devices and use their internet connection without permission or compensation. This can happen through malware apps, compromised devices, masked installations, automatic opt in and other methods that can put the device owner at risk.

Businesses using illegally obtained proxies can inadvertently pay bad actors for unauthorized access to devices.

Real-life examples:

Routers and IoT devices have been compromised for botnet operations and sold as residential proxies.⁶ ⁷
Certain proxy providers promote their services in forums frequented by bad actors. These IPs are likely to be illegally obtained.⁸
VPN apps on Google Play Store have also been used to acquire residential IPs without user consent.⁹

Though these operations have been shut down, it is likely that bad actors are still accessing residential IPs without consent via botnets and compromised or malicious applications.

External certification

Enterprise buyers need secure, enterprise-ready solutions. We identified the ingredients for a mature web data organization which can be documented via external certification:

Data security

Lack of data security in a suppliers’ systems can erode an enterprise’s competitive advantage or lead to data loss and system down time. Loss of system functionality can erode trust and lead to the devaluation of an enterprise.

System intrusion

Data collection services are not as deeply integrated to an enterprise’s systems as core digital services (e.g. a system of record like CRM). Therefore their security credentials are not as thoroughly reviewed as the credentials of a core system like a system of record. However, data security is critical for data collection services’ customers since these services:

Are sometimes integrated to more central systems like pricing engines.
Can infect enterprise systems even when they are not integrated to such systems. Using a data collection service involves receiving data from that service. Even some of the most secure forms of data transfer include risks.

System intrusion can also lead attackers to target the devices that supply residential IPs a to a proxy services. This can result in reputational harm to that proxy services’ customers.

Real-life vulnerability example in a residential proxy provider:

Operators of Kimwolf botnet bought proxy services from residential proxy provider IPIDEA. Using malicious commands, they infected the internal networks of devices supplying IPs to IPIDEA. These networks were then scanned and other vulnerable devices on these local networks were also infected.

Kimwolf is estimated to have spread to more than 2 million devices with this method. Data collected by IPIDEA’s customers also flowed through these infected networks.¹⁰

Data loss

Without data security, bad actors can gain access to data collected by enterprises to identify their activities and strategies leading to a loss of competitive advantage or business opportunities.

Real-life example:

Though web data is public, businesses can use web data in novel ways for competitive advantage. For example, investors spend up to 10% of their market data budget on alt data¹¹, but they rarely disclose their strategies since they believe that it can help them gain an advantage compared to their competitors. A data leak can lead to their strategies being exposed and therefore replicated by their competitors.

PII management

Web data includes private data behind login or PII that may be accidentally or purposefully disclosed on public websites. If web data collection services fail to manage PII correctly, such data can be acquired by bad actors. This can lead to reputational harm for the web data collection service and its customers.

Application security

Applications or intermediate programs like SDKs that source the web data collection services’ IPs can be whitelisted by external certification providers like McAfee. This increases enterprise’s trust in ethical supply practices of the web data collection service.

Insurance coverage

Enterprises typically require these insurances from any digital providers:

Professional liability insurance
Cyber insurance certificate

Detailed benchmark: Assessment of web data infrastructure providers

Benchmark: Ethical use by customers

Here we aim to answer the question: Does the company ensure that use of its solution is ethical and in-line with applicable laws and regulations? Summary of our findings:

* Not applicable: Since Zyte and Apify buy proxies from their suppliers and do not directly collect it from residential users, they would not be reached by website owners regarding abuse and therefore do not need to create a contact form for websites.

First, we reviewed policies:

Acceptable use policy review

All vendors prohibit illegal activities and provide examples like DoS attacks, unsolicited bulk messages, impersonation or spoofing.

In addition, some vendors also highlight that they prohibit activities which are likely to be illegal. Below, we list the prohibited activities based on the acceptable use policies and their addendums (e.g. data processing addendum) for each vendor.

We looked for terms that would prohibit activities that are likely to be illegal and can be identified based on user activity. For example, a significant share of users using proxies to take paid surveys could be using proxies to mislead survey providers about their actual location. Therefore, this activity is both likely to be illegal and can be identified based on user activity (i.e. when a user logs into a paid survey website).

Though clearly identifying prohibited activities is beneficial, it is not a requirement and does not impact our scores. Companies may choose to mention that they don’t allow illegal activities rather than mentioning every possible instance of illegal activities.

Mentioning an activity as prohibited doesn’t mean that such activities will be reviewed or blocked. Our scores rely on how these policies are implemented as outlined below:

Processes for ethical use

While some categories outlined in the acceptable use policies are quite broad (e.g. unauthorized data scraping or access), others are specific enough to be converted into preventative actions (e.g. blocking access) that data collection services can implement for users that have not completed their KYC process.

Based on these specific prohibited uses, we prepared an extensive list of uses which are likely to be illegal uses of proxies. For each use case, we identified scenarios including relevant web domains and actions. For example, in the scenario for artificial social media engagement, we attempted to log into a social network using a proxy to like an existing post.

Then, to test whether companies allow unethical use by customers, we created an account on each providers’ service using a non-AIMultiple email address. We did not complete a KYC process with this account and proceeded to use the services to understand what anonymous users can achieve with each service. KYC is a crucial step during which the user submits data to validate the legal entity that they represent. This links user activity to a legal entity:

That can be held accountable.
Whose rationale for online actions (e.g. using proxies to log into government websites) can be examined. For example, after understanding their use case, a researcher or government agency can be allowed to login to a government website using a proxy.

We expected these use cases to trigger a KYC process but in most vendors, that didn’t happen. A check mark indicates that the request was blocked for users that didn’t yet complete the KYC process:

For clarity, data collection services companies have no legal obligation to block these websites and some of these scenarios may be part of legal use. For example, a researcher may want to leverage proxies to run a controlled social media experiment. However, given the abuse potential in these scenarios, we expected data collection services to block them for users that have not completed the KYC process.

How brands communicate domains that they block

Bright Data lists restricted domain categories in their acceptable use policy.

Respecting websites’ preferences regarding automated data collection

What is robots.txt?

robots.txt is a filename for implementing Robots Exclusion Protocol. This protocol is used by websites to indicate portions of the website which the website owner prefers bots not to visit. Adherence to robots.txt is voluntary.

Pros and cons of adhering to robots.txt

➕ Respects website preferences.

➖ May not be recently updated and therefore be outdated.

➖ It typically involves terms that indicate that website owner prefers certain public sections of the website not to be accessed by bots.

Robots.txt may also provide uneven access to bots. For example, website owners may indicate that they don’t prefer answer engines’ bots to visit certain URLs that search engines’ bots visit.

Robots.txt is not a legal document and it can request to block bot access for pages that are legally:

allowed to be scraped (e.g. public data) or
not allowed to be scraped (e.g. data behind a login where the website owner’s ToC prohibits scraping such data).

Web data collection service providers may request residential proxy users to complete a KYC process and prove that they have a legal and ethical use case before these users can disregard robots.txt.

For testing, we sent requests to pages in subfolders that are requested to be blocked by robots.txt. The domains that we used were aimultiple.com and 5 web domains among the top 100 most visited web domains. Only Bright Data blocked these requests:

CNN example

CNN’s robots.txt blocks the folder /terms¹². For testing, we navigated to that folder with residential proxies and received 200 messages with the page’s data from all providers except Bright Data. Bright Data’s response is: “Residential Failed (bad_endpoint): Requested site is not available for immediate residential (no KYC) access mode in accordance with robots.txt. To get full residential access for targeting this site, fill in the KYC form: https://brightdata.com/cp/kyc”.

Abuse management

We outlined a methodology to evaluate abuse management practices of vendors and collected data to fulfill our evaluation criteria:

* Not applicable: Zyte buys proxies from other proxy providers and therefore when Zyte’s service is used for abuse, website owners would reach its proxy providers rather than Zyte.

While all vendors provide means for 3rd parties or their customers to reach them, having these are important for issue resolution:

Public abuse policy
A dedicated email address to report abuse
An alternate contact method (e.g. webform or messaging interface) that allows reporters to reach the company. This is helpful as emails can get filtered and may fail to reach the inbox.
Responsiveness to messages

3 providers in the benchmark (Bright Data) provided an email for reporting abuse. All these providers also outlined their policies in this domain.

We expect all other providers to do the same and this to become a widespread industry practice in the short term.

Finally, we evaluated abuse management responsiveness by emailing abuse reports from third-party domains (i.e. non-AIMultiple) and measuring response times. If we could not find an abuse email address, we sent it to the general contact form. We tested this via 3 batches of emails sent on:

Friday May 2, 2025 from:
- A ticket sales service with ~30k monthly traffic
- A law firm with ~1k monthly traffic in
May 17, 2025 from the ticket sales service.
May 24, 2025 from a social media agency with limited online traffic.

The first emails sent on May 2, 2025 were sent to companies that provided dedicated emails. Later, we expanded our list and included more general email addresses listed in the contact sections of all benchmarked web data collection services. If a company responded to our emails, we stopped sending them further emails.

In our emails, we mentioned that our websites received suspected bot traffic via proxies and asked for their support in identifying the source of proxies. We were able to get all compliance teams except one to answer us. Almost all responses were received on the same day.

Usage transparency

Website owners that provide web data and web collection services historically have had no data exchange about data collection activities. To limit crawling activities, website owners could either:

Contact web data collection services to report abuse
Work with bot management providers like Cloudflare to make crawling more challenging.

Now, there are initiatives for more structured data exchange between these parties. Bright Data launched Bright Data Webmaster console for webmasters to monitor crawling activities on their websites. More transparency is likely to improve web data collection practices.

Our experience with Webmaster console

We signed up by verifying our domain ownership and adding a collectors.txt file on the domain.

We now have access to the bot activity from Bright Data on our website:

Benchmark: Ethical supply

* Reviews on these 3rd party platforms were included: Amazon Appstore, App Store, Google Play Store, Trustpilot. For convenience, this value was calculated for 5 major apps for Bright Data, not all 120 apps featured on their website.

Partner transparency

Bandwidth required by web data infrastructure companies can be supplied in an ethical manner by providing benefits (e.g. payments, features like the ability to skip ads) in exchange for consent to share one’s internet connection. However, it is also possible to gain unauthorized access to retail users’ systems and sell their connections.

Web data infrastructure providers can formulate policies and processes, run external audits and publish their approach and audit findings to create transparency around how they acquire their internet connections. This can foster trust in the ethical supply of their service.

We created a framework for supply-side transparency in web data and rated vendors using this framework. We applied this framework regardless of whether a web data collection service acquired residential IPs itself or through other proxies. Our aim is to bring transparency to the entire supply chain of IPs since unethical practices can originate at any point in the supply chain.

Here you can find our detailed results:

Bright Data

Bright Data is classified as Level 5 since they publish

Their sourcing approach and how app developers can work with them via their SDK¹³ ¹⁴
Details on 120 suppliers were shared publicly. We could check reviews of these suppliers on 3rd party platforms to estimate how popular they are. ¹⁵

Review of selected apps

Bright Data shares 120 of apps on their website. Apps like Bright VPN are certified by 3rd parties on their disclosure and UX.¹⁶ We also downloaded these apps to see them in more detail:

Bright VPN
EarnApp
Sling Kong

Opt-in form with obligation not to collect personally identifiable data: Consent form with clear explanation from Bright VPN:

Earn App:

Sling Kong:

User is presented with the offer during the game:

Opt-in:

Additional info during opt-in:

Opt-out:

Value provided by apps:

Bright VPN: Free VPN service
EarnApp: Payments
Sling Kong: In-game virtual currency

Others

While most providers are aware of ethics in web scraping and have published on the topic (e.g. ¹⁷, we haven’t identified their specific commitments in this front except for Zyte.¹⁸

We expect this to change and most providers to move to at least Level 1 in the short term.

External certification

* Indicates that the company achieved all external certifications in this category

It is crucial for vendors to have the right systems, personnel & processes to protect clients’ data and secure the apps that supply its IPs. See our external certification measurement methodology to see the logic behind our scoring.

All vendors publicly claim to be compliant to both data privacy regulations. Therefore, this was not included in scoring.

How we measured organizational maturities

Based on the capabilities that we identified in this domain, we checked for the existence of these certificates at each provider using their public statements:

Data security certification & PII certification: ¹⁹²⁰ ²¹ ²² ²³
IP source whitelisted: ²⁴
Ethical practices evaluated: ²⁵

Some providers that do not hold ISO 27018 certificates claimed that they should be considered certified since they use cloud service providers that hold ISO 27018 certificates. Our cybersecurity advisor‘s opinion was that while this would facilitate certificate acquisition, they would still need to have their policies and controls certified to acquire the certificate.

Insurance coverage

3 web data collection companies shared their certificates for insurances. We do not publish certificates but reviewed the documents to ensure that

they covered these 2 insurance categories
Insurance limit in each category is at least in the multi-million scale in US$.

Disclaimers and recommendations for next steps

All of the providers in this benchmark except Nimble are customers of AIMultiple. As always, we followed our ethical commitments during this research.

We have completed an exhaustive review of ethical web data collection and while we are satisfied with the scope of this benchmark, we would love to increase its participation. We thank these companies for sharing their insurance coverage: Apify, Bright Data, Zyte.

We are waiting for responses from NetNut, Nimble. We’ll update the report as soon as we have more updates from them. 2 vendors have chosen not to participate in this iteration of the benchmark. We are always updating this report if any of these 7 companies suggest changes that are fact-based, fair to all vendors and help enterprises make better decisions.

This is the first report to focus on ethical web data according to our research. We hope that this transparency can help the web data industry find creative solutions to its challenges. These solutions will need to balance the interests of web data collectors, web automation users, website owners and residential users that supply their IPs to the industry.

Reference Links

Workers Fainted at Nike Clothing Factory Despite a Vow to Reform — ProPublica

ProPublica

2023 MOVEit data breach - Wikipedia

Contributors to Wikimedia projects

https://www.courthousenews.com/wp-content/uploads/2024/01/starbucks-labor-rights-violations-suit.pdf

Google faces questions over videos on YouTube

The Times

Court Rules in Favor of Bright Data in Meta v. Bright Data Case - Bright Data

Bright Data

https://media.defense.gov/2024/Sep/18/2003547016/-1/-1/0/CSA-PRC-LINKED-ACTORS-BOTNET.PDF

Internet Crime Complaint Center (IC3) | Home Internet Connected Devices Facilitate Criminal Activity

A Look at the Residential Proxy Market | Intel 471

Website

Satori Threat Intelligence Alert: PROXYLIB and LumiApps Transform Mobile Devices into Proxy Nodes - HUMAN Security

HUMAN Security

10.

Kimwolf Botnet Lurking in Corporate, Govt. Networks – Krebs on Security

https://edition.cnn.com/robots.txt

13.

Ethically Sourcing Residential Proxies | Bright Data

Bright Data

14.

homepage - Bright SDK

Bright SDK

15.

How Bright Data Obtains Its Residential IPs - Bright Data

Bright Data

16.

Bright VPN Compliance with guidelines - Google Sheets

17.

What is ethical scraping and how do you do it?

Apify Blog

18.

Web Scraping Data Compliance | Zyte

19.

Page not found - Bright Data

Bright Data

20.

Security | Platform | Apify Documentation

21.

https://netnut.com/wp-content/uploads/2024/01/NetNut-ISO.pdf

22.

Nimble Trust Center | Security, Compliance & Reliability

23.

Trust Center | Zyte

24.

Bright SDK Compliance with Guidelines - Google Sheets

25.

pwc-report - Bright Data

Bright Data

Principal Analyst

Cem Dilmegani

Principal Analyst

Follow On

Cem has been the principal analyst at AIMultiple since 2017. AIMultiple informs hundreds of thousands of businesses (as per similarWeb) including 55% of Fortune 500 every month.

Cem's work has been cited by leading global publications including Business Insider, Forbes, Washington Post, global firms like Deloitte, HPE and NGOs like World Economic Forum and supranational organizations like European Commission. You can see more reputable companies and resources that referenced AIMultiple.

Throughout his career, Cem served as a tech consultant, tech buyer and tech entrepreneur. He advised enterprises on their technology decisions at McKinsey & Company and Altman Solon for more than a decade. He also published a McKinsey report on digitalization.

He led technology strategy and procurement of a telco while reporting to the CEO. He has also led commercial growth of deep tech company Hypatos that reached a 7 digit annual recurring revenue and a 9 digit valuation from 0 within 2 years. Cem's work in Hypatos was covered by leading technology publications like TechCrunch and Business Insider.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.

View Full Profile

Be the first to comment

Your email address will not be published. All fields are required.

Next to Read

Web Data ScrapingSep 30

Sıla Ermut

Ethical & Compliant Web Data Benchmark in 2026

Assessment of web data collection services

Scoring model for ethical web data

Capabilities for ethical use by customers

Capabilities for ethical supply

External certification

Insurance

Summary score

Leading web data collection services

Web data collection products in focus

Verify: Is your web data collection compliant & ethical?

Web data is a common operational asset

With AI, web data is now more important

Limited regulatory oversight

Your suppliers’ ethical stance is part of your company’s reputation

What is the cost of unethical & noncompliant data collection?

Reputational risk

Legal risk

Ethical web data checklist

Ethical use by customers

Ethical supply

External certification

Data security

System intrusion

Data loss

PII management

Application security

Insurance coverage

Detailed benchmark: Assessment of web data infrastructure providers

Benchmark: Ethical use by customers

Acceptable use policy review

Processes for ethical use

How brands communicate domains that they block

Respecting websites’ preferences regarding automated data collection

CNN example

Abuse management

Usage transparency

Our experience with Webmaster console

Benchmark: Ethical supply

Partner transparency

Bright Data

Others

External certification

GDPR & CCPA compliance

How we measured organizational maturities

Insurance coverage

Disclaimers and recommendations for next steps

Reference Links

Be the first to comment

Next to Read

Web Scraping JavaScript vs Python: Which Is Better? ['26]

Web Scraping Using Google Sheets (With Real Example)

Is Web Scraping Legal? Laws, Ethics, and Best Practices

ChatGPT Web Scraping: Tutorial & Applications

Large-Scale Web Scraping: Techniques & Challenges ['26]

Generative AI Ethics: How to Manage Them