Analytics vendors and non-technical employees are democratizing data science. Gartner predicts that more than 40% of data science tasks will be automated by 2020. Organizations are looking at converting non-technical employees into data scientists so that they can combine their domain expertise with data science technology to solve business problems.
What does citizen data scientist mean?
Citizen data science is a term initiated by Gartner. They define it as “a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.”
In short, they are non-technical employees who can use data science tools to solve business problems.
Citizen data scientists can provide business and industry domain expertise that many data science experts lack. Their business experience and awareness of business priorities enable them to effectively integrate data science and machine learning output into business processes.
Why are citizen data scientists important now?
Interest in citizen data science is almost tripled between 2012-2020, as seen below.
Reasons for this growing interest are:
- Though there is an increasing need for analytics due to increased popularity of data-driven decision making, data science talent is in short supply. As of 2020, there are three times more data science job postings than job searches.
- As with any short supply product in the market, data science talent is expensive. According to the U.S. Bureau of Labor Statistics, the average data science salary is $101k.
- Analytics tools are easier-to-use now, which reduces the reliance on data scientists.
Most industry analysts are also highlighting the increased role of citizen data scientists in organizations:
- IDC big data analytics and AI research director Chwee Kan Chua mentions in an interview: “Lowering the barriers to allow even non-technical business users to be ‘data scientists’ is a great approach.”
- Gartner defined the term and is heavily promoting it
What are the tools used by citizen data scientists?
Various solutions help businesses to democratize AI and analytics:
- Citizen data scientists first need to understand business data and access it from various systems. Metadata management solutions like data catalogs or self-service data reporting tools can help citizen data scientists with this.
- Automated Machine Learning (AutoML): AutoML solutions can automate manual and repetitive machine learning tasks to empower citizen data scientists. ML tasks AutoML tools can automate are
- Data pre-processing
- Feature engineering
- Feature extraction
- Feature selection
- Algorithm selection & hyperparameter optimization
- Augmented analytics /AI-driven analytics: ML-led analytics, where tools extract insights from data in two forms:
- Search-driven: Software returns with results in various formats (reports, dashboards, etc.) to answer citizen data scientists’ queries.
- Auto-generated: ML algorithms identify patterns to automate insight generation.
- No/low-code and RPA solutions minimize coding with drag-and-drop interfaces which helps citizen developers place the models they prepare in production.
If you are looking for vendors for these solutions, feel free to check out related vendor lists:
- Self-service reporting software
- Business intelligence software
- AutoML software
- Low code platforms
- RPA software
What are best practices for citizen data science projects?
Create a workspace where citizen data scientists and data science experts can work collaboratively
Most citizen data scientists are not trained in the foundations of data science. They rely on tools to generate reports, analyze data, create dashboards or models. To maximize citizen data scientists’ value, you should have teams that can support them which also includes data engineers and expert data scientists.
Train citizen data scientists
Though citizen data scientists’ knowledge of the business is advantageous for the business, their inexperience in data science makes projects prone to errors. Citizen data scientists could be trained in the following areas:
- use of BI/autoML tools for maximum efficiency
- data security training to maintain data compliance
- detecting AI biases and creating standards for model trust and transparency so that citizen data scientists can establish explainable AI (XAI) systems.
Classify datasets based on accessibility
Due to data compliance issues, all data types should not be accessible to all employees. Classifying data sets that require limited access can help overcome this issue.
Create a sandbox for testing
Sandboxes, software testing environment, which include synthetic data and which are not connected to production environments help citizen data scientists quickly test their models before rolling them to production.
If you still have questions on citizen data science, don’t hesitate to contact us:
How can we do better?
Your feedback is valuable. We will do our best to improve our work based on it.