Early this year, when I published Unleash the True Power of Data — Welcome to the New Era of Data Strategy, I would not have imagined how 2020 may go down in history as one of the worst years as we continue to battle with this pandemic. However, this has certainly taught us a lot about the critical importance of digital technologies. Most enterprise organizations — from small-, medium- to large-sized businesses — have understood the importance of digital capabilities and digital transformation to succeed in today’s digital economy.
This economy, world, and workforce are flooded with the creation, collection, analysis, and sharing of massive data across different platforms. This digital ecosystem indeed imposes a new and more significant risk for organizations and society. The question is: Do we have enough best practices for data governance, data sharing, insight, and analytics to cover the risk? The answer is undoubtedly no, because most organizations lack modern data governance frameworks and risk mitigation strategies to cover all the various categories of risk.
In this data-driven, digital world, the creation and collection of data do not pose a major risk. However, when an analysis is done for insight from collected data, and consumers act upon this data, that action certainly poses a new risk for the organization. And these risks are real. In a new study by NTT DATA, one-quarter of executives and 36% of employees say they have encountered an AI application ignoring a command and worse, about a fifth of respondents revealed that AI app offered them suggestions that worked against a marginalized group.
AI bias starts, similar to all AI programs, starts from good, or bad data. So, coming back to data ethics, the new risk is whether or not the organization is ethical in sharing insight from this data, including:
- Is this data insight legal?
- Is this insight going to create social and economic injustice?
- Does the organization have consent for using or disclosing the insight?
The answers to some of these questions lie in identifying, prioritizing, and managing the risks associated with data sharing and data insights and following best practices for data ethics. Data Ethics is certainly the most important trait for success of any organization when protecting customer data and insight. If an organization fails to be ethical, it can certainly sabotage trust and damage the brand. Once trust is broken, it isn't easy to rebuild. As the saying goes, “Good ethics is good business.” To summarize by standard definition, Data Ethics is a code of behavior that describes what is right or what is wrong with the collection, storing and sharing of data insight.
The relationship between Data Ethics and Data Science
Data Analytics and Data Science technologies are ethically neutral when it comes to sharing insight generated from machine learning algorithms. Technology cannot decide what is right from wrong, or good from bad. These technologies might be meant to find data patterns and generate insight for social justice and good causes, but they are ethically neutral. We cannot have a value system or a framework for values for technologies, but every organization must have a value system to ensure that best practices for ethics are followed.
Machine learning, and deep learning for data science are going to rule the data-driven analytics world in this digital age. But machines generate insight based on the data they are trained on. Machines are trained by certain data to generate certain patterns and diagnoses. This is the reason Data Ethics has become a critical ingredient for producing the right insight from data science.
If an organization lacks data governance (and ethics for collecting, handling, and sharing of data), the corresponding practitioner, or data scientists working with the algorithms, are using their own ethics or methods that can put an organization at risk. It is vital for an organization to follow a data-first approach and handle the data ethically to avoid any contrary impact on organization’s business, products and people.
Why do we have Data Ethics challenges? In my view it relies on an individual’s perception of what is right or wrong. For any Data Scientist training, dataset for the machine learning algorithm is the most important. And it is responsibility of a data scientist to ensure that this dataset should not contain any biased decision or data discrimination or partiality. For a Data Scientist, the ethical challenges are human bias, discrimination and handling of sensitive data.
Who is responsible for Data Ethics?
Data Ethics for Data Science require modern data governance frameworks and best data ethical practices at every step of the data science project. Some government regulations, such as the General Data Protection Regulation (GDPR), the Health Insurance Portability and Accountability Act (HIPAA), the Family Educational Rights and Privacy Act (FERPA) and the California Consumer Privacy Act (CCPA) have developed, published and enforced Data Ethics rules. There are various groups that can help organizations to provide guidelines, including Bloomberg, BrightHive and Data for Democracy. All these groups have developed a code of Data Ethics called the Community Principles on Ethical Data Sharing (CPEDS) to codify Data Ethics for data scientists. For many organizations, a Data Governance team and legal counsels are responsible for overseeing compliance of — and remediation for — breaches in Data Ethics rules.
At NTT DATA Services, we deal with Data Ethics and data security by taking out the bias from a dataset as part of our Data Management Strategy.