Skip to main content

First Principles: The Anticipated Non-Personal Data Governance Framework

A committee constituted by the Ministry of Electronics and Information Technology issued a draft report on non-personal data governance in India in July 2020 which it revised in December 2020 following the receipt of feedback from stakeholders. The December 2020 report envisages parallel regimes for the governance of personal and non-personal data although it does not appear to take into account how fluid the two forms of data are. The report's underlying principles also seem fairly ambiguous; the document suggests that 'regulation in India to establish rights over non-personal data collected and created in India' be 'simple, digital and unambiguous' in § 3.4(v) without specifying what it means by 'digital' or, indeed, why the regulations should be digital. 

The comments contained in this write-up seek to engage with the principles which appear to underlie the December 2020 Draft Report by the Committee of Experts on Non-Personal Data Governance Framework, and which are set to underpin the framework for the governance of non-personal data in India without dealing with the details of terminology and responsible authorities currently in vogue since, at this stage, those are still details that are relatively malleable. 
  
Sovereignty 

Through the course of discussions on data governance in recent years has been the theme of 'data sovereignty' and, true to form, this report too states that 'sovereignty' is one of its guiding principles, in § 3.4(i), defining it by stating: "India has rights over data of India, its people and organisations." This is a somewhat unorthodox approach to the subject of sovereignty, and the suggestion that India has rights over 'data of India' suggests that the conception of the data to which India has rights could be construed in a manner that is unsustainably (and perhaps illegitimately) broad including rights to all data pertaining to India regardless of where it had been generated and all data generated in India regardless of by whom it had been generated or collated. 

Considering what the report's definition of 'sovereignty' reveals, it would perhaps be more useful to understand the term as a negative: what does it mean not to be colonised? 

In the context of data, and in the broader context of intellectual property laws, non-colonisation would likely mean leveraging Indian data for the benefit of Indians and protecting them from predatory foreign influences. Such an approach, although it says little about invasive intra-state approaches or unfair competition by dominant players within the domestic market, would at least have the benefit of being mindful of individual autonomy and agency without having the state seeming to make claims to 'its people and organisations' which sound strangely paternalistic and proprietary in what is a democratic republic. 

Understanding 'sovereignty' through the lens of non-colonisation in this context would have the added benefit of being in consonance with the stated aims of the anticipated data governance framework under discussion which are relevant to the subject: 'to create an enforcing framework that establishes rights of India and its communities over its non-personal data' and 'to create an enabling framework that ensures unlocking economic benefit from non-personal data for India and its people' per § 3.5 of the report. 

What is inescapable reading these aims, however, is that the primary concern of the anticipated governance framework for non-personal data is the community and not the individual. While this could be read as Indian culture often prioritising the community over individual, it is more likely that the emphasis on community rights stems from a theorised bifurcation of data into personal and non-personal data reflected in the report's text. 

Parallel Regimes 

The report is crystal clear about its envisaging two exclusive regimes operating to govern what it sees as two entirely different forms of data: personal and non-personal data, going so far as to partially explain what the latter is by saying: "When the data is not ‘Personal Data’ (as defined under the PDP Bill), or the data is without any Personally Identifiable Information (PII), it is considered Non-Personal Data," in § 4.1. 

The explanation contained in § 4.1 is entirely dependent on the PDP Bill and does not adequately take into account hybrid data or non-personal data generated from personal data. Going through the report, it is, for example, not immediately clear what steps an individual could meaningfully take to address the harm caused by the personalisation of non-personal data especially since the focus of the anticipated non-personal data regime is simply not on individual rights. 

Further, the compartmentalisation of the two forms of data appears to be far from watertight. 'Government hospital collecting health data of a patient (anonymised)' features in examples of non-personal data collected by public entities to take just one item from a table in § 4 of the report. The table goes on to suggest that the 'data collecting mechanisms, devices, instruments, sensors, etc.' in this case are public, and that the data is collected from or about a subject in a private space. It is entirely unclear what this is intended to mean not least because patient data cannot be anonymous at the point of initial collection. 

It is possible that the two regimes governing personal and non-personal data will ultimately be complementary, and that concerns about accountability and harm reversal, where possible, are misplaced but given that the proposals relating to both regimes are currently fluid, it is probably not prudent to simply trust that problems which may arise under one of them will somehow be addressed by the other. These comments therefore attempt to consider the non-personal data regime envisaged in the December 2020 report independently particularly since it is expected to function independently of the personal data regime. 

The report does recognise that the two forms of data can become enmeshed with each other; it states that 'mixed datasets that typically have inextricably linked personal and non-personal data will be governed by the PDP Bill' in § 5.1(v) and clarifies that de-anonymised data will be governed by the PDP regime in § 5.2. However, at an enabling level, this does not meaningfully acknowledge how permeable the compartments holding the two forms of data are or how easy it is for one form to be converted into the other while, at the enforcement level, the absence of any system to alert data principals of the de-anonymisation of their data means that individuals whose data has been compromised after anonymisation would likely not know that a problem had arisen. 

The report 'recommends that data collectors at the time of collecting personal data should provide a notice and offer the data principal the option to opt out of data anonymization' in § 5.4. The opt-out mechanism is an expression of the privacy by consent model and allows for the default to be that data would be collected for depersonalisation and unspecified post-anonymisation use. However, in a country where technological literacy is limited, it is perhaps worth prioritising a privacy by design model which would, instead of having data principals opt out of having their data be de-personalised and used, require that the default be to disallow the de-personalisation and subsequent use of once-personal data without the informed consent of relevant data principals having been obtained. 

The Business of Data 

The report labels businesses as 'Data Business' if they collect or manage either personal or non-personal data above a certain threshold, and it anticipates their being required to share metadata and underlying data under appropriate regulations. To exemplify, it states that metadata which hospitals collect about each patient such as their name, age, weight, and symptoms may be metadata shareable under a framework detailed in § 8. 

There are two primary concerns which the recommendations relating to 'Data Business' raise. Firstly, as the report itself explicitly states in § 6.1(iii), a Data Business 'is a horizontal classification and not an independent industry sector' and, "Existing businesses in various sectors that collect data will get categorized as a Data Business." This essentially means that there are no special provisions made for persons whose entire business is the collection, compilation, and perhaps analysis, of existing data, and it is entirely unclear whether failing to also treat data businesses as a separate vertical could ultimately result in businesses which exclusively in data becoming unsustainable as businesses. 

Secondly, while the example of health metadata in the report is relatively straightforward to the extent that one would expect hospitals to collect the sort of data mentioned in the example, it is also true that the determination of data fields in some businesses is the result of conducting extensive market surveys and analyses perhaps looking into consumer preferences. In such circumstances, forcing a company to share metadata could potentially allow third-parties to unfairly freeride on the expertise, investment, and labour of whoever has compiled the metadata not to mention that such compilations are potentially protectable under copyright law, and it is not inconceivable that a requirement to share them could potentially conflict with the rights granted to copyright owners under the 1957 Indian Copyright Act. 

Further, § 6.3(i) of the report states: "The meta-data about data being collected, stored and processed by the Data Business is stored digitally in meta-data directories in India. Open access is provided within India to these meta-data directories." The meaning of 'within India' is not clear at all, and could be read to mean to anyone within India or for use within India. In particular, the use of data obtained via the sharing mandates of the anticipated NPD regime is not restricted to non-competitive purposes due to which it has the potential to ride roughshod over the principles of competition law. Later, § 6.4(i) of the report clarifies that organizations registered in India that will have open access to a proposed meta-data repository, and that they can query the repository but not download metadata. Unfortunately, this does not resolve concerns about a level playing field or anti-competitive practices or, for that matter, explain either why individuals have been restricted from requesting data or how generally proscribing the download of metadata offers adequate protection to those who have generated it. 

Accountability 

That data is non-rivalrous and amenable to progressive use appears to have been accepted to be axiomatic by the report. The idea appears to underpin virtually every aspect of the anticipated non-personal data governance framework, and it is reflected in the enthusiasm with which the report suggests that data be available for sharing. In fact, § 7.2(ii) of the report, amongst other things, states: "Data being non-rivalrous, the value of data may be consumed by several organizations and communities, without degrading its value to the relevant community." 

The mechanisms to protect data from being used for purposes that would subvert social harmony or derogate from intellectual property laws are not especially robust although the report does acknowledge and seek to be compatible with the latter (not least in § 8.6), and to prevent harms arising from the unrestrained use of data by suggesting processes through which proposed uses of data may be vetted. 

Significantly, there are no provisions worth mentioning in the report about affixing any sort of accountability, let alone liability, for the mishandling of non-personal data. The lack of potential liability is a theme that appears to run through the length of the report not just in relation to data principals (who are largely beyond the purview of the NPD regime) but also in relation to data- and metadata sets though it is not always a significant concern. For example, it is not clear what penalties a company's failure to disclose or share metadata could attract but, considering that the requirement to share metadata may itself be open to challenge possibly on account of incompatibility with intellectual property laws, affixing liability for non-disclosure is probably a secondary issue. In other cases, however, the lack of accountability is a concern. § 7.4(iv) of the report states that 'data custodians have a responsibility towards responsible data stewardship and a 'duty of care' to the concerned community in relation to handling non-personal data related to it', for example, but apart from requiring custodians to have mechanisms to swiftly remedy accidental harms resulting from leveraging non-personal data, there are no clear mechanisms through which data custodians who fail in the discharge of their responsibilities may be held to account. So too is the case of the data trustee who fails in its responsibility, defined in § 7.7(ii) of the report, to ensure that persons are not harmed by the de-anonymisation of non-personal data. 

There are several issues that the report does not deal with such as the sharing of data for a business purpose (as § 8.3 of the report highlights). It isn't clear why, when a data governance set up is being formulated from scratch, certain transactions such as those involving data sharing between businesses should seemingly be left to self-regulation despite the fact that doing so also removes access to such protections as the law may offer to those whose data is mishandled. Currently, the framework to support the development of a data protection and governance regime is nowhere near complete.

The idea that data is apolitical in itself and endlessly available to propel human progress into previously uncharted territories is, frankly, terrifying not least because it is unsupportable. Unfortunately, this is the understanding of data that seems to be creeping into policy discourse which, though it acknowledges that data can potentially wreak havoc, fails to productively engage with how best to prevent and address possible harms. 

This post is by Nandita Saikia and was first published at IN Content Law