A committee constituted by the Ministry of Electronics and Information Technology issued a draft report on non-personal data governance in India in July 2020 which it revised in December 2020 following the receipt of feedback from stakeholders. The December 2020 report envisages parallel regimes for the governance of personal and non-personal data although it does not appear to take into account how fluid the two forms of data are. The report's underlying principles also seem fairly ambiguous; the document suggests that 'regulation in India to establish rights over non-personal data collected and created in India' be 'simple, digital and unambiguous' in § 3.4(v) without specifying what it means by 'digital' or, indeed, why the regulations should be digital.
The comments contained in this write-up seek to engage with the principles which appear to underlie the December 2020 Draft Report by the Committee of Experts on Non-Personal Data Governance Framework, and which are set to underpin the framework for the governance of non-personal data in India without dealing with the details of terminology and responsible authorities currently in vogue since, at this stage, those are still details that are relatively malleable.Sovereignty
Through the course of discussions on data governance in recent years has been
the theme of 'data sovereignty' and, true to form, this report too states that
'sovereignty' is one of its guiding principles, in § 3.4(i), defining it by stating: "India has rights over data of India, its people and organisations." This
is a somewhat unorthodox approach to the subject of sovereignty, and the
suggestion that India has rights over 'data of India' suggests that the
conception of the data to which India has rights could be construed in a manner
that is unsustainably (and perhaps illegitimately) broad including rights to all
data pertaining to India regardless of where it had been generated and all data
generated in India regardless of by whom it had been generated or collated.
Considering what the report's definition of 'sovereignty' reveals, it would perhaps
be more useful to understand the term as a negative: what does it mean not to
be colonised?
In the context of data, and in the broader context of intellectual
property laws, non-colonisation would likely mean leveraging Indian data for the
benefit of Indians and protecting them from predatory foreign influences. Such
an approach, although it says little about invasive intra-state approaches or
unfair competition by dominant players within the domestic market, would at least have the benefit of being
mindful of individual autonomy and agency without having the state seeming to
make claims to 'its people and organisations' which sound strangely
paternalistic and proprietary in what is a democratic republic.
Understanding
'sovereignty' through the lens of non-colonisation in this context would have
the added benefit of being in consonance with the stated aims of the anticipated data governance framework under discussion which are relevant to the subject: 'to create an enforcing framework
that establishes rights of India and its communities over its non-personal data'
and 'to create an enabling framework that ensures unlocking economic benefit
from non-personal data for India and its people' per § 3.5 of the report.
What
is inescapable reading these aims, however, is that the primary concern of the
anticipated governance framework for non-personal data is the community and not the individual. While this
could be read as Indian culture often prioritising the community over
individual, it is more likely that the emphasis on community rights stems from
a theorised bifurcation of data into personal and non-personal data reflected in the report's text.
Parallel Regimes
The report is crystal clear about its envisaging two exclusive regimes operating
to govern what it sees as two entirely different forms of data: personal and
non-personal data, going so far as to partially explain what the latter is by
saying: "When the data is not ‘Personal Data’ (as defined under the PDP Bill),
or the data is without any Personally Identifiable Information (PII), it is
considered Non-Personal Data," in § 4.1.
The explanation contained in § 4.1 is
entirely dependent on the PDP Bill and does not adequately take into account
hybrid data or non-personal data generated from personal data. Going through the
report, it is, for example, not immediately clear what steps an individual could
meaningfully take to address the harm caused by the personalisation of
non-personal data especially since the focus of the anticipated non-personal data regime is simply not on individual rights.
Further, the compartmentalisation of the two forms of
data appears to be far from watertight. 'Government hospital collecting health
data of a patient (anonymised)' features in examples of non-personal data
collected by public entities to take just one item from a table in § 4 of the
report. The table goes on to suggest that the 'data collecting mechanisms,
devices, instruments, sensors, etc.' in this case are public, and that the data
is collected from or about a subject in a private space. It is entirely unclear
what this is intended to mean not least because patient data cannot be anonymous
at the point of initial collection.
It is possible that the two regimes
governing personal and non-personal data will ultimately be complementary, and
that concerns about accountability and harm reversal, where possible, are
misplaced but given that the proposals relating to both regimes are currently
fluid, it is probably not prudent to simply trust that problems which may arise
under one of them will somehow be addressed by the other. These comments therefore attempt to consider the non-personal data regime envisaged in the December 2020 report independently particularly since it is expected to function independently of the
personal data regime.
The report does recognise that the two forms of data can become enmeshed with each other; it states that 'mixed datasets that typically
have inextricably linked personal and non-personal data will be governed by the
PDP Bill' in § 5.1(v) and clarifies that de-anonymised data will be governed by the
PDP regime in § 5.2. However, at an enabling level, this does not meaningfully
acknowledge how permeable the compartments holding the two forms of data are or
how easy it is for one form to be converted into the other while, at the
enforcement level, the absence of any system to alert data principals of the
de-anonymisation of their data means that individuals whose data has been
compromised after anonymisation would likely not know that a problem had arisen.
The report 'recommends that data collectors at the time of collecting personal
data should provide a notice and offer the data principal the option to opt out
of data anonymization' in § 5.4. The opt-out mechanism is an expression of the privacy by consent
model and allows for the default to be that data would be collected for
depersonalisation and unspecified post-anonymisation use. However, in a country
where technological literacy is limited, it is perhaps worth prioritising a
privacy by design model which would, instead of having data principals opt out
of having their data be de-personalised and used, require that the default be
to disallow the de-personalisation and subsequent use of once-personal data
without the informed consent of relevant data principals having been obtained.
The Business of Data
The report labels businesses as 'Data Business' if they collect or manage either
personal or non-personal data above a certain threshold, and it anticipates
their being required to share metadata and underlying data under appropriate
regulations. To exemplify, it states that metadata which hospitals collect about
each patient such as their name, age, weight, and symptoms may be metadata
shareable under a framework detailed in § 8.
There are two primary concerns
which the recommendations relating to 'Data Business' raise. Firstly, as the
report itself explicitly states in § 6.1(iii), a Data Business 'is a horizontal
classification and not an independent industry sector' and, "Existing businesses
in various sectors that collect data will get categorized as a Data Business."
This essentially means that there are no special provisions made for persons
whose entire business is the collection, compilation, and perhaps analysis, of
existing data, and it is entirely unclear whether failing to also treat data
businesses as a separate vertical could ultimately result in businesses which exclusively in data becoming
unsustainable as businesses.
Secondly, while the example of health metadata in the report is
relatively straightforward to the extent that one would expect hospitals to collect the sort
of data mentioned in the example, it is also true that the determination of data
fields in some businesses is the result of conducting extensive market surveys
and analyses perhaps looking into consumer preferences. In such circumstances,
forcing a company to share metadata could potentially allow third-parties to
unfairly freeride on the expertise, investment, and labour of whoever has
compiled the metadata not to mention that such compilations are potentially
protectable under copyright law, and it is not inconceivable that a requirement
to share them could potentially conflict with the rights granted to copyright
owners under the 1957 Indian Copyright Act.
Further, § 6.3(i) of the report states: "The
meta-data about data being collected, stored and processed by the Data Business
is stored digitally in meta-data directories in India. Open access is provided
within India to these meta-data directories." The meaning of 'within India' is
not clear at all, and could be read to mean to anyone within India or for use
within India. In particular, the use of data obtained via the sharing mandates of the anticipated NPD regime is not restricted to non-competitive purposes due to which it has the potential to ride roughshod over the principles of competition
law. Later, § 6.4(i) of the report clarifies that organizations registered in India that will
have open access to a proposed meta-data repository, and that they can query the
repository but not download metadata. Unfortunately, this does not
resolve concerns about a level playing field or anti-competitive practices or,
for that matter, explain either why individuals have been restricted from requesting data or how
generally proscribing the download of metadata offers adequate protection to those who
have generated it.
Accountability
That data is non-rivalrous and amenable to progressive use appears to have been
accepted to be axiomatic by the report. The idea appears to underpin virtually
every aspect of the anticipated non-personal data governance framework, and it
is reflected in the enthusiasm with which the report suggests that data be
available for sharing. In fact, § 7.2(ii) of the report, amongst other things, states: "Data
being non-rivalrous, the value of data may be consumed by several organizations
and communities, without degrading its value to the relevant community."
The
mechanisms to protect data from being used for purposes that would subvert
social harmony or derogate from intellectual property laws are not especially
robust although the report does acknowledge and seek to be compatible with the
latter (not least in § 8.6), and to prevent harms arising from the unrestrained
use of data by suggesting processes through which proposed uses of data may be
vetted.
Significantly, there are no provisions worth mentioning in the report
about affixing any sort of accountability, let alone liability, for the
mishandling of non-personal data. The lack of potential liability is a theme that
appears to run through the length of the report not just in relation to data
principals (who are largely beyond the purview of the NPD regime) but also in relation to data- and metadata sets though it is not
always a significant concern. For example, it is not clear what penalties a
company's failure to disclose or share metadata could attract but, considering
that the requirement to share metadata may itself be open to challenge possibly on account of incompatibility with intellectual property laws, affixing
liability for non-disclosure is probably a secondary issue. In other cases, however, the lack of
accountability is a concern. § 7.4(iv) of the report states that 'data
custodians have a responsibility towards responsible data stewardship and a
'duty of care' to the concerned community in relation to handling non-personal
data related to it', for example, but apart from requiring custodians to have
mechanisms to swiftly remedy accidental harms resulting from leveraging
non-personal data, there are no clear mechanisms through which data custodians
who fail in the discharge of their responsibilities may be held to account. So
too is the case of the data trustee who fails in its responsibility, defined in
§ 7.7(ii) of the report, to ensure that persons are not harmed by the de-anonymisation of
non-personal data.
There are several issues that the report does not deal with
such as the sharing of data for a business purpose (as § 8.3 of the report highlights). It isn't
clear why, when a data governance set up is being formulated from scratch,
certain transactions such as those involving data sharing between businesses
should seemingly be left to self-regulation despite the fact that doing so also removes access to
such protections as the law may offer to those whose data is mishandled. Currently, the framework to support the development of a data protection and governance regime is nowhere near complete.
The idea that data is apolitical in itself and endlessly available to propel human progress into previously uncharted territories is, frankly, terrifying not least because it is unsupportable. Unfortunately, this is the understanding of data that seems to be creeping into policy discourse which, though it acknowledges that data can potentially wreak havoc, fails to productively engage with how best to prevent and address possible harms.
This post is by Nandita Saikia and was first published at IN Content Law.