Skip to main content

The Legality of Making Sense of Data

Data ‘sovereignty’ has become an inescapable buzzword in Indian discussions on data and its use. We’re told, quite accurately, that India with its large population generates what is essentially an untapped goldmine through data, and that we should, as a country, make the most of it. Unfortunately, it isn't entirely clear what that means in its specifics or how we could achieve sovereignty whilst protecting both national interests and individual rights without subsuming one into the other. In this context, the term 'data localisation' pops up often enough but it isn't obvious that the term is tremendously meaningful with reference to contemporary technology particularly since it may not be easily implementable, if at all. Compounding the issue are doubts about our having a plethora of privacy shields at our disposal comparable to the EU-US one which protects transatlantic data flows.

It’s been a while now since we’ve been grappling with large quantities of data: as the costs and ease with which data can be saved have decreased, our tendency to store all the data we can has increased. After all, as a hoarder might explain: what’s retained just might come in useful one day, which would be all well and good if we limited ourselves to storing only our own data. That, however, is not the case: as individuals we may also store others’ data and contribute to their data being publicly exposed without their consent.

What else is the enthusiastic automated suggestion to tag one’s friends in SocMed photos than the barely-consensual breach of another person’s privacy? It’s true enough that a person can opt to disallow others from tagging them but the process is far from intuitive, and the default is often that such tagging is permissible through the standard terms of use of SocMed platforms. Without a certain degree of techno-legal savvy, the choice not to be tagged and to have at least that one aspect of one’s privacy be protected is largely illusory.

Ultimately, the choices we make online tend to find themselves being the substance of data mining processes which find patterns in large amounts of data. Obviously, developing the parameters upon which these patterns are discerned relies heavily on what one’s own beliefs are and data mining is not neutral.

Consider something as simple as determining what length of hemline a dress sold in a particular area would be: you have data about women, hemline lengths worn in the region, age, the percentage of married women, and religion which possibly all play a role in the decisions women make. Except that maybe you fail to factor in ‘climate’ which, in that area, is the single greatest determinant of the choices women make. And, so, present yourself a series of assumptions about why women choose the clothes they do without realising that the length of hemlines may have far less to do with socio-sexual practices than the simple desire of wanting to avoid either a heat-stroke or hypothermia. And, that, of course is a mistake which it is all too easy to see a software techie dudebros making: the industry is not known to be especially welcoming of women.

Big data, it could be argued, presents not so much the opportunity to eliminate our personal prejudices through the use of technology but to express them at scale. And where data is collected indiscriminately, nobody is exempt from the consequences of attempts to analyse big data, which is why it is important that we have clear data protection rules and a transparent understanding of what we’re doing with data.

Legal Recognition

The intersection of human choice and technology is one which the law has been trying to traverse safely for some time now. In the landmark 2017 case of Puttaswamy v UoI in which the Supreme Court recognised that privacy is a fundamental right, it said:

Data mining with the object of ensuring that resources are properly deployed to legitimate beneficiaries is a valid ground for the state to insist on the collection of authentic data. But, the data which the state has collected has to be utilised for legitimate purposes of the state and ought not to be utilised unauthorizedly for extraneous purposes. This will ensure that the legitimate concerns of the state are duly safeguarded while, at the same time, protecting privacy concerns. Prevention and investigation of crime and protection of the revenue are among the legitimate aims of the state. Digital platforms are a vital tool of ensuring good governance in a social welfare state. Information technology – legitimately deployed is a powerful enabler in the spread of innovation and knowledge.  A distinction has been made in contemporary literature between anonymity on one hand and privacy on the other. Both anonymity and privacy prevent others from gaining access to pieces of personal information yet they do so in opposite ways. Privacy involves hiding information whereas anonymity involves hiding what makes it personal. An unauthorised parting of the medical records of an individual which have been furnished to a hospital will amount to an invasion of privacy. On the other hand, the state may assert a legitimate interest in analysing data borne from hospital records to understand and deal with a public health epidemic such as malaria or dengue to obviate a serious impact on the population. If the State preserves the anonymity of the individual it could legitimately assert a valid state interest in the preservation of public health to design appropriate policy interventions on the basis of the data available to it.
The recognition of these issues, however, did not result in immediate legislative action: a contention made before the Madras High Court in a case between the Tamil Nadu Chemists and Druggists Association and Union of India, decided in 2017, suggested that the rules under the 1940 the Drugs and Cosmetics Act governing the sale of medicines online were inadequate, and that "There is no guarantee for data privacy if the medicines are sold on-line. Disease and treatment are the private information of the patients, which cannot be made available for data mining and for commercial use by on-line pharmacists." Although the court did not delve deeply into the fear of data mining, an almost comparable issue made its way to the US Supreme Court in Sorrell v. IMS Health Inc. decided in 2011. In that matter, the court struck down Vermont's Prescription Confidentiality Law saying:
Vermont law restricts the sale, disclosure, and use of pharmacy records that reveal the prescribing practices of individual doctors. Vt. Stat. Ann., Tit. 18, §4631 (Supp. 2010). Subject to certain exceptions, the information may not be sold, disclosed by pharmacies for marketing purposes, or used for marketing by pharmaceutical manufacturers. Vermont argues that its prohibitions safeguard medical privacy and diminish the likelihood that marketing will lead to prescription decisions not in the best interests of patients or the State. It can be assumed that these interests are significant. Speech in aid of pharmaceutical marketing, however, is a form of expression protected by the Free Speech Clause of the First Amendment. As a consequence, Vermont’s statute must be subjected to heightened judicial scrutiny. The law cannot satisfy that standard.
There was also a dissenting opinion in the matter by Justice Breyer who was joined by Justice Ginsburg and Justice Kagan:
The Vermont statute before us adversely affects expression in one, and only one, way. It deprives pharmaceutical and data-mining companies of data, collected pursuant to the government’s regulatory mandate, that could help pharmaceutical companies create better sales messages. In my view, this effect on expression is inextricably related to a lawful governmental effort to regulate a commercial enterprise. The First Amendment does not require courts to apply a special “heightened” standard of review when reviewing such an effort. And, in any event, the statute meets the First Amendment standard this Court has previously applied when the government seeks to regulate commercial speech. For any or all of these reasons, the Court should uphold the statute as constitutional
It is not difficult to see that there are valid arguments to be made regardless of which 'side' one is on, and sooner or later, they are arguments which India will have to determine for itself. We are not going to be able to sidestep them in the push for Digital India although, so far, limited digitalisation has meant that we have been able to watch how these issues have played out in other jurisdictions without making firm commitments ourselves.

Discovery of Information in Litigation

One of the first issues that strikes one when it comes to data mining is how to balance personal privacy against public interest. For example, in litigation, are parties allowed to collect each others' data and use it in an attempt to disprove their opponents' claims. There is very little opposition to say, insurance companies trawling through accessible images of people's holidays should they attempt to claim compensation for having their holidays ruined by a tummy bug while their online updates tell quite a different story. However, the standards we apply as a society to holiday insurance fraud are unlikely to be the same as those which would be applied to, say, rape. Would it be fair to require an alleged victim to turn over all of their communications to a third party in order to have it be sifted through to either corroborate or negate their allegations? What if those communications were anyway publicly accessible; could they then be used?

In the US, when a semi-professional basketball player claimed that he became disabled as the result of an automobile accident in Vasquez-Santos v Mathew 2019 NY Slip Op 00541 decided on January 24, 2019, the court allowed eDiscovery noting that 'private social media information can be discoverable to the extent it "contradicts or conflicts with [a] plaintiff's alleged restrictions, disabilities, and losses, and other claims" (Patterson v Turner Const. Co., 88 AD3d 617, 618 [1st Dept 2011])' although it limited access to the plaintiff's accounts and devices in time to those items posted or sent after the accident and in subject matter to those items discussing or showing defendant engaging in basketball or other similar physical activities.

At the moment, in India, we have no clear understanding of what is permissible and what isn't, much less of what should be permissible and seem to tend to 'play it by ear' and hope for the best, so to speak. We certainly recognise electronic documents but tend not to be certain of how to handle eDiscovery as a process.

Data Quality and Consent

Amongst the most important issues, when it comes to data mining is the quality of data and consent for the data having been made available. Take the example of the basketball player in Vasquez-Santos v Mathew, for example. The relevant court order states: "Although plaintiff testified that pictures depicting him playing basketball, which were posted on social media after the accident [which he claims made him disabled], were in games played before the accident, defendant is entitled to discovery to rebut such claims and defend against plaintiff's claims of injury. That plaintiff did not take the pictures himself is of no import. He was "tagged," thus allowing him access to them, and others were sent to his phone."

The court's order seems to indicate that the data available could be bad, in which case, it would be useless to the defendant. This highlights the importance of data being kept up-to-date. However, apart from sparse provisions in the Privacy Rules, 2011, there is little in India which allows individuals to ensure that their data is in fact correct and up to date and even those provisions would do little to aid anyone in a case such as this. What data quality Rules exist tend to be piecemeal in a number of different instruments: for example, the Drugs and Cosmetics (Amendment) Rules, 2019, were notified by the Central Government on 10 January 2019, to make the following insertion into the law:

84AB. Information to be uploaded by the licensee on online portal SUGAM. (1) The licensee granted license under this Part shall register with portal SUGAM ( and upload information, as per the format provided in the said portal, pertaining to the licences granted for manufacture for sale or distribution of drugs and the information so provided shall be updated from time to time. (2) The information uploaded by the licensee with SUGAM portal under sub-rule (1), shall be verified by the concerned Licensing Authority.
Useful though the are in terms of helping to maintain accurate databases, disparate laws do little to enhance data quality in general, and which could prove to be problematic.

The second issue which the basketball player's case highlights is one of agency and autonomy. The individual did not appear to have control over what was data over himself possibly in part because he allowed himself to be tagged by others in SocMed posts. In essence, a version of his identity was being created by others.

While there are instances where one might have little sympathy for a person's whose rights to privacy are violated by the commentary of others on their lives – would we want an adult criminal's history to be entirely under the carpet, for example – the fact that an identity can be so constructed by others highlights the need to ensure a basic standard of privacy by design rather than to merely facilitate privacy by consent through standard form check-box contracts. That line of thought must, however, work side by side with an understanding of the fact that the right to control what is known of one may be outweighed by others' right to information.

In other words, one's right to be forgotten or at least not to be indexed by a search engine may be superseded by the public's right to know. This issue was considered by the ECJ in the case of Google Spain SL, Google Inc. v Agencia Española de Protección de Datos (AEPD), Mario Costeja González, which veered towards making personal information inaccessible but, with the fight to have it be made inaccessible becoming as interesting as it did, the information sought to be hidden is now inescapable.

In India too, courts have tended to respect the right to privacy where there is no overriding public interest in having information be made public, as was seen in the case of a rape victim who wanted her name redacted in the judgment mentioning her name that was reproduced online. There is no statutory clear basis to enforce the right to be forgotten in India though and authorities are not limited to having information be de-indexed. 

Offline too, similar dynamics emerge. In the 2016 case of Mrs S Uppal vs Ministry Of Health & Family, the Central Information Commission upheld the denial of access to a doctor's service book which had been sought via an RTI application along with copies of certain pages stating:
After hearing parties and perusal of record, the Commission observes that the query under RTI seeking copies of some pages of service book enmasse is nothing short of data mining and indeed is an invasion on the privacy of an individual. During the hearing the Appellant has stated that he seeks only those information from the service book which do not fall within the ambit of personal information of the employee and hence the personal information may be redacted while supplying remaining information. However, the RTI application filed by the appellant is not accordingly worded. Hence, it is advised that the Appellant may file fresh RTI application indicating his exact query. In terms of the celebrated decision of the Apex Court in the case of Girish Ramchandra Deshpande, there is no doubt that: "....the performance of an employee/officer in an organization is primarily a matter between the employee and the employer and normally those aspects are governed by the service rules which fall under the expression "personal information", the disclosure of which has no relationship to any public activity or public interest. On the other hand, the disclosure of which would cause unwarranted invasion of privacy of that individual...
Cases that make it to the courts tend to be outliers but incidents in our own ordinary, everyday lives are not. It isn't at all common these days for a new acquaintance not to run a quick online search of our names to learn more about us. Ideally, we'd like to be able to have some degree of control over what they find but that can be hard when one's doting aunt posts pictures of one covered with cake at the age of three. The photos themselves may be innocuous but they could well be images which we'd perhaps prefer not to share in, say, certain professional settings.

The Treatment of Data Mining

Thus far, Indian courts have had limited opportunity to consider data mining. In the 2016 case of Karmanya Singh Sareen v. Union of India, the Delhi High Court tangentially indicated that what would hold sway would be consent. This was in a matter that developed upon Facebook's acquisition of WhatsApp giving rise to fears that users' data would be mined and misused.

Privileging consent is largely in line with current Indian jurisprudence and legal mandate: data mining is usually preceded by the acquisition of large data sets whether by buying databases or scraping websites for data. This could potentially give rise a variety of actions under tort law, civil law, and criminal law including those related to breach of contract, commercial misappropriation and unfair competition, unjust enrichment, breaches of privacy and, possibly as a subset of contractual breaches, violations of confidentiality, and the violation of intellectual property rights not least in terms of ‘moral rights’ violations and copyright infringement (assuming the data acquired were copyrightable) as well as in terms of trade mark violations including reverse passing off.

Critically, unauthorised website scraping does seem to be frowned upon by statute, specifically, Section 43 of the 2000 Information Technology Act, which forms a solid though perhaps misplaced foundation upon which to privilege on consent above all else. Given that consent may come to mean little in a world where almost noone reads EULAs and other standard form contracts, and fewer still realise that doing so may be prudent, it would probably make sense to guarantee individuals baseline rights, the violation of which would render an agreement unconscionable and consequently unenforceable, so that consent does not have the opportunity to become the be all and end all of individual rights.

Consent is easily co-opted and choice invariably inhibited. Noone should be able to accidentally consent to having their own lives be derailed which, given that big data is now ubiquitous, is a real fear.

(This post is by Nandita Saikia and was first published at IN Content Law.)

Note, August 2022The Indian position on text and mining is unclear. In 2019, I'd written: "Indian courts have had limited opportunity to consider data mining. In the 2016 case of Karmanya Singh Sareen v. Union of India, the Delhi High Court tangentially indicated that what would hold sway would be consent. .... [I'd commented:] Consent is easily co-opted and choice invariably inhibited. Noone should be able to accidentally consent to having their own lives be derailed which, given that big data is now ubiquitous, is a real fear."  

However, TDM isn't just about techno-capitalism but also about research. 

In India, there has occasionally been talk of large-scale projects to facilitate TDM of academic papers for the purpose of research despite the absence of a statutory provision to explicitly support such an endeavour. 

Section 52 of the Indian Copyright Act certainly does contain an exception for research but that exception is limited, and it doesn't appear to envisage infringement by any one person on Titanic scales to facilitate access to mined information by any class of people. In the absence of an explicit provision allowing it, having a person facilitate TDM for researchers would merely amount to an unauthorised change of gatekeeping — even if, unlike traditional publishers, a person engaging in such a project were to make access widely available, that person would still be a gatekeeper. After all, gatekeepers remain gatekeepers whether they slam doors shut or open them wide. And our law simply does not envisage changes in the identity of gatekeepers by usurpation or the like.

In this context, it's interesting to see that the UK appears to be considering amending its copyright law to introduce a broad TDM exception to copyright infringement (for any purpose), one of the factors in its decision being the fact of user-researchers otherwise being priced out of access to academic works. (See: Consultation outcome Artificial Intelligence and Intellectual Property: copyright and patents: Government response to consultation (Updated 28 June 2022); Paras. 7, 41 & 44)

This is probably an amendment in the works worth following closely given our own situation and per capita income here in India.