Data Breaches – So How Bad is it Getting?

Now that I am back poking around Cybersecurity after a year hiatus, the first order of business is figuring out just how bad is it really getting in terms of the number and scope of data breaches.   

But let’s first define what is a “data breach”.  I found this to be reasonable definition:  “a data breach is a confirmed incident in which sensitive, confidential or otherwise protected data has been accessed and/or disclosed in an unauthorized fashion. Data breaches may involve personal health information (PHI), personally identifiable information (PII), trade secrets or intellectual property.” 

And before I give you some stats that I was able to dig up, the first thing that became readily apparent in doing this exercise was that there is really no definitive source of breach reporting, at least for the US (with one exception — the US healthcare industry given HIPAA … more on that below and in future blogs). But given that there is no federal statute for other verticals to require breach notification and there are plenty of holes (and lack of enforcement) in the patchwork of various state laws, as Politifact notes “it nearly impossible to know the full extent to which data breaches have impacted American consumers and businesses.” So I have to present to the reader of this blog post a number of stats from a number of sources. And it also appears that many of these sources appear to have some bias, and none of them are able to see the full extent of the breaches occurring as they rely on public disclosure. But onwards …

One of the more cited research into data breach trends comes from the Identity Theft Resource Center (ITRC). Statista took their data and graphed it up per the screen shot below. ITRC calculated that the number of publicly available and disclosed breaches in the US actually decreased from 2017 to 2018, but the number of records compromised doubled from 2017 to 2018 — to a whopping 446 million compromised records. Nonetheless the number of breaches in 2018 is at the second highest level since since 2005. ITRC is a non-profit that appears to be sponsored by a number of cybersecurity and identity theft protection vendors, so in theory would be motivated to calculate as many breaches and exposed records as possible, but ITRC admits that “the actual total number of exposed records likely exceeds the reported number substantially” due to their report being based on what organizations have disclosed.


Another source of breach data I found is on this site called the Breach Level Index, which appears to be part of a marketing awareness campaign by cybersecurity vendor Gemalto. Like the ITRC report, they rely on public disclosures but factor in worldwide breaches (vs. just the US). And like the ITRC report, they show a leveling of the number of public disclosures of breaches, but show that the millions of records exposed are getting bigger each year. i.e. same number of breaches, but each breach has a bigger bang for the buck. I took their numbers and created the spreadsheet below. Unfortunately their data analysis stopped midway through 2018 (maybe that part of the marketing budget got cut?)

Another security vendor, Risk Based Security (RBS), does an annual report based on their scouring the internet for reports of breaches. They too found that the number of breaches on a worldwide basis has flattened out and 2018 saw a drop in the number of records exposed.


But this may be because the number of records exposed in 2016 and 2017 were so massive compared to prior years. In those years we saw some big whale breaches (e.g Yahoo at 3.5 billion and RiverCity at 1.5 billion — see chart below). So in theory it would be hard for 2018 to match (but ironically it almost did). And RBS did recently report that in the first half of 2019 that they calculated that the number of reported breaches was up 54% and the number of exposed records was up 52% when comparing the first 6 months of 2018 and 2019.


Finally, the Privacy Rights Clearinghouse documents that the number of publicly disclosed breaches have remained the same the last few years …


… and there has been a dip in 2018 in the number of breached records from the all-time high in 2016 (thanks Yahoo!), but a significant rise from 2010-2013 levels. And note we are still talking about 1.5 billion records in 2018.

Same source as above (Politifact)

So what is there to conclude?

  1. At first glance, based on these four data points, the number of breaches appear to be remaining the same the last 2-3 years but the number of compromised records is growing dramatically, i.e. the hackers are getting more bang for their buck.
  2. Then again, one could argue that because there is more pressure for entities to report their breaches (due to laws such as Europe’s General Data Protection Regulation aka GDPR, the upcoming California Consumer Privacy Act aka CCPA and other recently enacted state laws), that maybe the numbers from pre-2017 were understated due to lack of public disclosure. So maybe the rapid growth in the number of compromised records and breaches is not as dramatic as one can think. Sure I can buy that. But the numbers of breached records are still just insane. And the one initial data point in the post GDPR world, i.e. the numbers from RBS first half of 2019 report, does show breach activity still going up 50% year over year.
  3. Speaking of insanity, it is insane that in the US we don’t have a central and authoritative repository where breaches must be reported into. We have federal statistics on livestock, the weather, etc. but not one for the biggest threats to our economy and national security. We need this to fully understand the magnitude of the problem and be able to use this data to help us solve the problem by looking at attack vectors. But probably equally important, it would be great if Americans can have a centralized location to see if they were breached, or, at the very least, if one of their service providers has been breached.

Let’s talk about point #3 a bit more. It is interesting that in the Europe Union, with the implementation of the GDPR and the requirements for breach reporting, that the European Data Protection Board can be very exact on the number of breaches across all member countries (the EU’s ability to fine entities not reporting data breaches no doubt certainly motivates disclosure). They reported the specific number of 89,271 from May of 2018 to 2019 (the first full year of GDPR being enforced). This number in Europe is as much as 80x the numbers that the above studies have showed for US companies in roughly the same time period, so we are clearly not getting an accurate look into the breach landscape without a national breach notification law.

Ironically, in the US we have great visibility into healthcare breaches via HIPAA. Look at the level of detail on this webpage that we have for breaches in this vertical as there is a requirement to notify the Health and Human Service (HHS) department and failure to do so can lead to fines. I posted a few screenshots below.

US Healthcare Breaches. Source:
US healthcare records breached by year. Source: same as above
Source: same as above

Sigh, would love to be able to have the same across the board, not just for healthcare. I will definitely be blogging more on breach notifications when I drill into HIPAA, GDPR and the CCPA, and my belief that we need a federal Privacy and Data Protection Act. But next up I think I will drill down on trends in cybersecurity spending to see what directions those trend lines are going.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s