Sources of Most Popular Leak Databases

Where does the information that is made available on leak database websites come from? Particular insiders as well as malicious hackers that damage private property and steal valuable personal information distribute information from data breaches on the Internet. Some websites offer single services such as searching the Instagram leaks or Myspace leaks. Others aggregate as much of the data as they can; HaveIBeenPwned, a leak database has over 5.9 billion credentials stored that are accessed to provide comparisons to users and clients. The individual databases from thousands of websites are what serve as the tier one sources for these leak databases:

• Webhosting services are a major source of user accounts within the Leakprobe leak database. The most popular webhosting companies are 000webhost (1,041,662 user accounts), Hostinger (1,000,002 user accounts), and Freedom Hosting II (380,830 user accounts).

• Another source is social networking sites, dating back as far as Myspace.com – this is actually the largest single data breach that was aggregated, including 359,420,698 user accounts. Other social networking sites that are sources are Twitter (40,000,000 user accounts), Facebook (5 million user accounts), Instagram (6,000,000 user accounts), and Linkedin (40,453,112 user accounts).

• Perhaps the largest category of sources for user accounts from data breaches are pornographic websites. These contribute to at least 10% of the leak database, and are Brazzers (928,072 user accounts), Xhamster (378,991 user accounts), and Youporn (1,327,567 user accounts). There are many related niches such as adult dating (Adultfriendfinder [3,867,997 user accounts]) or chat websites (Stickcam [531,000 user accounts]) that contribute to the leak database as well.

From a list of the top 100 most popular websites that have been visited, Leakprobe has aggregated databases from dozens of them, including Dropbox, Google, Yahoo, and Apple. Of course further categorizational studies can be performed on the data that is currently available, and experiments can tell which websites are the greatest sources of data breaches as well as make predictions about what kind of websites are vulnerable. The scope of the problem derived from sources is that single databases can expose large amounts of users (The entire Yahoo database is up for sale but not yet added to Leakprobe), and data breaches still occur well up into the present day (My Fitness Pal data breach occurring very recently).