NetworkTigers on data mining and what it means for cybersecurity.
Data mining is a controversial topic in many circles. While many companies and research entities have made powerful use of this data collection and analysis technique, consumer confidence in companies that utilize data mining is easily shaken. Privacy violations, concerns over monetization, and the high potential for misuse all contribute to common cybersecurity concerns regarding data mining.
What is data mining?
Data mining is the practice of collecting and categorizing subsets of information. Data packets generated online by users can be assembled and used to predict trends, forecast upcoming events, and understand existing patterns. Data mining in and of itself is simply a collection and analysis tool used often by businesses and governments to understand their constituents and consumers.
It is important to understand that data mining differs from a data breach, even though the two can go hand in hand. Data breaches often occur when third parties, usually cybercriminal groups or hackers, access mined information. Data breaches can involve copying, viewing, accessing, or using stolen or private data. Data mining, on the other hand, is simply the collection and analysis of raw data. It can be broken down into the four following steps:
- Collection and accumulation: Data can be mined through user activity or purchased through third-party collectors or external sources like social media sites. Data may be warehoused for periods before it is subjected to computer processing.
- Preparation: Data may be formatted differently or involve incorrect statistics when it is collected. During the preparation phase, data will be standardized for analysis.
- Analysis: Data will be grouped into clusters by researchers or companies. Data points will be used to create predictive or analytical models.
- Evaluation: This can be understood as the checking phase. Data will be compared against other industry assessments and metrics.
- Interpretation and consultation: Data can then be presented to facilitate decision-making, such as where investment should occur and what areas need improvement.
Is data mining illegal?
In most circumstances, data mining is not illegal. However, it has been scrutinized recently as user data has been collected and sold to third parties without their knowledge or consent. Most data mining is not currently regulated by federal law in the United States. However, the European Union and the state of California have passed laws to create transparency over what kinds of data are being collected and how they are being used.
Is data mining good?
Whether or not data mining is a good practice depends on who you ask. Data mining can be used to personalize wearable devices, improve user experience, and even save lives through advancements in medical technology and preemptive treatments. Data mining is increasingly used in city planning, like in the case of Uber sharing its user data with city of Boston officials to analyze common traffic patterns and improve transportation options. It is also incorporated more seamlessly into everyday life as the Internet of Things expands. One easy example is the Google Nest thermostat, which now adjusts its settings autonomously as it learns its user’s preferences through data collection.
On the flip side, many consumers and internet users are concerned about the amount of information they are volunteering to companies seeking to exploit them for profit – when they even realize the transaction is taking place. A Harvard Business Review study shows that most people in multiple sectors do not know they’re sharing data:
- 27% realize that they are sharing their social network friends list
- 25% are aware when they are sharing their location through connected apps and websites
- 23% are aware that their web searches are subject to data mining
- 18% are aware that their communication history, including chat logs, can be subject to collection
- 17% know their IP address is not always private
- 14% realize their web-surfing activity can be data mined.
Data mining feels too much like surveillance to be a comfortable concept for many. For instance, Target raised eyebrows as far back as 2012 by revealing pregnancies in women who had chosen to keep that information private by sending coupons for cribs and maternity wear to their households in the mail. The company knew that the women were pregnant – as well as estimates for how far along they were in the pregnancy – by whether or not they were flagged as having purchased a certain amount of 25 “pregnancy indicator” products, such as unscented lotion or specific vitamins.
Understanding data mining and cybersecurity
Data mining is a powerful tool for cybersecurity efforts. Aggregating data can help pinpoint where weak spots are in cybersecurity practices and create relevant data sets for AI security methods. Data mining can pinpoint commonly used or recycled passwords, find patterns in phishing attempts or attacks by threat actors, and detect illegal activity happening over the internet. As long as data is appropriately collected, sorted, and prepared, it can be valuable in detecting fraud events and filtering spam.
On the other hand, data mining creates a vulnerability in systems. Aggregating data, especially when collections are not adequately protected, places consumer information at greater risk of a third-party attack. Data mining collections represent the motherload for hackers, with one successful breach allowing access to tons of aggregated data. Data mining can reveal passwords, personal information, financial data, consumer habits, medical histories, location information, and more, all in one fell swoop. A breach or improper use of consumer data can open a company to legal action, like the 2010s Cambridge Analytica scandal that cost Facebook $100 million in damages.
The takeaway on data mining
The pros of data mining are extensive, but so are the cons. Data mining done right is expensive. Data may be relatively easy to collect on today’s internet, but storing, sorting, and protecting consumer data is another practice altogether. Businesses seeking to engage in data mining, mainly to track down unexplained payments and protect against cyberthreats, must be prepared to protect their own collection and analytical systems for data mining to be practical and effective.
