Big Data refers to a very large set of data or a massive volume of data. Every day we produce a lot of data, such as emails we send, social media posts, online articles and videos, GPS signals and more. This dataset is thus Big Data.
Faced with this huge amount of data, we must be able to navigate and process the data.
Big Data is in fact the ability to process large volumes of information with more and more standard computing means.
These volumes of information or data are of interest to many sectors such as tourism, commerce, advertising, genetics, astronomy or human resources. Big data can be considered as the new black gold of the digital age.
To manage Big Data, it is necessary to collect, process, analyze this data and take all the relevant actions following the analysis of this data.
To process Big Data, we use NoSQL databases that perform better than traditional SQL databases (key / value, column, row, table).
The big data phenomenon can be characterized by the 5V:
Volume is the mass of information produced every second. In 2000, 20% of the data was digital and the rest was analog. In 2015, 98% of the data is now digital and the rest is analog. This data is produced by personal computers, smartphones, tablets and other devices.
Every minute we produce:
- 216000 photos on instagram
- 270000 tweets
- 30 billion instant messages
- 200 million emails
Most of this data is collected by two companies that are:
- Google with Gmail, the Google search engine, Android and Youtube
- Facebook with instagram and WhatsApp
These two companies accumulate this data in order to process it with the objective of retaining users in order to accumulate the maximum amount of data to monetize from their advertisers.
Velocity refers to the speed of development and deployment of new data.
Variety refers to different types of data such as images, videos, texts, voices, and others. In all of this data, 80% of this data is unstructured and the remaining 20% is structured data that is stored in relational data tables.
Veracity represents the credibility and reliability of the data collected. Since a large amount of data is collected, not all content is authentic. For example, on Twitter, some messages may contain shells, abbreviations, or familiar language.
Value is the profit that can be derived from the use of Big Data.
Now that we know what big data is, you have to know that in reality, the volume of data is not the priority, the priority is to have the right data, we could call that , the Right Data.
To illustrate Right Data, two companies that use this principle are Uber and Netflix.
Uber is an American company that develops mobile applications to connect users and drivers to carry out transportation services. Uber collects a massive amount of data from their mobile application used by their drivers and customers, but it does not just collect its data, it collects mostly relevant data that allows it to connect customers (consumers ) and drivers (service providers). Identifying the customer’s need (having a car) and the geographic location of the customer’s need (looking for the customer there) are the two right data that allowed Uber to render the taxis obsolete.
Netflix is an American company that offers a platform for movies and television series streaming on the internet. In 2016, 71 million people used the Netflix streaming service, these millions of users generate data that will be collected and analyzed to better understand the habits of viewers. Users’ preference for this type of film is Netflix’s Right Data, which has set up a recommendation engine and keywords for each television series. Based on what users have liked the most, keyword-based suggestions will be offered to users.
In conclusion, it is better to collect relevant data in order to process them effectively than a large body of data where all relevant and irrelevant data have been mixed.