Data Dictionary Part 1: A Beginner’s Guide to Big Data Terminology

Data dictionary is a recurring series that will help you explore key terminology surrounding Big Data

Welcome to our Data Dictionary series, where we will explore the terms you hear flying around news stories and blog posts about Big Data.

Learning about Big Data involves familiarizing yourself with the many admittedly confusing terms thrown around in the tech industry. For those who may be a bit overwhelmed with the jargon, here are some core terms and concepts that will hopefully help simplify the language:

Cloud computing: When we say “the cloud”, we usually mean a distant server that is accessed through the internet. Rather than storing information locally on a user’s hard drive, data is stored on the server hard drive. Cloud computing takes this one step further and implies software or data that is stored on and run from these servers.

Internet of things: One term you may hear quite often these days is the internet of things (IoT). Enabled by Big Data, the IoT describes the ability for designers to develop appliances, machines, medical devices and other forms of smart technology that are connected to the internet, transmitting and utilizing data to assist their functionality. Think of self-driving cars as a prominent example.

Machine learning: Big Data enables more than comprehensive analysis and internet-enabled products; it also can be used by developers to create programs that can improve themselves as they analyze and absorb this data. Machine learning is a core concept in artificial intelligence and data mining.

Data mining: This term refers to algorithms that comb through big collections of data and are programmed to seek out and extract specific types of information. By mining through data using efficient search algorithms, analysts can take even the largest repositories of data and find exactly what they need in very little time.

Analytics: An umbrella term for the many different ways of analyzing data, analytics are used to create insights that inform decisions makers. Typically done using algorithms that help provide substantial evidence to support a decision, analytics are used in the world of sports, business, healthcare, and other such areas.

Algorithm: For those new to this term — a term nearly ubiquitous with discussing computer technology today — it helps to consider an “algorithm” as a formula or set of instructions that a computer can use to execute a specific task. Big Data researchers use algorithms to help manage, search through and understand the information available to them.

Hadoop: Big Data is, as you might expect, big: Large data sets can take up a tremendous amount of space if stored in one location. To remedy this situation, Hadoop is used. Hadoop is a software framework that essentially takes these large data sets and divides them into multiple parts so that they can be stored in more than one location, while still enabling effective analysis of the data set as a whole.

Natural Language Processing: The algorithms that developers use to help computers better understand normal human speech, as opposed to only being able to understand specific terms and artificial sentences.

SQL / NoSQL: SQL is a system used to organize databases into tables to facilitate data retrieval. In contrast, NoSQL is a system that uses documents instead of tables or other forms of data categorization in its databases.

Software as a Service (SaaS): A phenomenon increasingly found in the tech world today, SaaS refers to a company’s software that is effectively leased to customers through a subscription service and accessed through the internet rather than sold in its entirety.

Structured and Unstructured Data: Structured data is information that is able to be organized in some way in a relational table or category. Unstructured data is the opposite: General information that cannot be labelled or understood under a single heading, such as free-form text.


Please enter your comment!
Please enter your name here