What is ETL? Why do I care?


The term ETL stands for Extract-Transform-Load. It may sound technical but there are valuable business applications that rely on it. ETL represents a process or set of processes to conduct a number of data management activities. Most companies around the world will use ETL services directly or indirectly through a vendor. From a high level overview, the process normally conducts an extract of data from a source, transforms that data and loads it into a system or Data Warehouse.

That’s great, but how does this ETL process help companies generate value? ETL is like any industrial process taking a raw material and building something of greater value from it. In the case of ETL the raw material starts with a single or multiple source of data sets. The data can be a list of customer investment accounts or inventory of products sold, etc. There is an endless amount of possibilities when it comes to data sources in the world today.

The extract stage is just the beginning. We take our raw material (source data) and extract it into a system that can be used in the next stage of the ETL process. The complexity of the extract stage can be as simple as copying from one file to another or a simple data import directly to a Data Base.  The complexity begins when you have multiple data sources, different file formats, the list goes on.

Another critical activity that can be conducted during the extract stage is to validate the data. If the source data used within the ETL process changes or the data is corrupted somehow, the entire downstream process will be negatively affected. Therefore it’s important that quality checks are used to confirm the source data is in the correct format and accurate to the expectations of the downstream transformation stage.

Our next stage is to transform the validated data in some manner. Transforming data is normally used to strip out unnecessary data fields, combine data into a new structure, perform calculations or some other type of manipulation. This is normally where the magic happens and our raw material begins the process of creating value. This stage is normally the most difficult and error prone stage of the three.

Moving into the third stage, the transformed data is loaded into a system that can be used by other applications or directly by individuals to derive the value the transformed data presents. Typically, this data is loaded into a large system called a Data Warehouse. A Data Warehouse can be connected to multiple applications that can use the data in reports, new calculations, the list goes on.

At InvestorCOM some of the main systems we use for our transformed client data are document composition applications which create professional documents for the investment industry. These documents can be printed, emailed or displayed on various types of web portals we create. By automating the entire ETL process InvestorCOM is able to produce thousands of documents a day and distribute them to our clients’ customers around Canada using various mediums.

In the end, the reason you may care about this is that likely every type of data that is sent to you or viewed on your screen could have at one point in time been run through an ETL process. Your company may also have raw data that has the potential for greater value – it’s just waiting for the right ETL process to be applied.