Here at Magento BI, we're more than just data wranglers and handlers. We consider ourselves explorers. Every day, we dive deep into the databases of a wide variety of business types to help clients like you draw actionable insights from raw data.
The primary benefit of using an operational database for business intelligence is that nothing needs to be built or modified in order to collect data. Valuable information is already there; all you need to do is unlock it.
That being said, how can you get the most out of your raw data? We put together some recommendations that'll help you optimize your database for analysis.
Don't delete data
The local and international laws that affect your business (and your own terms of service) may affect what types of data you can retain and how long you can retain it for. Compliance with these laws should be your first priority.
When an order is cancelled, a user deactivates their account, or a product is discontinued, it's tempting to delete the associated information in the database. Tables grow and eliminating clutter seems like a prudent idea. However, deleting rows means that this information is lost forever or that you'll need to dig through old backups to find it.
What can you do instead? You can add a status column to the table to which will indicate when the row is no longer active or relevant. We also recommend adding a column that stores the date the change was made, or creating a log for historical changes. If tables grow large enough that performance begins to suffer, consider archiving the old data to a table used for analytics.
Rarely overwrite data
We can't stress this enough: overwriting data should be done sparingly and with caution.
Let's use login dates as an example. Many companies will store the last login date rather than a table of historical logins. While you may only need the last login date for functional purposes, that overwritten data is a huge loss from an analytics perspective. Not keeping a complete log of these actions eliminates the ability to see how many users stayed away for long periods of time and then reactivated. It also makes it impossible to build things like user engagement cohort analyses based on logins.
As a general rule, if you're updating a record due to some kind of user action, we don't recommend overwriting information about a previous or separate user action.
Include updated_at columns for data updated over time
If a table's rows will have changing values over time - for example, order_status changes from 'processing' to 'complete' - include an updated_at column to record when the latest change occurs. Ensure that an updated_at value is available when first inserting the new data row, at which time the updated_at date corresponds to the created_at date.
In addition to optimizing for analysis, updated_at columns also allow you to use Incremental Replication methods, which can help shorten the length of your update cycles.
Store user acquisition source
One of the most common mistakes we see is the user acquisition source, or UAS, not being stored in the operational database. In the majority of situations where this is an issue, UAS is only being tracked through Google Analytics or some other web analytics tool. While these tools can be extremely valuable, there are some drawbacks to storing UAS in them exclusively.
Why do we say this? Because, more often than not, you can't extract user-level data from these tools. When it is possible, it's usually a difficult process. It should be easy to get this information and marry it with data from other sources, such as the behavioral and transactional information also stored in your database.
Storing UAS in your own database is often the largest improvement an online business can make to its analytical capabilities. This allows for the analysis of sales, user engagement, payback periods, customer lifetime value, churn, and other critical metrics by UAS. This data is crucial when deciding where to invest marketing resources.
Too many companies focus solely on finding channels that provide new users at the lowest cost, but if you aren't tracking the quality of users acquired from each channel, you run the risk of attracting users who don't generate business value.
Data table setup
Set a primary key
A primary key is an unchanging column (or set of columns) that produces unique values within a table. Primary keys are incredibly important, as they ensure that your tables are properly replicated in Magento BI.
When building primary keys, use an integer data type for the column that auto-increases. We also recommend you avoid using multiple column primary keys where possible.
If your table is an SQL view, add a column that can act as a primary key. Magento BI will be able to automatically identify this column as a primary key.
Assign a data type to your data column
If a data column doesn't have an assigned data type, Magento BI will guess what data type to use. If the system guesses incorrectly, you may not be able to perform the relevant analyses until our support team adjusts the column to the proper data type. For example, if a date column is guessed as a numeric data type, you won't be able to trend over time using that date dimension.
Add prefixes to your data tables if you have multiple databases
If you have more than one database connected to Magento BI, we recommend you add prefixes to your tables to avoid confusion. Prefixes will help you remember where metrics or data dimensions are sourced from.