Hi, we have a huge range of data quality from our suppliers and we often have to run loads of data cleansing processes to make them suitable for use in our store.
Typically we will be correcting spelling mistakes, trimming leading or trailing spaces, getting rid of weird characters, re-mapping supplier values to the best match in our attribute values or exploding a product name so we can extract attribute values. Most of these processes are fairly easy to define, some are way more complex, and many of the processes are similar across suppliers and data sets. Because of the size of the data sets and the quantity of the operations needed for each, it’s way beyond what is sensible for using Excel and other basic data tools and it’s unmanageable.
Are any of you aware of any software toolkits that we could use to create and manage these big data sanitising jobs in a better way?