Now, most open source software starts with this realization: “There has got to be a better way!” This is no different… Pull out hair (not an option in my case).Ignore the problem, it’s clearly too hard to do.
It’s probably something along those lines. A senior ETL consultant (with evil tendencies) will try to convince the customer that the requirement is “ off scope” or shift it to the next “iteration”.
The ones having heard of the PDI “Split Field to rows” step might know (from our forum) how to solve the problem by reading the whole line and by splitting it into un-pivoted form with a wad of JavaScript.
In our sample we have one column for each date since the beginning of this year. Because of that they contain a varying number of columns with a dimension value in the column header. Spreadsheets like these are usually automatically generated by some kind of pivoting program. Let’s assume that this spreadsheet describes the number of products sold on a given date. However, we can clearly do a lot better by extending our initiative to a few more steps: “Microsoft Excel Input” (which can also read ODS by the way), “Row Normalizer” and “Row De-normalizer”.īelow I’ll describe an actual (obfuscated) example that you will probably recognize as it is equally hideous as simple in it’s horrible complexity. Already with support for “CSV Input” and “Select Values” we could do a lot of dynamic things. Since then we received a lot of positive feedback on this functionality which encouraged me to extend it to a few more steps. Last year, right after the summer in version 4.1 of Pentaho Data Integration, we introduced the notion of dynamically inserted ETL metadata (Youtube video here).