In the first part of the series we basically dealt with the possibility of programming loops similar to those of other programming languages in M for Power Query. We looked at the concept of recursive functions, how recursions work in the M-language, why they work, and the issues of using them.
You don't know the first part of the series yet? Then get to know it here:
In this part I want to show you an alternative way to create loops in Power Query to overcome the issues of traditional recursive functions. The function List.Generate().
The basic functionality of List.Generate()
The question of what this function actually does is answered quickly:
"List.Generate() generates a list of values based on up to four parameters."
So List.Generate() expects up to four parameters to be handed over to generate a list. All these parameters are in turn functions. Let's take a closer look at the function definition and the individual parameters:
The "initial" parameter is a function that specifies the start value of the loop. In addition to a scalar, i.e. a number, this start value can also be a "Structured Value", i.e. a "Table", "Record" or a "List".
The "condition" parameter is a function that makes List.Generate() a kind of "Do-Loop" loop. Here you have to specify the execution condition of the loop. As long as the condition is met, the loop will continue.
The "next" parameter is a function that defines which operation will be performed on each new loop iteration. The start value from "initial" or the intermediate result of the last iteration is changed in each loop pass. This happens as long as "condition" is fulfilled.
The "selector" parameter is the only optional parameter of List.Generate(). Specifying "selector" you can set the structure in which the final result should be returned. For example, if you process a "record" with several columns using List.Generate(), you can specify which of these columns should be included in the output of the final result. If you do not specify "selector", the complete structure will be returned.
Let's take a simple example to illustrate how this works:
The following example uses List.Generate() to create a list of numbers. The first parameter passed in "initial" is a function that specifies the start value of the loop. In our example, the start value is 10. Because List.Generate() only accepts functions as parameters, this start value is preceded by the syntax of a function definition ("()=>"):
The second parameter "condition" contains a function that determines the condition under which the loop is continued. In the example, the loop is to be continued as long as the result of the previous loop pass is greater than or equal to -5. In the third section, the parameter "next" is passed. This parameter contains the operation to be performed in each loop pass. In our case, the resulting value of each iteration should be reduced by 2. The function therefore generates a list with the numbers from 10 to -4 performing eight loop iterations. We do not need the optional fourth parameter "selector" in this example.
Let us now change the execution parameter "condition":
In the example below, we set the condition to ">=10" instead of ">=-5". A look at the resultset shows that only the start value "10" is returned, because the condition is just fulfilled for this value.
Therefore there is no further loop iteration performed. This means that List.Generate() is a sort of header-controlled loop. In contrast to foot-operated loops, this loop is characterized by the fact that its execution condition is checked before the first loop pass. While foot-controlled loops perform at least one pass, head-controlled loops terminate if the execution condition is not fulfilled right at the beginning. List.Generate() is thus comparable to a "Do-While-Loop" loop.
Closures, "each" and the environment concept of M
The first parameter of List.Generate() is introduced by the typical function definition of the M-language followed by the start value of the loop (in the last example "()=> 10"). You may have noticed that this function has no parameters. The reason for this is that the function ()=>10 is used as parameter for another function, in this case List.Generate(), and is executed within that function.
If it would expect a parameter input like the one in the picture above ("StartValue"), it would cause an error because there is no way of passing the required parameter to this function as long as it is executed within List.Generate(). Using an optional parameter, e.g. (optional StartValue)=>, would not cause an error, but would not add any value, because still the parameter could never be used.
Does this mean that you always have to store the start value of the loop as a fixed value in the first parameter of List.Generate()? No, because this is where M's environment concept kicks in!
(We already discussed the basics of this concept in the first part of this series. For further information regarding environments in M I want to refer to a series of Lars Schreiber and Imke Feldmann again: The Environment concept in M for Power Query and Power BI)
Expressions in the body of a function in M can access all variables that are in the scope of that function, so to speak in its environment, when the function is initialized. So it's possible to define the start value in a separate variable or even in a distinct query, as long as this variable is located in the environment of the variable "CreateList", as in the example above, or as long as it is part of the global environment and "CreateList" can access its values.
If you use List.Generate() within another function, as in the following example, you can additionally rely on the construct of the so-called "closures":
The other parameter functions of List.Generate() do not start with "()=>", but of course they are all functions. They are introduced by "each".
The expression "each" in M is a simplified declaration of a function with a single parameter, the parameter "_". The use of "each" is therefore synonymous with the function declaration "( _ )=>", where "_" is a purely formal parameter, which could also be named quite differently. Accordingly, the three definitions in the following listing are identical and are providing the same result:
The parameter of "each", i.e. "_", always takes the value of the current line of the generated list within List.Generate(), this is the intermediate result of the last loop iteration.
This is particularly vivid if you do not use the "_" parameter in the third section of the function body of List.Generate().
In this case the intermediate result of the last loop iteration is not further processed in the third parameter and therefore the start value 10 is not reduced by 2 in each iteration, so the "condition" ">=-5" is never met.
What else can you use List.Generate() for?
List.Generate() does not only allow you to change numbers as start values. In addition to scalar values, it can also take "Structured Values" such as "Tables", "Lists" or "Records" as a starting point and then process them within the loop. This possibility extends the field of application of List.Generate() enormously. All over the web you will find a lot of examples, in which List.Generate() is used to perform different tasks. Two are mentioned here as examples:
Chris Webb uses List.Generate() in this post to iterate over various tables. Table A contains texts in which words are to be replaced. Table B contains the corresponding assignments for the words to be inserted. In the loops, the replacement is performed by determining which words in table A are to be replaced by which words from table B. Chris also shows the use of the optional fourth parameter in his article to return only one field of the created record at the end of the process.
In another post on "Excelando" loops with List.Generate are used to optimize the call of a Web API. When retrieving the API, individual pages must be addressed, whereby it is not known in advance how many pages there are to be read. The loop-construction used checks whether there is still data on a page and aborts as soon as the first empty page is called. Instead of calling a fixed, sufficiently large number of pages, the API call can be reduced to the number of calls necessary to get all the data. This avoids possible error messages when calling empty pages and makes the use of the API much more efficient.
These two examples are only intended to illustrate the breadth of the area in which List.Generate() can be used. When using List.Generate(), there are hardly any limits to your creativity!
List.Generate() and performance considerations
Even at the risk of repeating myself, I would like to renew my recommendation from the first part of the series on "Loops in M" at this point: Loops should only be considered as a solution in M as one of the last options. They should only be an option if the problem cannot be solved with the standard functions of the M library. Processing entire data sets is always preferable to handling rows individually.
In the first part of this series, we criticized recursive functions for not supporting tail call elimination, which led to an error at about 11,000 loop iterations and thus to the termination of the loop. List.Generate() does not have this problem. The example above shows that even billions of loop passes are possible without any issues, while providing considerably improved performance compared to recursive functions.
An additional performance tip: If you access "Tables" or "Lists" within List.Generate(), you should move them to memory using Table.Buffer("Your Table") or List.Buffer("Your List"). In this way, you prevent the "Table" or "List" from being re-evaluated every time you run a loop, thus increasing performance. However, you buy this advantage at the cost of "query folding" no longer happening after that point. In the end, the use of Table.Buffer() or List.Buffer() is a trade-off. Since "query folding" is only relevant for certain data sources, such as databases (SQL Server, Oracle, etc.), and does not play a role with other sources, such as Excel or flat files, there is not always a disadvantage.
What we have learned and how we proceed?
With List.Generate() we got to know a way to create "Do-While-Loops" in M for Power Query. It is a good alternative to the classic recursive function, because it has clear performance advantages and also supports tail call elimination.
List.Generate() has four parameters, all of which are functions themselves. The parameter "initial" is the start value of the loop. The start value can be scalar values or structured values like "Tables", "Records" or "Lists". These do not have to be stored as fixed values in List.Generate(), but can also be obtained from variables or parameters outside of List.Generate().
The parameter "condition" contains the execution condition of the loop, while the parameter "next" defines what should happen in each iteration with the intermediate result of the previous loop pass. The output of the final result can be influenced via the optional parameter "selector".
The third and last part of the series deals with the creation of "For-Next" loops in Power Query. For this we will use the function List.Accumulate(). If you are also interested in the other parts of this series, you can learn more here:
Do you know the other parts of the series?
I hope you liked the article and will also read the other parts of the series. If that's the case, please also don't forget to share this article with other people!
See you next time!
Many greetings from Hamburg
Alejandro Bedoya (Sonntag, 13 September 2020 21:38)
Mil gracias por la información es muy practica para lo que quiero realizar.
Maarten (Freitag, 26 Februar 2021 13:36)
Really awesome, thourough and clear description of the mechanics of looping in M script. Have not encountered a better source yet. Thank you so much.