Power BI Dataflows example - part 1

With Power BI Dataflows out in public preview and everyone exclaiming how absolutely amazing they are, I decided to put together a bit of an example of how you might use it to “hide” some of the data prep work that goes into cleaning a dataset. To do this I will build on a blog post from Erik Svensen that he wrote a few years ago where he uses data from Statistics Sweden to work through a JSON API. Exactly what data I use here isn’t quite as important as how I use it, but the idea is to provide a prepared dataset ready-to-use for an analyst.

The steps are as follows:

  1. Find a datasource containing the data to be analyzed
  2. Using Power BI Desktop, ingest the datasource and apply the necessary transformations to make the data usable
  3. Create a Power BI Dataflow to provide this prepared dataset as a datasource for anyone to use

Let’s say we want to provide a dataset that shows population numbers for Sweden. This can be found at the Statistics Sweden site here. In order to get this data we could either save it as a CSV or XLS file and toss it into Power BI Desktop, but as that will make things unnecessarily cumbersome whenever new data is made available, we should instead aim to use the  API provided. This is where Erik’s excellent blog post comes into play and explain how to get this data into Power BI. We grab the API URL and JSON query displayed at the bottom of the Statistics Sweden site here:

and then modify the JSON query using Notepad++ like this:

Then we put the whole shebang into Power BI Desktop using a blank query and the following Power Query code (also from Erik’s blog post linked above):

let

PostContents= “{
““query””: [
{
““code””: ““Alder””,
““selection””: {
““filter””: ““agg:Ålder10år””,
““values””: [
“"-4"”,
““5-14"”,
““15-24"”,
““25-34"”,
““35-44"”,
““45-54"”,
““55-64"”,
““65-74"”,
““75-84"”,
““85-94"”,
““95+””
]
}
},
{
““code””: ““Kon””,
““selection””: {
““filter””: ““item””,
““values””: [
““1"”,
““2"”
]
}
}
],
““response””: {
““format””: ““csv””
}
}
“,

Source = Web.Contents(“http://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101A/BefolkningR1860",[Content=Text.ToBinary(PostContents)]),
#“Imported CSV” = Csv.Document(Source,null,”,",null,1252)
in
#“Imported CSV”

This will give us the raw resultset from Statistics Sweden, but in order to be able to do any useful analysis we need to clean and format the data. We start with promoting the headers, unpivot the years in order to get them to one column, split some text away, replace a few values and hide and rename a few columns. Basic stuff that most often needs to be done to pretty much all datasets. The end Power Query code look like this:

let

PostContents= “{
““query””: [
{
““code””: ““Alder””,
““selection””: {
““filter””: ““agg:Ålder10år””,
““values””: [
“"-4"”,
““5-14"”,
““15-24"”,
““25-34"”,
““35-44"”,
““45-54"”,
““55-64"”,
““65-74"”,
““75-84"”,
““85-94"”,
““95+””
]
}
},
{
““code””: ““Kon””,
““selection””: {
““filter””: ““item””,
““values””: [
““1"”,
““2"”
]
}
}
],
““response””: {
““format””: ““csv””
}
}
“,

Source = Web.Contents(“http://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101A/BefolkningR1860",[Content=Text.ToBinary(PostContents)]),
#“Imported CSV” = Csv.Document(Source, [Delimiter = “,”, Encoding = 1252]),
#“Promoted Headers” = Table.PromoteHeaders(#“Imported CSV”, [PromoteAllScalars = true]),
#“Changed Type” = Table.TransformColumnTypes(#“Promoted Headers”, {{“ålder”, type text}, {“kön”, type text}, {“Folkmängd 1860”, Int64.Type}, {“Folkmängd 1861”, Int64.Type}, {“Folkmängd 1862”, Int64.Type}, {“Folkmängd 1863”, Int64.Type}, {“Folkmängd 1864”, Int64.Type}, {“Folkmängd 1865”, Int64.Type}, {“Folkmängd 1866”, Int64.Type}, {“Folkmängd 1867”, Int64.Type}, {“Folkmängd 1868”, Int64.Type}, {“Folkmängd 1869”, Int64.Type}, {“Folkmängd 1870”, Int64.Type}, {“Folkmängd 1871”, Int64.Type}, {“Folkmängd 1872”, Int64.Type}, {“Folkmängd 1873”, Int64.Type}, {“Folkmängd 1874”, Int64.Type}, {“Folkmängd 1875”, Int64.Type}, {“Folkmängd 1876”, Int64.Type}, {“Folkmängd 1877”, Int64.Type}, {“Folkmängd 1878”, Int64.Type}, {“Folkmängd 1879”, Int64.Type}, {“Folkmängd 1880”, Int64.Type}, {“Folkmängd 1881”, Int64.Type}, {“Folkmängd 1882”, Int64.Type}, {“Folkmängd 1883”, Int64.Type}, {“Folkmängd 1884”, Int64.Type}, {“Folkmängd 1885”, Int64.Type}, {“Folkmängd 1886”, Int64.Type}, {“Folkmängd 1887”, Int64.Type}, {“Folkmängd 1888”, Int64.Type}, {“Folkmängd 1889”, Int64.Type}, {“Folkmängd 1890”, Int64.Type}, {“Folkmängd 1891”, Int64.Type}, {“Folkmängd 1892”, Int64.Type}, {“Folkmängd 1893”, Int64.Type}, {“Folkmängd 1894”, Int64.Type}, {“Folkmängd 1895”, Int64.Type}, {“Folkmängd 1896”, Int64.Type}, {“Folkmängd 1897”, Int64.Type}, {“Folkmängd 1898”, Int64.Type}, {“Folkmängd 1899”, Int64.Type}, {“Folkmängd 1900”, Int64.Type}, {“Folkmängd 1901”, Int64.Type}, {“Folkmängd 1902”, Int64.Type}, {“Folkmängd 1903”, Int64.Type}, {“Folkmängd 1904”, Int64.Type}, {“Folkmängd 1905”, Int64.Type}, {“Folkmängd 1906”, Int64.Type}, {“Folkmängd 1907”, Int64.Type}, {“Folkmängd 1908”, Int64.Type}, {“Folkmängd 1909”, Int64.Type}, {“Folkmängd 1910”, Int64.Type}, {“Folkmängd 1911”, Int64.Type}, {“Folkmängd 1912”, Int64.Type}, {“Folkmängd 1913”, Int64.Type}, {“Folkmängd 1914”, Int64.Type}, {“Folkmängd 1915”, Int64.Type}, {“Folkmängd 1916”, Int64.Type}, {“Folkmängd 1917”, Int64.Type}, {“Folkmängd 1918”, Int64.Type}, {“Folkmängd 1919”, Int64.Type}, {“Folkmängd 1920”, Int64.Type}, {“Folkmängd 1921”, Int64.Type}, {“Folkmängd 1922”, Int64.Type}, {“Folkmängd 1923”, Int64.Type}, {“Folkmängd 1924”, Int64.Type}, {“Folkmängd 1925”, Int64.Type}, {“Folkmängd 1926”, Int64.Type}, {“Folkmängd 1927”, Int64.Type}, {“Folkmängd 1928”, Int64.Type}, {“Folkmängd 1929”, Int64.Type}, {“Folkmängd 1930”, Int64.Type}, {“Folkmängd 1931”, Int64.Type}, {“Folkmängd 1932”, Int64.Type}, {“Folkmängd 1933”, Int64.Type}, {“Folkmängd 1934”, Int64.Type}, {“Folkmängd 1935”, Int64.Type}, {“Folkmängd 1936”, Int64.Type}, {“Folkmängd 1937”, Int64.Type}, {“Folkmängd 1938”, Int64.Type}, {“Folkmängd 1939”, Int64.Type}, {“Folkmängd 1940”, Int64.Type}, {“Folkmängd 1941”, Int64.Type}, {“Folkmängd 1942”, Int64.Type}, {“Folkmängd 1943”, Int64.Type}, {“Folkmängd 1944”, Int64.Type}, {“Folkmängd 1945”, Int64.Type}, {“Folkmängd 1946”, Int64.Type}, {“Folkmängd 1947”, Int64.Type}, {“Folkmängd 1948”, Int64.Type}, {“Folkmängd 1949”, Int64.Type}, {“Folkmängd 1950”, Int64.Type}, {“Folkmängd 1951”, Int64.Type}, {“Folkmängd 1952”, Int64.Type}, {“Folkmängd 1953”, Int64.Type}, {“Folkmängd 1954”, Int64.Type}, {“Folkmängd 1955”, Int64.Type}, {“Folkmängd 1956”, Int64.Type}, {“Folkmängd 1957”, Int64.Type}, {“Folkmängd 1958”, Int64.Type}, {“Folkmängd 1959”, Int64.Type}, {“Folkmängd 1960”, Int64.Type}, {“Folkmängd 1961”, Int64.Type}, {“Folkmängd 1962”, Int64.Type}, {“Folkmängd 1963”, Int64.Type}, {“Folkmängd 1964”, Int64.Type}, {“Folkmängd 1965”, Int64.Type}, {“Folkmängd 1966”, Int64.Type}, {“Folkmängd 1967”, Int64.Type}, {“Folkmängd 1968”, Int64.Type}, {“Folkmängd 1969”, Int64.Type}, {“Folkmängd 1970”, Int64.Type}, {“Folkmängd 1971”, Int64.Type}, {“Folkmängd 1972”, Int64.Type}, {“Folkmängd 1973”, Int64.Type}, {“Folkmängd 1974”, Int64.Type}, {“Folkmängd 1975”, Int64.Type}, {“Folkmängd 1976”, Int64.Type}, {“Folkmängd 1977”, Int64.Type}, {“Folkmängd 1978”, Int64.Type}, {“Folkmängd 1979”, Int64.Type}, {“Folkmängd 1980”, Int64.Type}, {“Folkmängd 1981”, Int64.Type}, {“Folkmängd 1982”, Int64.Type}, {“Folkmängd 1983”, Int64.Type}, {“Folkmängd 1984”, Int64.Type}, {“Folkmängd 1985”, Int64.Type}, {“Folkmängd 1986”, Int64.Type}, {“Folkmängd 1987”, Int64.Type}, {“Folkmängd 1988”, Int64.Type}, {“Folkmängd 1989”, Int64.Type}, {“Folkmängd 1990”, Int64.Type}, {“Folkmängd 1991”, Int64.Type}, {“Folkmängd 1992”, Int64.Type}, {“Folkmängd 1993”, Int64.Type}, {“Folkmängd 1994”, Int64.Type}, {“Folkmängd 1995”, Int64.Type}, {“Folkmängd 1996”, Int64.Type}, {“Folkmängd 1997”, Int64.Type}, {“Folkmängd 1998”, Int64.Type}, {“Folkmängd 1999”, Int64.Type}, {“Folkmängd 2000”, Int64.Type}, {“Folkmängd 2001”, Int64.Type}, {“Folkmängd 2002”, Int64.Type}, {“Folkmängd 2003”, Int64.Type}, {“Folkmängd 2004”, Int64.Type}, {“Folkmängd 2005”, Int64.Type}, {“Folkmängd 2006”, Int64.Type}, {“Folkmängd 2007”, Int64.Type}, {“Folkmängd 2008”, Int64.Type}, {“Folkmängd 2009”, Int64.Type}, {“Folkmängd 2010”, Int64.Type}, {“Folkmängd 2011”, Int64.Type}, {“Folkmängd 2012”, Int64.Type}, {“Folkmängd 2013”, Int64.Type}, {“Folkmängd 2014”, Int64.Type}, {“Folkmängd 2015”, Int64.Type}, {“Folkmängd 2016”, Int64.Type}, {“Folkmängd 2017”, Int64.Type}}),
#“Unpivoted Columns” = Table.UnpivotOtherColumns(#“Changed Type”, {“ålder”, “kön”}, “Attribute”, “Value”),
#“Split Column by Delimiter” = Table.SplitColumn(#“Unpivoted Columns”, “Attribute”, Splitter.SplitTextByEachDelimiter({” “}, QuoteStyle.Csv), {“Attribute.1”, “Attribute.2”}),
#“Changed Type1” = Table.TransformColumnTypes(#“Split Column by Delimiter”, {{“Attribute.1”, type text}, {“Attribute.2”, Int64.Type}}),
#“Renamed Columns” = Table.RenameColumns(#“Changed Type1”, {{“Attribute.2”, “Year”}, {“kön”, “Sex”}, {“ålder”, “Age”}}),
#“Split Column by Delimiter1” = Table.SplitColumn(#“Renamed Columns”, “Age”, Splitter.SplitTextByEachDelimiter({” “}, QuoteStyle.Csv), {“Age.1”, “Age.2”}),
#“Changed Type2” = Table.TransformColumnTypes(#“Split Column by Delimiter1”, {{“Age.1”, type text}, {“Age.2”, type text}}),
#“Renamed Columns1” = Table.RenameColumns(#“Changed Type2”, {{“Age.1”, “Age”}}),
#“Removed Columns” = Table.RemoveColumns(#“Renamed Columns1”, {“Age.2”, “Attribute.1”}),
#“Replaced Value” = Table.ReplaceValue(#“Removed Columns”, “män”, “Male”, Replacer.ReplaceText, {“Sex”}),
#“Replaced Value1” = Table.ReplaceValue(#“Replaced Value”, “kvinnor”, “Female”, Replacer.ReplaceText, {“Sex”}),
#“Inserted conditional column” = Table.AddColumn(#“Replaced Value1”, “AgeSorter”, each if [Age] = “0-4” then “1” else if [Age] = “5-14” then “2” else if [Age] = “15-24” then “3” else if [Age] = “25-34” then “4” else if [Age] = “35-44” then “5” else if [Age] = “45-54” then “6” else if [Age] = “55-64” then “7” else if [Age] = “65-74” then “8” else if [Age] = “75-84” then “9” else if [Age] = “85-94” then “10” else if [Age] = “95+” then "” else null),
#“Changed column type” = Table.TransformColumnTypes(#“Inserted conditional column”, {{“AgeSorter”, Int64.Type}})
in
#“Changed column type”

Looking at the data, it’s much more useful for further analysis.

In order to reuse the work we just went through, we create a Power BI Dataflow. It is done from the Power BI service here, and note that it can only be done from an app workspace - not your own “my workspace”.

Then it’s time to create the dataflow by choosing “add new entities”.

 

There are quite a few data sources available already, even though some that are available in Power BI Desktop have yet to make it the the service (Data Lake Storage, for example). As we already have our Power Query code done, we’ll go ahead and choose a blank query that lets us write the Power Query code ourselves.

Next it’s just a question of pasting the Power Query code from before into the window like so, and click next:

If all goes according to plan (remember that this IS a preview feature and some parts of Power Query have yet to be implemented) the result should be identical to what we saw earlier in Power BI Desktop:

 

Let’s change the name of the dataset from “Query” to “Population” and click done.

There, that was the first entity of the dataflow done! We could add more entities, but for now we’ll just go ahead and save the dataflow by clicking save.

We’ll give the dataflow a useful name, a description and click save again.

Upon saving we have the opportunity to set the refresh schedule of the dataflow if we so desire. Unless the dataflow was refreshed at least once, there will be no data available when we go and try to use the dataset. The dataflow refresh settings can also be accessed here:

 

With this freshly minted dataflow in hand, let’s go back to Power BI Desktop and see how we can use it. Start by clicking “Get Data - Power BI - Power BI dataflows (Beta)”

 After logging in we are presented with the available dataflows:

Unless the dataflow is refreshed at least once, we will receive an error upon clicking the entity:

Provided the dataflow is refreshed, however, it looks exactly like any other datasource and we can go ahead and load or edit the query as usual:

So far we’ve removed some of the tedious work of prepping the data for further analysis, as anyone wanting to use this dataset no longer have to do all the preparation steps to get the data into a useful format.