Podcast posts

In a very short while I’ll be removing the podcast posts from this blog as they’ve been concentrated on https://www.kneedeepintech.com for some time. Christmas is coming, and with it a feeling of wanting to clean house, even on this blog 😛

The road to London

It’s time for me to go on the road again, and this time I’m headed to London and the UK Cloud Infrastructure User Group where I will be delivering a brand new session on self-service BI from an infrastructure perspective.

This session is not only brand new, it is also a bit of an presentation style experiment. I will be delivering the session in no less than three different voices – as in, three differing points of opinion. The subtitle for this session is “arguing with myself” for a reason…

My goal with the 60-minute session is to provide a walk-through what self-service BI is and what makes it so potentially awesome, how it can (and will!) affect the infrastructure people and what can be done to turn a disaster into an opportunity.
It is primarily focused on Power BI as this is my weapon of choice these days, but the information is immediately applicable to all situations. It’s more about the mindset than the specific implementation – something I find is most often the case.

The usergroup meeting is held at WeWork Waterhouse Square in Holborn (the closes tube station should be Chancery Lane) at 6.30 on Monday. Register here and come join!

Power BI Dataflows example – part 2

There are several more use cases for a dataflow, but one that is very useful is the ability to share a dataset between apps. Previously we had to duplicate the dataset to each and every app that needed to use it, increasing the risk that one dataset was ignored, not refreshed properly or otherwise out of sync with reality. By using dataflows we can have several apps rely on the same dataflow (via a dataset), and thus it is quite possible to have a “master dataset”.

Here is one way to do it:

1. Create an app workspace to keep all your dataflows that you are planning on sharing to different app workspaces. This way you have things neatly ordered and you don’t have to go searching for the dataflow in several app workspaces.

2. Create a dataflow in said app workspace.

3. Create a new app workspace for your app.

4. Create a file using Power BI Desktop that connects to the dataflow from step 2 and create all the visualizations you need.

5. Publish this report to the app workspace from step 3.

6. Publish the app to your users.

 

7. Access the app from “Get Apps > My organization” as a user who has been granted access to the app in step 6.

The beauty of an app is that the dataset underneath has its own refresh cycle apart from the app. That means that the app itself doesn’t have to be refreshed or updated in order for the dataset to update. However – as there is no automatic refresh of a dataset on top of a dataflow, refresh schedules have to be set for both the dataflow AND the dataset. Failure to refresh either (or both!) can lead to some confusion. This all works in Pro as well as Premium.

Power BI Dataflows example – part 1

With Power BI Dataflows out in public preview and everyone exclaiming how absolutely amazing they are, I decided to put together a bit of an example of how you might use it to “hide” some of the data prep work that goes into cleaning a dataset. To do this I will build on a blog post from Erik Svensen that he wrote a few years ago where he uses data from Statistics Sweden to work through a JSON API. Exactly what data I use here isn’t quite as important as how I use it, but the idea is to provide a prepared dataset ready-to-use for an analyst.

The steps are as follows:

  1. Find a datasource containing the data to be analyzed
  2. Using Power BI Desktop, ingest the datasource and apply the necessary transformations to make the data usable
  3. Create a Power BI Dataflow to provide this prepared dataset as a datasource for anyone to use

Let’s say we want to provide a dataset that shows population numbers for Sweden. This can be found at the Statistics Sweden site here. In order to get this data we could either save it as a CSV or XLS file and toss it into Power BI Desktop, but as that will make things unnecessarily cumbersome whenever new data is made available, we should instead aim to use the  API provided. This is where Erik’s excellent blog post comes into play and explain how to get this data into Power BI. We grab the API URL and JSON query displayed at the bottom of the Statistics Sweden site here:

and then modify the JSON query using Notepad++ like this:

Then we put the whole shebang into Power BI Desktop using a blank query and the following Power Query code (also from Erik’s blog post linked above):

let

PostContents= "{
  ""query"": [
    {
      ""code"": ""Alder"",
      ""selection"": {
        ""filter"": ""agg:Ålder10Ă„r"",
        ""values"": [
          ""-4"",
          ""5-14"",
          ""15-24"",
          ""25-34"",
          ""35-44"",
          ""45-54"",
          ""55-64"",
          ""65-74"",
          ""75-84"",
          ""85-94"",
          ""95+""
        ]
      }
    },
    {
      ""code"": ""Kon"",
      ""selection"": {
        ""filter"": ""item"",
        ""values"": [
          ""1"",
          ""2""
        ]
      }
    }
  ],
  ""response"": {
    ""format"": ""csv""
  }
}
",

Source = Web.Contents("http://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101A/BefolkningR1860",[Content=Text.ToBinary(PostContents)]),
#"Imported CSV" = Csv.Document(Source,null,",",null,1252)
in 
#"Imported CSV"

This will give us the raw resultset from Statistics Sweden, but in order to be able to do any useful analysis we need to clean and format the data. We start with promoting the headers, unpivot the years in order to get them to one column, split some text away, replace a few values and hide and rename a few columns. Basic stuff that most often needs to be done to pretty much all datasets. The end Power Query code look like this:

let

PostContents= "{
  ""query"": [
    {
      ""code"": ""Alder"",
      ""selection"": {
        ""filter"": ""agg:Ålder10Ă„r"",
        ""values"": [
          ""-4"",
          ""5-14"",
          ""15-24"",
          ""25-34"",
          ""35-44"",
          ""45-54"",
          ""55-64"",
          ""65-74"",
          ""75-84"",
          ""85-94"",
          ""95+""
        ]
      }
    },
    {
      ""code"": ""Kon"",
      ""selection"": {
        ""filter"": ""item"",
        ""values"": [
          ""1"",
          ""2""
        ]
      }
    }
  ],
  ""response"": {
    ""format"": ""csv""
  }
}
",

  Source = Web.Contents("http://api.scb.se/OV0104/v1/doris/sv/ssd/START/BE/BE0101/BE0101A/BefolkningR1860",[Content=Text.ToBinary(PostContents)]),
  #"Imported CSV" = Csv.Document(Source, [Delimiter = ",", Encoding = 1252]),
  #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars = true]),
  #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers", {{"Älder", type text}, {"kön", type text}, {"FolkmÀngd 1860", Int64.Type}, {"FolkmÀngd 1861", Int64.Type}, {"FolkmÀngd 1862", Int64.Type}, {"FolkmÀngd 1863", Int64.Type}, {"FolkmÀngd 1864", Int64.Type}, {"FolkmÀngd 1865", Int64.Type}, {"FolkmÀngd 1866", Int64.Type}, {"FolkmÀngd 1867", Int64.Type}, {"FolkmÀngd 1868", Int64.Type}, {"FolkmÀngd 1869", Int64.Type}, {"FolkmÀngd 1870", Int64.Type}, {"FolkmÀngd 1871", Int64.Type}, {"FolkmÀngd 1872", Int64.Type}, {"FolkmÀngd 1873", Int64.Type}, {"FolkmÀngd 1874", Int64.Type}, {"FolkmÀngd 1875", Int64.Type}, {"FolkmÀngd 1876", Int64.Type}, {"FolkmÀngd 1877", Int64.Type}, {"FolkmÀngd 1878", Int64.Type}, {"FolkmÀngd 1879", Int64.Type}, {"FolkmÀngd 1880", Int64.Type}, {"FolkmÀngd 1881", Int64.Type}, {"FolkmÀngd 1882", Int64.Type}, {"FolkmÀngd 1883", Int64.Type}, {"FolkmÀngd 1884", Int64.Type}, {"FolkmÀngd 1885", Int64.Type}, {"FolkmÀngd 1886", Int64.Type}, {"FolkmÀngd 1887", Int64.Type}, {"FolkmÀngd 1888", Int64.Type}, {"FolkmÀngd 1889", Int64.Type}, {"FolkmÀngd 1890", Int64.Type}, {"FolkmÀngd 1891", Int64.Type}, {"FolkmÀngd 1892", Int64.Type}, {"FolkmÀngd 1893", Int64.Type}, {"FolkmÀngd 1894", Int64.Type}, {"FolkmÀngd 1895", Int64.Type}, {"FolkmÀngd 1896", Int64.Type}, {"FolkmÀngd 1897", Int64.Type}, {"FolkmÀngd 1898", Int64.Type}, {"FolkmÀngd 1899", Int64.Type}, {"FolkmÀngd 1900", Int64.Type}, {"FolkmÀngd 1901", Int64.Type}, {"FolkmÀngd 1902", Int64.Type}, {"FolkmÀngd 1903", Int64.Type}, {"FolkmÀngd 1904", Int64.Type}, {"FolkmÀngd 1905", Int64.Type}, {"FolkmÀngd 1906", Int64.Type}, {"FolkmÀngd 1907", Int64.Type}, {"FolkmÀngd 1908", Int64.Type}, {"FolkmÀngd 1909", Int64.Type}, {"FolkmÀngd 1910", Int64.Type}, {"FolkmÀngd 1911", Int64.Type}, {"FolkmÀngd 1912", Int64.Type}, {"FolkmÀngd 1913", Int64.Type}, {"FolkmÀngd 1914", Int64.Type}, {"FolkmÀngd 1915", Int64.Type}, {"FolkmÀngd 1916", Int64.Type}, {"FolkmÀngd 1917", Int64.Type}, {"FolkmÀngd 1918", Int64.Type}, {"FolkmÀngd 1919", Int64.Type}, {"FolkmÀngd 1920", Int64.Type}, {"FolkmÀngd 1921", Int64.Type}, {"FolkmÀngd 1922", Int64.Type}, {"FolkmÀngd 1923", Int64.Type}, {"FolkmÀngd 1924", Int64.Type}, {"FolkmÀngd 1925", Int64.Type}, {"FolkmÀngd 1926", Int64.Type}, {"FolkmÀngd 1927", Int64.Type}, {"FolkmÀngd 1928", Int64.Type}, {"FolkmÀngd 1929", Int64.Type}, {"FolkmÀngd 1930", Int64.Type}, {"FolkmÀngd 1931", Int64.Type}, {"FolkmÀngd 1932", Int64.Type}, {"FolkmÀngd 1933", Int64.Type}, {"FolkmÀngd 1934", Int64.Type}, {"FolkmÀngd 1935", Int64.Type}, {"FolkmÀngd 1936", Int64.Type}, {"FolkmÀngd 1937", Int64.Type}, {"FolkmÀngd 1938", Int64.Type}, {"FolkmÀngd 1939", Int64.Type}, {"FolkmÀngd 1940", Int64.Type}, {"FolkmÀngd 1941", Int64.Type}, {"FolkmÀngd 1942", Int64.Type}, {"FolkmÀngd 1943", Int64.Type}, {"FolkmÀngd 1944", Int64.Type}, {"FolkmÀngd 1945", Int64.Type}, {"FolkmÀngd 1946", Int64.Type}, {"FolkmÀngd 1947", Int64.Type}, {"FolkmÀngd 1948", Int64.Type}, {"FolkmÀngd 1949", Int64.Type}, {"FolkmÀngd 1950", Int64.Type}, {"FolkmÀngd 1951", Int64.Type}, {"FolkmÀngd 1952", Int64.Type}, {"FolkmÀngd 1953", Int64.Type}, {"FolkmÀngd 1954", Int64.Type}, {"FolkmÀngd 1955", Int64.Type}, {"FolkmÀngd 1956", Int64.Type}, {"FolkmÀngd 1957", Int64.Type}, {"FolkmÀngd 1958", Int64.Type}, {"FolkmÀngd 1959", Int64.Type}, {"FolkmÀngd 1960", Int64.Type}, {"FolkmÀngd 1961", Int64.Type}, {"FolkmÀngd 1962", Int64.Type}, {"FolkmÀngd 1963", Int64.Type}, {"FolkmÀngd 1964", Int64.Type}, {"FolkmÀngd 1965", Int64.Type}, {"FolkmÀngd 1966", Int64.Type}, {"FolkmÀngd 1967", Int64.Type}, {"FolkmÀngd 1968", Int64.Type}, {"FolkmÀngd 1969", Int64.Type}, {"FolkmÀngd 1970", Int64.Type}, {"FolkmÀngd 1971", Int64.Type}, {"FolkmÀngd 1972", Int64.Type}, {"FolkmÀngd 1973", Int64.Type}, {"FolkmÀngd 1974", Int64.Type}, {"FolkmÀngd 1975", Int64.Type}, {"FolkmÀngd 1976", Int64.Type}, {"FolkmÀngd 1977", Int64.Type}, {"FolkmÀngd 1978", Int64.Type}, {"FolkmÀngd 1979", Int64.Type}, {"FolkmÀngd 1980", Int64.Type}, {"FolkmÀngd 1981", Int64.Type}, {"FolkmÀngd 1982", Int64.Type}, {"FolkmÀngd 1983", Int64.Type}, {"FolkmÀngd 1984", Int64.Type}, {"FolkmÀngd 1985", Int64.Type}, {"FolkmÀngd 1986", Int64.Type}, {"FolkmÀngd 1987", Int64.Type}, {"FolkmÀngd 1988", Int64.Type}, {"FolkmÀngd 1989", Int64.Type}, {"FolkmÀngd 1990", Int64.Type}, {"FolkmÀngd 1991", Int64.Type}, {"FolkmÀngd 1992", Int64.Type}, {"FolkmÀngd 1993", Int64.Type}, {"FolkmÀngd 1994", Int64.Type}, {"FolkmÀngd 1995", Int64.Type}, {"FolkmÀngd 1996", Int64.Type}, {"FolkmÀngd 1997", Int64.Type}, {"FolkmÀngd 1998", Int64.Type}, {"FolkmÀngd 1999", Int64.Type}, {"FolkmÀngd 2000", Int64.Type}, {"FolkmÀngd 2001", Int64.Type}, {"FolkmÀngd 2002", Int64.Type}, {"FolkmÀngd 2003", Int64.Type}, {"FolkmÀngd 2004", Int64.Type}, {"FolkmÀngd 2005", Int64.Type}, {"FolkmÀngd 2006", Int64.Type}, {"FolkmÀngd 2007", Int64.Type}, {"FolkmÀngd 2008", Int64.Type}, {"FolkmÀngd 2009", Int64.Type}, {"FolkmÀngd 2010", Int64.Type}, {"FolkmÀngd 2011", Int64.Type}, {"FolkmÀngd 2012", Int64.Type}, {"FolkmÀngd 2013", Int64.Type}, {"FolkmÀngd 2014", Int64.Type}, {"FolkmÀngd 2015", Int64.Type}, {"FolkmÀngd 2016", Int64.Type}, {"FolkmÀngd 2017", Int64.Type}}),
  #"Unpivoted Columns" = Table.UnpivotOtherColumns(#"Changed Type", {"Älder", "kön"}, "Attribute", "Value"),
  #"Split Column by Delimiter" = Table.SplitColumn(#"Unpivoted Columns", "Attribute", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv), {"Attribute.1", "Attribute.2"}),
  #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter", {{"Attribute.1", type text}, {"Attribute.2", Int64.Type}}),
  #"Renamed Columns" = Table.RenameColumns(#"Changed Type1", {{"Attribute.2", "Year"}, {"kön", "Sex"}, {"Älder", "Age"}}),
  #"Split Column by Delimiter1" = Table.SplitColumn(#"Renamed Columns", "Age", Splitter.SplitTextByEachDelimiter({" "}, QuoteStyle.Csv), {"Age.1", "Age.2"}),
  #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1", {{"Age.1", type text}, {"Age.2", type text}}),
  #"Renamed Columns1" = Table.RenameColumns(#"Changed Type2", {{"Age.1", "Age"}}),
  #"Removed Columns" = Table.RemoveColumns(#"Renamed Columns1", {"Age.2", "Attribute.1"}),
  #"Replaced Value" = Table.ReplaceValue(#"Removed Columns", "mÀn", "Male", Replacer.ReplaceText, {"Sex"}),
  #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value", "kvinnor", "Female", Replacer.ReplaceText, {"Sex"}),
  #"Inserted conditional column" = Table.AddColumn(#"Replaced Value1", "AgeSorter", each if [Age] = "0-4" then "1" else if [Age] = "5-14" then "2" else if [Age] = "15-24" then "3" else if [Age] = "25-34" then "4" else if [Age] = "35-44" then "5" else if [Age] = "45-54" then "6" else if [Age] = "55-64" then "7" else if [Age] = "65-74" then "8" else if [Age] = "75-84" then "9" else if [Age] = "85-94" then "10" else if [Age] = "95+" then "" else null),
  #"Changed column type" = Table.TransformColumnTypes(#"Inserted conditional column", {{"AgeSorter", Int64.Type}})
in
  #"Changed column type"

Looking at the data, it’s much more useful for further analysis.

In order to reuse the work we just went through, we create a Power BI Dataflow. It is done from the Power BI service here, and note that it can only be done from an app workspace – not your own “my workspace”.

Then it’s time to create the dataflow by choosing “add new entities”.

 

There are quite a few data sources available already, even though some that are available in Power BI Desktop have yet to make it the the service (Data Lake Storage, for example). As we already have our Power Query code done, we’ll go ahead and choose a blank query that lets us write the Power Query code ourselves.

Next it’s just a question of pasting the Power Query code from before into the window like so, and click next:

If all goes according to plan (remember that this IS a preview feature and some parts of Power Query have yet to be implemented) the result should be identical to what we saw earlier in Power BI Desktop:

 

Let’s change the name of the dataset from “Query” to “Population” and click done.

There, that was the first entity of the dataflow done! We could add more entities, but for now we’ll just go ahead and save the dataflow by clicking save.

We’ll give the dataflow a useful name, a description and click save again.

Upon saving we have the opportunity to set the refresh schedule of the dataflow if we so desire. Unless the dataflow was refreshed at least once, there will be no data available when we go and try to use the dataset. The dataflow refresh settings can also be accessed here:

 

With this freshly minted dataflow in hand, let’s go back to Power BI Desktop and see how we can use it. Start by clicking “Get Data – Power BI – Power BI dataflows (Beta)”

 After logging in we are presented with the available dataflows:

Unless the dataflow is refreshed at least once, we will receive an error upon clicking the entity:

Provided the dataflow is refreshed, however, it looks exactly like any other datasource and we can go ahead and load or edit the query as usual:

So far we’ve removed some of the tedious work of prepping the data for further analysis, as anyone wanting to use this dataset no longer have to do all the preparation steps to get the data into a useful format.

 

Hard choices

I was recently accepted to speak at a SQL Server-related event in Europe. As I relish the opportunity to speak I was very happy and started preparing in earnest for booking the trip and polishing my session. However, when I took a look at the schedule, I found the name of another speaker that I had not expected to find anymore. This person has been tossed out of one of the most important Microsoft programs allegedly (as I don’t have personal, first-hand knowledge) due to repeatedly misbehaving in general and being sexist in particular – a behavior I have personally witnessed multiple times. As is the case with most people behaving in this way, this person has shown no regret or even understanding how this behavior could possibly be interpreted as wrong in any way. As this information has not become very public, I reached out to the organizers of this event to ask them if they knew about this person’s history and behavior, and if they really wanted someone like that speaking at their event.

While I don’t exactly know what I was expecting for an answer, I was not expecting the answer I received. The answer was basically the organizers putting their head in the sand and explaining that they treat everyone with the equal respect and expect the same back. Unless anyone is convicted of a crime they would not be kept out of speaking at their event, as they could not act on hearsay alone. They were, in my opinion, hiding behind a legal straw man that has little or nothing to do with the question at hand.

As it was abundantly clear that the organizers do not share my values when it comes to community, I was faced with the choice of either speaking at this event and implicitly accepting this individual’s behavior or withdraw to make a point. I chose the latter as I feel the need to stand up to sexism and generally deplorable behavior. I can’t change the world, but I can make a personal stand despite this costing me an important speaking opportunity.

Our community is too precious to not make a stand.

 

*updated with the word “allegedly” as I am not privy to Microsoft’s reasoning.*