Apache Superset: Onboarding Guidebook
Superset is hosted on CDL’s servers. It can be accessed at – https://supersetv2.civicdatalab.in/login/

Every bandhu will have a superset account. Contact Sai Krishna Dammalapati <saikrishna@civicdatalab.in> or Abhinav Singh <abhinav@civicdatalab.in> if you don’t have one.
Upload a CSV
You can upload a dataset either by connecting to a database or by uploading a CSV. Ensure that a final version of the dataset is uploaded. The data pillar will have taken care of uploading important datasets onto Superset. If you have a CSV that you want to explore, upload a CSV.

As an example, KHOJ dataset will be used in this handbook. Upon clicking upload CSV to database a settings window will open. You can leave most of the settings default. But make note of the following settings:
Table Exists:

Parse dates:

Once uploaded, a Dataset gets created in the Superset. You can check all the uploaded CSVs here:

By default every column in the Dataset will be GROUPABLE and FILTERABLE. However, if there are dates/times in any of the columns, those columns should be identified as TEMPORAL. It will help in building time series charts.
To enable such operations on each of the columns, edit the table that you uploaded. You can do that by clicking on the pen button at the end of the table row. Hover the cursor under the Actions column to find it.

With that you have succesfully uploaded a CSV on to the Superset.
Naming convention
**The naming convention for the CSVs uploaded: ** 1. The dataset’s name should be self-explanatory. 2. Indicate the time range of the data within the name of the dataset. Eg: UnionBudget_FY17_18 3. Indicate “tmp” in the name of the CSV is uploaded for a temporary use case. Delete the file after using it. Eg: AssamBudget_FY19_20_tmp 4. Delete the duplicated CSVs if they are not used in the dashboard.
Making your first chart
Create a new chart from the Charts window.

Select the Dataset you uploaded and also the type of chart (visualization). We commonly use bar charts, Sankey diagrams, line charts, partition charts, area charts and simple table/pivot table charts.

Alternatively, you can directly click on the table name to open the chart window. The chart window looks like this:


You can add more “SIMPLE” METRICS like SUM, AVG, MAX etc., by clicking on it. You can also write CUSTOM SQL for advanced Metrics.

And the first chart is created. We visualized the number of judges in each state based on their state of origin. We also want to see a gender split in each state. So we choose “Gender” as a “DIMENSION”. A bar chart is an appropriate visualization for this. Save the chart.

Now that we created a new chart, we can further customise it.
Customise your chart
We can further customize the chart to make it more meaningful and aesthetic. Taking further the bar chart we created in the previous step, we now want a stacked bar chart. To customise a chart, click on the CUSTOMIZE tab.

The resultant stacked bar chart looks like this. Similarly, each type of visualisation will have different customisation options. Explore them.

Other customisation options that we generally use are: 1. COLOR SCHEME: To select color palette for the legend 2. X BOUNDS and Y BOUNDS: To put boundaries on X-axis and Y-axis 3. X AXIS and Y AXIS LABELS: Write self-explanatory label names for X-Axis and Y-Axis 4. FORMAT: To change the format in which the numbers should be represented on the chart. Eg: You can decide to show only 2 decimal points.
These options also vary from one visualisation type to another. Explore them for each visualisation to build the best possible chart.
Example charts
Important visualization types that are generally added to our dashboards, with a few examples:
Tree maps: For Treemaps, we just select one DIMENSION and a METRIC (COUNT) to apply on it. Here, we count the number of judges from each “Parent High Court”

Sankey Diagram: They are helpful in understanding the flow of metrics from one set of variables to another. For instance, we want to see how many judges belonging to a particular state end up working in the High Court of the same state. We choose Sankey Diagram as visualisation; SOURCE and TARGET variables as “State of Birth” and “Parent High Court” and METRIC as COUNT. We thus get the following diagram.

Big Number: The query for Big Number visualization is simple. You just have to select the Metric that you want to show in Big Numbers. Here, we show the COUNT of the Total number of Judges.

Here are a few examples of other charts: 1. Histograms 2. Dual axis line charts 3. Bullet charts 4. Sunburst chart 5. Chord diagram 6. Graph charts 7. Bubble charts 8. Nightingale rose charts 9. Word clouds
Making your first dashboard
Create a new dashboard from the dashboards window.

The newly created empty dashboard looks like this.

Click on the “Edit Dashboard” button to insert charts, text blobs etc. They can be just dragged and dropped into the dashboard. Use columns and rows (LAYOUT ELEMENTS) to organize the dashboard well.
Use of filters
There are two ways in which you can add filters to the dashboard.
FILTER BOX – If you want to make filters visible on the dashboard, filter boxes can be used. It consists of columns on which filters will be applied. This FILTER BOX is also a “Chart”.

You can also add filters directly on the dashboard

SCOPING is useful to select the charts on which the filter is applicable. By default, the filter will be applicable to all charts on the dashboard. For instance, we made the “State of Birth” filter inapplicable to the Sankey and Treemap charts that we created.

Use Filter Boxes only when it is necessary to see filters on the dashboard. Otherwise, adding filters directly on the dashboard is easier and more versatile due to the SCOPING functionality.
We have created our first dashboard with filters!

Best practices for a CDL dashboard
Don’t overpopulate the dashboard. Use the “TABS” feature to divide a dashboard into multiple pages. Each Tab/Page can present data at one hierarchical level. For example, One tab for all of India, one tab for the state, and one tab for district-level analysis.

Use the first tab of the dashboard to provide context about the dataset, especially if the dashboard is used by external partners. Use the text box features to write text.
Rename Metric Names as and when necessary to make them self-explanatory.

To present any Hierarchical information - a partition diagram is the best practice at CDL.

Use self-explanaotry chart names.
6. Bar chart hygiene checks: - Consider using top/bottom or thresholds to limit the number of entries. - Axis names should always be indicated. - Rotate the X-axis labels if there are more bars to avoid clutter.
Every CDL dashboard contains a Table at the end. Use the Pagination and Search feature if there are many records in the table.
When the dashboard is ready to be shared, write a readable URL slug for the dashboard. It is available in the Edit Properties section of the dashboard. Also, choose a suitable colour scheme for the entire dashboard

Dashboards to refer 1. SCADL Dashboard (v2)
External guidebooks to refer 1. Best Practices for Data Visualisation - Royal Statistical Society