Scraping data with import.io

It’s often difficult and time consuming extracting data from websites for use in stories. imort.io is a web based tool that easily parses web pages and turns web content into tabular data.

I’m going to use it to get a list of all Cardiff Councillors.

Simply copy the URL of the list of councillors on the Cardiff Council website and paste it into import.io.

Screenshot 2016-06-12 at 10.26.04

And…er…that’s it. You can edit the table in the browser and then download it as a CSV file.

Good governance and active citizenship require open data

 

Cobi Smith, Australian National University

Australia is a nation that cares about promoting transparency, empowering citizens, fighting corruption and using new technologies to strengthen governance. At least, this can be assumed given our commitment to the global Open Government Partnership.

But if this is so, then why is Australia slow to act, according to data on that very website? Our commitment to promoting transparency, supporting active citizenship or fighting corruption can also be questioned.

However, there is progress both in open access and open data, so perhaps it’s a lack of prioritising such international commitments within our growing “open” movement.

Assuming good faith and a belief in open data, it may be simply that people lack the knowledge, time or resources to make data accessible in line with open data standards.

Given different guidance depending on data type (consider for example Australia’s open council data standards and the International Aid Transparency Initiative) uncertainty about how to progress may be stalling release of open data in ways that support participatory governance.

Others have discussed how the costs of opening data are low compared to future benefits. Even if we all agree data should be open, where to from here?

Priorities

All Australians should have free and open access to information about how our society operates. This supports us in being active citizens who care about what’s happening and feel empowered to play positive roles in Australian society.

Progress is already happening, as demonstrated in GovHack, which is an annual competition that encourages members of the public to find creative ways to use government data – and is encouraged by the government.

More transparent budget data has been a welcome change in recent years. As open activist Rosie Williams notes, analysis and engagement are intended impacts of transparency:

The more educated people are, the less opportunity there is for budget information to be used for political purposes and the better political discourse will be for it.

But what happens when data are presented in ways that do serve political purposes, catching out even the most scientifically literate Australians?

It is important to consider the values and priorities behind data release and presentation. Open Knowledge Australia participants have recently been discussing strategies for prioritising open data and discussing what makes a “high value” dataset, thanks to Cassie Findlay.

There are uncontroversial, non-sensitive datasets that can be low-hanging fruit and can lead to fun tools that many of us might find useful. For example openbinmap.org is a new tool that – if your council shares its data – tells you if you should put the bins out tonight.

Its creator, Steve Bennett, has been involved in online discussion about what makes datasets valuable and higher priority. He notes that uniqueness is valuable:

[…] to what extent are there no other sources of this information? A council’s collection of street information is valuable but there’s a lot of overlap with OpenStreetMap, for instance. But no one else could have the garbage collection zone boundaries.

Up-to-date datasets are more valuable than poorly maintained ones. Steve noted that reusability is also valuable:

[…] was the data being collected with a general purpose in mind, or are there limitations due to the original purpose for which it was collected?

This is related to the concept of frictionless data, which means open data that can be reused and integrated beyond a given tool or project, typically using simple web standards.

Open by default?

Another Australian open knowledge leader, Steven De Costa, advocates for common web standards and notes that comparability makes data more valuable:

The value is not measured by any single dataset release, but the combined value of having all datasets available within a topic area so that comparisons can be made.

This gives meaning to portals such as the Open Data Index and International Aid Transparency Index.

Rebecca Cameron, who led an open data initiative in the Queensland government, noted that records of requests for information both under freedom of information laws and directly from researchers helped prioritise valuable datasets to be published. She said investigating which pages of government websites had the most hits and downloads of data helped inform which datasets to release.

In theory some datasets could by academic definition be high value, but if no one even opens the dataset it has little value.

Once released, she advocated waiting several months to see which datasets were still being used regularly, beyond an initial flurry of downloads.

Prioritisation is essential for progress but it inevitably involves value judgements. Decisions about which data are released first and how data are presented are not neutral. What we prioritise indicates our values. This is why transparency about values is important.

It’s nearly two years since an international charter for government data to be “open by default”. What values do omissions of open data indicate?

Sarah Barnes questioned where open data was heading in Australia. As she cautioned two years ago, citizen engagement is more than a double-click away.

Active citizenship involves allowing everyday Australians to understand the values and priorities associated with how data are shared. This allows us to make our own decisions about whether those values and priorities reflect our own. And if they don’t, how we can engage in shaping an Australia that better reflects our values.


Note: most quotes in this article came from conversations on the Open Knowledge Australia mailing list.

The Conversation

Cobi Smith, PhD researcher, Australian National University

This article was originally published on The Conversation. Read the original article.

Mapping Polling Stations with import.io and Silk

Cardiff Council publish a list of polling stations as a PDF.

Cardiff Hyperlocals wanted to map this data and present it in a more accessible format.

They used a tool called import.io to scrape the data from the PDF into a spreadsheet.

ImportioSilkGuide1

They then used a service called Silk to geocode the address column and map the polling stations.

ImportioSilkGuide2

This map can be embedded in another website and the data can be filtered by electoral division.

 

Publishing Open Data with PowerBI

Microsoft’s PowerBI is a new tool for creating reports and visualisations.

The desktop application is currently free and publishing data is very easy.

Step 1

Open your data from the wide selection of available sources. We’re going to use OData from StatsWales. This data is published in JSON format.

Step 2

Choose the data to display from the available fields and the type of visualisation. We’re going to use a line chart.

PowerBIguide1

Step 3

Publish the chart and display using an iframe or link.

PowerBIguide2

 

Welcome to Hyperlocal Open Data

Welcome to our guide to open data for hyperlocal journalists.

As part of Cardiff University’s “Hack the Local” hackathon, we’ve built a guide to open data for hyperlocal journalists.

If you don’t know what open data is, read our guide for a basic outline. Once you’re ready to take a look, browse some sources of open data and get inspired.

Open data can generate news for local sites or provide information for stories you’re already writing.

There are pitfalls and caveats that you must consider but we’ll do our best to point them out for you.