vic_logo

Releasing your data under the Open Data program

Get step-by-step information about how to assess, prepare and share your data asset on DataVic, the Victorian Open Data Portal.

Before you start

If you are a frequent user, or plan to become a frequent publisher, you will need to create an account. You can do this by contacting the DataVic team. Once your account has been created you can follow the steps below to make your data asset open on DataVic.

In certain exceptional circumstances, the DataVic service desk can publish your data on your behalf, once you have assessed, classified and prepared it. Contact DataVic for more information. 

  1. All data assets must be thoroughly assessed for risks before being made open under the Open Data program.

    Critical risk assessments include:

    • Personal information (privacy check): data must not have any identifying information in it. This includes data which has been de-identified (a change from the previous Open Data policy).
    • Public health information: data must not contain information that adversely affects public safety.
    • Inappropriate disclosure risk (cabinet protections): Data assets with a security classification limiting sharing with the public must not be shared. The Victorian Protective Data Security Framework and the Victorian Protective Data Security Standards provide more guidance.
    • Business impact levels: Business Impact Levels assess the severity of harm and the consequences of harm if information held by the government is inappropriately released. Data that is released to the public through the Open Data Program should have zero or minimal negative impact.
    • Legal document checks: sometimes, legal documents include usable data assets. If you or your organisation are considering sharing legal documents you must ensure that the data is appropriately cleansed.
    • Retirement checks: Retention and disposal authorities (RDAs) are standards issued by the Public Records Office of Victoria (PROV) and are a legal instrument authorising the retirement of public records. To review the full list of RDAs available visit the Public Record Office Victoria.
  2. Check your organisation’s Information Asset Register

    All Victorian Public Service organisations are required to keep an Information Asset Register. It is a good starting point for the information you will need to assess your data asset for publication. The Information Asset Register may already contain important details you need about data ownership and data custodianship. Your organisation’s information management professionals will be able to assist you with these enquiries. 

    Establish/confirm the IP status

    All data assets are considered intellectual property under the Victorian Government’s intellectual property policy. The Intellectual Property Policy is administered by the Department of Treasury and Finance for the whole of Victorian Government.

    The Intellectual Property Policy has two main goals:

    • The Victorian Government grants rights to its intellectual property as a public asset in a way that creates the most impact, value, accessibility, and benefit in line with the public interest; and
    • The Victorian Government buys or uses other people’s intellectual property in a transparent and efficient way, making sure that the Government upholds the law and manages any risks properly.

    The Intellectual Property Policy requires government departments and agencies to apply a licence that promotes sharing and re-use of data assets. The most common licences are Creative Commons Licences. These licences improve access to public sector information.

    The Department of Treasury and Finance encourages departments and agencies to apply Creative Commons 4.0 licences to their websites, publications and to other material provided to the public. Refer to Chapter 5 of the IP Guidelines for further information on applying Creative Commons copyright notices or to seek support, email the DTF IP Policy team

     

  3. Ensure your data is in a machine-readable format

    Datasets need to be made available in a machine-readable open format. Machine readable refers to a data format that is readable by an electronic device and complies with several standards. A machine-readable file has a structure which allows easy interrogation of the contents, and can be used with spreadsheet software, statistics software, and custom written code.

    Unstructured documents, for example PDFs and word documents, make extracting data inconsistent and time consuming. However, PDFs and document files can be included in its original format to add context where it makes the dataset easier to understand or reuse.

    Data is arranged in many different formats depending on its use. The following list provides a guide for some of the preferred open formats for datasets.

    Common database formats

    • CSV (comma separated values) for simple spreadsheets and simple databases. CSV files can be previewed within the Open Data Portal without the need to download the file, enabling end users to decide if the file is suitable for their purposes.
    • XML (extensible mark-up language) is a general‑purpose mark-up language for complex datasets, standardised by the main international standards organisation for the World Wide Web.
    • XBRL (extensible business reporting language) is a freely available global standard, standards‑based way to communicate and exchange business information between business systems. 

    Common text-based formats

    • ODT (OpenDocument Format), commonly known as OpenDocument, is a zip-compressed XML-based file format for spreadsheets, charts, presentations and word processing documents.
    • XML (extensible mark-up language) is a general‑purpose mark-up language for complex datasets, standardised by the main international standards organisation for the World Wide Web.
    • JSON (JavaScript Object Notation) is an open standard file format and data interchange format, that uses human-readable text to store and transmit data objects.
    • HTML (Hypertext Markup Language) is the standard markup language for documents designed to be displayed in a web browser.
    • RTF (Rich Text Format) is a proprietary document file format developed by Microsoft Corporation for cross-platform document interchange with Microsoft products.

    Common tabular data formats

    • CSV (comma separated values) for simple spreadsheets and simple databases. CSV files can be previewed within the Open Data Portal without the need to download the file, enabling end users to decide if the file is suitable for their purposes.
    • ODS (operational data store) is used for operational reporting and as a source of data for enterprise data warehouses.

    Common geospatial data formats

    • SHP (Shapefile format) is a geospatial vector data format for geographic information systems software.
    • GeoJSON is an open standard format designed for representing simple geographical features, along with their non-spatial attributes based on the JSON format.
    • GeoTIFF is a public domain metadata standard that allows georeferencing to be embedded within a TIFF (Tag Image File Format) file.
    • KML (formerly Keyhole Markup Language) is an XML language focused on geographic visualisation, including annotation of maps and images
    • WMS (Web Map Service), a protocol that allows georeferenced map images to be served over the web.
    • WFS (Web Feature Service) allows requests for geographical features to be drawn across the web.
    • WCS (Web Coverage Service Interface Standard) provides access to coverage data in forms that are useful for client‑side rendering, as input into scientific models, and for other clients.

    Common data compression container formats

    • GZIP is a file format and software application used for file compression and decompression, intended for use by GNU.
    • ZIP is an archive file format that supports lossless data compression. A ZIP file may contain one or more files or directories that may have been compressed.

    Ensure the licence is open for re-use without restrictions

    For data to be ‘open’ it must be released with an open data licence that allows people to use the data in any way they want, including transforming, combining and sharing it with others, even commercially.

    The most appropriate form of licence for Victorian Government-owned data containing copyright material will vary depending on the type of material involved and the circumstances surrounding making the dataset available.

    It is recommended that agencies apply the Creative Commons licence (CCBY 4.0) to datasets released under the Policy, as it is the least restrictive licence. The Victorian Government Intellectual Property Policy Intent and Principles and supporting Guidelines contains more detailed information on licensing.

    There are six CC licences available. Each licence is identified by a combination of the symbols, an acronym and a descriptive label. Creative Commons supports you to choose a licence.

    It is recommended that departments and agencies seek internal legal advice before using other CC licences, which may involve issues that these Guidelines do not address.

    If access to a dataset is to be restricted to an internal audience only, then any re-use restrictions can be applied to a data record. In these circumstances it is recommended that data administrators and custodians clearly articulate the conditions of use in the relevant Data Directory fields for each data record.

    Identify the key contact for the dataset

    Each dataset must have an assigned data administrator and data custodian to ensure that data released through DataVic website is managed appropriately throughout its lifecycle.

    To support a consistent approach to data management, it is preferred that the method of publishing to DataVic uses existing processes within your department.

    It is within these structured governance models that appropriate resources can be identified for roles within the DataVic publishing environment, which includes the treatment of user feedback.

    Ensure the dataset is hosted on a publicly available server

    DataVic has been created to simplify discovery and access to a range of Victorian government data. DataVic is essentially a data discovery tool that is populated with a collection of metadata records. DataVic is not a data storage facility.

    All data records published on DataVic will point back either to the dataset URL hosted on that department or agencies own server environment or a page on the agency website that provides direct access to datasets via a data tool.

    It is the responsibility of the relevant department or agency to update the affected dataset URLs either through the Data Directory or through the service desk function (depending on the supply model in operation).

    Review the relevant procedure manuals for a detailed publishing workflow process. For a copy of the procedure manuals or specific support, contact DataVic.

  4. DataVic is built on open source software called CKAN. This data registry software allows for descriptions of data assets. The metadata statement is how we describe our data assets in CKAN. 

    Metadata means descriptive information about the content, context, structure and management of data assets.

    Data assets differ a little from other records that the Victorian Government produces. Information records are often about recording a decision, action or other government business at a specific time as an administrative memory. Data assets are generally more active. Data is used and consumed. Data tools are deployed and work actively to host, transfer and transform machine-readable data assets. This means when you are describing your data asset you should think about its purpose, why you created it, how it is consumed to get jobs done, how it is stored and maintained, and the useful characteristics of this data asset that would help another person to understand its potential power.

    The Victorian Government uses a minimum metadata statement to describe data assets. Learn more about the minimum metadata statement.

  5. Self-publishing

    There are two distinct self-publishing processes available to departments and agencies:

    • a harvest
    • a manual creation of the metadata. 

    A harvest between the relevant platform and DataVic is where all the necessary workflow and publishing decisions are made within that department or agency’s own environment. Once the integration has been created all publishing choices are made outside of the DataVic environment. 

    Alternatively, a department or agency can commit to take an active role in the creation of DataVic records by managing a whole organisational environment within the Data Directory.

    The first step will be to get departmental or agency approval for a custodianship model that contains roles for both content creators and content approvers. This will include nominating appropriate staff members to fill these roles.

    Once this model has been established and the roles assigned, contact the DataVic operations team to help you create the initial publishing environment for the organisation and apply roles for the nominated staff.

    Training will be required for nominated staff to learn the responsibilities within an organisation as well as the publishing workflow process.

    Published by DataVic

    This method has been designed to accommodate the one-off and infrequent publishing needs of departments and agencies. The process starts with the completion of a standard metadata template containing all the relevant details of the data record to be published. Every request for publishing must include the details of both the person submitting the request as well as the data administrator who has made the publishing approval.

    Ensure that the completed template includes all the details of the person who has approved the publishing of the record on DataVic.

    Submit the completed template to the DataVic Service desk (signup required). 

  6. It is your responsibility to ensure that your data asset is maintained and kept current or retired where appropriate. Learn about managing online records or find more information for data publishers.

    Data assets that have been released under Freedom of Information requests

    Requests for Victorian Government data assets may be made under the Freedom of Information Act 1982. Data assets that are released under Freedom of Information laws should be considered for appropriate sharing with the public.

    Appropriately sharing data assets with the public is in line with the reasons for the Freedom of Information laws. The data assets must still meet the purposes of this Policy and satisfy the restrictions against inappropriate sharing. The Freedom of Information laws provide timelines for providing information. These timelines do not apply to the Policy.

Reviewed 05 May 2021

Was this page helpful?