Data collection challenges and improvements

Organisations may face a number of challenges in collecting consistent and quality data. To develop methods to improve data collection practices, it is necessary to first identify barriers to consistent data collection. This section identifies common data collection challenges, as well as those more specific to collecting data on family violence, and from priority communities. The section also provides advice about how to address some of these challenges and improve data collection. Government departments, agencies and service providers with responsibility for data collections should consider these challenges and improvement opportunities as part of implementation planning. Additional data collection challenges that are specific to particular communities are discussed in the priority community sections of this paper.

Challenges in current data collection practices

Inconsistent data collection standards

Data standards outline how common data items and demographic information should be collected. Established standards typically contain data definitions, standardised questions and accepted response options which guide consistent collection practices. Currently, there are many national and state-wide data standards which are used for collecting administrative data. These standards are not always broadly applied, and may themselves be inconsistent, and this can impact the comparability of data collections.

For example, many specialist and non-disability specific support services collect data on disability drawing on definitions used by the National Disability Insurance Scheme (NDIS), the National Disability Agreement and state governing bodies. Different types of services may apply different standards depending on what is most relevant for their service provision. For example, medical services may be more likely to collect disability information by way of diagnoses and medical history, while non-disability specific services may be more interested in collecting information concerning support needs or a need for reasonable adjustments. As a result, the scope and detail of information collection may not be consistent across services, making it complex to compare data between services, or to population level data sets.

The absence to date of a co-ordinated effort between government, service providers and other agencies to standardise data collection practices means there is considerable variation in how information is collected and recorded in Victoria in relation to family violence and to priority communities.

Context of data collection

Data collection from clients may occur in a variety of situations and settings where it can be difficult to obtain complete and accurate information, and the amount of information gathered may vary depending on the context of the situation. In most cases, the person responsible for collecting data has a primary role that focusses on the provision of a service (for example, as a police officer, support worker or medical practitioner) and, although they collect data as part of these roles, data collection is not necessarily the primary function of their role. Contexts where certain data collection may be limited include crisis or emergency situations, where workers are prioritising the safety of an individual, or situations where an individual’s privacy may be compromised by asking about family violence, such as in a busy waiting room.

Further, in some cases, organisations may not be resourced to provide services to specific cohorts, which can mean there is little incentive to improve the data collected on these individuals in an administrative setting. For example, some family violence services are not specifically resourced to meet children’s needs and may therefore not collect detailed information on this cohort. Conversely, improved data collection on priority communities can help build an evidence base from which to consider different funding models.

Data collection is not core to business function

The core functions of an organisation and time pressures in service delivery can impact the type and quality of data that an organisation collects. Administrative data are typically collected as a by-product of operational requirements or to meet an internal business need and may only include core information needed to perform a service, such as a client’s contact details. In such cases, information on an individual’s sexual orientation, cultural background or disability may not be seen as an operational requirement for organisations that do not offer specialised services. As a result, organisations may only collect a narrow range of data items, which lack sufficient detail needed for broader secondary use purposes, such as conducting state-wide service analysis, monitoring or research.

A perception that the collection of certain demographic information is not relevant to core business functions can impact data quality and comparability for all priority communities discussed in this framework. For example, while many services are legally obliged to collect information on a person’s requirement for an interpreter, other information on their cultural background may not be deemed as relevant to service delivery, resulting in partial information on CALD communities. Concerns have also been raised about how the collection of Aboriginal information may contravene an organisation’s commitment to equitable service delivery⁵, despite the fact that a person’s response to this question should not impact the standard of service they receive.

Complexity

In some cases, adequate information about a person’s background cannot be ascertained through one data item, for example for CALD and LGBTI communities, and for people with disabilities. Where this is attempted, it often under-represents those who face heightened risks and barriers to accessing services. It also has the potential to add confusion regarding different concepts that may not be fully understood by people outside of specific communities. For example, grouping diverse people and communities into a single ‘LGBTI’ group, or using the need for an interpreter as a marker of CALD communities, does not recognise and represent these communities accurately, and decreases data integrity.

Lack of training in data collection

As the primary role of front-line service and clinical staff will generally not be data collection, they may not receive training in this area. If staff do not receive training or understand why they need to collect particular data, they may feel less confident to ask the associated questions, or ask them in a different way. A lack of training in how and why to collect certain kinds of data can particularly impact the priority communities discussed in this framework. For example, given the personal nature surrounding questions about sexual orientation or intersex variation, organisations may be reluctant to ask for this information, particularly if there are concerns that this may cause a person to be offended or experience discomfort. A fear of causing offence may also impact staff willingness to ask questions about a person’s disability, cultural background or Indigenous status, and lead them to make assumptions based on observation or on information being volunteered. Staff training in the benefit of collecting these data items, and in sensitive or culturally appropriate ways to do so, can build staff understanding of the value of these types of data, and assist in building data quality and consistency.

Lack of quality assurance processes

There may be limited opportunities to confirm information with a person who has been in contact with a service, meaning that the data initially collected cannot be verified. Additionally, the sophistication of record keeping systems can vary and data quality is often reliant on the person entering the data correctly. Depending on the resourcing of an organisation, time may not permit staff to review information for completeness and obtain missing data.

Changes to definitions and policies and maintaining data comparability

Over time, best practice data collection policies and procedures change. Agencies and their staff may not be aware of these changes and how they affect them, meaning that they inadvertently follow outdated practices. This issue tends to be more prevalent in large organisations, particularly if information is not communicated widely and consistently throughout the workplace. Also, if training is not provided to reinforce changes in practice, staff may continue to follow the procedures they are most familiar with.

Organisations changing data collection systems and processes also need to be aware of the need to ensure continuity of reporting using existing data items. For example, many service providers are bound by the requirements of their funding body to provide particular data fields on a regular basis. Furthermore, in some cases these minimum requirements are established at the federal level, rather than by Victorian state government departments. Longitudinal analysis of service usage based on common data items, and comparability to national data sets, such as those of the ABS, are another consideration when updating data collections.

Economic and IT restrictions

Some organisations may not have the capacity or infrastructure to prioritise improvements to data collection systems and processes. This may be due to a backlog of paper-based records to be digitised, a small workforce to input and maintain data, and lack of budget to upgrade records management systems. It is also important to note that many IT systems are provided by government departments, who also carry the responsibility of resourcing and conducting system updates. These updates can be expensive and take time. In some cases, these IT systems may have limited capacity to include multiple response values or dynamic questioning, that supports sophisticated data collection. The introduction of multiple response options may also present problems for exporting and analysing data.

Improving data collection

The remainder of this section provides information on improving data collection practices in general. It includes a range of processes that can be implemented at the organisational level, and through changes in infrastructure and data collection practices. It also provides advice on interim process for improving data quality for analysis and reporting purposes, and information on privacy and security requirements.

Organisational Practices

Commitment from all levels of an organisation to improve data collection

Improving data collection and the quality of data holdings requires a concerted effort from an entire organisation, and should begin with a top-down commitment for change. This includes identifying priority areas for improvement and barriers to improvement, adopting best practice procedures for collecting quality data, using data standards where available (including those recommended in this framework), ensuring IT infrastructure is kept up to date and allows for efficient and effective data collection, and providing training where needed to those collecting data to ensure confidence and consistency in data collection practices.

Training

It is important to provide training to staff involved in the collection of data. Training should emphasise why it is important to collect data and highlight the benefits of data for operations, planning, research and evaluation. If staff understand the rationale for collecting certain information, they will feel more confident to ask for these data items and to explain why it is important. Training should include how to phrase questions, clarify answers and record responses.

Using data-related Key Performance Indicators (KPIs)

Setting KPIs linked to data and evidence can be a motivating factor for organisations to ensure improvement in their data collection practices. KPIs can target many aspects of data quality including completeness (how many records have a recorded value), and precision (how many records have a meaningful or valid value). Organisations should set reasonable KPIs that aim to improve the quality of their data, but not create perverse incentives that could undermine data quality or service delivery.

For example, an organisation finds that only 50% of the clients contained within their record management system have a recorded gender. The organisation sets a KPI for 100% of clients to have a recorded gender, and they monitor this goal over the course of a six-month period to ensure that improvements made are effectively moving towards this goal.

Conducting audits and business process reviews

If possible, it is recommended that audits of datasets are conducted at regular intervals to ensure accuracy and completeness of recorded data. Audits may illuminate systemic or recurring issues in data collection that can be addressed once identified. Similarly, reviews of business processes can identify difficulties in data collection and assist an organisation to understand the barriers to quality data collection. Conducting audits and business review processes can also be a component of evaluating the success of KPIs.

Infrastructure and Collection Practices

Data items have pre-defined responses

Where appropriate, it is recommended that data items have a pre-defined set of response options at the point of entry into a data management system. This reduces the potential for typographical errors and enables more efficient data collection and subsequent analysis. However, there may be instances where a free text field should be provided. Recommended response options and instances where free text coding should be allowed will be discussed in the data collection standards proposed in this framework.

Accommodation of multiple response options

There are some priority data items where it is not ideal to collect only one response from a person. For example, when asking a person to describe their disability, a person may disclose that they are blind and have mobility difficulties. In this case, it may be restrictive to ask someone to choose between response options when recording their disability type. It is acknowledged that allowing for multiple response options can create complexities both for IT infrastructure and for analysis of the collected data, however it is recommended that for certain data items, agencies and services consider approaches which accommodate multiple response options.

Creating mandatory data fields

Where appropriate, it is advised that service providers and agencies update their data collection infrastructure to utilise mandatory data fields (or at minimum, prompts, on all non-optional data items). Therefore, the person inputting the data cannot move to the next screen without entering a response in mandatory data fields. Incorporating mandatory data fields into a records management system ensures that all non-optional data items receive a response. However, it is important to remember that people have the right to not respond to a question.

For example, to improve their collection of gender information, an organisation updates their data entry system so that a response for gender must be recorded when entering details about a new client before the new entry can be completed. The organisation now finds that 100% of all new client records have a recorded response for gender.

Guidance for collecting data in written form and verbally

Regardless of whether data collection is written (form-based) or verbal, using the question phrasing and response options outlined in each data item is recommended. Further, it is generally recommended that data are collected directly from a client, rather than by proxy, particularly for sensitive information such as a person’s sexual identity or orientation.

Collecting data via a third party
Although it is preferable for data to be gathered directly from a person, this may not always be possible. Agencies and services should have their own policies which dictate where it is acceptable for communication (and by proxy, data collection) to take place through an agreed upon third party. Agencies and services should be aware that in some cases a person’s guardian or representative may be the perpetrator of family violence against that person. If such a circumstance is suspected, agencies and services are encouraged to have protocol in place to help assist victims so that the offending party is not speaking on behalf of the victim.

Collecting data via a third party

Although it is preferable for data to be gathered directly from a person, this may not always be possible. Agencies and services should have their own policies which dictate where it is acceptable for communication (and by proxy, data collection) to take place through an agreed upon third party. Agencies and services should be aware that in some cases a person’s guardian or representative may be the perpetrator of family violence against that person. If such a circumstance is suspected, agencies and services are encouraged to have protocol in place to help assist victims so that the offending party is not speaking on behalf of the victim.

Where data are collected from a respondent verbally, questions should be asked as they are written, and data collectors should describe the response options available for each question. Detailing the full list of response options across data items can have a range of benefits, including communicating that an organisation is inclusive of a broad range of identities, as well as assisting respondents with choosing the most appropriate category for them. In some cases, such as for the disability data items, respondents may need additional information to understand the scope of categories, and data collector should provide information and guidance to assist with understanding each of the categories.

Where possible, all available categories in a data item or classification being used should be read or provided to a respondent rather than a short-list. However in cases where the context of the data collection does not allow for this, the question can be asked on its own. Managing long lists of response options is particularly relevant to questions about country of birth and language spoken at home, where there is an extensive range of options a person may choose. If a short-list is being used, and a respondent’s answer is a category in the short-list, this category should be selected. Where their answer is not in the short-list of response categories, the data collector can select ‘other’ and where possible, enter the response in the text field.

Because communities may have unique, or an extensive range of terms to describe identities, such as sexual identity or ethnicity, it may be necessary to clarify the term used with the respondent. For example, if a client responds to a question on gender identity as ‘non-binary’, the data collector should confirm with the client that this aligns with the ‘self-described’ response category, and write ‘nonbinary’ in the free text field which accompanies that option. Responses should not be questioned or assumed based on a person’s appearance or other information that has already been disclosed.

Including response options for ‘prefer not to say’, ‘question unable to be asked’ and ‘no response’

Inevitably, there are circumstances where it is not possible to obtain certain information. It is recommended to include response options for priority items that will help provide details about why information was not able to be collected. This can be used to evaluate collection practices, and determine if solutions can be implemented to better address gaps in data. There are some data items that may involve the disclosure of sensitive information. Hence, it is recommended to provide respondents with a ‘prefer not to say’ option, which respects a person’s choice not to disclose particular information. Including this as an option also enables data analysis to determine where the question was asked and the response is not missing or unknown for other reasons.

When questions are asked of people verbally, it is possible that the data collector will be unable to gain all the required information. This may be due to the context in which the information is being gathered (for example, an emergency event), or to other unexpected events (for example, a client abruptly hangs up the phone). In these cases, it is ideal to include the response option ‘question unable to be asked’, which explains why the information was unable to be recorded.

When questions are asked of people by form or online, a person may choose to not complete all questions. In these cases, it is recommended to include a response option for ‘no response’, for when this information is subsequently lodged in a data records system.

For example, an organisation is pleased to find that they have achieved their goal of 100% of all clients having a recorded gender. Upon closer analysis of the records however, the organisation finds that 40% of the records have a gender recorded as ‘unknown’. As all data are collected verbally from clients, they update their records management system to allow for response options for ‘prefer not to say’ and ‘question unable to be asked’. Over time, they find that for 30% of all clients, a question about their gender is unable to be asked.

Following up to obtain additional data

It is acknowledged that certain situations do not permit the collection of comprehensive data. Where feasible, it is recommended that missing data are followed up at an appropriate time. In particular, organisations that work with people in a crisis or emergency situation should obtain data required to deliver the immediate service, and then follow up for further information once the crisis or emergency has been managed. The follow up to address data gaps could accompany an existing operational task, including a routine call to ensure the welfare of a patient or client following a service.

For example, an organisation finds that 30% of their client records have a response value of ‘question unable to be asked’ for gender. After consulting with front-line staff, it emerges that information is typically gained from clients while they are accessing an emergency service. The organisation may decide it is appropriate to implement a follow up later which includes collecting missing data items.

Interim improvements for analysis and reporting purposes

Improving the quality of demographic data within one data source

In cases where clients are presenting over multiple occasions and there is partial coverage of a data item, there are post-hoc data improvement processes that can be implemented to superficially improve the quality of the data. It is important to note that historical responses to data items should remain unchanged, and the application of any of the methods below should update only the most current status. There are three options to improving the quality of demographic data, all of which have advantages and disadvantages. These rules that summarised below are drawn from the CSA’s ‘Consultation paper: Improving recorded crime statistics for Victoria’s Aboriginal community’⁶. However, the approach is more broadly relevant to other demographic identifiers:

Application of an ‘ever-identified’ rule - Using this method, a person who has identified at one point in time as being of Aboriginal would then be given this status across all of their other records in the database.
Application of a ‘most recent identification’ rule - Using this method, a person’s most recent Indigenous status would be applied across all of their other records in the database.
Application of a ‘most frequent’ rule - Under this method, a person’s most frequent response to the SIQ would be applied across all of their other records in the database.

For the purposes of analysis and reporting on Victoria Police crime data, the CSA applied the ‘most frequent’ rule to Indigenous status to improve the quality of the Indigenous status variable in victim and offender analyses. Overall, feedback received by the CSA indicated that this rule was the favoured methodology.⁷ This concept can be applied across other datasets where individual clients can be identified across a database and would provide an interim measure while other data improvement processes are developed and implemented.

Improving the quality of data across multiple data sources

In addition to using one of the methods outlined above, it is also possible to use multiple data sources to identify a person’s certain demographic or community data items even where it is not directly collected as a result of a service or business process. This involves linking a person across the datasets using key pieces of information and attributing that data category in one data source to their profile in another source.

Privacy and security considerations

This section outlines some of the privacy principles that public sector organisations should be aware of when collecting and storing data. If policies and procedures regarding the secure storage and transfer of data are not already in place, organisations may be reluctant to collect personal information if it is not imperative for their operations. However, this section is not intended to provide extensive privacy and security guidance; instead organisations should refer to any relevant legislative, regulatory and administrative provisions for further information.

Information Privacy Principles (IPPs) and Health Privacy Principles (HPPs) are privacy principles which govern the way that public sector organisations, including contracted service providers, collect, use and handle personal and health information. The IPPs apply to personal information under the Privacy and Data Protection Act 2014 (Vic) (PDPA), while the HPPs apply to health information under the Health Records Act 2001 (Vic) (HRA).⁸ Privacy principles that are particularly relevant to the framework include; IPP/HPP 1 (Collection), IPP/HPP 3 (Data quality), IPP/HPP 4 (Data security), IPP/HPP 8 (Anonymity), and IPP 10 (Sensitive information).

The Office of the Victorian Information Commissioner has developed the Victorian Protective Data Security Framework (VPDSF), which provides comprehensive information on managing data security risks from the point of data collection and throughout the information lifecycle.⁹ The standards in the VPDSF relate to data governance, information security, personnel security, Information Communications Technology security, and physical security. It is recommended that organisations review and adopt these protocols prior to data collection.

In addition to the IPPs and HPPs, there are a number of other policies and laws that make up the Victorian information management landscape, which agencies should consider when developing their own privacy and security policies. In particular, organisations should turn to their enabling legislation as a starting point in determining the information they are permitted to collect.

In particular government departments, agencies and service providers should be aware of their obligations where prescribed under the Family Violence Information Sharing Scheme (FVIS Scheme) and the Child Information Sharing Scheme (CIS Scheme)¹⁰. The FVIS Scheme and CIS Scheme are aimed at removing barriers to information sharing to allow professionals to work together across the service system, to make more informed decisions and better respond to the safety and wellbeing needs of individuals, children and families. The requirements of Schemes, including record keeping requirements, are detailed in the Family Violence Information Sharing Guidelines and the Child Information Sharing Guidelines. When services are sharing information under these schemes, the Victorian Protective Data Security Standards will continue to apply.

Organisations are encouraged to seek advice prior to implementating the data items proposed in this framework to ensure compliance with relevant privacy legislation. Individual organisations are responsible for ensuring that their business practices are compliant with State and Commonwealth privacy requirements and information sharing schemes, and should seek guidance from privacy advisors, legal teams, the Office of the Victorian Information Commissioner and/or the Health Complaints Commissioner when unsure about their obligations.

5 AIHW 2010, National best practice guidelines for collecting Indigenous status in health data sets, Cat. no. IHW 29, Canberra, p.3.
6 CSA 2016, Consultation paper: Improving recorded crime statistics for Victoria’s Aboriginal community, viewed 22 June 2018,
www.crimestatistics.vic.gov.au/about-the-data/consultation-paper-improving-recorded-crime-statistics-for-victorias-aboriginal(opens in a new window)
7 CSA 2016, Outcomes of recent public consultation: proposed methods to improve Victorian Aboriginal and Torres Strait Islander
recorded crime statistics, viewed 22 June 2018, https://www.crimestatistics.vic.gov.au/media-centre/news/outcomes-of-recent-public-consultation-proposed-methods-to-improve-victorian(opens in a new window)
8 Commissioner for Privacy and Data Protection 2016, Information sheet: Information Privacy Principles and Health Privacy Principles – May 2016, viewed 12 June 2018, www.cpdp.vic.gov.au/menu-resources/resources-privacy/resources-privacy-…
9 Office of the Victorian Information Commissioner 2018, Victorian Protective Data Security Framework – March 2018, viewed 20 June 2018, www.cpdp.vic.gov.au/menu-data-security/victorian-protective-data-security-framework/vpdsf(opens in a new window)
10 Department of Education and Training, The Child Information Sharing (CIS) Scheme and The Family Violence Information Sharing (FVIS) Scheme available at https://engage.vic.gov.au/child-information-sharing-scheme(opens in a new window)

Updated 19 October 2020