For years now, people have complained about being swamped by email (or tweets, messages etc) but this decade is clearly becoming an era of mass data of all kinds.

We are finding ever new ways of collecting and creating data, e.g. fitness measurements using consumer devices, increasingly granular “smart” meter readings. So even the techno-illiterates among us are using the term “big data“, generally referring to the high volume aspect of course.

But is collection/creation enough? Of course not, it gives no real value by itself. The value really comes when we can deduce or infer some real intelligence from the disparate data sources, possibly involving sharing it with other parties. In the healthcare world, we are at a very early stage it seems.

We need to address the ongoing challenges of protecting the storage/transmission/use of the data of course which include:
* identification and protection of really valuable or sensitive data – the suitability of techniques such as tokenisation and encryption should be understood
* deciding and controlling who has access to each data set
* unless the data is completely “open data”, some control should be put in place over distribution of the data – but this should not deter potential users of the data
* design/operation of suitable “data pooling” environments which are suitably secure and fit-for-purpose
* when data sets are brought together (or aggregated), there is a possibility that the total is more sensitive than the sum of the parts, i.e. extra measures may be required to protect access to it and/or prevent unintended identification of individuals

Furthermore, I would like to make the following suggestions for maturing the world of healthcare data:
* as in other sectors, ownership of data needs to be clarified – this is not currently obvious in the case of medical records
* likewise, geographical location of data (and any copies) may dictate what data is made available and how it is treated
* consider standard good (but not onerous) security practices from an early stage – to avoid breaches and to provide some assurance to potential data providers
* interoperability is a prerequisite for effective sharing/reuse of data – possibly through standards and APIs
* easy-to-understand communications are vital to encourage participation, e.g. cooperation of data providers

Finally what impact does this have on skills required for humans working with this data and related technologies? Clearly, we are talking about a combination of expertise here rather than a focus on data scientists or IT personnel, so that could be collaboration of specialists in teams or hybrid individuals who can think holistically.

For more information on one example initiative aiming at helping to facilitate much of the above, have a look at the UK/Singapore “Data City, Data Nation” project at the Digital Catapult Centre (www.digitalcatapultcentre.org.uk).