Ethics and open data collection

I started this question multiple times over the last couple days. Since not everyone is on Facebook or in the areas of Facebook where I had this discussion, I figured I’d start the conversation again here – hoping that people will jump in here with their thoughtful comments, questions, and suggestions.

I will start by saying there is no one ‘correct’ or ‘right’ answer to this. It just isn’t that simple. I can see multiple sides and multiple perspectives. It is in part why I feel the need to grapple with the idea a little more before deciding what I think, but also figuring out what aspects I want to really defend.

My intention for my dissertation research is to use some form of open data collection. This may mean using data that is currently in the open, or it may mean using a blog as a way to collect data. The exact mechanism of this is not certain. I’m not 100% why I’m committed to this idea, but I kind of am. In part because I think it is an important discussion to be having, and an important next evolution in doing research in digital spaces. It is in part a mixing of the idea of doing data on open data sources (so what is available in the public) and doing participatory research.

I am studying the learning that occurs in reading, writing, or participating in the breast cancer blogosphere. It is a rather vague and broad question. I think it is important for a variety of reasons. One of which is because I think peer learning has a direct impact on healthcare in situations of critical illness. I also think that the affect that peer learning does have (and the place of blogs) is poorly understood (and not considered legitimate) within the healthcare community. As a result, patients are often told to “not use the internet” when they are diagnosed with critical illness (such as breast cancer), when the internet may be exactly where they need to go to learn what questions to ask their doctors and to gain a better understanding of their disease and help them make important decisions about their treatment. The healthcare system doesn’t have the resources to educate every patient to the level that they need to be educated. In addition, healthcare providers who have never experienced the illness don’t necessarily know some of the things that patients might want to know – they may dismiss them as unimportant, where a patient may actually think it very important. Lived experience and peer learning have their place. Anyways, that is not the point of this post, this is about ethics and data collection …

Here are some of the rules: (1) if data is already publicly accessible (no password protected sites) on the Internet, then it is considered in the ‘public’, (2) information in the public can be used in research without the need for IRB (institutional review board, also known as research ethics board) approval. Based upon this, publicly accessible blogs are in the public and researching what is on the blogs (and comments) does not require IRB approval.

One of the ways IRB is required is based upon whether or not the research is deemed ‘human subjects research’. The problem is, ‘human subjects research’ is open to an awful lot of interpretation. In most cases an anonymous survey does not require IRB approval – it is waived as exempt because no personally identifiable information is collected (typically).

There are emerging protocols on how to manage research data that is collected on the internet. Again, anonymous surveys are typically pretty easy. You put in a simple informed consent page at the beginning and you are done. Analyzing data that is publicly available on the internet is also pretty straight forward (although there are questions about ethical attributions when studying blogs and other online communities – that is a discussion for different blog post).

Where things are getting murky is open data collection as an insider within the community. As a blogger, I can ask any question I want on my blog. I’m doing it now. I own my space on the internet. You can chose whether or not you read it, and whether or not you reply to me personally or publicly. If you reply publicly and I publish it on my blog, it is now information in the ‘public’. This does not pose a problem when I am asking a question for interest sake. It does pose a potential problem if I ask a question for the explicit intent of doing academic research. Why? Because the IRB can say that I cannot solicit data without first getting IRB approval.

When I’m an outsider to the community it is pretty easy to hold back. You just avoid asking the question until the study has received approval. It is more challenging within the blogging community. Part of the authenticity of the community itself is that ability to ask questions whenever they arise. It feels dishonest to my readers, but also dishonest to myself, to hold back on the question while waiting for IRB approval. Truth be told, I can almost never hold back on blogging my ideas. But it also feels unethical to ask questions that I would normally ask knowing that I might want to use the answers to that question within research. Or even more to ask question because I know I want to use them in my research.

One might just say, no big deal just apply for IRB approval. However, applying for IRB approval at my institution is a big deal. It is horribly time consuming. It requires submitting multiple printed versions of the 30+ page form. When not required it is both a waste of my time and a waste of the time of everyone involved in the IRB process. Oh ya, and they only accept applications the first three working days of the month, and they take July off – so not only time consuming from a getting the application together perspective, but also time consuming from a calendar / research delay perspective. In addition, the logical processes mean that consent forms and such need to be in a specific format which eases the approval process but IMHO can make the process that much more cumbersome for participants. Frankly, I sadly see it as more red tape than making my research more ethical.

So, I’m stuck again on what is ethical and how do I move forward with insider research within a care community (the breast cancer blogosphere). On one side I’m told that if I use what is currently publicly available, I don’t need to get IRB approval. But what if that information is publicly available because I already asked the question. For example, when I created the course “Should I blog“,  I asked other cancer bloggers to answer the question “why they blog”. Rather than telling me why, I asked them to write a blog post about it. Another question I asked them was “where do you draw the line?” – that is, what do you chose to share and not share? I used this information to help create the course. Now, as it turns out, that information is of potential value to my research question. Those posts, which I solicited for a different purpose may now be of research interest. Since my motives were not for research, this might be deemed “secondary use of data” which is much easier IRB process, but also it could easily be argued that the data already exists in the public.

But if I ask the question now … then is that question being asked for the purposes of my research? The nature of the blogosphere is that any question that I ask (both here and on my other blogs) is likely to ripple through the blogosphere, with other bloggers picking up and replying to it. All that information will end up ‘in the public’ – so in the open. But the act of asking for it, is really soliciting data for research purposes, and therefore should be subjected to IRB approval.

So I’m back to the question / moral dilemma of soliciting feedback for research purposes from within a community that I already have an established relationship.

What are your thoughts? Does / should blog solicitation of data (e.g. interviews / focus groups that happen in the publicly accessible/open blogosphere) be subject to the same ethical guidelines as interviews / focus groups that happen within closed face-to-face research? Does the nature of the researcher as an insider in the community change your answer? 

Footnote for the sake of transparency: Any response to this inquiry may be used in the preparation and presentation of academic articles/dissertation chapters and/or conference presentations. 

One response

  1. […] post is really intended to further my thinking on the ideas I introduced in my posts on Ethics and open data collection and Usage guidelines for researchers who use […]

Leave a Reply

Your email address will not be published. Required fields are marked *