Love Data Week - Day 3 - Read about re-use of open data and the Open path project

We are half way in the Love Data Week and we have more data stories to share with you!

Research data at DTU:


“What motivates me to be a FAIR ambassador is a possibility to disseminate principles that will enhance the transparency and traceability of the work in the data lifecycle and boost the importance of the role of data creators in the research environment.”

Nikola Vasiljevic, researcher at DTU Wind Energy and FAIR ambassador

Stories about data:

 
Re-use of open data in Engineering Design

Due to the increased focus on integrity and transparency in research, the word ‘open data’ is getting more and more relevant. In some research areas, especially within engineering disciplines, making data openly available or re-using open data is not that common compared to bioinformatics, physics or computational science, for example.

We talked to Pedro Parraguez, postdoc at DTU Management Engineering, who has investigated the implications and applications of using data already available in the Engineering Design discipline. He has recently published the paper “Data-driven engineering design research: Opportunities using open data” (ISSN: 22204334) and we asked him why open data is not common practice in this research area?

What is Engineering Design and what type of data do you use in this research discipline?
Engineering Design is a discipline that studies how we design services, products and more generally, systems of an engineering nature. For example, it can be from the design of a very small component part of a machine, all the way up to a large system like a bridge or a space shuttle.

The type of data that we use is as diverse as the research area. It can be technical data about measurements or data about material resistance. We also use data about the people who are involved in the process of designing an artifact.

From the title of your publication, one can infer that ‘open data’ and the re-use of open data is not common in Engineering Design? If that is correct, why?
When the discipline started, most data was not available by default. In the 70s and 80s there was no digital trace widely available about what was going on during the design process so you had to gather  the data yourself for each new study. Getting the data was a kind of handcrafted process.

Nowadays, the main constraint is that while these data might already exist, they are usually not collected for the purpose of research, and they are the property of the company that generated it. As a result, data tends to be proprietary and the researcher is allowed to collect it only after having obtained the permission of the company. This means that there are many constraints related with privacy and confidentiality that you need to navigate and respect. Many times, we are simply not allowed to make the data publicly available or if we are, we normally need to anonymize it heavily.

However, it is becoming increasingly possible to creatively exploit new open data generated in other contexts that is also relevant for Engineering Design research. For example, developing open source software is an engineering design activity because there is a design process in creating this software and in many cases, you can collect digital traces of the GitHub repositories that are publicly available. Also, there are communities of people that design objects through 3D printing and sometimes the data is available online in various degrees of quality, but at least you might be able to get the data. You have patents and a number of other sources that you can also creatively combine and use in Engineering Design.

What would be the benefit of using open data in your research area?
All disciplines are under a lot of pressure to be as transparent as possible and to enable everybody to check if what you are reporting is correct.  Issues related to replicability, validity and reliability are very important in any discipline. In disciplines like Engineering Design, there is an extra challenge (of privacy and confidentiality) that I mentioned before where achieving transparency is usually not an easy task. That is why using open data creatively can allow us to move the discipline towards easier ways of checking validity and reliability. It does not mean that we will ever get to be 100% open, but at least we can move in that direction.

Pedro Parraguez Ruiz, Postdoc,
Engineering Systems division, DTU Management Engineering,
ppru@dtu.dk, ORCID: 0000-0002-0017-4057

We are data:

 
The ‘Open path project’

Massive amounts of our private data are collected and stored by various corporations – this is old news. The location data from our phone devices is just one example - why not manage, visualize and use these data ourselves?

If you are looking for datasets to explore the data science world and tools available, look at this interesting project from The New York Times Labs (@NYTLabs) called openpaths.cc

“Using our mobile apps you can track your location, visualize where you've been, and upload your data to the OpenPaths website. You can then download your data from the website in a variety of friendly formats, including KML, JSON, and CSV.” (Source: website openpaths)

 

Love Data Week - 3