Skip to content

Latest commit

 

History

History
132 lines (90 loc) · 4.94 KB

databasesExercise.md

File metadata and controls

132 lines (90 loc) · 4.94 KB

Self guided tour of databases

MG-RAST/ SRA / SiLVA / greengenes / RDP / VAMPS / PHAST / UNITE

Author: Tracy Teal with updates by Ashley Shade

Objectives

  • Gain familiarity with some common databases
  • What questions might you use this database to address
  • How would you get data in
  • Can you interact with data on the site
  • How can you get data out
  • Is the 'help' helpful
  • How can you get data in/out/process for more than one sample/sequence

The Tour

Introduction

Welcome to the self-guided tour of genomic/metagenomic databases. We know that you're in for a fun ride ahead! Think of this as a walking tour / choose your own adventure. Take your time and try different paths if you figure out you chose one that led you to perish in a dark cave or you just wanted to try a different route.

The below set of questions are aimed to help you explore the database and figure out how to interact with it. Use as you wish.

Go to the website

Probably the database has some obvious URL. You should be able to find it just by searching for it though.

Accounts

  • Do you have to create an account to interact with this site? Would creating an account give you access to features you wouldn't otherwise have? Why might a database ask you to create an account?

  • If there is an ability to create an account and it doesn't request your first born as a part of that process, go ahead and create an account. Was there an approval process? Did it ask a lot of questions? Was it too terrible?

Often the need to create an account is the biggest barrier to new people using a site. They don't want to create an account before they know it will be useful to them, or the process is time consuming and is not deemed to be worth it.

Data availability

  • Is there publicly available or example data that you can use to explore the features of the site?

Uploading data

  • If this database takes new data, what format of data does it require? Do you know how to provide data in this format? If they have particular guidelines is the 'help' on data format clear and helpful? Are there templates?

Another of the biggest barriers to working with tools is data format. If a site asks for data in a format that is not generally created (FASTA, FASTQ,BIOM) then you will have to figure out how to get it in that format. Sometimes that requires finding a converter. The site might have one or have good documentation on how to do it. Or you might need to write your own script. This is called "data wrangling". It's not particularly fun, but it's a part of many genomic/metagenomic processes.

Questions

Formulate a question that you might ask using the data or features of this site.

  • If there is the ability to analyze or explore data, try to answer that question on the web site.

  • If the site allows for or requires downloading data, how would you download the data to address your question?

Downloading data

  • How would you download data from this site? What data format does the data come in once you download it? Is this a humungous download? Is it reasonable to download the data this way?

Alternative ways of interacting with the site

  • Is there another way to upload/download/analyze data for this database other than through the web interface? If so, is there good documentation on how to do so at the site? If there's not, is that documentation somewhere else? Could you Google for it?

  • If you can access data another way, can you figure out how to do it?

Help

  • If you haven't already, go to the Help, FAQ or Forum section of the web site. Is there helpful 'help' documentation? Is there an active user community that could help address issues or questions you might have?

Automation

  • In the above sections, you worked to address some questions, probably for one sample or sequence? How would it be to try to do this for more than one sample/sequence or even hundreds or thousands? Is there a more automated way to do this, or does it all need to be done 'by hand'?

Overall experience

You made it to the end! You have your data / answered your question / uploaded your data for archival purposes. Congratultions!

  • How was the experience? Would you want to use this database again?

  • If this was how your data was provided to other researchers would you be happy? Could others easily access your data?

  • Are there other common alternatives to this database?

  • Do you have thoughts on what could change? If it's not a huge database, developers often appreciate constructive feedback. It could be worth filing an issue or making a comment, especially if something isn't working.

  • Bonus question: Are there other alternatives to this database? Maybe ones that aren't as specific to this type of data? Dryad, Figshare

  • Bonus question: If this is a database that takes data and shares it publicly, how are you getting credit for that data? Is there a DOI? Something else?

  • Bonus question: What happens to the data if this database goes away? How do you think this database is funded?