Self guided tour of databases

MG-RAST/ SRA / SiLVA / greengenes / RDP / VAMPS / PHAST / UNITE

Author: Tracy Teal with updates by Ashley Shade

Objectives

Gain familiarity with some common databases
What questions might you use this database to address
How would you get data in
Can you interact with data on the site
How can you get data out
Is the 'help' helpful
How can you get data in/out/process for more than one sample/sequence

The Tour

Introduction

Welcome to the self-guided tour of genomic/metagenomic databases. We know that you're in for a fun ride ahead! Think of this as a walking tour / choose your own adventure. Take your time and try different paths if you figure out you chose one that led you to perish in a dark cave or you just wanted to try a different route.

The below set of questions are aimed to help you explore the database and figure out how to interact with it. Use as you wish.

Go to the website

Probably the database has some obvious URL. You should be able to find it just by searching for it though.

Accounts

Do you have to create an account to interact with this site? Would creating an account give you access to features you wouldn't otherwise have? Why might a database ask you to create an account?
If there is an ability to create an account and it doesn't request your first born as a part of that process, go ahead and create an account. Was there an approval process? Did it ask a lot of questions? Was it too terrible?

Often the need to create an account is the biggest barrier to new people using a site. They don't want to create an account before they know it will be useful to them, or the process is time consuming and is not deemed to be worth it.

Data availability

Is there publicly available or example data that you can use to explore the features of the site?

Uploading data

If this database takes new data, what format of data does it require? Do you know how to provide data in this format? If they have particular guidelines is the 'help' on data format clear and helpful? Are there templates?

Another of the biggest barriers to working with tools is data format. If a site asks for data in a format that is not generally created (FASTA, FASTQ,BIOM) then you will have to figure out how to get it in that format. Sometimes that requires finding a converter. The site might have one or have good documentation on how to do it. Or you might need to write your own script. This is called "data wrangling". It's not particularly fun, but it's a part of many genomic/metagenomic processes.

Questions

Formulate a question that you might ask using the data or features of this site.

If there is the ability to analyze or explore data, try to answer that question on the web site.
If the site allows for or requires downloading data, how would you download the data to address your question?

Downloading data

How would you download data from this site? What data format does the data come in once you download it? Is this a humungous download? Is it reasonable to download the data this way?

Alternative ways of interacting with the site

Is there another way to upload/download/analyze data for this database other than through the web interface? If so, is there good documentation on how to do so at the site? If there's not, is that documentation somewhere else? Could you Google for it?
If you can access data another way, can you figure out how to do it?

Help

If you haven't already, go to the Help, FAQ or Forum section of the web site. Is there helpful 'help' documentation? Is there an active user community that could help address issues or questions you might have?

Automation

In the above sections, you worked to address some questions, probably for one sample or sequence? How would it be to try to do this for more than one sample/sequence or even hundreds or thousands? Is there a more automated way to do this, or does it all need to be done 'by hand'?

Overall experience

You made it to the end! You have your data / answered your question / uploaded your data for archival purposes. Congratultions!

How was the experience? Would you want to use this database again?
If this was how your data was provided to other researchers would you be happy? Could others easily access your data?
Are there other common alternatives to this database?
Do you have thoughts on what could change? If it's not a huge database, developers often appreciate constructive feedback. It could be worth filing an issue or making a comment, especially if something isn't working.
Bonus question: Are there other alternatives to this database? Maybe ones that aren't as specific to this type of data? Dryad, Figshare
Bonus question: If this is a database that takes data and shares it publicly, how are you getting credit for that data? Is there a DOI? Something else?
Bonus question: What happens to the data if this database goes away? How do you think this database is funded?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

databasesExercise.md

databasesExercise.md

Self guided tour of databases

Objectives

The Tour

Introduction

Go to the website

Accounts

Data availability

Uploading data

Questions

Downloading data

Alternative ways of interacting with the site

Help

Automation

Overall experience

Files

databasesExercise.md

Latest commit

History

databasesExercise.md

File metadata and controls

Self guided tour of databases

Objectives

The Tour

Introduction

Go to the website

Accounts

Data availability

Uploading data

Questions

Downloading data

Alternative ways of interacting with the site

Help

Automation

Overall experience