Missing chromosome column?

Hi all,

Shouldn’t the ensembl_gene, ensembl_transcript, and ensembl_translation tables also have a chromosome column? I see that each of them have a seq_region_start and seq_region_end column but not a chromosome column. I suspect some people may need it for various applications or simple just plotting.

Thanks for taking a look

1 Like

Hi @joreyna,

Thank you the putting this point into discussion. Actually, we have already included the common metadata into metadata tables (Gene, Transcript and Protein). The current database contains ensembl_gene, ensembl_transcript, and ensembl_translation tables which are due to upgrade into gene, transcript, and protein tables, respectively.

After upgrade, new tables would look like this:

https://staging.cmi-pb.org/db/gene
https://staging.cmi-pb.org/db/transcript
https://staging.cmi-pb.org/db/protein

I suggest you to use the above tables for the same purpose for now. The chromosome information can be accessed from gene table.

Best,
Pramod

Note: The proper documentation will be provided with new database version.

1 Like

I just loaded that other table and it looks good, thanks for the quick reply,

Joaquin

1 Like

Great :slight_smile:

On a semi-related question, if I check the chr, start and end information between the gene table and the Ensembl website I find slightly different values. Ensembl is using GRCh38 but I’m not sure what reference we are using. Here is an example:

From our database:

From the Ensembl database:

The lengths between both data sources is the same (435) but the positions are about 200kb apart. What could be going on?

Thanks!

We are using here the latest GRCh37. It is primarily because RNASeq data is mapped with GRCh37.

I could find Ensembl GRCh37 version of ENSG00000188403:

There are differences in gene metadata available at these two Ensembl versions. I think GRCh38 is more enriched as compared to GRCh37.

1 Like

Thanks Pramod, I think this would be really important to document since some people would want to Google/lookup these ID’s and they would also find these discrepancies. I can definitely move forward with the tasks but I just wanted to point this out.

Joaquin

1 Like

Thanks @joreyna for raising this point. We re-discussed about preferring GRCh38 over GRCh37. As you also pointed out GRCh38 would be the obvious choice since GRCh38 is default option at Ensembl.

We will be using GRCh38 from now and tables (gene, transcript and protein) would be available for access very soon. I will keep it posted here.

Best,
Pramod

1 Like