Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_table() has strange behavior reading doubles greater than 100M #977

Closed
wwgordon opened this issue Mar 13, 2019 · 2 comments
Closed

Comments

@wwgordon
Copy link

I've been trying to read in chromosomal positions with read_table(), and it seems to be ignoring the early digits for numbers greater than 100M:

library(tidyverse)
rm(list = ls())
path = "<path>"

gwas_base = read.table(path, header = TRUE)
gwas_tidy = read_table(path, col_names = TRUE)
#> Parsed with column specification:
#> cols(
#>   CHR = col_double(),
#>   SNP = col_character(),
#>   BP = col_double(),
#>   A1 = col_character(),
#>   TEST = col_character(),
#>   NMISS = col_double(),
#>   BETA = col_double(),
#>   SE = col_double(),
#>   `L95      U95` = col_character(),
#>   STAT = col_double(),
#>   P = col_double()
#> )

summary(gwas_base$BP)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#>     36787  31568379  69147960  78449518 115057896 249202567
summary(gwas_tidy$BP)
#>     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
#>      827 18564087 38138482 42432636 65551527 99997478
dim(gwas_base)
#> [1] 207089     12
dim(gwas_tidy)
#> [1] 207089     11
gwas_base[1, ]
#>   CHR       SNP      BP A1 TEST NMISS  BETA    SE     L95   U95  STAT
#> 1   1 rs6687776 1030565  T  ADD  1700 1.701 1.243 -0.7357 4.137 1.368
#>        P
#> 1 0.1715
gwas_tidy[1, ]
#> # A tibble: 1 x 11
#>     CHR SNP       BP A1    TEST  NMISS  BETA    SE `L95      U95`  STAT
#>   <dbl> <chr>  <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr>          <dbl>
#> 1     1 rs66… 1.03e6 T     ADD    1700  1.70  1.24 -0.7357    4.…  1.37
#> # … with 1 more variable: P <dbl>

Created on 2019-03-13 by the reprex package (v0.2.1)

I suspect this has to do with the fact that the first few hundred positions are much smaller, but it seems like a dangerous behavior without warning.

Additionally, read_table() is treating the L95 and U95 columns as one column.

@jimhester
Copy link
Collaborator

This is a duplicate of #976

@lock
Copy link

lock bot commented Dec 10, 2019

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Dec 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants