[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge sf object and data frame [Help] #193

Closed
brendaprallon opened this issue Jan 30, 2017 · 14 comments
Closed

Merge sf object and data frame [Help] #193

brendaprallon opened this issue Jan 30, 2017 · 14 comments

Comments

@brendaprallon
Copy link

Hi, I'm sorry if this issue doesn't apply, but I wasn't sure where else to ask.
I'm having trouble merging a sf object and a data frame by a common column i.e "NAME". The final result is a data frame instead of sf, which I didn't expect since I did not change anything in the geometry. As I try to convert it back to sf with st_as_sf(), the geometry appears to have changed, because the plot is different. Here's the example I constructed bellow:

library(sf)
nc = st_read(system.file("shape/nc.shp", package="sf"))
df_example = data.frame(cbind(c("Ashe", "Alleghany", "Surry", "Currituck", "Northhampton", "Salt Lake City", "Atlanta", "New York City"), c(3, 4, 5, 2, 0, 7, 6, 20)))
colnames(df_example) = c("NAME", "RANDOM_NUMBER")
merged = merge(nc, df_example, by = "NAME")
class(nc)
[1] "sf"         "data.frame"
class(merged)
[1] "data.frame"
plot(nc)
plot(merged)
#### plots are different ####

Is there a simple solution to this or am I doing something wrong? When I try merging these using a SpatialPolygonsDataFrame object and a data.frame, it works. Thank you so much!

@eivindhammers
Copy link

Both merged <- st_as_sf(merged) and st_geometry(merged) <- merged$geometry after merging should work, I think.

> st_geometry(merged) <- merged$geometry
> class(merged)
[1] "sf"         "data.frame"

If by "plots are different" you mean that your merger drops all geometries for which is.na(RANDOM_NUMBER), addall.x = TRUE to your merge() (or use left_join())

@edzer
Copy link
Member
edzer commented Jan 31, 2017

Thanks; I think we should make this automatic, by providing merge and left_join methods for sf objects.

@brendaprallon
Copy link
Author

Thank you very much, all.x = TRUE did the trick to preserve the geometry.

@eivindhammers
Copy link

Would you add a duplicateGeoms argument to merge.sf, like in merge.Spatial?

@edzer
Copy link
Member
edzer commented Jan 31, 2017

Well, that doesn't do much, apart from suppressing a warning in case of multiple matches. My gut feeling is that it will be more useful to try to support the *_join functions in dplyr, and include spatial matches.

@tiernanmartin
Copy link
tiernanmartin commented Jan 31, 2017

While it would be great to eventually have a set of spatial *_join functions, I expect that may take considerable attention and time to develop. In the meantime, could we get a interim version where the sf-class isn't dropped when a sf object is included in a *_join?

Example:

library(dplyr, warn.conflicts = F, quietly = T)
library(sf)
#> Linking to GEOS 3.5.0, GDAL 2.1.1, proj.4 4.9.3
demo(nc, ask = FALSE, echo = FALSE, verbose = FALSE)

nc_df <- nc %>% unclass %>% as_data_frame() %>% select(NAME, BIR74)

nc %>% select(-BIR74) %>% left_join(nc_df, by = "NAME") %>% class
#> [1] "data.frame"

edzer added a commit that referenced this issue Jan 31, 2017
@edzer
Copy link
Member
edzer commented Jan 31, 2017

@brendaprallon please test.

@edzer
Copy link
Member
edzer commented Feb 1, 2017

@tiernanmartin pls test the non-spatial *_join.sf methods.

Now, empty GEOMETRYCOLLECTION geometries are put in cases where no geom is available; this might be improved to e.g. put an empty LINESTRING when all the geoms are LINESTRING; for POINT geoms there is no empty version:

a = data.frame(a = 1:3, b = 5:7)
st_geometry(a) = st_sfc(st_point(c(0,0)), st_point(c(1,1)), st_point(c(2,2)))
b = data.frame(x = c("a", "b", "c"), b = c(2,5,6))
full_join(a, b)
# Joining, by = "b"
# Simple feature collection with 4 features and 3 fields (of which 1 is empty)
# geometry type:  GEOMETRY
# dimension:      XY
# bbox:           xmin: 0 ymin: 0 xmax: 2 ymax: 2
# epsg (SRID):    NA
# proj4string:    NA
#    a b    x             geometry
# 1  1 5    b           POINT(0 0)
# 2  2 6    c           POINT(1 1)
# 3  3 7 <NA>           POINT(2 2)
# 4 NA 2    a GEOMETRYCOLLECTION()

@edzer
Copy link
Member
edzer commented Feb 1, 2017

@tiernanmartin 3f7c25d now adds st_join for spatial join with flexible geometry predicates, besides the *_join dplyr join methods for non-spatial joins; #200 #50 #42 -- please test!

@tiernanmartin
Copy link

@edzer my limited tests of the non-spatial *_join functions all returned the expected results - thanks for the quick turn around! If I encounter anything unusual I will post a test here.

@tiernanmartin
Copy link

I tried st_join with a slightly modified version of your test above and got an error:

library(sf)
#> Linking to GEOS 3.5.0, GDAL 2.1.1, proj.4 4.9.3

a = data.frame(a = 1:3, b = 5:7)
st_geometry(a) = st_sfc(st_point(c(0, 0)), st_point(c(1, 1)), st_point(c(2, 
  2)))

b = data.frame(x = c("a", "b", "c"), b = c(2, 5, 6))
st_geometry(b) = st_sfc(st_point(c(0, 0)), st_point(c(1, 1)), st_point(c(2, 
  2)))


st_join(a, b)
#> Error: length(setdiff(names(value), nv)) == 0 is not TRUE

Running st_join in debug mode indicates that the error occurs when subsetting x :

        if (missing(FUN)) {
                if (left) {
                        i = lapply(i, function(x) {
                                if (length(x) == 0) 
                                        NA_integer_
                                else x
                        })
                        ix = rep(seq_len(nrow(x)), sapply(i, length))
                }
                st_sf(cbind(as.data.frame(x[ix, ]),     <- error occurs here
                                  y[unlist(i), , drop = FALSE]))    
        }

edzer added a commit that referenced this issue Feb 2, 2017
@brendaprallon
Copy link
Author

@edzer Thank you so much for the quick response. However I'm running into a lot of trouble to install the github version for some reason. I'll test it as soon as I get around that. Thanks!

@edzer
Copy link
Member
edzer commented Feb 18, 2017

Things should work now with 0.3-4 on CRAN; could you pls check?

@brendaprallon
Copy link
Author

Thank you, I tested it with my data and it is working perfectly!

@edzer edzer closed this as completed Feb 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants