Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can one convert a DataFrames into a raster without knowing the resolution and extent? #572

Closed
Ujjwal4CULS opened this issue Dec 5, 2023 · 38 comments

Comments

@Ujjwal4CULS
Copy link

Could someone please assist in converting the attached data into a raster format, given that the resolution and extent are unknown?
mo

@rafaqz
Copy link
Owner

rafaqz commented Dec 5, 2023

Looks like you want rasterize. Check the docs for usage. At minimum you need to define the resolution manually with the res keyword as its not always easy to guess something sensible. But the extent can be detected from the points.

The easiest way to do it is to combine your Latitude and Longitude columns into the GeoInterface.jl standard "geometry" column, as a tuple - which is recognised as a point.

Something like this should work:

df.geometry = collect(zip(df.Longitude, df.Latitude)))
rasterize(sum, df; res=your_res, crs=your_crs)

Its not so well documented but you could also just do:

rasterize(sum, df; res=your_res, crs=your_crs, geomcolumn=(:Longitude, :Latitude))

You can also specify size instead of res if that's easier. Also, you could use e.g. maximum instead of sum if you expect multiple points in each pixel. If there is never multiple points per pixel it will make no difference.

@rafaqz rafaqz closed this as completed Dec 5, 2023
@rafaqz rafaqz reopened this Dec 5, 2023
@Ujjwal4CULS
Copy link
Author

Hi,

Thank you very much for your reply. However, I don't have the resolution and size of the data in this case. How can I perform the above code without this information?

@rafaqz
Copy link
Owner

rafaqz commented Dec 5, 2023

Maybe you can write an algorithm to guess a resolution from the spacing of the points?

Is it originally coming from a square raster or just a scatter? I have not idea what this data is.

(to me it doesn't make sense to do this autmatically in Rasters.jl. The failure mode from guessing pixels too small is to use all of your RAM, and there are so many ways people could disagree with any guess Rasters makes - so its better to let people specify what they want. The main way to do this is use another known raster as the to keyword and just take all its spatial properties - an exception could be for perfectly spaced points that originally came from a raster)

@Ujjwal4CULS
Copy link
Author

The data comes from a square raster with a 300-meter resolution. This data was generated from the Sentinel-3 Water (Full Resolution) dataset.

@Ujjwal4CULS
Copy link
Author

Ujjwal4CULS commented Dec 6, 2023

The error from your code is attached here as a screenshot.
mos2

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

I need a MWE to be able to investivage this: code I can run that reproduces your example completely, including generating the data.

@Ujjwal4CULS
Copy link
Author

I am providing a minimal example dataset and accompanying code for your review. This should help in investigating the issue at hand. Additionally, the dataset (gt_dt.arrow) is available via the provided link .

Here is a snippet of the code with error:
`using ArchGDAL, RasterDataSources, GeoData, HDF5, DataTables, DataFramesMeta, Makie, GeoMakie, GDAL, GDAL_jll, GeoArrays, Arrow, DataFrames, NCDatasets, Statistics, CairoMakie, GeoStats, GeoStatsBase

arrow_table = Arrow.Table("gt_dt.arrow")
reordered_df = DataFrame(arrow_table)
reordered_df.geometry = collect(zip(reordered_df.lon, reordered_df.lat))
dt_ras = Rasters.rasterize(sum, reordered_df; res=0.1000005, crs=EPSG(4326))
ERROR: UndefKeywordError: keyword argumentto` not assigned
Stacktrace:
[1] top-level scope
@ none:1

@Ujjwal4CULS
Copy link
Author

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

First, I'm wondering if you have an old version of Rasters.jl? thats not the error I get (try an ] update).

What is actually missing is the fill keyword! - we need to specify using the :value column in your dataframe as our fill value. So:

using Rasters, Arrow, ArchGDAL, DataFrames
arrow_table = Arrow.Table("/home/raf/Downloads/gt_dt.arrow")
reordered_df = DataFrame(arrow_table)
reordered_df.geometry = collect(zip(reordered_df.lon, reordered_df.lat))
dt_ras = Rasters.rasterize(sum, reordered_df; res=0.1000005, crs=EPSG(4326), fill=:value)

You can plot with Makie to verify:

using GLMakie
p = plot(dt_ras)

plot

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

And make sure that sum is the aggregation that you actually want before using this.

@Ujjwal4CULS
Copy link
Author

Could you please update this for the newer version of Rasters? I am unable to access or install the older version of Rasters. Does this line dt_ras = Rasters.rasterize(sum, reordered_df; res=0.1000005, crs=EPSG(4326), fill=:value) generate a 255-band raster stack data?

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

This is for the newer version! I'm saying I think you don't have the newest version

@Ujjwal4CULS
Copy link
Author

After updating, I see the same error. I think it's an issue with the package version. Could you please let me know your Rasters package version? Also, does this line dt_ras = Rasters.rasterize(sum, reordered_df; res=0.1000005, crs=EPSG(4326), fill=:value) generate the 255 layers/bands raster stack data?
jj_1

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

This worked for me both in [email protected] and on Rasters#main

What is the output of ] status Rasters for you?

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

On an older version Rasters did not automatically generate the axes using res or size, you had to pass them in using to.

@Ujjwal4CULS
Copy link
Author

test
I think it's an issue with the package update, isn't it?

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

That's an incredibly old version of Rasters.jl - its at 0.9 now and soon 0.10. You must have some other old packages in your system that are holding it back.

Try to make a new clean environment in a new folder using Pkg.activate and install Rasters in it.

You can also try doing ] add [email protected] and seeing what the error is - it should point to some package that is holding it back.

@Ujjwal4CULS
Copy link
Author

I think I am still getting the same error after the update.
jl_2

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

Did you restart julia?

@Ujjwal4CULS
Copy link
Author

After restarting, I still encountered the same error.

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

Did you activate the environment again? (immediately after restart - if you already loaded rasters it is too late, start again)

I made the plot on Rasters 0.9 with the exact code I pasted above so it almost certainly works.

@Ujjwal4CULS
Copy link
Author

Thank you! Now it is working following your suggestion, but it generates a single-band raster. It should be a raster stack with 255 bands, and each band name should be a date. In this context, could you please tell me how to generate the raster stack data using my given data?

@rafaqz
Copy link
Owner

rafaqz commented Dec 6, 2023

You will need to manually group your data using DataFrames.jl by date, and rasterize each group and make a vector of all of them. I can't teach you how to do this.

Then, concatenate all the rasters together on a Band dimension like concat_rasters = cat(rasters_vec...; dims=Band) should work. Then you can write(filename, concat_rasters) to a geotiff.

@rafaqz rafaqz closed this as completed Dec 6, 2023
@Ujjwal4CULS
Copy link
Author

Thank you very much. Your suggestion is working for the RasterStack data.

@Ujjwal4CULS
Copy link
Author

Ujjwal4CULS commented Dec 7, 2023

arrow_table` = Arrow.Table("gt_dt.arrow")
reordered_df = DataFrame(arrow_table)
reordered_df.geometry = collect(zip(reordered_df.lon, reordered_df.lat))
dt_ras = Rasters.rasterize(sum, reordered_df; res=0.1000005, crs=EPSG(4326), fill=:value)
grouped = groupby(reordered_df, :date)
band_names = [string(g.date[1]) for g in grouped]
rasters = []
for g in grouped
    raster = Rasters.rasterize(sum, g; res=0.1000005, crs=EPSG(4326), fill=:value)
    push!(rasters, raster)
end
concat_rasters = cat(rasters...; dims=Band)
#setdimnames!(concat_rasters, Band, band_names)
gt = RasterStack(concat_rasters)

Could you please advise me on how to set band names for a RasterStack in my concat_rasters dataset to correspond with dates? I've tried using #setdimnames!(concat_rasters, Band, band_names), but this line isn't working. Unfortunately, I haven't found relevant documentation on this matter. Any assistance would be greatly appreciated.

@rafaqz
Copy link
Owner

rafaqz commented Dec 7, 2023

You cant just try random commands ;)

When you make the Band dimension in cat you can do ...; dims=Band(your_names_vector))

@Ujjwal4CULS
Copy link
Author

Ujjwal4CULS commented Dec 7, 2023

Thank you very much for your reply. How can I set the names of the bands in a stacked .tif file? This means setting the name for g.
,,,
g = RasterStack("x.tif")
,,,

@rafaqz
Copy link
Owner

rafaqz commented Dec 7, 2023

new_g = set(g, Band => your_band_names)

@Ujjwal4CULS
Copy link
Author

Thank you very much!

@Ujjwal4CULS
Copy link
Author

Sorry for my last question. How can raster and rasterstack files be converted into dataframes?

@rafaqz
Copy link
Owner

rafaqz commented Dec 7, 2023

DataFrame(rasterstack) ;)

@Ujjwal4CULS
Copy link
Author

The code provided above only generates values for rows*columns. My question is whether the DataFrame will create columns for x, y, band names, and values, akin to the terra package's as.data.frame(s, xy=TRUE) function in R?

@rafaqz
Copy link
Owner

rafaqz commented Dec 7, 2023

Thats what it does tho. Each dimension is also a column

@rafaqz
Copy link
Owner

rafaqz commented Dec 7, 2023

Yes, that is exactly what it outputs?

@Ujjwal4CULS
Copy link
Author

The output is a dataframe in R. However, the output in Julia is different, lacking longitude and latitude column values, as shown in the attached file
gm

@rafaqz
Copy link
Owner

rafaqz commented Dec 7, 2023

Again you must have ancient versions of something to get that result. Please check you are using new packages

@Ujjwal4CULS
Copy link
Author

Your suggestion is working. Thanks for this discussion. I hope this will be helpful for many people, including myself and those who love Julia's performance in comparison to R and Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants