I’m a data engineer, use parquet all the time and absolutely love love love it as a format!
arrow (a data format) + parquet, is particularly powerful, and lets you:
-
Only read the columns you need (with a csv your computer has to parse all the data even if afterwards you discard all but one column)
-
Use metadata to only read relevant files. This is particularly cool abd probably needs some unpacking. Say you’re reading 10 files, but only want data where “column-a” is greater than 5. Parquet can look at file headers at run time, and figure out if a file doesn’t have any column-a values over five. And therefore, never have to read it!.
-
Have data in an unambigious format that can be read by multiple programming languages. Since CSV is text, anything reading it will look at a value like “2022-04-05” and say “oh, this text looks like dates, let’s see what happens if I read it as dates”. Parquet contains actual data type information, so it will always be read consistently.
If you’re handling a lot of data, this kind of stuff can wind up making a huge difference.
I feel like in a lot of ways, most languages are great candidates for this, for lots of different reasons!
Buuuuut, Rust’s compilation can be pretty resource intensive, so if you’re actually developing on limited hardware:
Then there’s the fact that it’s a home server, so always on, meaning you actually have generous resources in some ways, because any available CPU is kinda just there to use so:
And then why not go whole hog into the world of experimental languages:
And then we’re forgetting about:
But that doesn’t factor in:
Plus:
Edit: My actual serious answer is that Rust + Rocket would be great fun if you’re interested in learning something new, and you’d get very optimised code. If you just want it to use less memory that java and don’t want to spend too much time learning new things then python is probably fine and very quick to learn. Go is a nice halfway point.