- What are the pros and cons of the Apache Parquet format compared to . . .
Parquet has gained significant traction outside of the Hadoop ecosystem For example, the Delta Lake project is being built on Parquet files Arrow is an important project that makes it easy to work with Parquet files with a variety of different languages (C, C++, Go, Java, JavaScript, MATLAB, Python, R, Ruby, Rust), but doesn't support Avro
- Reading Fixing a corrupt parquet file - Stack Overflow
Reading Fixing a corrupt parquet file Asked 1 year, 3 months ago Modified 6 months ago Viewed 2k times
- Python: save pandas data frame to parquet file - Stack Overflow
Is it possible to save a pandas data frame directly to a parquet file? If not, what would be the suggested process? The aim is to be able to send the parquet file to another team, which they can
- Is it possible to read parquet files in chunks? - Stack Overflow
The Parquet format stores the data in chunks, but there isn't a documented way to read in it chunks like read_csv Is there a way to read parquet files in chunks?
- Inspect Parquet from command line - Stack Overflow
How do I inspect the content of a Parquet file from the command line? The only option I see now is $ hadoop fs -get my-path local-file $ parquet-tools head local-file | less I would like to avoid
- How do I get schema column names from parquet file?
Also, Cloudera (which supports and contributes heavily to Parquet) has a nice page with examples on usage of hangxie's parquet-tools An example from that page for your use case: parquet-tools schema part-m-00000 parquet Check out the Cloudera page: Using Apache Parquet Data Files with CDH - Parquet File Structure
- How to append new data to an existing parquet file?
I have parquet files with some data in them I want to add more data to them frequently every day I want to do this without having to load the object to memory and then concatenate and write again
- Extension of Apache parquet files, is it . pqt or . parquet?
I wonder if there is a consensus regarding the extension of parquet files I have seen a shorter pqt extension, which has typical 3-letters (like in csv, tsv, txt, etc) and then there is a rather long (therefore unconventional (?)) parquet extension which is widely used
|