pyspark cheat sheet
1. To get the distinct values in the column of a pyspark dataframe. jdf.select('State').distinct().show() Here jdf is the data frame. 'State' is a column of the jdf dataframe and the above command would return the distinct values in the state column Reference: https://www.datasciencemadesimple.com/distinct-value-of-a-column-in-pyspark/ 2. Loading data or pickle file from Azure Data Bricks to Azure Storage account container using Azure Storage SDK for python https://github.com/mohanish12/sparkNotebooks/blob/main/Upload_databricks_AzureStorage.ipynb?short_path=521a0c3 3. Write data frame as parquet on an Azure storage account container blob_account_name= "" blob_container_name = "" blob_sas_token = "https://*.blob.core.windows.net/weather?sp=" blob_relative_path = "snwd_NJ3.parquet" wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path) spark.conf.set('fs.az...