Results for pyspark
I have this problem with my pyspark dataframe, I created a column with collect_list() by doing normal groupBy agg and I want to write something that w...
I'm using the query below to get the start and end of the week. Although dayofweek is part of the pyspark.sql.functions in this format within a SQL qu...
I am running PySpark 3.3.1. In my Window function, I noticed that the max output is not what I expect. Why is that? What is happening in the window gr...
Consider the following schema in a PySpark dataframe df: root |-- mydoc: array (nullable = true) | |-- element: struct (containsNull = true) | ...
I am running the below pyspark query It throws the below error . I want dynamic values to passes via a varaibale in to case statement data1 = 'HI' d...
I have a PySpark dataframe which contains a column containing Bytes in a nested dictionary so the data is look like this: Col_name: "{"bytes":"\u0014o...
I have a function that takes a dataframe as a parameter and calculates the NULL value counts and NULL value percentage and returns a dataframe with co...
I am creating a PySpark application to do the following: Read a collection of points from a CSV file Read a collection of polygons from a CSV file Ru...
I have a dataframe df1 like this: A B AA [a,b,c,d] BB [a,f,g,c] CC [a,b,l,m] And another one as df2 like: C D XX [a,b,c,n] YY [a,m,r,s] UU [...
I have been trying to flatten the row in pyspark dataframe after a group by My dataset looks like this |member_id|age|gender| date|cost| +-------...
I am submitting my Spark job using spark-submit CLI with --py-files (wheel file) as an argument. I want to list all the packages included in the wheel...
I'm using the following function (partly from a code snippet I got from this post: Compute size of Spark dataframe - SizeEstimator gives unexpected re...
I wrote a function I'd like to modify to have an argument that can take one or multiple parameters but I'm having trouble making it work correctly. de...
I have a dataframe like below in pyspark df = spark.createDataFrame([ ['red', 1, 'blue', 1]], schema=['orc_color', 'orc_numbr', 'hive_color', 'hi...
I want to be able to take a dataframe like this with a features column containing a list of dicts: {"id": 1, "label": 1, "features": [{"key1": 1}, {"k...
I want to sum the arrays within a column of arrays by element - the column of arrays should be aggregated to one array. The below code gives the desir...
The schema of my output and input dataset is the same. Upon running this script, I want to first check create a new dataset using the filter_data func...
I have docker-compose.yaml version: '3' services: spark-master: build: context: ./spark-master ports: - "7077:7077" - "808...
I am spinning up on Python and PySpark, installed using Anaconda on Windows 10. For now, I'm working through sparkbyexamples.com pages, e.g., here, h...
I have a dataframe with many columns and in one of the columns I have the logical operation which I need to perform on the dataframe. As an example lo...
I have a pyspark dataframe having below types of date time values (string type) - |text|date_filing| |AAA|1998-12-22| |BBBB|2023-08-30 12:03:17.814757...
So I have this weird issue. I'm using a huge dataset that has dates and times in it represented by a single string. This data can be easily converte...
I have employee data like below. I want to group the below data by EMP_ID and if 'Status' of this grouped EMP_ID has the value 'Not Done' then entire ...
I am loading a predefined schema from a JSON file for a specific dataset I ingest into a Azure Data Lake. The JSON file that contains the schema is al...
I have leading and trailing characters in a pyspark dataframe. But trim and regex_replace doesn't seem to work for them. There are probably some null ...
How do I undo the most recent local commits in Git?
How can I remove a specific item from an array in JavaScript?
How do I delete a Git branch locally and remotely?
How can I find all files containing a specific text (string) on Linux?
How to find all files containing specific text (string) on Linux?
How do I revert a Git repository to a previous commit?
How do I create an HTML button that acts like a link?
How do I check out a remote Git branch?