# How can separate data into different groups for the Cumulative Distribution Function in R ?

## Recommended Posts

I'm new in R. I'm trying to modify a TIBCO provided TERR function from https://community.tibco.com/modules/cdf-data-function-tibco-spotfirer. The script works fantastic for one dataset. However, when I modify the script and add line "group = as.factor(rep(c("group1", "group2"), each = N/2))" in the script, the "group" column is created and combined as a factor variable with two levels "group1" and "group2" which are repeated N/2 times each. The sample cdf plot as shown in the picture 2.

It doesn't separate the data into different groups and may I know how can I create a separate CDF for each group as shown in the picture 1 ?

Picture 1

--R script--

library(ggplot2)

x = sort(analysisColumn, decreasing=FALSE, na.last=NA) # Sort increasing, removing missing values.

N=length(x)

cdfTable = data.frame(

value  = x,

prob = ((1:N)-1)/(N-1),

group = as.factor(rep(c("group1", "group2"), each = N/2))

)

ggplot(cdfTable, aes(x=value, y=prob, color = group)) +

geom_line() +

ggtitle("Multiple Cumulative Distribution Curves")

Picture 2

Picture 3

##### Share on other sites

Could you elaborate on what your goal is? You have divided the data in two halves in the same loop as calculating the cdf, so you are creating group1 for the first N/2 rows and group2 for the rest, which is what you are getting. Did you mean to use a separate column to divide the data in two groups, and then calculate the cdf for each group?

If so, what is the name of the grouping column in your example?

##### Share on other sites

Thanks for your reply. Yes, you're right. I would like to use a separate column to divide the data in two groups or maybe more to achieve the cdf as shown in the Picture 1. The grouping column is named as "group". May I know is the existing approach correct to do so ? or should use facet_wrap() function to create multiple panels for each group instead of divide the data in two halves in the same loop?

##### Share on other sites

you have uploaded a sample dataset so if you give me the name of the grouping column I can try. I understand you have a column of data and another column that divides that data into separate groups.

If you mean facets as used by ggplot2, are no facets involved, as Spotfire will take care of the plot.

##### Share on other sites

Yes, I mean the facets call from ggplot2 lib. Did you mean Spotfire can handle the multiple panels without the lib ?

Sorry I thought you refer to the grouping column name that highlight in yellow in the picture below. The name of the grouping column that I would like to create the multiple panels for each group are "group1", "group2", "group3", "group4" and "group5".

##### Share on other sites

I meant the name of the actual column containing the values group1, etc. You have 105 columns in your dataset, which one should be used for grouping the values of the first column (x)?

##### Share on other sites

Got it. I uploaded the dataset in .csv file (separate the data in five groups - "group1", "group2", "group3", "group4", "group5") and also the .dxp file.

For the first column (x), the values ​​group as "group 1" between 0.0996 to 0.2736

For "group 2", the values group between 0.2736 to 0.2832

For "group 3", the values group between 0.2832 to 0.2964

For "group 4", the values group between 0.2964 to 0.3192

For "group 5", the values group between 0.3192 to 0.3516

Wonder the cdf couldn't achieve the cdf (separate CDF for each group) as shown in the Picture 1 due to duplicated values in my dataset ? Or something need to do with the formula "group = as.factor(rep(c("group1", "group2"), each = N/2))" to create multiple CDF plots for each group ?

##### Share on other sites

Attached .csv file

##### Share on other sites

I still don't quite understand how you wish to group your data. Anyway I have modified the original dxp that was on TIBCO Exchange by adding a grouping column. Note that the inputs to the data function are now different.

I used the original example and I added a grouping column (the only potential candidate I could use for that dataset was RAD).

The data function works by creating a separate CDF for each group, then concatenating them into the final table.

The resulting table can be displayed via a line chart separated by group.

You don't need ggplot2 as that is the R library for data visualization, which is taken care of by Spotfire.

• 1
##### Share on other sites

I would like to group my data as you plotted. This is great! My question is now solved, thank you so much! Now I only realize I don't need the ggplot2 until you enlighten me on the issue. I tried to modify the data function from you, it's able to create a separate CDF for each group for my data now. I learnt something today. Thanks a lot!

##### Share on other sites

• 1 year later...
On 1/30/2023 at 4:53 AM, Gaia Paolini said:

I still don't quite understand how you wish to group your data. Anyway I have modified the original dxp that was on TIBCO Exchange by adding a grouping column. Note that the inputs to the data function are now different.

I used the original example and I added a grouping column (the only potential candidate I could use for that dataset was RAD).

The data function works by creating a separate CDF for each group, then concatenating them into the final table.

The resulting table can be displayed via a line chart separated by group.

You don't need ggplot2 as that is the R library for data visualization, which is taken care of by Spotfire.

Can you share the R script for this or provide dxp attachment.  Not sure why there is no file in this thread.