I'm new on plataform Statistica and i need to present to my companny some requisites, if the software can attend.
Could you help me?
- Read file homesite.train.csv. (obs: this file was provided in one of Kaggle's competitions).
- Convert the columns with categorical data (Strings) to numeric.
- Save converted data to Hadoop HDFS.
- From Spark, read the HDFS data and load them into an RDD. Cache and demonstrate on the Spark console.
- Divide the RDD between training and validation (70% -30%) randomly.
- With the training data, train a model using Spark's RandomForestClassifier.
- Validate the trained model with the validation data.
- Calculate the AUC-ROC score in Spark and display in the solution interface.