Hackathon 2 - problem statement related queries

wangsherpa · July 1, 2021, 4:30am

Are the training data in order?
Screenshot 2564-07-01 at 09.57.30

User 32327’s buying order: phones → fashion → home decor → …?

nasir.afroze · July 1, 2021, 4:31am

No, the training data has no preference order.

22.rohan.rathore · July 1, 2021, 5:23am

I read the page, still don’t understand the dataset.

for Mean Relevance Rank, the products shown order matters. ie. MRR is different if ‘phone’ is at first place compared to the case when ‘phone’ is in third. In test csv, the order is not provided.
The train_data.csv contains only one product bought per user id, while in submission, we are expected to predict 3 per user id. Is that correct ?
If I output more than 3 products in ‘products bought’, will only first 3 be considered, or will it be scored 0 ?

@nasir.afroze

nasir.afroze · July 1, 2021, 6:05am

Hi @22.rohan.rathore

1 MRR this metric emphasises the order of your predictions. You have to predict 3 categories in decreasing order of probability.

2 Train_data.csv have multiple transactions for individual user_id. But in one row you have been given only one transaction ( one product category and one aov).

3 Your submission won’t fit in the format as demonstrated in the sample submission. So please predict only the top 3.

I hope it resolves your doubts.

k.kishore · July 1, 2021, 8:05am

I have submitted few submissions nearly 6 in past few days I observe when ever I submit I find both MRR and precision to be same and I doubt that can anyone clarify that

amang251314 · July 1, 2021, 11:44am

Same happened with me. I am waiting for the leaderboard. I will again start doing it when it is up.

sam · July 3, 2021, 6:10am

Hi @nasir.afroze
How do we have to use ‘training data target’ ? Does it only have future assumptions? Because aov is negative for every instance.

vatangl14 · July 4, 2021, 8:04am

Hi @nasir.afroze,

Would it be possible for you to share the evaluation metric code which is being used to evaluate our submissions? This will help to validate the model locally before making a submission.

Prashant_Raj · July 4, 2021, 8:43am

hey can you explain aov. Is it cost or value of the product to the user.

nasir.afroze · July 4, 2021, 11:54am

Evaluation metric code can not be shared but you can easily code it by yourself. The evaluation metrics are explained in a good enough manner.

A hint- Try to write a find in string function.

nasir.afroze · July 4, 2021, 11:54am

AOV - The amount of money a customer pays to buy a product.

ee18btech11026 · July 6, 2021, 4:25am

Hi @nasir.afroze Can you explain what is training data target file ?

ee18btech11026 · July 6, 2021, 4:52am

Just for clarification , so the max value of precision can be 3 and max value of MRR will be 1 , please correct if I am wrong .

nasir.afroze · July 6, 2021, 5:01am

The maximum value of precision can be 1 only. Precision is for checking whether any of your predictions match the correct category.

Yes the maximum value of MRR can only be 1.

nasir.afroze · July 6, 2021, 5:03am

It is the correct category which should be predicted for training data set . You can validate your model based on the results you get while using it as the correct category dataset for evaluating your model.

paradocs · July 6, 2021, 5:17am

Hi @nasir.afroze,
I started the hackathon today itself and had a couple of doubts regarding the data:

What do the duplicate values in the training data signify? (That the user bought the same product again at the same price? Strange!) I saw that there are 151 rows whose duplicate is present. The same row duplicated just once.
The Train Target dataset has the next purchase of ~13k users out of the ~29k users mentioned in the Train Data. Is it meant to be that way (because that would mean the data of the remaining ~16k users is kinda useless)? Or am I missing something?

data007 · July 6, 2021, 9:32am

@nasir.afroze What would be the weights for different metrics (Precision, Recall and MRR) to decide the private leaderboard or would it be solely based on Precision which is currently being used to display the rank on public leaderboard?

sagarjounkani · July 6, 2021, 11:30am

I have the same question

nasir.afroze · July 6, 2021, 3:55pm

Hi @paradocs,
Welcome to the second hackathon of GHF.
1 Out of the 257406 transactions listed in training,151 repeat transactions can be done or not? Giving an open-ended answer I hope you get what I am saying.

2 You are asked to predict the future transaction of a certain number of 16k user_id not given in the train_data_target. You can see that there is no user_id common in test_data and train_data_target.

I hope it helps.

nasir.afroze · July 6, 2021, 3:58pm

Recall>Precision>MRR. The order used for ranking . same criteria for private leaderboard