Hackathon 2 - problem statement related queries

nasir.afroze · June 28, 2021, 11:15am

Hello Everyone, Kindly ask us any doubt related to the Problem Statement of Hackathon2 here itself.

sangram · June 29, 2021, 2:51am

Hi @nasir.afroze
Few queries about dataset

Duplicate rows in training data and test data. (302 rows are duplicates i.e. same user_id, aov and category). What is the context? Or could they be dropped?
Is there a hidden test dataset, which will be used for calculating metrics, leaderboards?
Can external data or domain knowledge be used to make predictions?
Unable to post topic or reply to posts in the GHF hackathon 2 Category. Hence posting this on Site-Feedback.

amang251314 · June 29, 2021, 3:52am

I am also looking for a way to calculate the given metrics but in the target feature, we have only one category.

nasir.afroze · June 29, 2021, 4:01am

Based on that one category you have to calculate the three different metrics. We have already explained the metrics in the problem statement.

amang251314 · June 29, 2021, 4:02am

But to calculate the metric, we need three true values.

nasir.afroze · June 29, 2021, 4:04am

You have to predict the top three categories from which a person is most likely to buy.

amang251314 · June 29, 2021, 4:05am

yes.but sir, to calculate the metrics we need three true values

nasir.afroze · June 29, 2021, 4:42am

You predict the top 3 in the order to decreasing probability. The metric checks three things-
1 Did you make a predicition?
2 Whether one of the prediction is actually the correct category.
3 What is the rank of the correct predicted category if any.

For example, you have a user_id = 123456. The correct category for it is “Fashion”. You predicted, “Phones, Cars, Fashion” (in this particular order).

1The recall is 1 ( because you made a prediction)
2Precision is also 1 ( because one of your predicted categories is the correct category)
3MRR is 1/3 ( because it was the third recommendation given by you).

I hope it helps.

amang251314 · June 29, 2021, 4:44am

Thanks. It clarifies all my doubts.

amang251314 · June 29, 2021, 4:57am

Precision and MRR score are same number when i do submission.

nasir.afroze · June 29, 2021, 5:01am

It is a coincidence.

amang251314 · June 29, 2021, 5:05am

I did 8 submissions and all are giving this same precision and MRR

nasir.afroze · June 29, 2021, 5:10am

Have you given the top 3 predictions for each user_id or just the top 1 prediction?

amang251314 · June 29, 2021, 5:12am

i predicted top 3 category.

nasir.afroze · June 29, 2021, 5:23am

I think it is possible only when the first prediction of top 3 is correct. For a certain example either your precision and mrr both are either 1 or 0.

It is coming because of your approach. No need to worry.

nasir.afroze · June 29, 2021, 6:05am

Hi @Gaikwad_Sangram_Dash, Thanks for raising the query.

I did understand the duplicate part. For inner joining on the training and test data, I am not getting any duplicates.
For leaderboard calculation, we have the backend code running; it matches your answer with the correct answers.
Yes, you can use external data and domain knowledge to make predictions.
This issue has been resolved.

amang251314 · June 29, 2021, 6:35am

When can we expect the opening of the leaderboard?

Prabhnoor_Singh · June 29, 2021, 9:29am

@nasir.afroze
I am getting “Unable to compute score.” error message while making submissions. I have also tried submitting with the SAMPLE SUBMISSIONS file (without altering it) however, the error remains the same.

nasir.afroze · June 29, 2021, 10:13am

Hi @Prabhnoor_Singh, We have resolved the issue. Please try making a submission now.

nasir.afroze · June 29, 2021, 10:20am

Leaderboard will be opened in a couple of days!