Torch-GRU4Rec#

  • Out-of-the-box:

    A single typo needed to be fixed, so the code is able to run.

  • The small differences (e.g. missing momentum parameter) add up to a noticeable difference when Torch-GRU4Rec is compared to the matching features version.

  • Sampling is performed after all item scores are computed, which slows down training. This bug is rooted so deep in the code that we did not fix it.

Rees46#

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.1027

0.1027

0.2897

0.1680

0.4027

0.1831

0.5206

0.1913

GRU4Rec Official

Torch-GRU4Rec params

0.0968

0.0968

0.2801

0.1607

0.3923

0.1757

0.5112

0.1839

Torch-GRU4Rec

OOB

0.0954

0.0954

0.2774

0.1588

0.3894

0.1737

0.5081

0.1820

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.1108

0.1108

0.3000

0.1772

0.4126

0.1922

0.5291

0.2003

GRU4Rec Official

Torch-GRU4Rec params

0.0825

0.0825

0.2484

0.1401

0.3551

0.1543

0.4716

0.1624

Torch-GRU4Rec

OOB

0.0814

0.0814

0.2459

0.1383

0.3531

0.1525

0.4688

0.1606

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

-5.78%

-5.78%

-3.34%

-4.34%

-2.59%

-4.04%

-1.80%

-3.83%

Torch-GRU4Rec

OOB

-7.12%

-7.12%

-4.28%

-5.50%

-3.32%

-5.11%

-2.40%

-4.86%

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

-25.58%

-25.58%

-17.21%

-20.94%

-13.94%

-19.73%

-10.87%

-18.94%

Torch-GRU4Rec

OOB

-26.58%

-26.58%

-18.05%

-21.96%

-14.42%

-20.63%

-11.40%

-19.83%

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

bpr-max

bpr-max

bpr-max

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

512

512

final_act

elu-0.5

elu-0.5

elu-0.5

layers

512

512

512

batch_size

32

32

32

dropout_p_embed

0.1

0.1

0.1

dropout_p_hidden

0

0

0

learning_rate

0.03

0.03

0.03

momentum

0.55

0

N/A

n_sample

2048

2048

2048

sample_alpha

0.2

0.2

0.2

bpreg

0.75

0.75

0.75

logq

0

0

N/A

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

cross-entropy

cross-entropy

cross-entropy

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

512

512

final_act

softmax

softmax

softmax

layers

512

512

512

batch_size

240

240

240

dropout_p_embed

0.45

0.45

0.45

dropout_p_hidden

0

0

0

learning_rate

0.065

0.065

0.065

momentum

0

0

N/A

n_sample

2048

2048

2048

sample_alpha

0.5

0.5

0.5

bpreg

0

0

0

logq

1

0

N/A

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

1956.80

916.45

29326.00

GRU4Rec Official

Torch-GRU4Rec params

1816.76

0.93 x

987.09

31587.00

Torch-GRU4Rec

OOB

30689.54

15.68 x

16.89 x

58.43

1869.86

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

367.41

650.91

156189.00

GRU4Rec Official

Torch-GRU4Rec params

381.71

1.04 x

626.53

150339.00

Torch-GRU4Rec

OOB

7192.88

19.58 x

18.84 x

33.25

7978.05

Yoochoose#

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.1745

0.1745

0.4346

0.2675

0.5664

0.2851

0.6799

0.2931

GRU4Rec Official

Torch-GRU4Rec params

0.1748

0.1748

0.4298

0.2662

0.5603

0.2837

0.6769

0.2919

Torch-GRU4Rec

OOB

0.1755

0.1755

0.4271

0.2654

0.5560

0.2826

0.6711

0.2907

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.1797

0.1797

0.4457

0.2757

0.5698

0.2924

0.6804

0.3002

GRU4Rec Official

Torch-GRU4Rec params

0.1710

0.1710

0.4301

0.2633

0.5571

0.2803

0.6690

0.2882

Torch-GRU4Rec

OOB

0.1686

0.1686

0.4268

0.2598

0.5528

0.2768

0.6671

0.2847

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

0.14%

0.14%

-1.09%

-0.49%

-1.07%

-0.48%

-0.45%

-0.39%

Torch-GRU4Rec

OOB

0.53%

0.53%

-1.72%

-0.77%

-1.84%

-0.89%

-1.30%

-0.82%

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

-4.81%

-4.81%

-3.50%

-4.52%

-2.23%

-4.14%

-1.67%

-4.00%

Torch-GRU4Rec

OOB

-6.14%

-6.14%

-4.25%

-5.77%

-2.99%

-5.35%

-1.95%

-5.14%

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

bpr-max

bpr-max

bpr-max

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

448

448

final_act

linear

linear

linear

layers

448

448

448

batch_size

48

48

48

dropout_p_embed

0.25

0.25

0.25

dropout_p_hidden

0

0

0

learning_rate

0.075

0.075

0.075

momentum

0.1

0

N/A

n_sample

2048

2048

2048

sample_alpha

0.2

0.2

0.2

bpreg

0.5

0.5

0.5

logq

0

0

N/A

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

cross-entropy

cross-entropy

cross-entropy

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

480

480

final_act

softmax

softmax

softmax

layers

480

480

480

batch_size

48

48

48

dropout_p_embed

0

0

0

dropout_p_hidden

0.2

0.2

0.2

learning_rate

0.07

0.07

0.07

momentum

0

0

N/A

n_sample

2048

2048

2048

sample_alpha

0.2

0.2

0.2

bpreg

0

0

0

logq

1

0

N/A

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

487.51

919.23

44121.00

GRU4Rec Official

Torch-GRU4Rec params

461.22

0.95 x

971.64

46636.00

Torch-GRU4Rec

OOB

2164.21

4.44 x

4.69 x

207.09

9939.82

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

451.75

991.99

47613.00

GRU4Rec Official

Torch-GRU4Rec params

439.19

0.97 x

1020.37

48975.00

Torch-GRU4Rec

OOB

2082.11

4.61 x

4.74 x

215.24

10330.85

Coveo#

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.0501

0.0501

0.1464

0.0835

0.2172

0.0928

0.3123

0.0994

GRU4Rec Official

Torch-GRU4Rec params

0.0484

0.0484

0.1430

0.0811

0.2107

0.0901

0.2995

0.0962

Torch-GRU4Rec

OOB

0.0479

0.0479

0.1400

0.0794

0.2067

0.0883

0.2960

0.0944

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.0489

0.0489

0.1418

0.0814

0.2085

0.0901

0.2947

0.0960

GRU4Rec Official

Torch-GRU4Rec params

0.0473

0.0473

0.1379

0.0786

0.2015

0.0871

0.2848

0.0928

Torch-GRU4Rec

OOB

0.0445

0.0445

0.1343

0.0755

0.1949

0.0835

0.2835

0.0897

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

-3.52%

-3.52%

-2.27%

-2.86%

-2.98%

-2.99%

-4.09%

-3.21%

Torch-GRU4Rec

OOB

-4.43%

-4.43%

-4.35%

-4.88%

-4.84%

-4.92%

-5.23%

-5.01%

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

-3.29%

-3.29%

-2.74%

-3.34%

-3.35%

-3.38%

-3.37%

-3.37%

Torch-GRU4Rec

OOB

-8.98%

-8.98%

-5.30%

-7.14%

-6.55%

-7.33%

-3.80%

-6.61%

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

bpr-max

bpr-max

bpr-max

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

512

512

final_act

elu-1

elu-1

elu-1

layers

512

512

512

batch_size

144

144

144

dropout_p_embed

0.35

0.35

0.35

dropout_p_hidden

0

0

0

learning_rate

0.05

0.05

0.05

momentum

0.4

0

N/A

n_sample

2048

2048

2048

sample_alpha

0.2

0.2

0.2

bpreg

1.85

1.85

1.85

logq

0

0

N/A

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

cross-entropy

cross-entropy

cross-entropy

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

512

512

final_act

softmax

softmax

softmax

layers

512

512

512

batch_size

32

32

32

dropout_p_embed

0.4

0.4

0.4

dropout_p_hidden

0.15

0.15

0.15

learning_rate

0.03

0.03

0.03

momentum

0

0

N/A

n_sample

2048

2048

2048

sample_alpha

0

0

0

bpreg

0

0

0

logq

1

0

N/A

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

12.38

704.14

100615.00

GRU4Rec Official

Torch-GRU4Rec params

12.23

0.99 x

712.45

101803.00

Torch-GRU4Rec

OOB

32.18

2.60 x

2.63 x

270.97

38719.00

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

37.17

1047.47

33505.00

GRU4Rec Official

Torch-GRU4Rec params

36.81

0.99 x

1057.66

33830.00

Torch-GRU4Rec

OOB

91.28

2.46 x

2.48 x

426.78

13650.91

Retailrocket#

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.1224

0.1224

0.3196

0.1928

0.4181

0.2060

0.5187

0.2131

GRU4Rec Official

Torch-GRU4Rec params

0.0979

0.0979

0.2672

0.1576

0.3631

0.1703

0.4639

0.1773

Torch-GRU4Rec

OOB

0.0926

0.0926

0.2413

0.1451

0.3336

0.1574

0.4201

0.1633

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.1150

0.1150

0.3026

0.1814

0.4019

0.1947

0.4953

0.2012

GRU4Rec Official

Torch-GRU4Rec params

0.0907

0.0907

0.2462

0.1456

0.3331

0.1571

0.4249

0.1635

Torch-GRU4Rec

OOB

0.0806

0.0806

0.2206

0.1300

0.2994

0.1405

0.3834

0.1464

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

-20.04%

-20.04%

-16.41%

-18.23%

-13.15%

-17.33%

-10.56%

-16.77%

Torch-GRU4Rec

OOB

-24.35%

-24.35%

-24.51%

-24.70%

-20.22%

-23.61%

-19.00%

-23.35%

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

-21.12%

-21.12%

-18.63%

-19.73%

-17.11%

-19.29%

-14.20%

-18.73%

Torch-GRU4Rec

OOB

-29.89%

-29.89%

-27.10%

-28.36%

-25.50%

-27.83%

-22.60%

-27.27%

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

bpr-max

bpr-max

bpr-max

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

224

224

final_act

elu-0.5

elu-0.5

elu-0.5

layers

224

224

224

batch_size

80

80

80

dropout_p_embed

0.5

0.5

0.5

dropout_p_hidden

0.05

0.05

0.05

learning_rate

0.05

0.05

0.05

momentum

0.4

0

N/A

n_sample

2048

2048

2048

sample_alpha

0.4

0.4

0.4

bpreg

1.95

1.95

1.95

logq

0

0

N/A

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

cross-entropy

cross-entropy

cross-entropy

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

192

192

final_act

softmax

softmax

softmax

layers

192

192

192

batch_size

240

240

240

dropout_p_embed

0.5

0.5

0.5

dropout_p_hidden

0.05

0.05

0.05

learning_rate

0.085

0.085

0.085

momentum

0.3

0

N/A

n_sample

2048

2048

2048

sample_alpha

0.3

0.3

0.3

bpreg

0

0

0

logq

1

0

N/A

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

6.86

1019.34

80807.00

GRU4Rec Official

Torch-GRU4Rec params

6.53

0.95 x

1071.09

84909.00

Torch-GRU4Rec

OOB

26.39

3.85 x

4.04 x

265.19

21022.30

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

2.77

880.71

199935.00

GRU4Rec Official

Torch-GRU4Rec params

2.45

0.88 x

996.78

226283.00

Torch-GRU4Rec

OOB

10.67

3.85 x

4.36 x

229.70

52144.58

Diginetica#

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.0688

0.0688

0.2304

0.1237

0.3533

0.1399

0.4995

0.1500

GRU4Rec Official

Torch-GRU4Rec params

0.0643

0.0643

0.2113

0.1143

0.3204

0.1287

0.4597

0.1383

Torch-GRU4Rec

OOB

0.0636

0.0636

0.2110

0.1137

0.3185

0.1278

0.4550

0.1372

Metrics#

Implementation

Variant

Recall@1

MRR@1

Recall@5

MRR@5

Recall@10

MRR@10

Recall@20

MRR@20

GRU4Rec Official

Best params

0.0647

0.0647

0.2220

0.1181

0.3414

0.1339

0.4874

0.1440

GRU4Rec Official

Torch-GRU4Rec params

0.0552

0.0552

0.1927

0.1020

0.2967

0.1157

0.4255

0.1246

Torch-GRU4Rec

OOB

0.0541

0.0541

0.1894

0.1002

0.2921

0.1137

0.4245

0.1229

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

-6.58%

-6.58%

-8.30%

-7.53%

-9.32%

-7.97%

-7.97%

-7.76%

Torch-GRU4Rec

OOB

-7.64%

-7.64%

-8.42%

-8.08%

-9.85%

-8.63%

-8.92%

-8.54%

Metric difference compared to the “Best params” version with the corresponding loss#

Implementation

Variant

Recall@1 Diff

MRR@1 Diff

Recall@5 Diff

MRR@5 Diff

Recall@10 Diff

MRR@10 Diff

Recall@20 Diff

MRR@20 Diff

GRU4Rec Official

Best params

GRU4Rec Official

Torch-GRU4Rec params

-14.78%

-14.78%

-13.17%

-13.69%

-13.08%

-13.55%

-12.70%

-13.47%

Torch-GRU4Rec

OOB

-16.45%

-16.45%

-14.67%

-15.19%

-14.44%

-15.04%

-12.90%

-14.71%

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

bpr-max

bpr-max

bpr-max

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

512

512

final_act

elu-1

elu-1

elu-1

layers

512

512

512

batch_size

128

128

128

dropout_p_embed

0.5

0.5

0.5

dropout_p_hidden

0.3

0.3

0.3

learning_rate

0.05

0.05

0.05

momentum

0.15

0

N/A

n_sample

2048

2048

2048

sample_alpha

0.3

0.3

0.3

bpreg

0.9

0.9

0.9

logq

0

0

N/A

Hyperparameters used in the experiment#

GRU4Rec Official

GRU4Rec Official

Torch-GRU4Rec

Variant

Best params

Torch-GRU4Rec params

OOB

loss

cross-entropy

cross-entropy

cross-entropy

optim

adagrad

adagrad

adagrad

constrained_embedding

True

False

False

embedding

0

192

192

final_act

softmax

softmax

softmax

layers

192

192

192

batch_size

128

128

128

dropout_p_embed

0.45

0.45

0.45

dropout_p_hidden

0.15

0.15

0.15

learning_rate

0.1

0.1

0.1

momentum

0

0

N/A

n_sample

2048

2048

2048

sample_alpha

0

0

0

bpreg

0

0

0

logq

1

0

N/A

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

8.02

639.87

81757.00

GRU4Rec Official

Torch-GRU4Rec params

7.71

0.96 x

665.42

85021.00

Torch-GRU4Rec

OOB

36.89

4.60 x

4.78 x

139.19

17783.97

Runtime metrics#

Implementation

Variant

Avg. epoch time (s)

Avg. epoch time to Best

Avg. epoch time to Matching

Avg. mb/s

Avg. e/s

GRU4Rec Official

Best params

4.52

1134.52

144959.00

GRU4Rec Official

Torch-GRU4Rec params

4.48

0.99 x

1146.91

146542.00

Torch-GRU4Rec

OOB

17.74

3.92 x

3.96 x

289.38

36974.80