-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcardshuffling.tex
1543 lines (1373 loc) · 64.3 KB
/
cardshuffling.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%%% -*-LaTeX-*-
%%% cardshuffling.tex.orig
%%% Prettyprinted by texpretty lex version 0.02 [21-May-2001]
%%% on Wed Sep 16 08:38:45 2020
%%% for Steve Dunbar (sdunbar@family-desktop)
\documentclass[12pt]{article}
\input{../../../../etc/macros}
\input{../../../../etc/mzlatex_macros}
%% \input{../../../../etc/pdf_macros}
\bibliographystyle{plain}
\begin{document}
\myheader \mytitle
\hr
\sectiontitle{Card Shuffling as a Markov Chain}
\hr
\usefirefox
\hr
% \visual{Study Tip}{../../../../CommonInformation/Lessons/studytip.png}
% \section*{Study Tip}
% \hr
\visual{Rating}{../../../../CommonInformation/Lessons/rating.png}
\section*{Rating} %one of
% Everyone: contains no mathematics.
% Student: contains scenes of mild algebra or calculus that may require guidance.
Mathematically Mature: may contain mathematics beyond calculus with
proofs. % Mathematicians Only: prolonged scenes of intense rigor.
\hr
\visual{Section Starter Question}{../../../../CommonInformation/Lessons/question_mark.png}
\section*{Section Starter Question}
Why shuffle a deck of cards? What kind of shuffle do you use? How many
shuffles are sufficient to achieve the purpose of shuffling?
\hr
\visual{Key Concepts}{../../../../CommonInformation/Lessons/keyconcepts.png}
\section*{Key Concepts}
\begin{enumerate}
\item
Card deck shuffles are a family of possible re-orderings with
probability distributions, leading to transition probabilities,
and thus Markov processes. The most well-studied type of
shuffle is the riffle shuffle and that is the main focus here.
\item
Going from card order\( \pi \) to \( \tau \) is the same as
composing \( \pi \) with the permutation \( \pi^{-1} \circ \tau \).
Now identify shuffles as functions on \( \set{1, \dots n} \) to \(
\set{1, \dots n} \), that is, permutations.%
Since a particular shuffle is one of a whole family of shuffles,
chosen with a probability distribution \( Q \) from the family,
the transition probabilities are
\[
p_{\pi \tau} = \Prob{X_t = \tau \given X_{t-1} = \pi}
= Q(\pi^{-1} \circ \tau).
\]
\item
The identification of shuffles or operations with permutations
gives a probability distribution on \( S_n \).
\item
A \defn{Top-to-Random Shuffle},%
\index{top-to-random-shuffle}
takes the top card from a stack of \( n \) cards and inserts it
in the gap between the \( (k-1) \)th card and the \( k \)th card
in the deck.
\item
The Top-To-Random-Shuffle demonstrates the cut-off phenomenon
for the Total Variation distance of the Markov chain
distribution from the uniform distribution as a function of the
number of steps.
\item
One realistic model of shuffling a deck of cards is the \defn{riffle
shuffle}.
\item
The set of cuts and interleavings in a riffle shuffle induces in
a natural way a density on the set of permutations. Call this a
\defn{riffle shuffle} and denote it by \( R \). That is, \( R(\pi)
\) is the sum of probabilities of each cut and interleaving that
gives the rearrangement of the deck corresponding to \( \pi \).
\item
\( 7 \) shuffles the of 3-card deck gets very close to the
uniform density, which turns out to be the stationary density.
\item
The probability of achieving a permutation \( \pi \) when doing
an \( a \)-shuffle is
\[
\frac{1}{a^n} \binom{n + a - r}{n},
\] where \( r \) is the number of rising sequences in \( \pi \).
\item
The eigenvalues of the transition probability matrix for a
riffle shuffle are \( 1 \), \( \frac{1}{2} \), \( \frac{1}{4} \)
and \( \frac{1}{2^n} \). The second largest eigenvalue
determines the rate of convergence to the stationary
distribution. For riffle shuffling, this eigenvalue is \( \frac
{1}{2} \).
\item
For a finite, irreducible, aperiodic Markov chain \( Y_t \)
distributed as \( Q^t \) at time \( t \) and with stationary
distribution \( \pi \), and \( \tau \) is a strong stationary
time, then
\[
\| Q^{\tau} - \pi \|_{TV} \le \Prob(\tau \ge t).
\]
\item
Set \( d_n(t) = \| P^{\tau_{\text{top}}+1} - U \|_{TV} \). Then
for \( \epsilon > 0 \),
\begin{enumerate}
\item
\( d_{n}(n \log n + n \log \epsilon^{-1} )\le \epsilon \)
for \( n \) sufficiently large.
\item
\( d_{n}(n \log n - n \log (C \epsilon^{-1})) \ge 1-\epsilon
\) for \( n \) sufficiently large.
\end{enumerate}
\end{enumerate}
\hr
\visual{Vocabulary}{../../../../CommonInformation/Lessons/vocabulary.png}
\section*{Vocabulary}
\begin{enumerate}
\item
A defn{Top-to-Random Shuffle},%
\index{top-to-random-shuffle}
takes the top card from a stack of \( n \) cards and inserts it
in the gap between the \( (k-1) \)th card and the \( k \)th card
in the deck.
\item
The \defn{total variation distance} of \( \mu \) from \( \nu \)
is%
\index{total variation distance}
\[
\| \mu - \nu \|_{TV} = \max_{A \subset \Omega} \abs{ \mu(A)
- \nu(A)} = \frac{1}{2} \sum\limits_{x \in \Omega} \abs{ \mu
(x) - \nu(x)}.
\]
\item
A \defn{strong stationary time}%
for \( X_t \), \( t \ge 0 \) if \( X_{\tau_{\text{top}}+1} \sim
\operatorname{unif}
(S_n) \), and \( X_{\tau_{\text{top}}+1} \) is independent of \(
\tau_{\text{top}} \).
\item
The \defn{riffle shuffle} first cuts the deck randomly into two
packets, one containing \( k \) cards and the other containing \(
n-k \) cards. Choose \( k \), the number of cards cut according
to the binomial density. Once the deck is cut into two packets,
interleave the cards from each packet in any possible way, such
that the cards from each packet keep their own relative order.
\item
A special case of this is the \defn{perfect shuffle}, also know
as the \defn{faro shuffle} wherein the two packets are
completely interleaved.
\item
A \defn{rising sequence} of a permutation is a maximal
consecutive increasing subsequence.
\item
A \defn{\( a \)-shuffle} is another probability density on \( S_n
\). Let \( a \) be any positive integer. Cut the deck into \(
a \) packets of nonnegative sizes \( m_1, m_2, \dots, m_a \)
with \( m_1 + \cdots + m_a = n \) but some of the \( n_i \) may
be zero. Interleave the cards from each packet in any way, so
long as the cards from each packet, so long as the cards from
each packet keep the relative order among themselves. With a
fixed packet structure, consider all interleavings equally
likely.
\end{enumerate}
\hr
\visual{Mathematical Ideas}{../../../../CommonInformation/Lessons/mathematicalideas.png}
\section*{Mathematical Ideas}
\subsection*{General Setting}
An unopened deck of cards has the face-up order (depending on
manufacturer, but typically in the U.S.), starting with the Ace of
Spades:
\begin{itemize}
\item
Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King of Spades,
\item
Ace, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King of Diamonds,
\item
King, Queen, Jack, 10, 9, 8, 7, 6, 5, 4, 3, 2, Ace of Clubs,
then
\item
King, Queen, Jack, 10, 9, 8, 7, 6, 5, 4, 3, 2, Ace of Hearts.
\end{itemize}
Call this the initial order of the deck. Knowing this order is
essential for some sleight of hand tricks performed by a magician. For
card players, shuffling the deck to remove this order is essential so
that cards dealt from the deck come ``at random'', that is, in an order
uniformly distributed over all possible deck orders. The main question
here is: Starting from this order, how many shuffles are necessary to
obtain a ``random'' deck order from the uniform distribution?
In terms of Markov processes, the questions are: What is the state
space, what is an appropriate transition probability matrix, what is the
steady state distribution, hopefully uniform, and how fast does the
Markov process approach the steady state distribution?
For simplicity and definiteness, let the cards in the initial deck order
above be numbered \( 1 \) to \( 52 \). It will also be convenient to
study much smaller decks of cards having \( n \) cards. The set of
states for a Markov process modeling the order of the deck is \( S_n \),
the set of permutations on \( n \) cards. For convenience, set the
initial state \( X_0 \) to be the identity permutation with probability \(
1 \). In other words, choose the initial distribution as not shuffling
the deck yet.
Consider a shuffle, that is, a re-ordering operation on a state that
takes an order to another order. For example, the riffle shuffle, also
called a dovetail shuffle or leafing the cards, is a common type of
shuffle that interleaves packets of cards. A perfect riffle shuffle,
also called a faro shuffle, splits the deck exactly in half, then
interleaves cards alternately from each half. A perfect rifle shuffle is
difficult to perform, except for practiced magicians. More commonly,
packets of adjacent cards from unevenly split portions interleave,
creating a new order for the deck that nevertheless preserves some of
the previous order in each packet. Thus a particular riffle shuffle is
one of a whole family of riffle shuffles, chosen with a probability
distribution on the family. This probability distribution then induces
a transition probability from state to state, and thus a Markov process.
Other types of shuffles have colorful names such as the Top-to-Random
shuffle, Hindu shuffle, pile shuffle, Corgi shuffle, Mongean shuffle,
and Weave shuffle. Some shuffle types are a family of possible
re-orderings with probability distributions different from the riffle
shuffle, leading to different transition probabilities, and thus
different Markov processes.
Going from card order \( \pi \) to \( \sigma \) is the same as composing \(
\pi \) with the permutation \( \pi^{-1} \circ \sigma \). Now identify
shuffles as functions on \( \set{1, \dots n} \) to \( \set{1, \dots n} \),
that is, permutations.%
\index{permutation}
Since a particular riffle shuffle is one of a whole family of riffle
shuffles, chosen with a probability distribution \( Q \) from the
family, the transition probabilities are \( p_{\pi \sigma} = \Prob{X_t =
\sigma \given X_{t-1} = \pi} = Q(\pi^{-1} \circ \sigma) \). So now the goal
is to describe the probability distribution \( Q \) and apply it to the
Markov process.
\begin{remark}
This section uses a list notation for permutations. For example,
the notation \( \pi = [231] \) represents the permutation with \(
\pi(1) = 2 \), \( \pi(2) = 3 \) and \( \pi(3) = 1 \). A common
alternative explicit notation for the same permutation is
\[
\begin{pmatrix}
1 & 2 & 3 \\
2 & 3 & 1
\end{pmatrix}
.
\] Writing the permutation in matrix form makes finding the inverse
obvious, \( \pi^{-1} = [312] \).
Recall also that sequential permutations are applied from right to
left. Composing \( \pi \) with the permutation \( \pi^{-1} \circ
\sigma \) gives \( \pi \circ (\pi^{-1} \circ \sigma) = \sigma \). If \(
\sigma = [132] \), then \( \pi^{-1} \circ \sigma = [321] \) and \( [132]
= [231] \circ [321] \).
This section does not use cycle notation for permutations.
\end{remark}
\subsection*{Top to Random Shuffle} A particularly simple shuffle is the
\defn{Top-to-Random Shuffle},%
\index{top-to-random-shuffle}
abbreviated TTRS\@. The TTRS takes the top card from a stack of \( n \)
cards and inserts it in the gap between the \( (k-1) \)th card and the \(
k \)th card in the deck. See Figure~%
\ref{fig:cardshuffling:cards1}. Note that \( k = 1 \) is possible, in
which case the top card returns to the top. Likewise, \( k = n+1 \) is
also permitted, in which case the top card moves to the bottom of the
card stack.
Consider the order of the cards to be a permutation on \( n \) symbols.
The TTRS is naturally a finite Markov chain \( X_t \) for \( t \ge 0 \)
with \( X_t \in S_n \). Set \( X_0 = \sigma_0 \), the identity
permutation. The transition probabilities are
\[
\Prob{X_{t+1} = \sigma' \given X_t = \sigma} =
\begin{cases}
\frac{1}{n} & \text{\( \sigma' \) is a TTRS of \( \sigma \)}\\
0 & \text{otherwise}
\end{cases}
\] defining the transition probability matrix \( P \). Then after \( t \)
TTRS shuffles, the order of the deck has a probability distribution \(
P^t X_0 \) on \( S_n \), where with an overload of notation \( X_0
\) is the vector with a \( 1 \) in the position for \( \sigma_0 \) and
\( 0 \) elsewhere, representing the initial state.
The Markov chain \( X_t \) induced by the TTRS
is irreducible, see the exercises. It is also immediate that \( X_t \)
is aperiodic since it is possible that the top card can recur back on
top. Therefore, this Markov chain must converge to a stationary
distribution and this section will later prove that \( P^t X_0 \to
\operatorname{unif}
(S_n) \).
\begin{example}
The transition matrix for the TTRS on a deck with three cards is
\[
\bordermatrix{ & [123] & [213] & [231] & [132] & [312] & [321]
\cr
[123] & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & 0 & 0 \cr
[213] & \frac{1}{3} & \frac{1}{3} & 0 & \frac{1}{3} & 0 & 0 \cr
[231] & 0 & 0 & \frac{1}{3} & 0 & \frac{1}{3} & \frac{1}{3} \cr
[132] & 0 & 0 & 0 & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \cr
[312] & \frac{1}{3} & 0 & 0 & \frac{1}{3} & \frac{1} {3} & 0 \cr
[321] & 0 & \frac{1}{3} & \frac{1}{3} & 0 & 0 & \frac{1}{3} \cr
}.
\]
If the card deck is initially in order \( 1 \) to \( n \) from top
to bottom, how many TTRS shuffles does it take for the deck to be
sufficiently shuffled? Starting with the identity ordering, the
density of the permutations after \( 7 \) top-to-random shuffles is
the first row of \( P^7 \). Numerically,
\[
P^7 =
\begin{pmatrix}
0.16690 & 0.16690 & 0.16690 & 0.16644 & 0.16644 & 0.16644 \\
0.16690 & 0.16690 & 0.16644 & 0.16690 & 0.16644 & 0.16644 \\
0.16644 & 0.16644 & 0.16690 & 0.16644 & 0.16690 & 0.16690 \\
0.16644 & 0.16644 & 0.16644 & 0.16690 & 0.16690 & 0.16690 \\
0.16690 & 0.16644 & 0.16644 & 0.16690 & 0.16690 & 0.16644 \\
0.16644 & 0.16690 & 0.16690 & 0.16644 & 0.16644 & 0.16690 \\
\end{pmatrix}
.
\] That is, \( 7 \) shuffles of the 3-card deck gets close to the
stationary density, which turns out to be the uniform density. The
eigenvalues of \( P \) are \( 1, \frac{1}{3}, \frac{1}{3}, \frac {1}
{3}, 0, 0 \).
\end{example}
\begin{figure}
\centering
\begin{asy}
size(5inches);
real myfontsize = 12;
real mylineskip = 1.2*myfontsize;
pen mypen = fontsize(myfontsize, mylineskip);
defaultpen(mypen);
real eps = 0.1;
pair vert = (0, eps);
defaultpen(5);
path card = (0,0)--(1,0);
label("Stack Position", shift(3*vert)*(1.1,0));
draw(shift(2*vert)*card); label("$1$", shift(2*vert)*(1.1,0));
draw(shift(vert)*card); label("$2$", shift(vert)*(1.1,0));
draw(card); label("$3$",(1.1,0));
label("$\Huge{\vdots}$", -vert);
draw(shift(-2vert)*card); label("$k-1$", shift(-2*vert)*(1.1,0));
draw(shift(-3*vert)*card); label("$k$", shift(-3*vert)*(1.1,0));
label("$\Huge{\vdots}$", -4*vert);
draw(shift(-5*vert)*card); label("$n$", shift(-5*vert)*(1.1,0));
draw( arc( (1.15, -eps/4), r = 2.20*eps, angle1=90, angle2=-90),
arrow=Arrow(), red+1bp);
\end{asy}
\caption{Schematic drawing of the Top-to-Random-Shuffle.}%
\label{fig:cardshuffling:cards1}
\end{figure}
\begin{lemma}
At any time \( t \), if \( k \) cards appear beneath the card
labeled \( n \), then these cards appear in any order with equal
probability.
\end{lemma}
\begin{proof}
The proof is by induction on \( t \). The base case \( t = 0 \) is
trivial. Suppose that the claim is true for some \( t > 0 \). In
the transition to \( t + 1 \), two cases can occur, see Figure~%
\ref{fig:cardshuffling:cards2} for a schematic diagram. First, the
top card is randomly placed above the card labeled \( n \) that is
somewhere in the stack. Then nothing is changed and the proof is
complete. Otherwise, the top card is placed in one of the \( k+1 \)
available spaces below the last card labeled \( n \) that is
somewhere in the stack. The probability of any particular one of
these arrangements is
\[
\frac{1}{k!} \cdot \frac{1}{k+1} = \frac{1}{(k+1)!}
\] where \( \frac{1}{k!} \) comes from the induction hypothesis and
the \( \frac{1}{k+1} \) comes from the TTRS\@. The proof is
complete.
\end{proof}
\begin{figure}
\centering
\begin{asy}
size(5inches);
real myfontsize = 12;
real mylineskip = 1.2*myfontsize;
pen mypen = fontsize(myfontsize, mylineskip);
defaultpen(mypen);
real eps = 0.1;
pair vert = (0, eps);
defaultpen(2);
path card = (0,0)--(1,0);
picture p = new picture;
size(p, 2inches);
label(p, "Card Number", shift(3*vert)*(-0.1,0));
draw(p, shift(2*vert)*card);
draw(p, shift(vert)*card);
draw(p, card);
label(p, "$\Large{\vdots}$", -vert);
label(p, "$n$", shift(-2*vert)*(-0.1,0));
draw(p, shift(-2*vert)*card);
draw(p, shift(-3*vert)*card);
label(p, "$\Large{\vdots}$", -4*vert);
draw(p, shift(-5*vert)*card);
draw(p, (1.15, -2*eps)--(1.15, -5*eps),
arrow=Arrows(),
bar= Bars(), black+1bp );
label(p, "$k$ cards", (1.15, -3.5*eps), align=E );
draw(p, arc( (1.15, 0), r = eps, angle1=90, angle2=-90),
arrow=Arrow(), red+1bp);
picture q = new picture;
size(q, 2inches);
label(q, "Card Number", shift(3*vert)*(-0.1,0));
draw(q, shift(2*vert)*card);
draw(q, shift(vert)*card);
draw(q, card);
label(q, "$\Large{\vdots}$", -vert);
label(q, "$n$", shift(-2*vert)*(-0.1,0));
draw(q, shift(-2*vert)*card);
draw(q, shift(-3*vert)*card);
label(q, "$\Large{\vdots}$", -4*vert);
draw(q, shift(-5*vert)*card);
draw(q, (1.15, -2*eps)--(1.15, -5*eps),
arrow=Arrows(),
bar= Bars(), black+1bp );
label(q, "$k$ cards", (1.15, -3.5*eps), align=E );
draw(q, arc( (1.15, -eps), r = 3*eps, angle1=90, angle2=-90),
arrow=Arrow(), red+1bp);
add(p.fit(),(0,0), (0,0) );
add(q.fit(),(0,0), (100,0) );
\end{asy}
\caption{Schematic diagram of the proof of the Lemma.}%
\label{fig:cardshuffling:cards2}
\end{figure}
\begin{theorem}
\label{thm:cardshuffling:tautop} Let \( \tau_{\text{top}} \) be the
first time that card \( n \) reaches the top of the deck. Then \( P^
{\tau_{\text{top}}+1}X_0 \) is uniform on \( S_n \). Furthermore,
whatever permutation arises at time \( \tau_{\text{top}}+1 \) is
independent of \( \tau_{\text{top}} \).
\end{theorem}
\begin{proof}
The proof follows from the Lemma, since at time \( \tau_{\text{top}}
\) the \( n-1 \) cards below card \( n \) will be uniformly
distributed over the \( (n-1)! \) possible permutations. Then at
time \( \tau_{\text {top}}+ 1 \) card \( n \) is inserted uniformly
at random in the deck.
\end{proof}
\begin{remark}
Waiting for \( \tau_{\text{top}} \) is the same as waiting for
completion in the ``coupon collectors problem in reverse''. More
precisely, collecting a coupon here is putting the top card below
the card labeled \( n \). The first card is hard to put under \( n \),
in fact it happens with probability \( \frac{1}{n+1} \) but it gets
easier as time goes on. This motivates the later assertions that \( \E{\tau_
{\text{top}} + 1} = \Theta(n \log n) \) and that \( \Prob{\tau_{\text
{top}}+1 \ge n \log n + c n} \le \EulerE^{-c} \) for all \( c \ge 0 \).
See below for more details.
\end{remark}
\begin{definition}
If \( \mu \) and \( \nu \) are probability distributions on \(
\Omega \), the \defn{total variation distance} of \( \mu \) from \(
\nu \) is%
\index{total variation distance}
\[
\| \mu - \nu \|_{TV} = \sup_{A \subset \Omega} \abs{ \mu(A) -
\nu(A)} = \frac{1}{2} \sum\limits_{x \in \Omega} \abs{ \mu(x) -
\nu(x)}.
\]
\end{definition}
\begin{remark}
Probability distributions \( \mu \) and \( \nu \) are far apart in
total variation distance if there is a ``bad event'' \( A \) such
that \( \mu \) and \( \nu \) measure \( A \) differently.
\end{remark}
\begin{definition}
Define \( \tau_{\text{top}} \) as a \defn{strong stationary time}%
\index{strong stationary time}
for \( X_t \), \( t \ge 0 \) if \( X_{\tau_{\text{top}}+1} \sim
\operatorname{unif}
(S_n) \), and \( X_{\tau_{\text{top}}+1} \) is independent of \(
\tau_{\text{top}} \).
\end{definition}
\begin{remark}
A \emph{stopping time} is a rule which tells the process to ``stop''
depending on the current value of the process. The stopping time is
strong stationary if conditional on stopping after \( t+1 \) steps the
value of the process is uniform on the state space.
\end{remark}
\begin{lemma}
\label{lem:cardshuffling:stoptime} Let \( Q \) be a probability
distribution on a finite group \( G \) inducing an irreducible and
aperiodic Markov chain
with transition probabilities \( Q(\pi^{-1} \circ \sigma) \)
from \( \pi \) to \( \sigma \). Let \( \tau \) be a strong
stationary time for \( Q \) and \( U \) the uniform distribution. Then
\[
\| Q^{\tau} - U \|_{TV} \le \Prob{\tau > k}
\] for all \( k \ge 0 \)
\end{lemma}
\begin{remark}
The hypotheses irreducible and aperiodic may not be strictly
necessary, but occur here because both are common in theorems about
Markov chains.
\end{remark}
\begin{proof}
For any \( A \subset G \)
\begin{align*}
Q^{k}(A) &= \Prob{X_k \in A} \\
&= \sum_{j \le k} \Prob{X_k \in A, \tau = j} + \Prob{X_k \in A,
\tau >k} \\
&= \sum_{j \le k} U(A) \Prob{\tau = j} + \Prob{X_k \in A \given
\tau >k} \Prob{\tau > k} \\
&= U(A) + \left( \Prob{X_k \in A \given \tau >k} - U(A) \right)
\Prob{\tau > k}
\end{align*}
and because \( \abs{\Prob{X_k \in A \given \tau >k} - U(A)} \le 1 \)
\[
\| Q^{\tau} - U \|_{TV} \le \Prob{\tau > k}.
\]
\end{proof}
\begin{lemma}
\label{lem:cardshuffling:coupon} Sample uniformly with replacement
from an urn with \( n \) balls. Let \( V \) be the number of draws
required until each ball has been drawn at least once. Then
\[
\Prob{V > n \log n + c n} \le \EulerE^{-c}
\] for \( c \ge 0 \) and \( n \ge 1 \).
\end{lemma}
\begin{remark}
The lemma statement is another formulation of the coupon collectors
problem.%
\index{coupon collectors problem}
The usual formulation has \( n \) different types of coupons or
prizes in a cereal box. On each draw, one obtains a coupon or prize
equally likely to be any one of the \( n \) types. The goal is to
find the expected number of coupons one needs to gather before
obtaining a complete set of at least one of each type.
\end{remark}
\begin{proof}
Let \( m = n \log n + c n \). For each ball \( b \) let \( A_b \)
be the event ``ball \( b \) not drawn in the first \( m \) draws.
Then
\[
\Prob{ V > m} = \Prob{ \bigcup_{b=1}^n A_b } \le \sum_{b=1}^n \Prob{A_b} =
n \left( 1 - \frac{1}{n} \right)^m \le n \EulerE^{-m/n} = \EulerE^
{-c}.
\]
See the exercises for a proof of the second inequality.
\end{proof}
% \begin{theorem}[Aldous, Diaconis]
% For a finite, irreducible, aperiodic Markov chain \( Y_t \)
% distributed as \( Q^t \) at time \( t \) and with stationary
% distribution \( \pi \), and \( \tau \) is a strong stationary time,
% then
% \[
% \| Q^{\tau} - \pi \|_{TV} \le \Prob{\tau \ge t}.
% \]
% \end{theorem}
% Then immediately, \( \| P^{\tau_{\text{top}}+1} - U \|_{TV} \le \Prob{\tau_
% {\text{top}+1} \le \EulerE^{-c}} \). This is like the coupon collector
% having \( n \) coupons.
For simplicity in what follows, set \( d_P(n) = \| P^n - U \|_{TV} \).
Then \( d_P(n) \) measures how close \( n \) repeated shuffles get the
deck to being shuffled according to the uniform density.
\begin{theorem}
For the TTRS shuffle
\begin{enumerate}
\item
\( d_P(n \log n + n \log \epsilon^{-1} )\le \epsilon \) for \(
n \) sufficiently large.
\item
\( d_P(n \log n - n \log (C \epsilon^{-1})) \ge 1-\epsilon \)
for \( n \) sufficiently large.
\end{enumerate}
\end{theorem}
\begin{proof}
\begin{enumerate}
\item
Theorem~%
\ref{thm:cardshuffling:tautop} shows that \( \tau_{\text{top}}
\), the first time that the original bottom card has come to
the top and been inserted into the deck is a strong uniform
time for the TTRS\@.
\item
The goal is to show that \( \tau_{\text{top}} \) has the
same distribution as \( V \) in Lemma~%
\ref{lem:cardshuffling:coupon}. Then the upper bound
follows from Lemma~%
\ref{lem:cardshuffling:coupon} and Lemma~%
\ref{lem:cardshuffling:stoptime}.
\item
Write
\[
\tau_{\text{top}} = \tau_1 + (\tau_2 - \tau_1) + \cdots
+ (\tau_{n-1} - \tau_{n-2}) + (\tau_{\text{top}} - \tau_
{n-1})
\] where \( \tau_i \) is the time until card \( i \) is
placed under the original bottom card.
\item
When exactly \( i \) cards are under the original bottom
card \( b \), the chance that the current top card is
inserted below \( b \) is \( \frac{i+1}{n} \) and hence the
random variable \( (\tau_{i+1} - \tau_i) \) has geometric
distribution
\[
\Prob{(\tau_{i+1} - \tau_i) = j} = \frac{i+1}{n}\left(1
- \frac{i+1}{n} \right)^{j-1}
\] for \( j \ge 1 \).
\item
The random variable \( V \) in Lemma~%
\ref{lem:cardshuffling:coupon} can be written as
\[
V = (V - V_{n-1}) + (V_{n-1} - V_{n-2}) + \cdots + (V_2
- V_1) + V_1
\] where \( V_i \) is the number of draws required until \(
i \) distinct balls have been drawn at least once.
\item
After \( i \) distinct balls have been drawn, the chance that
a draw produces a not-previously-drawn ball is \( \frac{n-i}
{n} \). So \( V_i - V_{i-1} \) has distribution
\[
\Prob{V_i - V_{i-1} = j} = \frac{n-i}{n} \left( 1 -
\frac{n-i}{n} \right)^{j-1}
\] for \( j \ge 1 \).
\item
Comparing, the corresponding terms \( (\tau_{i+1} - \tau_i) \)
and \( V_{n-i} - V_{(n-i)-1} \) have the same distribution, since
the summands in each sum are independent, it follows that
the sums \( \tau \) and \( V \) have the same distribution,
as required.
\item
To prove the lower bound, fix \( j \) and \( A_j \) be the
set of configurations of the deck such that the bottom \( j \)
original cards stay in their original relative order.
Plainly \( U(A_j) = \frac{1}{j!} \).
\item
Let \( k = k(n) = n \log n - c_n n \) where \( c_n \to
\infty \). The goal is to show \( P^{k(n)}(A_j) \to 1 \) as \( n
\to \infty \) for fixed \( j \). Then \( d(k(n)) = \sup\{P^k(A_j)
- U(A_j)\} \to 1 \) as \( n \to \infty \) for fixed \( j \),
establishing the lower bound.
\item
To prove \( P^{k(n)}(A_j) \to 1 \) as \( n \to \infty \), note \(
P^{k(n)}(A_j) \ge \Prob{\tau- \tau_{j-1} > k} \) because \( \tau
- \tau_{j-1} \) is distributed as the time for the card
initially \( j \)th from the bottom to come to the top and
be inserted. If this has not happened by time \( k(n) \), then
the original bottom \( j \) cards must still be in their
relative order at time \( k \).
\item
It suffices to show that \( \Prob{\tau- \tau_{j-1} \le k}
\to 0 \) as \( n \to \infty \) for fixed \( j \). This
follows from Chebyshev's inequality. Note that
\begin{align*}
\E{(\tau_{i+1} - \tau_i)} &= frac{n}{i+1} \\
\Var{(\tau_{i+1} - \tau_i)} &= \left( \frac{n}{i+1}
\right)^2 \left( 1 - \frac{i+1}{n} \right)
\end{align*}
and so
\[
\E{(\tau - \tau_j)} = \sum\limits_{i=j}^{n-1} \frac{n}{i+1}
= n \log n + O(n)
\] and
\[
\Var{(\tau - \tau_j)} = \sum\limits_{i=j}^{n-1} \left(
\frac{n}{i+1} \right)^2 \left( 1 - \frac{i+1}{n} \right)=
O(n^2).
\] Then using Chebyshev's inequality gives \( \Prob{\tau-
\tau_{j-1} \le k} \to 0 \) as \( n \to \infty \) for fixed \(
j \).
\end{enumerate}
\end{proof}
\begin{remark}
The strong stationary time property of \( \tau \) played no role in
establishing the lower bound. The proof gets lower bounds by
guessing some set \( A \) for which \( P^k(A) - U(A) \) should be
large and then using
\[
d(k) = \| P^k - U \|_{\text{TV}} \ge \abs{P^k(A) - U(A)}.
\]
\end{remark}
Note that \( n \log n + n \log \epsilon^{-1} = n \log n (1 + o(1)) \)
and \( n \log n - n \log \epsilon^{-1} = n \log n (1 - o(1)) \). This
gives the sense that \( n \log n \) shuffles is about the right number
of shuffles needed to bring the deck close to being uniformly shuffled.
This gives a cut-off phenomenon, that is \( n \log n \) is a critical
number of shuffles such that \( d_P(n \log n + o(n)) \approx 0 \) but \( d_P
(n \log n - o(n)) \approx 1 \). The distance from the stationary density
changes abruptly at some value, see Figure~%
\ref{fig:cardshuffling:cards3}.
\begin{figure}
\centering
\begin{asy}
import graph;
size(5inches);
real myfontsize = 12;
real mylineskip = 1.2*myfontsize;
pen mypen = fontsize(myfontsize, mylineskip);
defaultpen(mypen);
real f( real x) {
real a = 0.6;
real k = 50.0;
real term = exp(-k*(x -a));
return term/( 1 + term);
}
draw( graph(f, 0,1));
xaxis("$t$", Arrow);
xtick(Label("$n \log n (1 -o(1))$", (0.4,0), 2S), (0.4, 0), S);
xtick(Label("$n \log n $", (0.6, 0), 2N), (0.6, 0), N);
xtick(Label("$n \log n (1 +o(1))$", (0.8,0), 2S), (0.8, 0), S);
yaxis("$\| P^t - U \|_{TV}$", Arrow);
\end{asy}
\caption{Schematic graph of the cut-off phenomenon for the Total
Variation distance of the Markov chain distribution from the uniform
distribution as a function of the number of steps.}%
\label{fig:cardshuffling:cards3}
\end{figure}
Note that this is quite different from the asymptotics of
\( d_P(n) = \| P^n - U \|_{TV} \). Perron-Frobenius theory says
\( d_P(n) \asympt a \lambda^n \) where \( \lambda \) is the second
largest eigenvalue, but the long-time asymptotics miss the cut-off.
% The
% justification is to find a ``bad event'' and use it to measure the
% total variation distance. In fact, let \( A_j \) be the event that
% the bottom \( j \) cards of the deck appear in correct relative order.
% Then \( U(A_j) = 1/j! \). while \( P^t(A_j) \to 1\).
\subsection*{The Riffle Shuffle}
A more realistic model of shuffling a deck cards is the commonly used \defn
{riffle shuffle}.%
\index{riffle shuffle}
The riffle shuffle is sometimes called the GSR shuffle since Gilbert and
Shannon and independently Reeds first analyzed it. First cut the deck
randomly into two packets, one containing \( k \) cards and the other
containing \( n-k \) cards. Choose the number of cards cut, \( k \),
according to the binomial density, meaning that the probability of the
cut occurring after \( k \) cards is exactly \( \frac{1}{2^n}\binom{n}{k}
\).
Once the deck is cut into two packets, interleave the cards from each
packet in any possible way, such that the cards from each packet keep
their own relative order. This means the cards originally in positions \(
1, 2, 3, \dots, k \) must still be in the same order after shuffling,
even if there are other cards in between. The same goes for cards
originally in positions \( k+1, k+2, \dots, n \). This requirement is
quite natural, considering how a person shuffles two packets of cards,
one in each hand. The cards in the left hand must still be in the same
relative order in the shuffled deck, no matter how they interleave with
the cards in the other packet, because the cards drop in order while
shuffling. The same goes for the cards in the right hand. See Figure~%
\ref{fig:cardshuffling:riffle} for an illustration of a riffle shuffle
on a \( 10 \)-card deck.
\begin{figure}
\centering
\begin{asy}
size(5inches);
real myfontsize = 12;
real mylineskip = 1.2*myfontsize;
pen mypen = fontsize(myfontsize, mylineskip);
defaultpen(mypen);
real eps = 0.1;
pair vert = (0, eps);
pair left = (-eps, 0);
pair right = (1.25, 0);
defaultpen(5);
path card = (0,0)--(1,0);
for(int i=0; i<6; ++i) {
draw( shift(i * vert) * card);
}
for(int i=6; i<10; ++i) {
draw( shift(left) * shift(i * vert) * card, red);
}
draw( shift(right) * shift( 0 * vert) * card );
draw( shift(right) * shift( 1 * vert) * card );
draw( shift(right) * shift( 2 * vert) * card );
draw( shift(left) * shift(right) * shift( 3 * vert) * card, red );
draw( shift(right) * shift( 4 * vert) * card );
draw( shift(left) * shift(right) * shift( 5 * vert) * card, red );
draw( shift(left) * shift(right) * shift( 6 * vert) * card, red );
draw( shift(right) * shift( 7 * vert) * card );
draw( shift(left) * shift(right) * shift( 8 * vert) * card, red );
draw( shift(right) * shift( 9 * vert) * card );
int[] Pi = {2, 4, 5, 7, 1, 3, 6, 8, 9, 10};
for( int i=0; i<10; ++i) {
label(string(10-i )+"$\qquad$"+string(Pi[9-i]), (2.75, eps * i));
}
label("$i\qquad\pi_i$", (2.75, eps * 10));
\end{asy}
\caption{A riffle shuffle on a $ 10 $-card deck cut into a top
packet of $ 4 $ cards and bottom packet of $ 6 $ cards.}%
\label{fig:cardshuffling:riffle}
\end{figure}
A special case of this is the \defn{perfect shuffle},%
\index{perfect
shuffle}
also known as the \defn{faro shuffle} wherein the two packets are
completely interleaved, one card from each hand following one card from
the other hand. A perfect shuffle is easy to describe but difficult to
perform, except for practiced magicians.
Choose among all possible interleavings uniformly with \( k \) locations
among \( n \) places for the first packet, fixing the locations for the
cards of the other packet. This is the well-known ``stars and bars''
counting argument, with the first packet playing the role of the
``stars'', the second packet the ``bars'' creating \( \binom{n}{k} \)
possible interleavings. With uniform choice, this means the probability
of any one interleaving has probability \( 1/\binom {n}{k} \) of
occurring. Hence the probability of any particular cut, followed by any
particular interleaving is \( \frac{1}{2^n}\binom{n}{k} \cdot 1/\binom{n}
{k} = \frac{1}{2^n} \). Note that this probability has no information
about the cut or the interleaving. The density on possible cuts and
interleaving is uniform,.
The uniform density on the set of cuts and interleavings now induces in
a natural way a density on the set of permutations. Call the density a
\emph{riffle shuffle} and denote it by \( R \). That is, \( R(\pi) \)
is the sum of probabilities of each cut and interleaving that gives the
rearrangement of the deck corresponding to \( \pi \). In short, the
chance of any arrangement of cards occurring under riffle shuffling is
the proportion of cuts and interleavings that give that arrangement.
\begin{example}
Consider the riffle shuffle on a \( 3 \)-card deck as a Markov
chain. The probability distribution for \( R \) is in Table~%
\ref{tab:cardshuffling:riffle3}. To obtain the entries in the
transition probability matrix, systematically go through the
possible cuts and interleavings. Cutting three cards into the left
packet, and none in the right packet, the only possible interleaving
trivially leaves the deck unchanged. With a cut into \( 2 \) cards
on the left, \( 1 \) card on the right, one interleaving drops the
right packet card on the bottom, the left packet cards as the top \(
2 \), leaving the deck unchanged. Two other interleavings move the
card in the right packet to the middle or the top. The other two
cuts are symmetric to the cuts described above, so \( 4 \) of the \(
8 \) cuts and interleavings keep the deck in the original order.
However, one shuffle each moves the formerly bottom card labeled \(
3 \) to the middle or top position, leaving cards \( 1 \) and \( 2 \)
in that order in the shuffled deck. A single riffle shuffle cannot
reverse the order of the deck.
\begin{table}
\centering
\caption{Probability distribution for a riffle shuffle
on a $ 3 $ card deck.}
\begin{tabular}{ccccccc}
$\pi$ & $[123]$ & $[213]$ & $[231]$ & $[132]$ & $[312]$ & $[321]$ \\
$Q(\pi)$ & $\frac{1}{2}$ & $\frac{1}{8}$ & $\frac{1}{8}$ & $\frac{1}{8}$ & $\frac{1}{8}$ & 0. \\
\end{tabular}%
\label{tab:cardshuffling:riffle3}
\end{table}
To obtain the entries in Table~%
\ref{tab:cardshuffling:riffle3} do the computation for a typical
element of the transition probability matrix, say \( p_{\pi,\sigma} \)
with \( \pi = [213] \) and \( \sigma = [132] \). Then \( \pi^{-1} = [213]
\) and \( \pi^{-1} \circ \sigma = [231] \). Now \( R([231]) = \frac{1}
{8} \), giving \( p_{[213] [132]} = \frac{1}{8} \) in the
probability transition matrix.
The full probability transition matrix under this ordering of the
permutations is
\[
\bordermatrix{ & [123] & [213] & [231] & [132] & [312] & [321]
\cr
[123] & \frac{1}{2} & \frac{1}{8} & \frac{1}{8} & \frac{1}
{8} & \frac{1}{8} & 0 \cr
[213] & \frac{1}{8} & \frac{1}{2} & \frac{1}{8} & \frac{1}
{8} & 0 & \frac{1}{8} \cr
[231] & \frac{1}{8} & \frac{1}{8} & \frac{1}{2} & 0
& \frac{1}{8} & \frac{1}{8} \cr
[132] & \frac{1}{8} & \frac{1}{8} & 0 & \frac{1}{2}
& \frac{1}{8} & \frac{1}{8} \cr
[312] & \frac{1}{8} & 0 & \frac{1}{8} & \frac{1}{8}
& \frac{1}{2} & \frac{1}{8} \cr
[321] & 0 & \frac{1}{8} & \frac{1}{8} & \frac{1}{8}
& \frac{1}{8} & \frac{1}{2} \cr
}.
\] Although in this case, the \( n=3 \) riffle shuffle, the matrix
is symmetric, this is in general not true, the riffle shuffle with
deck sizes greater than \( 3 \) is nonsymmetric, see the exercises.
\end{example}
First note that the Markov chain for riffle shuffling is regular, that
is, any permutation has a positive probability of appearing after
sufficiently many shuffles, see the exercises. In fact, any number of shuffles greater
than \( \log_2 n \) will do. Since the riffle shuffle Markov chain is
regular, there is a unique stationary density, which is the uniform
density on \( S_n \).
Starting with the identity ordering, the density of the permutations
after \( 7 \) riffle shuffles is the first row of \( P^7 \). With
matrix multiplication, the density is nearly uniform. In fact,
\[
P^7 =
\begin{pmatrix}
0.17059 & 0.16666 & 0.16666 & 0.16666 &
0.16666 & 0.16278 \\
0.16666 & 0.17059 & 0.16666 & 0.16666 &
0.16278 & 0.16666 \\
0.16666 & 0.16666 & 0.17059 & 0.16278 &
0.16666 & 0.16666 \\
0.16666 & 0.16666 & 0.16278 & 0.17059 &
0.16666 & 0.16666 \\
0.16666 & 0.16278 & 0.16666 & 0.16666 &
0.17059 & 0.16666 \\
0.16278 & 0.16666 & 0.16666 & 0.16666 &
0.16666 & 0.17059 \\
\end{pmatrix}
.
\] That is, \( 7 \) shuffles of the 3-card deck gets close to the
stationary density, which turns out to be the uniform density.
\subsection*{Probability of a Permutation Under Riffle Shuffle}
Define a \defn{rising sequence}%
\index{rising sequence}