Die erwartete Zahl, auf der ich bin, nachdem ich Karten gezogen habe, bis ich ein Ass, 2, 3 usw. bekomme

12

Ich habe einige Probleme beim Lösen der folgenden Probleme.

Sie ziehen Karten von einem Standardstapel mit 52 Karten ohne Ersatz, bis Sie ein Ass erhalten. Du ziehst aus dem, was noch übrig ist, bis du eine 2 bekommst. Du machst mit 3 weiter. Was ist die erwartete Anzahl, auf der du sein wirst, nachdem das gesamte Deck aufgebraucht ist?

Es war natürlich zu lassen

$T_i = \text{first position of card whose value is }i$
$U_i = \text{last position of card whose value is }i$

Das Problem besteht also im Wesentlichen darin, herauszufinden, wie wahrscheinlich es ist, dass Sie sich auf $k$ wenn das Deck leer ist.

P r (T_{1} < \dots < T_{k} \cap U_{k + 1} < T_{k})

$Pr(T_1<\cdots<T_k \cap U_{k+1} < T_k)$

Ich kann sehen, dass

P r (T_{1} < \dots < T_{k}) = 1 / k! and P r (U_{k + 1} < T_{k}) = 1 / 70

$Pr(T_1<\cdots<T_k) = 1/k! \\ \text{and} \\ Pr(U_{k+1} < T_k) = 1/70$

konnte aber nicht weiter kommen ...

— Rechnung
quelle

1

Was passiert, wenn Sie zum Zeitpunkt Ihres ersten Asses bereits alle

2

$2$ Sekunden gezogen haben?

— gung - Wiedereinsetzung von Monica

Bedeutet "erwartete" Zahl wirklich "wahrscheinlichste" Zahl?

— whuber

Dies ist ein interessantes Problem, aber ich bin mir nicht sicher, was die Mathematik angeht, die Sie nach "das Problem ist im Wesentlichen" schreiben.

Sie in der ersten Aussage

\cap

$\cap$ eher

als

schreiben

\cup

$\cup$ ? Trotzdem bin ich mir nicht sicher, ob die Aussage richtig ist. Betrachten Sie einen Sequenzanfang 2AAA2. Wir haben

T_{1} = 2, T_{2} = 1

$T_1=2, T_2=1$ und somit

T_{1} > T_{2}

$T_1 > T_2$ , aber wenn ich Ihre Textbeschreibung richtig verstehe, können wir trotzdem das Ass an der zweiten Position und dann die 2 an der fünften Position auswählen? Und deshalb ist

T_{1} < T_{2}

$T_1 < T_2$ keine notwendige Bedingung?

— TooTone

@TooTone Oh, ich meinte

\cap

$\cap$ wie Sie gesagt haben , und Sie haben Recht;

T_{1} < T_{2}

$T_1 < T_2$ ist keine notwendige Bedingung ...

— Rechnung

@gung In diesem Fall ist dein Deck leer und du bist immer noch auf 2.

— Rechnung,

0

nach der idee von @ gung würde der erwartete wert meiner meinung nach 5,84 betragen? Aus meiner Interpretation der Kommentare gehe ich davon aus, dass "A" ein nahezu unmöglicher Wert ist (es sei denn, die letzten vier Karten im Deck sind alle Asse). Hier sind die Ergebnisse einer 100.000-Iterationen-Monte-Carlo-Simulation

results
    2     3     4     5     6     7     8     9     J     K     Q     T 
 1406  7740 16309 21241 19998 15127  9393  4906   976   190   380  2334

Und hier ist der R-Code, falls Sie damit spielen möchten.

# monte carlo card-drawing functions from here
# http://streaming.stat.iastate.edu/workshops/r-intro/lectures/5-Rprogramming.pdf

# create a straightforward deck of cards
create_deck <-
    function( ){
        suit <- c( "H" , "C" , "D" , "S" )
        rank <- c( "A" , 2:9 , "T" , "J" , "Q" , "K" )
        deck <- NULL
        for ( r in rank ) deck <- c( deck , paste( r , suit ) )
        deck
    }

# construct a function to shuffle everything
shuffle <- function( deck ){ sample( deck , length( deck ) ) }

# draw one card at a time
draw_cards <-
    function( deck , start , n = 1 ){
        cards <- NULL

        for ( i in start:( start + n - 1 ) ){
            if ( i <= length( deck ) ){
                cards <- c( cards , deck[ i ] )
            }
        }

        return( cards )
    }

# create an empty vector for your results
results <- NULL

# run your simulation this many times..
for ( i in seq( 100000 ) ){
    # create a new deck
    sdeck <- shuffle( create_deck() )

    d <- sdeck[ grep('A|2' , sdeck ) ]
    e <- identical( grep( "2" , d ) , 1:4 )

    # loop through ranks in this order
    rank <- c( "A" , 2:9 , "T" , "J" , "Q" , "K" )

    # start at this position
    card.position <- 0

    # start with a blank current.draw
    current.draw <- ""

    # start with a blank current rank
    this.rank <- NULL

    # start with the first rank
    rank.position <- 1

    # keep drawing until you find the rank you wanted
    while( card.position < 52 ){

        # increase the position by one every time
        card.position <- card.position + 1

        # store the current draw for testing next time
        current.draw <- draw_cards( sdeck , card.position )

        # if you draw the current rank, move to the next.
        if ( grepl( rank[ rank.position ] , current.draw ) ) rank.position <- rank.position + 1

        # if you have gone through every rank and are still not out of cards,
        # should it still be a king?  this assumes yes.
        if ( rank.position == length( rank ) ) break        

    }

    # store the rank for this iteration.
    this.rank <- rank[ rank.position ]

    # at the end of the iteration, store the result
    results <- c( results , this.rank )

}

# print the final results
table( results )

# make A, T, J, Q, K numerics
results[ results == 'A' ] <- 1
results[ results == 'T' ] <- 10
results[ results == 'J' ] <- 11
results[ results == 'Q' ] <- 12
results[ results == 'K' ] <- 13
results <- as.numeric( results )

# and here's your expected value after 100,000 simulations.
mean( results )

— Anthony Damico
quelle

Warum ist Aunmöglich? Betrachten Sie AAAAzum Beispiel die Folge von 48 Karten, gefolgt von .

— TooTone

du hast recht .. es ist einer von 270725 - oder mit R-Code1/prod( 48:1 / 52:5 )

— Anthony Damico

1

Diese Antwort ist falsch. Betrachten Sie die Anzahl für "2": Da dies nur dann resultieren kann, wenn alle 2en vor einer der 1en angetroffen werden, ist ihre Wahrscheinlichkeit eins in jeder

und daher beträgt die Erwartung in Ihrer Simulation

(\binom{8}{4}) = 70

$\binom{8}{4}=70$

mit einem Standardfehler von

. Ihre Ausgabe von

ist über sechs Standardfehlern zu hoch, was sie mit ziemlicher Sicherheit fehlerhaft macht. Ein genauer Wert für den Mittelwert (basierend auf einer anderen Simulation mit

Iterationen) beträgt

.

10^{5} / (\binom{8}{4}) \approx 1428.6

$10^5/\binom{8}{4}\approx 1428.6$

37.5

$37.5$

1660

$1660$

10^{6}

$10^6$

5.833 \pm 0.004

$5.833\pm 0.004$

— Whuber

1

Ihr stark dokumentierter Code ist leider um ein Vielfaches länger und langsamer als nötig. Ich habe gezeigt, dass die Ausgabe falsch ist. obwohl ich wünschte, ich hätte die Zeit, um Ihren Code zu debuggen, tue ich das nicht und es ist nicht meine Aufgabe, das zu tun. Mein Argument lautet: Sie werden am Ende immer noch an "2" arbeiten, wenn alle "2" vor allen "A" stehen. Unter den

gleichwahrscheinliche Arten, die vier "2" und vier "A" anzuordnen, genau eine von ihnen erfüllt dieses Kriterium. DeshalbIhr Wertunter der Überschrift „2“ sollteNähe sein

, aber es ist nicht.

(\binom{4 + 4}{4}) = 70

$\binom{4+4}{4}=70$ results

10^{5} / 70 = 1429

$10^5/70=1429$

— whuber

1

Selbst Moderatoren können die Stimmen anderer nicht entfernen :-). Ein Chi-Quadrat-Test legt nahe, dass Ihre Ergebnisse mit meinen übereinstimmen, aber es wäre schön zu wissen, wie Sie Ihre Simulation getestet haben, da dies das Vertrauen in Ihre Antwort verbessern würde. Nach einer Änderung, die Sie am ersten Absatz Ihrer Antwort vorgenommen haben, sind nun beide Ergebnisse falsch: Wie ich Ihre Frage interpretiert habe, ist es niemals möglich, noch an einem Ass zu arbeiten, wenn alle Karten erschöpft sind.

— Whuber

7

Für eine Simulation ist es entscheidend zu sein richtig als auch schnell zu sein. Beide Ziele schlagen vor, Code zu schreiben, der auf die Kernfunktionen der Programmierumgebung abzielt, sowie Code, der so kurz und einfach wie möglich ist, da Einfachheit Klarheit verleiht und Klarheit die Korrektheit fördert. Hier ist mein Versuch, beides zu erreichenR:

#
# Simulate one play with a deck of `n` distinct cards in `k` suits.
#
sim <- function(n=13, k=4) {
  deck <- sample(rep(1:n, k)) # Shuffle the deck
  deck <- c(deck, 1:n)        # Add sentinels to terminate the loop
  k <- 0                      # Count the cards searched for
  for (j in 1:n) {
    k <- k+1                          # Count this card
    deck <- deck[-(1:match(j, deck))] # Deal cards until `j` is found
    if (length(deck) < n) break       # Stop when sentinels are reached
  }
  return(k)                   # Return the number of cards searched
}

Dies anwenden reproduzierbare kann mit der replicateFunktion nach dem Setzen des Zufallszahlen-Seeds erfolgen, wie in

> set.seed(17);  system.time(d <- replicate(10^5, sim(13, 4)))
   user  system elapsed 
   5.46    0.00    5.46

Das ist langsam, aber schnell genug, um ziemlich lange (und daher präzise) Simulationen wiederholt durchzuführen, ohne zu warten. Es gibt verschiedene Möglichkeiten, wie wir das Ergebnis darstellen können. Beginnen wir mit dem Mittelwert:

> n <- length(d)
> mean(d)
[1] 5.83488

> sd(d) / sqrt(n)
[1] 0.005978956

Letzteres ist der Standardfehler: Wir erwarten, dass der simulierte Mittelwert innerhalb von zwei oder drei SEs vom wahren Wert liegt. Die wahre Erwartung liegt also irgendwo zwischen und $5.817$ $5.853$ .

Möglicherweise möchten wir auch eine Tabelle der Frequenzen (und ihrer Standardfehler) sehen. Der folgende Code verschönert die Tabellierung ein wenig:

u <- table(d)
u.se <- sqrt(u/n * (1-u/n)) / sqrt(n)
cards <- c("A", "2", "3", "4", "5", "6", "7", "8", "9", "T", "J", "Q", "K")
dimnames(u) <- list(sapply(dimnames(u), function(x) cards[as.integer(x)]))
print(rbind(frequency=u/n, SE=u.se), digits=2)

Hier ist die Ausgabe:

                2       3      4      5      6      7       8       9       T       J       Q       K
frequency 0.01453 0.07795 0.1637 0.2104 0.1995 0.1509 0.09534 0.04995 0.02249 0.01009 0.00345 0.00173
SE        0.00038 0.00085 0.0012 0.0013 0.0013 0.0011 0.00093 0.00069 0.00047 0.00032 0.00019 0.00013

Woher wissen wir, dass die Simulation überhaupt korrekt ist? Eine Möglichkeit besteht darin, es ausführlich auf kleinere Probleme zu testen. Aus diesem Grund wurde dieser Code geschrieben, um eine kleine Verallgemeinerung des Problems anzugreifen, indem verschiedene Karten durch und $13$ n $4$ Farben durch ersetzt wurden k. Für das Testen ist es jedoch wichtig, den Code einem Deck in einer vorgegebenen Reihenfolge zuführen zu können. Schreiben wir eine etwas andere Schnittstelle zum selben Algorithmus:

draw <- function(deck) {
  n <- length(sentinels <- sort(unique(deck)))
  deck <- c(deck, sentinels)
  k <- 0
  for (j in sentinels) {
    k <- k+1
    deck <- deck[-(1:match(j, deck))]
    if (length(deck) < n) break
  }
  return(k)
}

(Es ist möglich, drawanstelle von zu verwendensim überall zu verwenden, aber die zusätzliche Arbeit, die zu Beginn ausgeführt wird, drawmacht es doppelt so langsam wie sim.)

Wir können dies nutzen, indem wir es auf alle anwenden einzelne Shuffle eines bestimmten Decks . Da der Zweck hier nur einige einmalige Tests sind, ist die Effizienz bei der Erzeugung dieser Mischvorgänge unwichtig. Hier ist ein schneller Brute-Force-Weg:

n <- 4 # Distinct cards
k <- 2 # Number of suits
d <- expand.grid(lapply(1:(n*k), function(i) 1:n))
e <- apply(d, 1, function(x) var(tabulate(x))==0)
g <- apply(d, 1, function(x) length(unique(x))==n)
d <- d[e & g,]

Jetzt dist ein Datenrahmen, dessen Zeilen alle Mischvorgänge enthalten. Wenden Sie drawauf jede Zeile an und zählen Sie die Ergebnisse:

d$result <- apply(as.matrix(d), 1, draw)
    (counts <- table(d$result))

Die Ausgabe (die wir in einem formalen Test vorübergehend verwenden werden) ist

   2    3    4 
 420  784 1316

(Der Wert von ist übrigens leicht zu verstehen: Wir würden immer noch an Karte wenn alle Zweien vor allen Assen stünden . Die Chance, dass dies passiert (mit zwei Farben), ist $420$ $2$ . Von denverschiedenen Shuffles sind $1/\binom{2+2}{2} = 1/6$ $2520$ $2520/6 = 420$ haben diese Eigenschaft.)

Wir können die Ausgabe mit einem Chi-Quadrat-Test testen. Zu diesem Zweck wende ich sim mal in diesem Fall von verschiedene Karten in Anzüge: $10,000$ $n = 4$ $k = 2$

>set.seed(17)
>d.sim <- replicate(10^4, sim(n, k))
>print((rbind(table(d.sim) / length(d.sim), counts / dim(d)[1])), digits=3)

         2     3     4
[1,] 0.168 0.312 0.520
[2,] 0.167 0.311 0.522

> chisq.test(table(d.sim), p=counts / dim(d)[1])

    Chi-squared test for given probabilities

data:  table(d.sim) 
X-squared = 0.2129, df = 2, p-value = 0.899

Weil so hoch ist, finden wir keinen signifikanten Unterschied zwischen dem, was sagt, und den Werten, die durch erschöpfende Aufzählung berechnet wurden. Wenn Sie diese Übung für einige andere (kleine) Werte von und wiederholen, erhalten Sie vergleichbare Ergebnisse. Dies gibt uns Anlass, auf und zu vertrauen $p$ sim $n$ $k$ sim $n=13$ $k=4$ .

Schließlich vergleicht ein Chi-Quadrat-Test mit zwei Stichproben die Ausgabe von simmit der in einer anderen Antwort angegebenen Ausgabe:

>y <- c(1660,8414,16973,21495,20021,14549,8957,4546,2087,828,313,109)
>chisq.test(cbind(u, y))

data:  cbind(u, y) 
X-squared = 142.2489, df = 11, p-value < 2.2e-16

Die enorme Chi-Quadrat-Statistik erzeugt einen p-Wert, der im Wesentlichen Null ist: Ohne Zweifel ist simer mit der anderen Antwort nicht einverstanden. Es gibt zwei mögliche Lösungen für die Meinungsverschiedenheit: Eine (oder beide!) Dieser Antworten ist falsch oder sie setzen unterschiedliche Interpretationen der Frage um. Zum Beispiel habe ich so interpretiert, „nachdem das Deck abläuft“ , nachdem die letzte Karte zu beobachten und, falls zulässig, die „Nummer , die Sie wird auf“ Aktualisierung , bevor das Verfahren beendet wird . Es ist vorstellbar, dass dieser letzte Schritt nicht getan werden sollte. Vielleicht erklärt ein derart subtiler Unterschied in der Interpretation die Meinungsverschiedenheit. An diesem Punkt können wir die Frage ändern, um klarer zu machen, was gefragt wird.

— whuber
quelle

4

Es gibt eine genaue Antwort (in Form eines Matrixprodukts, dargestellt in Punkt 4 unten). Aus diesen Beobachtungen ergibt sich ein einigermaßen effizienter Algorithmus für die Berechnung:

Eine zufällige Mischung von Karten kann durch zufälliges Mischen von Karten und anschließendes zufälliges Verteilen der verbleibenden Karten erzeugt werden $N+k$ $N$ $k$ Karten in diesen Karten erzeugt werden.
Indem Sie nur die Asse mischen und dann (unter Anwendung der ersten Beobachtung) die Zweien, dann die Dreien und so weiter durchmischen, kann dieses Problem als eine Kette von dreizehn Schritten angesehen werden.
Wir müssen mehr als den Wert der gesuchten Karte im Auge behalten. Dabei müssen wir jedoch nicht die Position der Marke in Bezug auf alle Karten berücksichtigen, sondern nur die Position in Bezug auf Karten mit gleichem oder kleinerem Wert.

Stellen Sie sich vor, Sie setzen eine Markierung auf das erste Ass und markieren dann die ersten beiden, die danach gefunden wurden, und so weiter. (Wenn zu irgendeinem Zeitpunkt der Stapel leer wird, ohne dass die Karte angezeigt wird, die wir gerade suchen, bleiben alle Karten unmarkiert.) Der "Platz" jeder Markierung (sofern vorhanden) entspricht der Anzahl der Karten mit dem gleichen oder einem niedrigeren Wert wurden ausgeteilt, als die Marke gemacht wurde (einschließlich der markierten Karte selbst). Die Orte enthalten alle wesentlichen Informationen.
Die Stelle nach der Markierung ist eine Zufallszahl. Für ein bestimmtes Deck bildet die Reihenfolge dieser Orte einen stochastischen Prozess. Es ist in der Tat ein Markov-Prozess (mit variabler Übergangsmatrix). Eine genaue Antwort kann daher aus zwölf Matrixmultiplikationen berechnet werden. $i^\text{th}$

Unter Verwendung dieser Ideen erhält diese Maschine ein Wert von (computing in double precision floating point) in Sekunden. Diese Annäherung des genauen Wertes $5.8325885529019965$ $1/9$

\frac{1982600579265894785026945331968939023522542569}{339917784579447928182134345929899510000000000}

$\frac{1982600579265894785026945331968939023522542569}{339917784579447928182134345929899510000000000}$ mit allen angezeigten Ziffern .

Der Rest dieses Beitrags enthält Details, stellt eine funktionierende Implementierung vor (in R) und schließt mit einigen Kommentaren zu der Frage und der Effizienz der Lösung.

Zufälliges Mischen eines Decks

Tatsächlich ist es konzeptionell klarer und mathematisch nicht komplizierter, ein "Deck" (auch bekannt als Multiset ) von Karten zu betrachten, von denen es mit dem niedrigsten Nennwert gibt, mit dem nächsten am niedrigsten und so weiter. (Die gestellte Frage betrifft das vom Vektor festgelegte Deck $N = k_1+k_2+\cdots+k_m$ $k_1$ $k_2$ $13$ $(4,4,\ldots,4)$ .)

Ein "zufälliges Mischen" von Karten ist eine Permutation, die gleichmäßig und zufällig aus dem entnommen wird Permutationen der Karten. Diese Shuffles fallen in Gruppen äquivalenter Konfigurationen, weil das Permutieren der "Asse" untereinander nichts ändert, das Permutieren der "Zweien" untereinander ebenfalls nichts ändert und so weiter. Daher enthält jede Gruppe von Permutationen, die identisch aussehen, wenn die Farben der Karten ignoriert werden, $N$ $N! = N\times(N-1)\times\cdots\times 2\times 1$ $N$ $k_1$ $k_2$ Permutationen. Diese Gruppen, deren Anzahl sich daher aus demMultinomialkoeffizienten ergibt $k_1!\times k_2!\times \cdots \times k_m!$

(\binom{N}{k_{1}, k_{2}, \dots, k_{m}}) = \frac{N!}{k_{1}! k_{2}! \dots k_{m}!},

$\binom{N}{k_1,k_2,\ldots,k_m} = \frac{N!}{k_1!k_2!\cdots k_m!},$

werden "Kombinationen" des Decks genannt.

Es gibt eine andere Möglichkeit, die Kombinationen zu zählen. Die ersten -Karten können nur Kombination. Sie belassen "Slots" zwischen und um sie herum, in die die nächsten Karten gelegt werden können. Wir könnten dies mit einem Diagramm anzeigen, in dem " " eine der -Karten und " " einen Steckplatz bezeichnet, der zwischen und zusätzliche Karten aufnehmen kann: $k_1$ $k_1!/k_1! = 1$ $k_1+1$ $k_2$ $*$ $k_1$ $\_$ $0$ $k_2$

\underset{k_{1} stars}{\underset{⏟}{_*_*_\dots_*_}}

$\underbrace{\_*\_*\_\cdots\_*\_}_{k_1\text{ stars}}$

$k_2$ $k_1+k_2$ $\binom{k_1+k_2}{k_1,k_2} = \frac{(k_1+k_2)!}{k_1!k_2!}$

$k_3$ "threes," we find there are $\binom{(k_1+k_2)+k_3}{k_1+k_2,k_3}= \frac{(k_1+k_2+k_3)!}{(k_1+k_2)!k_3!}$ ways to intersperse them among the first $k_1+k_2$ cards. Therefore the total number of distinct ways to arrange the first $k_1+k_2+k_3$ cards in this manner equals

1 \times \frac{(k_{1} + k_{2})!}{k_{1}! k_{2}!} \times \frac{(k_{1} + k_{2} + k_{3})!}{(k_{1} + k_{2})! k_{3}!} = \frac{(k_{1} + k_{2} + k_{3})!}{k_{1}! k_{2}! k_{3}!} .

$1\times\frac{(k_1+k_2)!}{k_1!k_2!}\times\frac{(k_1+k_2+k_3)!}{(k_1+k_2)!k_3!} = \frac{(k_1+k_2+k_3)!}{k_1!k_2!k_3!}.$

After finishing the last $k_n$ cards and continuing to multiply these telescoping fractions, we find that the number of distinct combinations obtained equals the total number of combinations as previously counted, $\binom{N}{k_1,k_2,\ldots,k_m}$ . Therefore we have overlooked no combinations. That means this sequential process of shuffling the cards correctly captures the probabilities of each combination, assuming that at each stage each possible distinct way of interspersing the new cards among the old is taken with uniformly equal probability.

The place process

Initially, there are $k_1$ aces and obviously the very first is marked. At later stages there are $n = k_1 + k_2 + \cdots + k_{j-1}$ cards, the place (if a marked card exists) equals $p$ (some value from $1$ through $n$ ), and we are about to intersperse $k=k_j$ cards around them. We can visualize this with a diagram like

\underset{p - 1 stars}{\underset{⏟}{_*_*_\dots_*_}} ⊙ \underset{n - p stars}{\underset{⏟}{_*_\dots_*_}}

$\underbrace{\_*\_*\_\cdots\_*\_}_{p-1\text{ stars}}\odot\underbrace{\_*\_\cdots\_*\_}_{n-p\text{ stars}}$

where " $\odot$ " designates the currently marked symbol. Conditional on this value of the place $p$ , we wish to find the probability that the next place will equal $q$ (some value from $1$ through $n+k$ ; by the rules of the game, the next place must come after $p$ , whence $q\ge p+1$ ). If we can find how many ways there are to intersperse the $k$ new cards in the blanks so that the next place equals $q$ , then we can divide by the total number of ways to intersperse these cards (equal to $\binom{n+k}{k}$ , as we have seen) to obtain the transition probability that the place changes from $p$ to $q$ . (There will also be a transition probability for the place to disappear altogether when none of the new cards follow the marked card, but there is no need to compute this explicitly.)

Let's update the diagram to reflect this situation:

\underset{p - 1 stars}{\underset{⏟}{_*_*_\dots_*_}} ⊙ \underset{s stars}{\underset{⏟}{* * \dots *}} | \underset{n - p - s stars}{\underset{⏟}{_*_\dots_*_}}

$\underbrace{\_*\_*\_\cdots\_*\_}_{p-1\text{ stars}}\odot\underbrace{**\cdots*}_{s\text{ stars}}\ \vert\ \underbrace{\_*\_\cdots\_*\_}_{n-p-s\text{ stars}}$

The vertical bar " $\vert$ " shows where the first new card occurs after the marked card: no new cards may therefore appear between the $\odot$ and the $\vert$ (and therefore no slots are shown in that interval). We do not know how many stars there are in this interval, so I have just called it $s$ (which may be zero) The unknown $s$ will disappear once we find the relationship between it and $q$ .

Suppose, then, we intersperse $j$ new cards around the stars before the $\odot$ and then--independently of that--we intersperse the remaining $k-j-1$ new cards around the stars after the $\vert$ . There are

τ_{n, k} (s, p) = (\binom{(p - 1) + j}{j}) (\binom{(n - p - s) + (k - j) - 1}{k - j - 1})

$\tau_{n,k}(s,p) = \binom{(p-1)+j}{j}\binom{(n-p-s) + (k-j)-1}{k-j-1}$

ways to do this. Notice, though--this is the trickiest part of the analysis--that the place of $\vert$ equals $p+s+j+1$ because

There are $p$ "old" cards at or before the mark.
There are $s$ old cards after the mark but before $\vert$ .
There are $j$ new cards before the mark.
There is the new card represented by $\vert$ itself.

Thus, $\tau_{n,k}(s,p)$ gives us information about the transition from place $p$ to place $q=p+s+j+1$ . When we track this information carefully for all possible values of $s$ , and sum over all these (disjoint) possibilities, we obtain the conditional probability of place $q$ following place $p$ ,

{Pr}_{n, k} (q | p) = (\sum_{j} (\binom{p - 1 + j}{j}) (\binom{n + k - q}{k - j - 1})) / (\binom{n + k}{k})

${\Pr}_{n,k}(q|p) = \left(\sum_j \binom{p-1+j}{j}\binom{n+k-q}{k-j-1}\right) / \binom{n+k}{k}$

where the sum starts at $j=\max(0, q-(n+1))$ and ends at $j=\min(k-1, q-(p+1)$ . (The variable length of this sum suggests there is unlikely to be a closed formula for it as a function of $n, k, q,$ and $p$ , except in special cases.)

The algorithm

Initially there is probability $1$ that the place will be $1$ and probability $0$ it will have any other possible value in $2, 3, \ldots, k_1$ . This can be represented by a vector $p_1 = (1, 0, \ldots, 0)$ .

After interspersing the next $k_2$ cards, the vector $p_1$ is updated to $p_2$ by multiplying it (on the left) by the transition matrix $(\Pr_{k_1,k_2}(q|p), 1\le p\le k_1, 1\le q\le k_2)$ . This is repeated until all $k_1+k_2+\cdots+k_m$ cards have been placed. At each stage $j$ , the sum of the entries in the probability vector $p_j$ is the chance that some card has been marked. Whatever remains to make the value equal to $1$ therefore is the chance that no card is left marked after step $j$ . The successive differences in these values therefore give us the probability that we could not find a card of type $j$ to mark: that is the probability distribution of the value of the card we were looking for when the deck runs out at the end of the game.

Implementation

The following R code implements the algorithm. It parallels the preceding discussion. First, calculation of the transition probabilities is performed by t.matrix (without normalization with the division by $\binom{n+k}{k}$ , making it easier to track the calculations when testing the code):

t.matrix <- function(q, p, n, k) {
  j <- max(0, q-(n+1)):min(k-1, q-(p+1))
  return (sum(choose(p-1+j,j) * choose(n+k-q, k-1-j))
}

This is used by transition to update $p_{j-1}$ to $p_j$ . It calculates the transition matrix and performs the multiplication. It also takes care of computing the initial vector $p_1$ if the argument p is an empty vector:

#
# `p` is the place distribution: p[i] is the chance the place is `i`.
#
transition <- function(p, k) {
  n <- length(p)
  if (n==0) {
    q <- c(1, rep(0, k-1))
  } else {
    #
    # Construct the transition matrix.
    #
    t.mat <- matrix(0, nrow=n, ncol=(n+k))
    #dimnames(t.mat) <- list(p=1:n, q=1:(n+k))
    for (i in 1:n) {
      t.mat[i, ] <- c(rep(0, i), sapply((i+1):(n+k), 
                                        function(q) t.matrix(q, i, n, k)))
    }
    #
    # Normalize and apply the transition matrix.
    #
    q <- as.vector(p %*% t.mat / choose(n+k, k))
  }
  names(q) <- 1:(n+k)
  return (q)
}

We can now easily compute the non-mark probabilities at each stage for any deck:

#
# `k` is an array giving the numbers of each card in order;
# e.g., k = rep(4, 13) for a standard deck.
#
# NB: the *complements* of the p-vectors are output.
#
game <- function(k) {
  p <- numeric(0)
  q <- sapply(k, function(i) 1 - sum(p <<- transition(p, i)))
  names(q) <- names(k)
  return (q)
}

Here they are for the standard deck:

k <- rep(4, 13)
names(k) <- c("A", 2:9, "T", "J", "Q", "K")
(g <- game(k))

The output is

         A          2          3          4          5          6          7          8          9          T          J          Q          K 
0.00000000 0.01428571 0.09232323 0.25595013 0.46786622 0.66819134 0.81821790 0.91160622 0.96146102 0.98479430 0.99452614 0.99818922 0.99944610

According to the rules, if a king was marked then we would not look for any further cards: this means the value of $0.9994461$ has to be increased to $1$ . Upon doing so, the differences give the distribution of the "number you will be on when the deck runs out":

> g[13] <- 1; diff(g)
          2           3           4           5           6           7           8           9           T           J           Q           K 
0.014285714 0.078037518 0.163626897 0.211916093 0.200325120 0.150026562 0.093388313 0.049854807 0.023333275 0.009731843 0.003663077 0.001810781

(Compare this to the output I report in a separate answer describing a Monte-Carlo simulation: they appear to be the same, up to expected amounts of random variation.)

The expected value is immediate:

> sum(diff(g) * 2:13)
[1] 5.832589

All told, this required only a dozen lines or so of executable code. I have checked it against hand calculations for small values of $k$ (up to $3$ ). Thus, if any discrepancy becomes apparent between the code and the preceding analysis of the problem, trust the code (because the analysis may have typographical errors).

Remarks

Relationships to other sequences

When there is one of each card, the distribution is a sequence of reciprocals of whole numbers:

> 1/diff(game(rep(1,10)))
[1]      2      3      8     30    144    840   5760  45360 403200

The value at place $i$ is $i! + (i-1)!$ (starting at place $i=1$ ). This is sequence A001048 in the Online Encyclopedia of Integer Sequences. Accordingly, we might hope for a closed formula for the decks with constant $k_i$ (the "suited" decks) that would generalize this sequence, which itself has some profound meanings. (For instance, it counts sizes of the largest conjugacy classes in permutation groups and is also related to trinomial coefficients.) (Unfortunately, the reciprocals in the generalization for $k\gt 1$ are not usually integers.)

The game as a stochastic process

Our analysis makes it clear that the initial $i$ coefficients of the vectors $p_j$ , $j\ge i$ , are constant. For example, let's track the output of game as it processes each group of cards:

> sapply(1:13, function(i) game(rep(4,i)))

[[1]]
[1] 0

[[2]]
[1] 0.00000000 0.01428571

[[3]]
[1] 0.00000000 0.01428571 0.09232323

[[4]]
[1] 0.00000000 0.01428571 0.09232323 0.25595013

...

[[13]]
 [1] 0.00000000 0.01428571 0.09232323 0.25595013 0.46786622 0.66819134 0.81821790 0.91160622 0.96146102 0.98479430 0.99452614 0.99818922 0.99944610

For instance, the second value of the final vector (describing the results with a full deck of 52 cards) already appeared after the second group was processed (and equals $1/\binom{8}{4}=1/70$ ). Thus, if you want information only about the marks up through the $j^\text{th}$ card value, you only have to perform the calculation for a deck of $k_1+k_2+\cdots+k_j$ cards.

Because the chance of not marking a card of value $j$ is getting quickly close to $1$ as $j$ increases, after $13$ types of cards in four suits we have almost reached a limiting value for the expectation. Indeed, the limiting value is approximately $5.833355$ (computed for a deck of $4 \times 32$ cards, at which point double precision rounding error prevents going any further).

Timing

Looking at the algorithm applied to the $m$ -vector $(k,k, \ldots, k)$ , we see its timing should be proportional to $k^2$ and--using a crude upper bound--not any worse than proportional to $m^3$ . By timing all calculations for $k=1$ through $7$ and $n=10$ through $30$ , and analyzing only those taking relatively long times ( $1/2$ second or longer), I estimate the computation time is approximately $O(k^2 n^{2.9})$ , supporting this upper-bound assessment.

One use of these asymptotics is to project calculation times for larger problems. For instance, seeing that the case $k=4, n=30$ takes about $1.31$ seconds, we would estimate that the (very interesting) case $k=1, n=100$ would take about $1.31(1/4)^2(100/30)^{2.9}\approx 2.7$ seconds. (It actually takes $2.87$ seconds.)

— whuber
quelle

0

Hacked a simple Monte Carlo in Perl and found approximately $5.8329$ .

#!/usr/bin/perl

use strict;

my @deck = (1..13) x 4;

my $N = 100000; # Monte Carlo iterations.

my $mean = 0;

for (my $i = 1; $i <= $N; $i++) {
    my @d = @deck;
    fisher_yates_shuffle(\@d);
    my $last = 0;
        foreach my $c (@d) {
        if ($c == $last + 1) { $last = $c }
    }
    $mean += ($last + 1) / $N;
}

print $mean, "\n";

sub fisher_yates_shuffle {
    my $array = shift;
        my $i = @$array;
        while (--$i) {
        my $j = int rand($i + 1);
        @$array[$i, $j] = @$array[$j, $i];
    }
}

— Zen
quelle

Given the sharp discrepancy between this and all the previous answers, including two simulations and a theoretical (exact) one, I suspect you are interpreting the question in a different way. In the absence of any explanation on your part, we just have to take it as being wrong. (I suspect you may be counting one less, in which case your 4.8 should be compared to 5.83258...; but even then, your two significant digits of precision provide no additional insight into this problem.)

— whuber

1

Yep! There was an off-by-one mistake.

— Zen