### Abstract

The given research paper represents the self-sufficient research containing a profound definition of specific issue of the probability of similar birthdays. It should also be noted that it covers a profound data analysis and the key points of the research are represented in the structural analysis of the found information about similar birthdays’ paradox.

The following evaluates the results of the data analysis to determine whether or not the data conforms to the “probability theory” of birthdays. According to this coursework, the probability theory is based on the idea that in any set of personalities chosen at random, a pair of individuals will have their birthdays falling on the same day. According to the principles of origin, the probability of any two people sharing a birthday is 100% for a study that involves a total of 366 participants. Ideally, this is only true if the precondition that each of the 366 days in a year has an equal chance of holding a birthday.

The paper is well-organized due to which the theme is fully covered

### Research Methods

The data for this project was collected from seven groups of students who were all asked to give their birthdays for the purpose of the research. There were two datasets that were used: the junior school group and the senior school group. Within these datasets, the students were divided into year groups. In the dataset for the junior school, the year groups were called “Foundation 1”, “Foundation 2”, and “Year 1 to Year 6”, ranging from the students between the ages of 4 to 10. In the dataset for the senior school, the year groups were from “Year 7 up to Year 13”, ranging from the students between the ages of 12-18. The first group in the senior school was comprised of a total of forty-nine personalities, and these were randomly drawn from the Y10 category of the students aged from 15 to 16. The analysis of their birthdays revealed a probability of 97% that any two individuals in the group shared a birthday. The second group Y11, which consisted in total of fifty students from different year groups, also resulted in a probability of 97%. From the third year group the category Y12, a probability percentage of 80% was obtained from a group that consisted of forty-two students. In addition, the fourth and the fifth groups, Y6 and Y7, which consisted of twenty-one and fifty-four participants, had percentages of 50% and 97% respectively. Moreover, the research attained probabilities of 98% and 99% for the sixth and the seventh groups, Y8 and Y9, which had a total of 52 and 70 persons respectively. This formed the basis of the theoretical analysis of the “Birthday Paradox”.

Similar birthdays’ paradox is a seeming paradoxical statement that the probability of coincidence of birthdays (number and month) at least at two members out of the group consisting of 23 and more persons exceeds 50 %. For 60 and more persons the probability of such a coincidence exceeds 99 %, though it reaches 100 %, according to the Dirichlet’s principle, only when there are no less than 367 persons in a group (taking into account leap years).

Such a statement can seem unevident as the probability of coincidence of the birthdays of two persons in any day of a year (1/365 = 0.27 %) multiplied by a number of people in a group out of 23 gives only 23/365 = 6.3 %. This reasoning is incorrect, as the number of possible pairs (253) considerably exceeds a number of people in a group. Thus, the statement is not a paradox in a strict scientific sense – a logic contradiction is not present in it, and the paradox consists only in the distinctions between an intuitive perception of a situation by a person and the results of a mathematical calculation.

### Data Research

The analysis of the data obtained involved the computation of the approximate probability that two individuals in a group of N personalities would share the same birthday. The entire analysis is based on an assumption that each day of the year has an equal chance to have a birthday. This is, however, far from the truth since the distribution of real-life birthdays can never be uniform. Considering a group of 20 people, the P(A) is the probability that two members of the group share a birthday. This implies that the probability P (A’) that two members of the group do not share a birthday becomes 1 – {P (A)}. In the case of this research data, the P(A’) can be said to be a description of 20 independent events whereby the chances of all these events occurring equals the product of the individual possibility of each event occurring. For instance, the probability for the group of twenty people would be calculated in the following manner:

P(1)* P (2)* P (3)* P (4)*…*P (20).

One of the ways to understand at an intuitive level why the probability of coincidence of two persons’ birthdays is so high in a group of 23 persons consists in the comprehension of the following fact: as the probability of the shared birthdays at any two persons in a group is taken into consideration, this probability is defined by the quantity which can be made out of 23 people. As the order of people pairs has no value, a total number of such pairs is equal to a number of combinations from 23 to 2, that is 23 × 22/2 = 253 pairs. Having looked at this number, it is easy to understand that the probability of the coincidence of birthdays at least at one pair will be enough high.

The key moment here is that the statement of a paradox of birthdays speaks of a coincidence of birthdays at any two members of a group. One of the popular beliefs consists in the fact that this case is confused to another, similar at first sight, case when one person is chosen from a group and the probability of the fact that someone’s birthday from the a group will coincide with the birthday of the chosen person is estimated. In the latter case the probability of coincidence is much lower.

In the given example for the calculation of the probability of the fact that a group consisting of n people has at least two shared birthdays and the birthdays are distributed in regular intervals – there are no leap years, twins, and the birth rate does not depend on the day of a week, a season and other factors. Actually, it is not absolutely so – usually more children are born in summer; besides, more children are born in certain days of week in some countries because of the peculiarities of the work in hospitals. However, non-uniformity of distribution can increase only the probability of the shared birthdays but not reduce them: if all people were born only in 3 days out of 365, the probability of the shared birthdays would be very high.

At first, let us calculate that the probability of p (n) in a group from n birthdays of all people will be various.

If n does > 365 owe to the Dirichlet’s principle, the probability is equal to zero. If n ≤ 365 let us think as follows. If to take at random one person from a group and to remember his birthday, then to take at random the second person, thus, the probability of the fact that his birthday will not coincide to the birthday of the first person is equal to 1 – 1/365. Then, if to take the third person, thus, the probability of the fact that his birthday will not coincide with the birthdays of the first two is equal to 1 – 2/365. Arguing by analogy, we can reach the last person, the probability of discrepancy of his birthday with all previous will be equal to 1 – (n – 1)/365. Multiplying all these probabilities, we receive that the probability of all birthdays in a group will be various:

Therefore, the probability of the fact that the birthdays of at least two people will coincide is equal to:

The value of this function surpasses 1/2 at n = 23 (thus, the probability of coincidence is equal approximately to 50.7 %). The probabilities for some n values are illustrated in the following table:

 n p (n) 10 12 % 20 41 % 30 70 % 50 97 % 100 99,99996 % 200 99,9999999999999999999999999998 % 300 (1 − 7×10−73) × 100 % 350 (1 − 3×10−131) × 100 % 367 100 %

The analysis of person number two would have one previously analyzed person. As such, the probability that the birthday of person number two falls on a different date from person number one becomes 364/365. This simply implies that the birthday of person number two would fall on one of the 364 remaining days of the year because the birthday of person number one has already been removed from the count. The same trend would apply to person number three in that two slots have been occupied already by persons number one and two respectively. If this analysis proceeds up to person number twenty, the probability of person number twenty not sharing a similar birthday with individuals analyzed previously becomes 345/365. However, it is mathematically correct that P(A’) equals the product of the individual probabilities up to that of person number twenty. For instance:

Substituting n = 23, we receive q (n) approximately 5.9 %, only a little bit better than one chance from 17. To make the probability of coincidence of birthday with the set person exceed 50 %, the number of people in a group should be not less than 253. This number is appreciable more than a half of the days in a year (365/2 = 182.5); it occurs because of the fact that the birthdays of the other members of a group can coincide among themselves and it reduces the probability of coincidence of one of them with the birthday of a set person.

### Analysis of the Junior Dataset

For the group of the junior category that comprised of a total of 551 students, the analysis would be basically the same, only a little more involving. For instance, the second individual in the group would have a probability of having a similar birthday with another in the group calculated as follows: 551/551*550/551. This would typically give a big value considering that there is still a large pool of individuals to choose from. However, this would consequently go down with a similar trend giving the fourth person a slightly smaller probability of 551/551* 550/551* 549/551* 548/551. The overall probability for the whole group would then be a product of all these individual values.

### Analysis of the Senior Dataset

For the group of the senior category that comprised of a total of 345 students, the analysis would be the same as for the senior group. The second individual in the group would have a probability of having a similar birthday with another in the group calculated as follows: 345/345*344/345. This would typically give a big value considering that there is still a large pool of individuals to choose from. However, this would consequently go down with a similar trend giving the third person a slightly smaller probability of 345/345*344/345* 343/345. The overall probability for the whole group would then be a product of all these individual values. 