One of the most important functions of the working statistician/mathematician is to independently investigate a particular advanced topics and then distill it down to its most fundamental elements so that it can be easily and effectively disseminated to others.
The purpose of this project is two-fold: to demonstrate semester-long cumulative knowledge of the theory of mathematical statistics, and to investigate the framework and application of a particular statistical method or procedure. As part of your investigation, you should apply the method to a particular data set to answer a research question of interest.
You will work either individually or in small groups to research a topic in mathematical statistics beyond the scope of what we’ve covered in class, and then share your results in a 5 - 10 page research paper.
This is an opportunity to either investigate an area of interest that isn’t covered in class this semester, or to dive deeper into an particular topic that was only dealt with briefly in class. A list of possible topics can be found at the end of this document.
Your final research paper should accomplish each of the following:
Discuss any necessary background information needed to understand your topic.
Provide context, motivation, and history indicating how your topic fits into the field of mathematical statistics.
Incorporate significant material from one or more references beyond our class textbook.
Give an explicit and rigorous statement of the subject of your research, as it appears in the literature.
Present your own informal interpretation of what the topic means or represents.
Apply your method or procedure to at least one data set (either simulated or real), in order to answer a research question of interest.
Discuss and provide proof or derivation of one significant theorem or result related to your topic.
Explain how your method relates to other topics studied in our course.
Suggest related areas for further research on the topic.
A 1-page proposal for your research project is due by 11:59pm on Friday, April 14th. This proposal should outline the scope and intent of your project. Although changes to the direction of the project are possible after this point, this timeline should give both you and I the opportunity to ensure the project is appropriate for this assignment.
Only one proposal needs to be submitted per group. The proposal should include:
A list of all members of your group.
The topic your group intends to research.
A 1-2 paragraph description of the topic you wish to explore, which should describe briefly how this topic fits with other topics we’ve explore in this class, and also address why someone might be interested in your investigation.
A 1 paragraph description of data that you plan to analyze using your method. This data can either represent simulated data according to a particular model, or can be acquired from a real source. In the former case, you should describe how you plan to simulate the data. In the latter case, your group should explicitly identify the data and verify that it can be obtained and analyzed in R.
A description of several possible research questions involving the data that might be answerable using your method or procedure, along with a brief indication of why you believe your method can be used to answer these questions.
A list of at least two potential sources other than our course textbook that can be used in your investigation. Full citation is not necessary, but you should include enough information about the source so that it can be found online or in the library without too much difficulty.
The research paper is your opportunity to provide a technical and in-depth treatment of your research topic. The target audience for your paper is your classmates, who you can assume are familiar with the topics discussed throughout STA 335 - 336, have significant level of statistical and mathematical maturity, and can follow detailed technical arguments comparable to those presented in the textbook.
However, you should not assume that your audience has any in-depth knowledge of your particular research topic, or knows the precise statements of definitions, theorems and examples relevant to your topic.
A draft of your research paper is due by 11:59pm on Friday, May 12th. The final draft of your research paper is due by 5pm on Friday, May 19th.
During finals week, your classmates will have opportunity to review and reflect on your research paper.
Only one research paper needs to be completed and submitted per group. However, each group member is responsible for completing a 1-page project reflection, due the last day of finals week.
In addition to accomplishing the objectives listed in the Overview section above, your paper must…
Be typed single-spaced using 10 - 12 pt font.
Have length at least 5 pages, with 2 additional pages per group member beyond the first. (i.e. the minimal length paper for a group of 3 people is 9 pages, while the minimal length for a paper authored by a single person is 5 pages)
Include a graph, diagram, chart, or some other visual aid, which can be either be computer generated, or hand-drawn.
Make-use of legible mathematical typesetting where appropriate (it is not necessary to use LaTeX, but any equations used should be displayed in a readable and unambiguous manner)
Include citations to at least one relevant source beyond our textbook, per group member (i.e. a group of 3 must include at least 3 references other than our course textbook)
Be written using an appropriate style for a professional academic publication, with correct grammar, spelling and punctuation.
You have considerable latitude in choosing your topic, and what you choose to research will depend on your background and interests. The topic you choose should be something relatively new to you (although it’s fine if you’ve seen some parts of it before, as long as the core component of project leads you to discover something you didn’t know before).
A list of potential topics can be found below. For those topics appearing in our Probability and Statistics textbook, the relevant section is listed. Otherwise, you can contact Prof. Wells for an external source (although a good starting place is to skim the relevant article on Wikipedia).
Topics have been loosely arranged by theme, although some topics may easily fit within more than one listed theme.
You may also choose a topic not listed below, although I’d advice you to consult with Prof. Wells before delving too deeply into the topic to make sure it is of appropriate scope.
Conjugate priors using the exponential family: External source
Invariance principle for Bayesian estimates, the Jeffreys prior: External source
The principle of maximum entropy: External source
Alternative loss functions and decision rules for Bayesian estimators: External source
Posterior predictive distributions: External source
Sufficient statistics: P&S 7.7
Fisher information of a sample: P&S 8.8
Methods of convergence of sequences of random variables: External source
The delta method for asymptotic distributions: P&S 6.3
Asymptotic properties of the MLE; consistency, efficiency, and/or normality: External source
Minimum variance unbiased estimators: External source
The Neyman-Pearson Lemma for simple hypotheses: P&S 9.2
Uniformly most powerful tests using monotone likelihood ratio: P&S 9.3
Evaluating interval estimators; optimizing length and/or loss: External source
The likelihood principle: External source
The chi-squared test for goodness-of-fit: P&S 10.1
Fisher’s Exact Test for goodness-of-fit: External source
Goodness-of-fit tests for composite hypotheses: P&S 10.2
The chi-squared test for independence and homogeneity: P&S 10.3,10.4
The Komogorov-Smirnov Test for normality: P&S 10.6
Robust estimators: P&S 10.7
1-sample nonparametric tests: External source
2-sample nonparametric tests: External source
The EM Algorithm for approximating MLEs: P&S 7.6
The \(t\)-percentile bootstrap confidence interval: External source
Bias-corrected bootstrap confidence intervals: External source
Permutation-based hypothesis tests: External source
Importance sampling in Monte Carlo methods: External source
Markov Chain Monte Carlo, The Gibbs Sampler: P&S 12.5
Markov Chain Monte Carlo, The Metropolis-Hastings Algorithm: External source
Logistic regression for binary data: External source
Poisson regression for count data: External source
Generalized linear regression using the exponential family of distributions: External source
Two-way ANOVA: P&S 11.7, 11.8