Prévia do material em texto
13.7 The R Statistical Analysis Tool R is an open-source statistical analysis tool that is widely used in the finance industry. It provides an integrated suite of functions for data analysis, graphing, and statistical programming. R is increasingly being used as a data analysis and statistical tool as it is an open-source language, and additional features are constantly being added by the user community. This tool can be used on many different computing platforms and can be downloaded at The R Project for Statistical Computing (https://openstax.org/r/The-R-Project). Key Terms arithmetic mean a measure of center of a data set, calculated by adding up the data values and dividing the sum by the number of data values bar graph a chart that presents categorical data in a summarized form based on frequency or relative frequency bivariate data paired data in which each value of one variable is paired with a value of a second variable data visualization the use of graphical displays, such as bar charts, histograms, and scatter plots, to help interpret patterns and trends in a data set empirical rule a rule that provides the percentages of data values falling within one, two, and three standard deviations from the mean for a bell-shaped (normal) distribution expected value a weighted average of the values of a variable where the weights are the associated probabilities exponential distribution a continuous probability distribution that is useful for calculating probabilities within the time between events frequency distribution a method of organizing and summarizing a data set that provides the frequency with which each value in the data set occurs geometric mean a measure of center of a data set, calculated by multiplying the data values and then raising the product to the exponent , where n is the number of data values histogram a graphical display of continuous data showing class intervals on the horizontal axis and frequency or relative frequency on the vertical axis interquartile range (IQR) a number that indicates the spread of the middle half, or middle 50%, of the data; the difference between the third quartile (Q3) and the first quartile (Q1) median the middle value in an ordered data set mode the most frequently occurring data value in a data set normal distribution a bell-shaped distribution curve that is used to model many measurements, including IQ scores, salaries, heights, weights, blood pressures, etc. outliers data values that are significantly different from the other data values in a data set percentiles numbers that divide an ordered data set into hundredths; often used to indicate position of a data value in a data set population data data representing all the outcomes or measurements that are of interest portfolio a collection of financial investments, such as stocks, bonds, mutual funds, certificates of deposit, etc. probability distribution a mathematical function that assigns probabilities to various outcomes quartiles numbers that divide an ordered data set into quarters; the second quartile is the same as the median sample data data representing outcomes or measurements collected from a subset or part of a population scatter plot (or scatter diagram) a graphical display that shows the relationship between a dependent variable and an independent variable standard deviation a measure of the spread of a data set that indicates how far a typical data value is from the mean time series graph a graphical display used to show measurement data plotted versus time, where time is 412 13 • Key Terms Access for free at openstax.org https://openstax.org/r/The-R-Project displayed on the horizontal axis variance the measure of the spread of data values calculated as the square of the standard deviation weighted mean a measure of center of a data set in which each data value has a corresponding weighting x-axis the horizontal axis in a rectangular coordinate system y-axis the vertical axis in a rectangular coordinate system z-score (or z-value) a measure of the position of a data value in the data set, calculated by subtracting the mean from the data value and then dividing the difference by the standard deviation CFA Institute This chapter supports some of the Learning Outcome Statements (LOS) in this CFA® Level I Study Session (https://openstax.org/r/cfa-level1-study-session). Reference with permission of CFA Institute. Multiple Choice 1. A data set of salaries contains an outlier salary. The best measure of center to use for this data set is the ________. a. mean b. median c. mode d. standard deviation 2. A portfolio includes shares of United Airlines stock that were purchased at different times and different prices. Which measure is best to determine the average cost of the shares of the stock? a. mean b. median c. weighted mean d. standard deviation 3. Standard deviation is a measure of the ________. a. center of a data set b. position of a data value in a data set c. area under a normal curve d. spread of a distribution 4. How are standard deviation and variance related? a. The two measures are equal to one another. b. Variance is the square root of the standard deviation. c. Standard deviation is the square root of the variance. d. The two squared measures are equal to one another. 5. Which of the following is the best definition of a z-score? a. the distance of a data value from the mean b. the number of standard deviations that a data value is from the mean c. the distance of a data value from the mean divided by the sample size d. the number of quartiles that a data value is from the mean 6. The results of a standardized test indicate that you are in the 85th percentile. What is the best interpretation of this result? a. You scored in the top 85% of all students taking the test. 13 • CFA Institute 413 https://openstax.org/r/cfa-level1-study-session https://openstax.org/r/cfa-level1-study-session b. You scored in the top 15% of all students taking the test. c. Your score on the test is 85 when measured on a scale from 0 to 100. d. You scored in the bottom 15% of all students taking the test. 7. The interquartile range is ________. a. the middle 50% of a data set b. the upper 50% of a data set c. the lower 50% of a data set d. equal to the median 8. In a frequency distribution table, the sum of the relative frequencies must be equal to ________. a. the sample size b. 1, or 100% c. zero d. the standard deviation of the distribution 9. A change in the standard deviation of a normal distribution will result in ________. a. a change in the location of the peak of the curve b. a change in the area under the curve c. a change in the shape of the curve d. a change that shifts the graph to the left or the right 10. When calculating an expected value, ________. a. the result should always be 1 b. the result should always be a positive value c. the result should always be a negative value d. the result can be a positive or negative value 11. The area under a normal curve between a z-score of -2 and a z-score of +2 is ________. a. 0.68 b. 0.95 c. 0.997 d. dependent on the mean and standard deviation 12. A scatter plot is a visualization for ________. a. univariate data only b. bivariate data only c. either univariate or bivariate data d. test scores 13. Which of the following is NOT a benefit of using the R statistical analysis tool? a. Additional features are constantly being added by the user community. b. It can be used on many computer platforms, including Mac, Windows, and Linux. c. It is free to download. d. Users pay an annual subscription fee. Review Questions 1. Explain the considerations that determine whether the mean or the median is the best measure of central tendency for a data set. 414 13 • Review Questions Access for free at openstax.org 2. Explain the difference between a mean and a weighted mean. 3. Explain why the standard deviation of a data set cannot be a negative value. 4. Explain what a negative z-score, a positive z-score,and a z-score of zero imply. 5. Explain how quartiles can be used to detect outliers in a data set. Problems 1. You purchased 1,000 shares of a stock for $12 per share. Then, two months later, you purchased an additional 500 shares of the same stock at $9 per share. Calculate the weighted mean of the purchase price for the total of 1,500 shares. 2. You score a 60 on a biology test. The mean test grade is 70, and the standard deviation is 5. Calculate and interpret your corresponding z-score. 3. You score a 60 on a biology test. The mean test grade is 70, and the standard deviation is 5. What percentile does your grade correspond to? 4. A fast food restaurant has measured service time for customers waiting in line, and the service time follows an exponential distribution with a mean waiting time of 1.9 minutes. The restaurant has a guarantee that if customers wait in line for more than 5 minutes, their meal is free. What is the probability that a customer will receive a free meal? 5. The total value of your portfolio consists of approximately 65% stock assets, 25% bonds, and 10% cash equivalents. Historical returns have shown that stocks provide a return of 12%, bonds provide a return of 3.5%, and cash savings provide a return of 1.5%. What is the expected value of the return on this portfolio? 6. The distribution of the average annual return of the S&P 500 over a 50-year time period follows a normal distribution with a mean rate of return of 10.5% and a standard deviation of 14.3%. What is the probability that an average annual return will fall between -3.8% and 24.8%? 7. Write a short R program to find the expected return for the data set in Table 13.16. Historical Return on United Airlines Stock Associated Probability 12% 15% 5% 35% 2% 25% -5% 14% -10% 11% Table 13.16 Video Activity Normal Distribution Stock Return Calculations Click to view content (https://openstax.org/r/stock-return) 1. Assume the return on stocks follows a normal distribution. Is it more likely that a stock will return between -1 and +1 standard deviations from the mean or between -2 and +2 standard deviations from the mean? Why? 13 • Problems 415 rex-linked/table-001 https://openstax.org/r/stock-return 2. Would an investor be likely to prefer a stock that has a smaller standard deviation for annual stock returns or one with a larger standard deviation for annual stock returns? Why? Portfolio Weights Click to view content (https://openstax.org/r/port-weights) 3. What are the reasons for calculating portfolio weights? What useful information does this provide to the investor? 4. What are the advantages and disadvantages of the equal weighting approach and the market cap weighting approach for portfolio allocation strategy? 416 13 • Video Activity Access for free at openstax.org https://openstax.org/r/port-weights Chapter 13 Statistical Analysis in Finance Key Terms CFA Institute Multiple Choice Review Questions Problems Video Activity