Applications of Quantitative MethodsSociology
740


John FoxDepartment of Sociology

Contents: 

Sociology 740 is a second course in social statistics, with a focus on regression analysis, linear models, and generalized linear models, such as logistic regression and Poisson regression. An introductory statistics course covering the elements of descriptive statistics and statistical inference (such as Sociology 3H06 or 6Z03) is a prerequisite. Regression analysis will be developed from first principles, although I expect that the topic is at least somewhat familiar. The course will be conducted at a relatively low level of mathematical and statistical sophistication: Emphasis will be placed on the practical application of statistical methods and students will have considerable opportunity to apply these methods to real data. One of the goals of the course is to introduce students to modern statistical computing.
The class will meet on Mondays for four hours each week, 12:302:30 and 3:305:30 in room KTH712. Class time will be devoted primarily to lectures, but also to solution of homework problems, to questions, and to computing instruction. For as long in the semester as students are interested in attending it, we will have an optional computerlab session following the class, also in room KTH712, and I will be available during this period to provide assistance with computing and other courserelated issues. You can, if you wish, use the computing period to work on your homework assignments (which will be the initial focus of the labs), or you can work independently on the homework.
The course is divided into weekly topics (see the course outline below). Each topic is associated with readings, some of which are optional. All of the readings are from the two course texts, both available from the university bookstore:
In addition, I will post my lecture slides to the course web site, with links in the course outline.
Using R, you will be able to do the computing for the course in the computing lab or on your own computer.
R is a free opensource implementation of the S statistical programming language and computing environment, which has become a kind of lingua franca of statistical computing. (The commercial implementation of S, called SPLUS, has been eclipsed by R.) R has facilities for constructing publicationquality statistical graphs and incorporates a wide range of statistical dataanalysis capabilities. About 5000 contributed R 'packages,' freely available at the Comprehensive R Archive Network (CRAN) website, greatly expand the capabilities of R.
Several years ago, Ashley Vance, a New York Times technology writer, published an article on R. Although it's not entirely accurate, the article is an index of the growing influence of R. Somewhat more recently, Steve McNally of Forbes posted an article entitled "Names you need to know in 2011: R statistical analysis software."
R is commanddriven: Statistical models and graphs are specified by typing statements in the S language. A pointandclick interface to some of the capabilities of R, including those that we will need for this course, is available in the Rcmdr (R Commander) package for R. I recommend that you learn and use commands, but you may use the Rcmdr interface if you wish. If you do install the Rcmdr, make sure to read the installation notes for your operating system.
Versions of R are available for PCs running Windows, Unix workstations (including Linux), and Macintosh computers (under OS X). The latest version of R, for Windows and other platforms, is always available for download from the Internet. I recommend that you also install the RStudio interactive development environment ('IDE') for R, which, like R, is opensource and freely available on the Internet. R Studio works on Windows, Mac OS X, and Linux/Unix systems.
I have prepared instructions for downloading and installing R  along with RStudio and the R packages that you will need for the course  on Windows, Mac OS X, and Linux computers. If you attend the computing labs, you should bring your laptop with R (and preferably RStudio) installed.
There will be small weekly homework assignments, including data analysis to be carried out on the computer. Although I will supply data sets for use in these assignments, students always have the option of substituting their own data. Homework assignments will be collected and corrected, but not graded. Students are encouraged to work collectively on the homework, if collective effort facilitates their progress in the course. Homework will be collected each week and returned the next week. Ordinarily, homework that is more than one week late will not be accepted (see below).
I recommend that you prepare your homework assignments as "R Markdown" documents in RStudio. R Markdown allows you to intermix R commands and explantory text to produce HTML (web) documents, which can then be printed. I've prepared an R Markdown homework template to help you get started, and will demonstrate its use in class. More information (and probably more than you really need to know) is available on the RStudio web site. To use R markdown, you will need to install the knitr package for R. Students who submit all of their homework assignments using R Markdown will receive an extra two percentage points on their final grade.
There will be two straightforward inclass openbook exams, each three hours in length. The first exam will cover material in the first part of the course; the second exam will cover the remainder of the course. There will be a twohour review session prior to each exam..
The student's homework record will contribute 20 percent of his or her final grade. Students will receive full credit for homework assignments that show reasonable effort and that are submitted on time. Homework up to one week late will receive half credit. As mentioned, students submitting all of their homework assignments using R Markdown documents will receive an additional two percent for the homework. Each exam will be worth 40 percent of the final grade. The standard grading scheme will be used to translate percentage grades on the exams into letter grades. Please note that it is possible to get a grade of A+ in this class, but I give this grade based upon my judgment rather than by mechanical translation of percentages into letter grades; in particular, a grade of 90 percent is generally not sufficient to receive an A+ in this course.
I am committed to helping students who experience difficulty with the class. If you need help, please do not hesitate to contact me. You may see me during my office hours (including by appointment), and you may contact me by email at jfox AT mcmaster.ca; by phone at 9055259140x23604; by phone at home (to be disclosed in class); or by posting a message to the class email list, soc740list AT mailman.mcmaster.ca.
It is your responsibility to understand what constitutes academic dishonesty. For information on the various kinds of academic dishonesty please refer to the Academic Integrity Policy, specifically Appendix 3, located at http://www.mcmaster.ca/policy/StudentsAcademicStudies/AcademicIntegrity.pdf.
The following illustrates only three forms of academic dishonesty:
1. Plagiarism, e.g. the submission of work that is not one's own or for which other credit has been obtained.
2. Improper collaboration in group work.
3. Copying or using unauthorized aids in tests and examinations.
The instructor and university reserve the right to modify elements of the course during the term. The university may change the dates and deadlines for any or all courses in extreme circumstances. If either type of modification becomes necessary, reasonable notice and communication with the students will be given with explanation and the opportunity to comment on changes. It is the responsibility of the student to check his/her McMaster email and course websites weekly during the term and to note any changes.
Dates  Topic[a]  Reading (starred sections in the texts are optional)  Homework[b] 
Jan. 6  Introduction (R script file) (Duncan.txt data file) Bring your laptop if you need help installing R. 
AR Ch. 1, 2, CAR Preface, Ch. 1, 2 (Sec. 2.1  2.3)  Exercises 1.1, D2.1, and to be distributed (answers, corresponding .Rmd file) 
Jan.13  Examining and Transforming Data (script)  AR Ch. 3, 4; CAR Ch. 3  to be distributed (answers, .Rmd file) 
Jan. 20  Linear LeastSquares Regression (script)  AR Ch. 5; CAR Ch. 4 (Sec. 4.1, 4.2.1, 4.2.2, 4.3.4)  Exercise 5.6 (optional) and to be distributed (answers, .Rmd file) 
Optional: Ch. 9 (Sec. 9.2); Appendix B (Sec. B.1.1  B.1.3)  
Jan. 27  Statistical Inference for Regression (script)  AR Ch. 6; CAR Ch. 4 (Sec. 4.3.1, 4.4.1, 4.4.2, 4.4.3, 4.4.4)  Exercises D6.3, D6.5 (answers, .Rmd file) 
optional: AR Ch. 9 (except 9.1.1, 9.1.2), Ch. 10 (except 10.4); Appendix B (Sec. B.2, B.3); CAR (rest of Sec. 4.3 and 4.4)  
Feb. 3  DummyVariable Regression and Analysis of Variance (script)  AR Ch.7; Ch. 8 (Sec. 8.1, 8.2.1); CAR Ch. 4 (Sec. 4.2.3, 4.7, 4.8)  to be distributed (answers, .Rmd file) 
optional: AR Ch. 8 (remainder); Ch. 9 (9.1,1, 9.1.2), Ch. 10 (10.4)  
Feb. 10  Regression Diagnostics (1): Unusual and Influential Data (script)  AR Ch. 11; CAR Ch. 6 (Sec. 6.1, 6.2.3, 6.3)  to be distributed (answers, .Rmd file) 
Feb. 17  Reading week: no class  
Feb. 24  Review (12:30  2:30 PM, KTH712) and Exam 1 (3:30  6:30 PM, Togo Salmon Hall 122): Introduction through dummy regression  
Mar. 3 
Regression Diagnostics
(2): Collinearity and Model Selection (script) 
AR Ch. 13, 22 (22.1,
22.3); CAR Ch. 6 (Sec. 6.7) 
to be distributed (answers, .Rmd file) 
Mar. 10  Regression Diagnostics
(3): Nonlinearity and Other Ills (script) 
AR Ch. 12; CAR Ch. 6
(Sec. 6.2.1, 6.2.2, 6.4, 6.5) 
Exercise D12.2(a) (answers, .Rmd file) 
Optional: CAR Ch. 17 (Sec. 17.1, 17.2), Appendix D (Sec. D.6)  
Mar. 17  Logit and Probit Models for Dichotomous Categorical Responses (script)  AR Ch. 14 (Sec. 14.1); CAR Ch. 5 (Sec. 5.3, 5.4)  to be distributed (answers, .Rmd file) 
Mar. 24  Logit and Probit Models for Polytomous Categorical Responses (script)  AR Ch. 14 (Sec. 14.2); CAR Ch. 5 (Sec. 5.7, 5.8, 5.9)  to be distributed (answers, .Rmd file) 
Mar. 31  Generalized Linear Models: An Introduction (script)  AR Ch. 15, CAR Ch. 5 (Sec. 5.1, 5.2, 5.5, 5.6, 5.10, 5.11), Ch. 6 (Sec. 6.6)  to be distributed (answers, .Rmd file) 
Apr.7  Missing data (optional lecture) (script)  optional: Ch. 20 (Sec. 20.1, 20.2, 20.4)  
Apr. 14  Review (12:30  2:30 PM, KTH712) and Exam 2 (3:30  6:30 PM, DeGroote School of Business room B107): Regression diagnostics (1) through generalized linear models (regression diagnosics overview) 
[a] Links to lecture notes under Topic are to PDF (PortableDocument Format) files. If you do not have a PDFfile viewer, you can download the Adobe Reader viewer free from the Adobe web site.
[b] Unless otherwise noted, homework exercises numbered chapter.problem are in AR; those numbered Dchapter.problem (Dataanalysis exercises) are on the web site for AR, as are data sets used in the homework. Please pay attention to the presence or absence of the "D."R Markdown homework template: homeworktemplate.Rmd.