NOW
POSTED: THE GUIDE TO EXAM 3:
EDF
5481 METHODS OF EDUCATIONAL RESEARCH
INSTRUCTOR:
DR. SUSAN CAROL LOSH
FALL 2002
|
WHY EXAMINE WEB-BASED DATABASES?
|
As you have already learned, it is expensive
and time-consuming to collect data, especially datasets that are sizable
or comprehensive. In the early 1970s, the United States Federal government
initiated a series of what have come to be called "Social Indicators."
The idea was to collect data from different domains (education, health,
the status of women and ethnic minorities, public opinion, etc.) and to
continue these series over time, thereby tracking change and continuity
among Americans. At the same time, other countries, particularly Canada,
Western Europe, and Japan, also began indicator series, thus making possible
international comparisons. One example is the Third International Mathematics
and Science Study (TIMSS). Data were collected in 42 countries in 1995
and in 38 countries in 1999. A recent addition addresses experience with
computers and the World Wide Web.
Considerable effort has been devoted
to making many of these indicator series compatible over time:
-
Questions are asked in the same way
-
Changes to questions are established via "split-ballot"
testing, i.e., experiments to see whether the revised questions work the
same way as the original questions. A good indicator series NEVER
shifts question format (or open question codes) arbitrarily.
-
Variables are defined in the same way
-
Coding categories remain constant
-
If coding changes are made, care is taken
to make new coding systems compatible with the old, such as the detailed
United States Census three digit occupational codes
A series may have an "oversight board."
These boards monitor the content and form of the indicator series. Thus,
principal investigators cannot capriciously change either content or form
without input from a panel of expert professionals.
The number of data archives is already
HUGE and it seems to be growing by the minute. Some of the large archives,
such as The Roper Center or the Howard W. Odum Institute for Research in
Social Science at the University of North Carolina, are simply staggering
in the amount of data that they hold.
As you look through some of the pages,
you will see that several times I have given the warning: "set aside a
day to explore this archive." Do take this warning seriously!
One of these archives may hold the answer to your proposed dissertation
or provide the basis for a nice conference paper or article. They are definitely
worth exploring.
With resources such as these, the novice--and
even the experienced--researcher should seriously reconsider whether they
really want to gather all of their own data from scratch.
|
WHY THESE ARCHIVES ARE IMPORTANT
TO YOU
|
-
There is no point in "reinventing the wheel."
Why
do a small local study when data already exist on regional, national or
even international levels? An example is using the "CIRP" to look at college
student beliefs, attitudes, and accomplishments instead of convenience
samples of your buddy's classes.
-
"There is plenty of gold in them thar hills."
Most of these databases are so huge that no one investigator could ever
analyze everything in them. With each successive year, the possibilities
for analysis grow. Further, other researchers may have ideas for analysis
that did not occur to the original Principal Investigator. In other words,
there is plenty of data for you to do an original analysis--without all
the backbreaking work of collecting the data too.
-
Many of these archives offer an unprecedented
opportunity to track trends over time. How did computer use change
from the early 1980s to the late 1990s? What kind of educational preparation
do students receive who rise to eminence later on? What are the average
student characteristics in research universities as opposed to liberal
arts colleges, and how did these characteristics change over time? What
are gender differences in Internet use over time?
-
YOUR time, resources, and energy. Many
researchers, especially junior faculty, have limited resources. With one
eye on the tenure clock, junior faculty have limited time too. It takes
time, often A LOT of time, to gather your own data. If existing archives
have variables that are directly pertinent to your research interests,
it is often in your best professional interests to use these archives.
Obviously, using pre-existing archives
are not for everyone. Many students in disciplines that lend themselves
to "quick and dirty" experiments can quickly collect data with relatively
little financial investment. However, even these researchers may be interested
in "triangulation" with survey data or historical records.
|
CLICK HERE
TO ENTER THE ONLINE DATABASE MENU
|
|
QUESTIONS YOU SHOULD CONSIDER
ABOUT ONLINE DATABASES
|
-
What is the unit of analysis? Is it
an individual? An organization, such as a college or university? A time
point for a country or state series? Archives vary and the unit is not
always an individual.
-
What kinds of variables does the archive
cover? Degree attainment? Health practices? Drug or alcohol usage?
Attitudes?
-
What is the time frame covered by the archive?
Examples:
the average school FCAT scores for 1998-2001 or The General Social Survey
from 1972-2000.
-
What is the geographic frame covered by
the archive (state? local? United States? international?)
-
Who were the sponsor(s) of the archive
(e.g., NSF? NCES? United Faculty of Florida?)
-
How did the archive come to be?
-
Were the data collected especially for
the archive (such as IPEDS)? Or were the data compiled from other sources
(such as Web CASPAR)?
-
Does the archive contain any tutorials
that instruct how to use it (online or otherwise)?
-
How are the data available? Are they
ready for online analysis? Are the data available to download into your
computer? Are the data contained in .pdf format tables? Are there
alternative ways to obtain the data (such as CD-Rom?)? If so, how can the
data be obtained?
-
Can you simply download the data or must
you obtain a CD-Rom or other device from the archive agency?
-
How "clean" are the data? One good
example is the U.S. government's famous "Falling Through the Net" data
(EDF 5400 students will remember this one from Spring 2002) about the "Digital
Divide" in Computer and Internet Usage. This is one of the most cited datasets
about the Digital Divide and it is appallingly "dirty." The "age" variable
contains 3 year olds and 10 year olds and considerable data are missing
on racial identification (this was an in-person survey so this variable
should have been mostly complete). Apparently, the government was in such
a rush to put up the dataset, the data contain a LOT
of careless errors. As a result, I consider any estimates from these data
to be VERY unreliable.
-
Is there a charge for the data? If
so, what is the cost? Most archival costs are surprisingly reasonable,
when you consider the effort involved in the first place. For example,
the cost of the ENTIRE General Social Survey archive, from 1972 to 2000,
over 40,000 interviews, in SPSS ready format, and including a hard copy
of the Codebook is less than $300. Compare this with the millions
of dollars it cost to gather the data. Don't forget: you will incur time
and financial costs to gather and process your own data. It may, indeed,
turn out to be cheaper to use the archive.
-
What kinds of analyses can be done online?
Frequency
distributions? Cross-tabulations? Multiple regression or other multivariate
analyses? See if the archive uses the California-Berkeley Survey Documentation
and Analysis (SDA) program which is simple to use, covers most basic
statistics, and is unbelievably fast (including on a dial-up system like
my home computer). Many online datasets are now directly linked to the
SDA system. (Gossip has it that SPSS wants to develop and distribute a
competing online analysis system--but you can bet that SPSS will charge
to use it! The SDA system is FREE!)
-
Is a questionnaire available or some other
original document describing each variable in detail? Maybe it is available
as a separate link or as a .pdf document (did you remember to download
the Adobe Acrobat Reader?)
-
What is mentioned about coverage or response
rate? For example, data are missing from several states in early data
series about abortion. Some surveys have completed interviews with less
than half of the originally contacted respondents. In other cases, such
as the CIRP, response rates can vary considerably from college to college.
-
Do you need any kind of license from the
data agency? Many data sets at the National Science Foundation, the
National Center for Educational Statistics, and other agencies require
you to have a license if you work with what is called the "unit record"
data. Unit record data is the "raw data" where each record is an individual
or an institution. This means the person or institution could plausibly
be identified. Obtaining a license is typically not a problem for legitimate
researchers but it does necessitate some paperwork so be prepared to check
about this and budget some time accordingly.
-
What was the mode of data collection? In-person
surveys may give different results than telephone surveys. The top administrator
of a university may access different data than a rank-and-file faculty
member.
-
How recently has the database been monitored
or updated? See if you can find a date on the page, typically at the
very top or the very bottom of the page. "Old pages" may have missing links,
unfixed errors, omit the most recent updates to files, or simply may not
work.
-
Were the data gathered over time by different
agencies or different principal investigators? If
so, changes in variables, definitions, or coding may have occurred. You
may find differences attributable to these changes, rather than to changes
in the concepts you are studying--thus threats to internal validity.
-
How far back does the data series extend?
The
longer the series, the more likely you are to encounter strange alphabetic
and non-alphanumeric codes, or inconsistencies in definitions or measures.
And the more likely the original data are to be flat out MISSING.
-
Were data compiled from different agencies
into a single archive? Again, check for consistencies in definitions
(even of the same variable!) across agencies.
-
See if the description of the archive notes
any problems or missing information.
-
What are your computer skills? Some
databases are in ascii format which you can probably download into a spreadsheet
such as Quattro Pro or EXCEL. But the field delimiters vary widely: some
use spaces, others use commas, still others rely on a format statement
so that the data can be read. Do you know how to analyze data using a spreadsheet
program? If not, do you know how to transfer spreadsheet data into a statistical
program such as SPSS or SAS? Do you have file management skills so that
you can insert value labels, variable labels and missing data codes? In
other cases, you may have to save or print tabular displays and hand enter
the data into a spreadsheet (very carefully). As you can see, it is VERY
helpful to have good computer skills--or to have some good friends who
do!
Any original problems when the data
were first gathered will STILL be there when the data are archived. See
what you can find out about issues with question format, sampling, coding
categories, and other sources of bias and random error. Sometimes (for
example: the General Social Survey) there will be considerable information
about entities such as response rate, sometimes there is not.
Always remember this classic cliché:
do the best you can with what you got. Despite any problems, online
databases and archives are a terrific resource for us all.
|
|
WHERE
TO START HUNTING FOR ONLINE ARCHIVES
|
-
Professional associations in your field
(check
out those resources and links to professional sites in Blackboard)
-
The FSU on-line library system
-
Search engines using your topic of interest
(see
McMillan who does a nice job in this area)
-
Major US government or state WEB sites
(if
you are an International Student, check out sites from your home country).
The National Center for Education Statistics, the National Science Foundation,
the Centers for Disease Control--and even the State of Florida website
all contain links to many, many databases. You will find several of them
in our course database menu.
-
Major archives such as the Pew Center
for Research on the People and the Press, or the Roper Center in Connecticut.
-
One link leads to another. I found
the International Social Survey Program link from the General Social Survey
www site.
-
Check with faculty and graduate students
in Information and Library Sciences
-
Many recent textbooks have online supplements
or Web sites that list archives
-
Check McMillan, chapters 3 and 4 for information
on Subject Directories and Search Engines (pp. 86-87; 90; 93; 96-97).
|
|
CLICK HERE
TO ENTER THE ONLINE DATABASE MENU
|
November 19 2002
This page was built with
Netscape Composer.
It is best displayed in
Netscape Navigator,
600 X 800 display resolution.
Susan Carol Losh
Always
under construction as new databases are entered.