כנס האיגוד ייערך ביום שני, 24.5.2010, במרכז הכנסים ברמת אפעל.
להרשמה לכנס וחידוש החברות באיגוד לשנת 2010, לחצו כאן.
לפרטים אודות מרכז הכנסים ודרכי ההגעה, לחצו כאן
להלן תכנית הכנס. אנא עקבו אחרי עדכון התכנית.
שעה | |
8:30 – 9:15 | התכנסות והרשמה |
9:15 – 10:20 | מושב פתיחה:
|
10:20 – 10:30 | הפסקה |
10:30 – 12:10 | מושבים מקבילים – (invited). מושבים אלה יתקיימו בשפה האנגלית.
|
12:10 – 12:25 | הפסקת קפה |
12:25 – 13:30 | הרצאת סקירה: אתגרים סטטיסטיים מרכזיים בחקר הגנום – סהרון רוסט. לא נדרש ידע מוקדם בנושא. |
13:30 – 14:30 | ארוחת צהריים ואסיפה כללית של האיגוד |
14:30 – 16:10 | מושבים מקבילים – (contributed וסטודנטים):
|
16:10 – 16:30 | הפסקת קפה |
16:30 – 18:00 | מושב סיום
|
Gene-Environment Case-Control Studies
Raymond J. Carroll
Distinguished Professor of Statistics, Nutrition and Toxicology – Texas A&M University
We consider population-based case-control studies of gene-environment interactions using prospective logistic regression models. Data sets like this arise when studying pathways based on haplotypes as well as in multistage genome wide association studies (GWAS). In a typical case-control study, logistic regression is used and there is little power for detecting interactions. However, in many cases it is reasonable to assume that, for example, genotype and environment are independent in the population, possibly conditional on factors to account for population stratification. In such as case, we have developed an extremely statistically powerful semiparametric approach for this problem, showing that it leads to much more efficient estimates of gene-environment interaction parameters and the gene main effect than the standard approach: decreases of standard errors for the former are often by factors of 50% and more. The issue of course that arises is the very assumption of conditional independence, because if that assumption is violated, biases result so that one can announce gene-environment interactions or gene effects even though they do not exist. We will describe a simple, computationally fast approach for gaining robustness without losing statistical power, one based on the idea of Empirical Bayes methodology. Examples to colorectal adenoma studies of the NAT2 gene and prostate cancer in the VDR pathway are described to illustrate the approaches.
אתגרים סטטיסטיים מרכזיים בחקר הגנום – סהרון רוסט
My first major topic will be the intriguing and varied role that principal component analysis (PCA) plays in genetics. I will demonstrate some of the applications of PCA, discuss the statistical and inferential challenges surrounding these applications, and detail one example: of historical inference based on PCA results.
Next, I will discuss the identification of genetic causes of disease in the context of genome-wide association studies (GWAS), especially in post-GWAS modeling. I will focus on the modeling of gene-gene interactions that go beyond additive effects of genetic factors on phenotypes. It has recently been argued that such interactions may play a major role in the genetics of disease, and that current modeling approaches offer little in the way of modeling tools for gene-gene interactions. I will survey some of the evidence and discuss possible approaches for addressing this challenge.
מושב ביוסטטיסטיקה – יו"ר: דוד שטיינברג. ארגון:לארי פרידמן
Uncovering Symptom Progression History from Large Disease Registries, with Application to Young Cystic Fibrosis Patients
Jason Fine, University of North Carolina, USA
The growing availability of population based disease registry data has brought precious opportunities for epidemiologists to understand the natural history of chronic diseases. It also presents challenges to the traditional data analysis techniques due to the multistate nature of the data, including complicated censoring/truncation schemes and the temporal dynamics of covariate influences. In a case study of the Cystic Fibrosis Foundation Patient Registry, we propose analyses of progressive symptoms using temporal process regressions, as an alternative to the commonly employed proportional hazards models. Such regression enables flexible nonparametric analyses of key prognostic factors. Two endpoints are considered: the prevalence of ever positive and curently positive for Pseudomonas Aeruginios (PA) infection. The analysis of ever PA positive via a time-varying coefficient model demonstrates the lack of fit as well as the potential loss of information in a standard proportional hazards analysis. The analysis of currently PA positive is novel and yields clinical insights not directly available from proportional hazards models. Key findings include that the benefits of neonatal screening on patient outcomes attenuate over time and that cohorts may demonstrate different patterns of PA, which may be explained in part by changes in patient management. The simplicity of the proposed time-varying inferences reduces the computational burden considerably over alternative time-varying regression strategies, which may be prohibitive in large datasets.
Statistical issues in investigating the developmental origins of adult disease using large cohort studies: examples from the Jerusalem Perinatal Study
Orly Manor, Hebrew University, Israel
The growing epidemic of chronic diseases in the last decades increased interest in potential mechanisms leading to the development of these diseases. A relatively new focus of research centers on the influence of early life events on the risks of chronic diseases in adult life. The Jerusalem Perinatal Study, which is a population-based cohort of all 92,000 births in Jerusalem during 1964-1976, is used to examine associations between fetal development and health and disease later in life and to investigate explanatory pathways for these associations. Statistical issues involved in this examination will be described. Examples include: predicting parental health outcomes using children characteristics as well as predicting children health outcomes using parental characteristics, while incorporating information from multiple siblings within a family; examining choice of follow-up time starting point when investigating parental or grandparental mortality in relation to offspring characteristics; studying familial diseases including the time interval between cancer diagnosis among mothers and offspring.
Risk Prediction in Complex Genetic Diseases Based on Family History
Malka Gorfine, Haifa Technion, Israel
Risk Prediction in Complex Genetic Diseases Based on Family History
The important advance such as validation of millions of genetic markers and advances in genotyping technology, led to exceptionally rapid applications of genomewide association studies, over the past few years. As a result, regions of the genome associated with disease have been discovered and replicated in many common human diseases, including type 1 and 2 diabetes, obesity, coronary disease, Crohn's disease, celiac disease, asthma, and breast, colorectal, and prostate, cancers among many others. With the rapid progress in the genetic field, it has become increasingly critical to provide an accurate assessment of genetic risk and to have personalized management strategies for the entire population, with the aim of increasing survival in high-risk people while decreasing cost and complications in low-risk people. All the available methods for estimation of genetic risk use the unrealistic assumption that the observed familial aggregation is solely explained by the genes under study. However, prediction of disease probability may be substantially biased if the residual correlation among family members is not accounted for. In this work we provide a novel frailty-based risk prediction procedure in which the model's parameters are being estimated from an external case-control family study.
Combining dietary self-reports and biomarkers to increase the statistical power of nutritional cohort studies
Laurence Freedman, Gertner Institute, Israel
The study of human nutrition and its relation to health is plagued by the problem of dietary measurement error. Individuals report their dietary intakes with considerable inaccuracy, and this affects nearly all epidemiologic research concerned with diet and health. Estimated disease relative risks associated with a dietary intake are generally biased towards the null and require adjustment. The method of regression calibration can be used to make this adjustment. However, regression calibration does not help with a second effect of measurement error, that is, the loss of statistical power to detect diet-disease relationships. I will describe a method of combining a dietary self-report with a biomarker that also measures dietary intake with error. We show through realistic computer simulations that the biomarker can sometimes provide higher statistical power than the self-report, and that the combination can provide some modest gains in power over either measure alone. I provide an illustration of the method using data from a study of the relationship between intake of two carotenoids, lutein and zeaxanthin, and the occurrence of cataracts in the eye.
מושב תיאוריה סטטיסטית – יו"ר:פאבל ציגנסקי. ארגון: יעקב ריטוב
Drawdowns of random walk, Aumann-Serrano riskiness and Cusum detection of a change in distribution.
Isaac Meilijson – School of Mathematical Sciences, Tel Aviv University
For a favorable (positive mean) but risky (negative essential infimum) random variable X , let w be the unique positive root of E[exp(-w*X)] (if it exists). The parameter w is called "adjustment coefficient" in actuarial mathematics and 1/w is the Aumann-Serrano "index of riskiness". Let S be the random walk with increments distributed like X. Then (Cramer, Lundberg), very consistent with a notion of riskiness, min(S) is approximately exponentially distributed with parameter w and the expected time it takes S to first achieve a drawdown of size d is approximately [exp(w*d)-1]/w. Both results are exact for Brownian Motion. The CUSUM method for detecting a change in distribution declares a change when the log-likelihood random walk achieves a given drawup, so Wald's identity controls the expected time to true detection and Cramer-Lundberg approximations control the rate of false alarm. The connection between these subjects is obscured by the enlightening fact that log-likelihoods have w=1.
These issues will be reviewed and the Cramer-Lundberg approximation improved.
Copula Networks
Gal Elidan – The Hebrew University of Jerusalem
Multivariate continuous densities are of paramount important in numerous fields ranging from computational biology to geology. Bayesian networks offer a general framework geared toward estimation of such densities in high-dimension by relying on a graph structure that encodes independencies, facilitating a decomposition of the likelihood and relatively efficient inference. However, practical considerations almost always lead to a rather simple parametric form, thereby limiting our ability to capture complex dependence structures. In contrast, copulas offer great flexibility by providing a generic representation of multivariate distributions that separates the choice the marginal densities and that of the dependency structure. Yet, despite a dramatic growth in academic and practical interest, copulas are for the most part practical only for relatively small ($<10$) dimensions.
We present the Copula Network model, an elegant marriage between these two frameworks. Our approach builds on a novel copula-based re-parameterization of a conditional density that, joined with a graph that encodes independencies, offers great flexibility in modeling and estimation of high-dimensional domains, while maintaining control over the form of the univariate marginals. We demonstrate the advantage of our framework for generalization over standard Bayesian networks as well as tree structured copula models for varied real-life domains that are of substantially higher dimension than those typically considered in the copula literature.
Sparse Non Gaussian Component Analysis by Semidefinite Programming
Anatoly Iouditski – University of Grenoble (joint with E. Diederichs , A. Nemirovski and V. Spokoiny)
Sparse non-Gaussian component analysis is an unsupervised method of extracting a linear structure from a high dimensional data based on estimating a low-dimensional non-Gaussian data component. In this paper we discuss a new approach to direct estimation of the projector on the non-Gaussian subspace based on semidefinite programming. The new procedure improves the method sensitivity to a broad variety of deviations from normality and decreases the computational effort.
Some Pseudo-Bayesian consideration in very large models.
Ya'acov Ritov – The Hebrew University of Jerusalem
We will argue that some Bayesian considerations on very large parameter spaces are irrational. And can be justified only by non-Bayesian considerations. Examples will include hidden Markov models (HMMs), sparse regression and complex sampling schemes.
מושב מקביל I – יו"ר: יצחק מלכסון
The Best Linear Unbiased Estimator for continuation of a function
Yair Goldberg, University of North Carolina, Chapel Hill
Joint work with Ya'acov Ritov and Avishai Mandelbaum
We show how to construct the best linear unbiased predictor (BLUP) for the continuation of a curve in a spline-function model. We assume that the entire curve is drawn from some smooth random process and that the curve is given up to some cut point. We demonstrate how to compute the BLUP efficiently. Confidence bands for the BLUP are discussed. Finally, we apply the proposed BLUP to real-world call center data. Specifically, we forecast the continuation of both the call arrival rate and the workload processes at the call center of a commercial bank.
Application of the quasi-estimation and its comparison with some other methods
Anatoly Gordinsky, Berman Engineering Ltd.
Empirical Bayes in the presence of explanatory variables with application to spatial-temporal and census data
Eitan Greenshtein, The Central Bureau of Statistics
Joint work with Ya'acov Ritov and Noam Cohen
We study the problem of incorporating Empirical Byes techniques in the presence of explanatory variables. Explanatory variables violate the permutational invariance structure, which motivates the application of EB techniques. Application is given to the problem of estimation of certain proportions in many small areas (Statistical-Areas), using spatial and temporal information/explanatory-variables.
Adaptive deconvolution of distribution functions
Itai Dattner, University of Haifa
It is well known that rates of convergence of estimators in deconvolution problems are affected by the smoothness of the error density and the density to be estimated. However, the problem of distribution deconvolution is more delicate than what was considered so far. We derive different rates of convergence with respect to the tail behavior of the error characteristic function. We present optimal in order deconvolution estimators, both for known and unknown error distribution. An adaptive estimator which achieves the optimal rates within a logarithmic factor is developed. Simulation studies comparing the adaptive estimator to other methods are presented and support the superiority of our method. An example with real data is also discussed.
Minimum-norm estimation for a bi-exponential survival model
Yuval Nov, University of Haifa
We present a novel semi-parametric model for two-sample survival data, and an estimation method with a simple, closed-form solution. The method is based on minimization of the functional distance between two estimators of an unknown transformation postulated by the model. We study analytically the asymptotics of the estimators, and conduct a small simulation study.
מושב מקביל II – יו"ר: יוסי לוי
Applications of context based modeling
Irad Ben-Gal, Tel Aviv University
We introduce context-based probabilistic models for purposes of anomaly detection and pattern classification. The proposed models have great flexibility in optimizing under/over fitting effects, and they generalize known models such as Markov models, Bayesian Networks (BN), and Variable Order Markov (VOM) models. The context-based models have been tested and applied to diverse areas, including: monitoring service systems, predictive maintenance of products and business processes, analyzing communication networks and monitoring of retail chains. Examples will be given.
Proprietary algorithm for design of efficient siRNA
Daniel Rothenstein, QBI Enterprises. Ltd./Quark Pharmaceuticals, Inc.
RNA interference (RNAi) is a recently discovered natural process of gene silencing worthy of the Nobel Prize in Physiology and Medicine in 2006. Nobel laureates Fire and Mello found that double-stranded RNAs (dsRNAs) cause gene specific silencing in C. elegans. Subsequently, the existence of RNA interference (RNAi) mechanism was demonstrated in mammals. Further, it was demonstrated that RNAi in mammals can be induced by short synthetic dsRNA molecules. These short inhibiting agents were termed small interfering RNAs (siRNAs). In the past decade, RNAi has not only become a widely used research tool, but synthetic siRNAs are becoming an emerging and highly promising new class of therapeutics.
To enable efficient RNAi drug discovery, development of prediction tools for design of efficient siRNAs was necessary. The following talk will focus on Quark's proprietary algorithm, SiRS™, for predicting siRNA activity according to numerous rules related the sequence features of siRNA itself and of target sequence within the target gene. SiRS™ is based on a compilation of accumulated, published and proprietary data concerning siRNA function. This presentation will provide a comprehensive review on the principles of construction of a reliable data set from the published data, extracting important features out of the siRNA sequence and its target sequence and finally selecting the significant explanatory variables for predicting siRNA activity. The data on performance of the SiRS™ algorithm and its comparison to other publicly available similar applications will be provided.
Do we really need to identify the “real” distribution of our data?
Michal Shauly, Ben Gurion University of the Negev
A basic assumption in distribution fitting is that a single family of distributions can represent a wide range of diversely shaped distributions. To this day, only few studies aimed to compare the goodness-of-fit obtained by fitting such families to data. In this paper, two families of distributions, Pearson and RMM, are compared using L2-norm in dicator (integrated square-distance between the estimated and the real P DFs). Each family is fitted by MLE to simulated samples from Gamma, Weibull and Log-normal distributions, with 3 skewness levels and 3 sample sizes (L9 array). The real (data generating) distribution is also fitted and compared. Results show that RMM is consistently better than Pearson. Furthermore, while differences between Pearson and the rea l model are significant, RMM and the real model do not differ signific antly. The notable implications of these results are discussed.
אמידת תחזית מרחבית לשיעורי הפשיעה בישראל
יורי גובמן*x דמיטרי רומנוב* ניר פוגל* בשורה רגב** שי עמרם**
* – הלשכה המרכזית לסטטיסטיקה
** – משטרת ישראל, אגף התכנון והארגון
x – איש הקשר, דואל yuryg@cbs.gov.il
בחירת שיטות לצרכי אמידת התחזיות לשיעורי פשיעה מסוגים שונים, לפי אזורים גיאוגרפיים, מהווה אתגר מחקרי חשוב, כאשר יש לקחת בחשבון מספר רב של משתנים חברתיים-כלכליים המשפיעים על רמת הפשיעה. כמו כן, רמת הפשיעה מושפעת מפעילות משטרתית היוצרת הרתעה, ולכן יש לכלול במודל לחיזוי את מגוון מאפייני השיטור. התחזית נבנית עבור יחידות גיאוגראפיות שונות: ישוב/רשות מקומית ושטח המוגדר כתחום פעילותה של תחנת המשטרה, כאשר שטח זה יכול לכלול יישוב אחד או מספר ישובים, ומהווה בדרך כלל יחידה גיאוגראפית מספיק גדולה משיקולי היקף התופעה ויציבותה. בניית תחזית מרחבית מעמידה אתגר נוסף – אמידת התלות המרחבית הקיימת בין שיעורי הפשיעה באזורים שונים, ושינוי במבנה התלות המרחבית לאורך זמן. תחזית מרחבית עשויה להיות שימושית במיוחד למשטרת ישראל במטרה לייעל את תכנון הפעילות המשטרתית לתווך בינוני וארוך, ולסייע לדרג הפיקודי של משטרת ישראל לקבוע פריסה אופטימאלית של שוטרים בין תחנות המשטרה.
החידושים המרכזיים בעבודה הנוכחית: השוואה של מגוון שיטות לחיזוי הפשיעה לטווח בינוני וארוך (שנה עד שלוש שנים), תוך אמידת מבנה התלות בין התצפיות בזמן ובמרחב; שימוש בבסיס נתונים ייחודי הנבנה בלשכה המרכזית לסטטיסטיקה, תוך שילוב נתונים דמוגראפיים וכלכליים עם נתוני משטרה; שימוש נרחב במאפייני כוחות משטרה באזור; השוואת השיטה הישירה לעומת הבלתי ישירה לחיזוי הפשיעה.
תחילה, נאמד מודל להסבר שיעורי הפשיעה, לפי סוגי הפשיעה, עבור השנים 2003 – 2008 ועבור רזולוציות מרחביות שונות. לאחר מכן, הושוו מספר שיטות לאמידת התחזיות לטווח של שנה עד 3 שנים, הכוללות מודל לא פרמטרי של חיזוי על סמך "עצי רגרסיה", מודל של אינדיקאטורים מובילים המבוססת על ניתוח מודל רגרסיה, וגישה לפיה התחזיות נבנות על סמך ניתוח סדרה עתית רבעוניות. בבניית תחזיות ברמת תחנת המשטרה, הושוו אומדנים לפי השיטה הישירה ולפי השיטה הבלתי ישירה, כאשר היתרון של השיטה הבלתי ישירה נעוץ ביכולת התחשבות בשוני במאפייני האוכלוסייה וברמות הפשיעה בין הישובים המרכיבים את התחנה. נבחנה השפעה של מספר השוטרים בתחנה, לפי מקצוע, על רמת הפשיעה, תוך ניטרול האנדוגניות הידועה בין השיטור לפשיעה בתקופת הבסיס, הנובעת מכך שהקצאת כוח שיטור לפי אזורים תלויה ברמת הפשיעה באותה נקודת הזמן. לבסוף, אופיינו תנאים בהם יש להעדיף כל אחת מהשיטות שנבחנו על פני השיטות המתחרות.
מילות מפתח: חיזוי פשיעה מרחבי, אינדיקאטורים מובילים, עצי רגרסיה, סדרה עתית, השפעה אנדוגנית
Test for equality of baseline hazard functions for correlated survival data using fratility models: an application to call center data
Polina Khudyakov, Technion
Joint work with Malka Gorfine and Paul Feigin
Call centers are intended to provide customer service, technical support, marketing and other services via the telephone. Call centers collect a huge amount of data, and this provides a great opportunity for companies to use this information for the analysis of customer needs, desires, and intentions. This study is dedicated to the analysis of customer patience, defined as the ability to endure waiting for service. This human trait plays an important role in the call center mechanism. Every call can be considered as a possibility to keep or lose a customer, and the outcome depends on the customer’s satisfaction and affects the future customer’s choice. The assessment of customer patience is a complicated issue because in most cases customers receive the required service before they lose their patience. To estimate the distribution of the patience, we consider all calls with non-zero service time as censored observation.
Different methods, for estimating the customer patience, already exist in the literature. For example, by using the Weibull distribution, (Palm, 1953), or the standard Kaplan-Meier product-limit estimator (Brown et al., 2005, JASA, 36-50). Our work is the first attempt to apply frailty models in customer patience analysis while taking into account the possible dependency between calls of the same customer, and estimating this dependency. In this work we first extended the estimation technique of Gorfine et al (2006, Biometrika, 735-741) to address the case of different unspecified baseline hazard functions for each call, in case the customer behavior changes as s/he becomes more experienced with the call center services. Then, we provided a new class of test statistics for testing the equality of the baseline hazard functions. As to our knowledge, this is the first work dealing with testing the equality of baseline hazard functions for clustered data using dependent samples. The asymptotic distribution of the test statistics was investigated theoretically under the null and certain local alternatives. We also provided consistent variance estimators. The test statistics properties, under finite sample size, were studied by extensive simulation study and verified the control of Type I error and our proposed sample size calculations. The utility of our proposed estimation technique and the new test statistic is illustrated by the analysis of a call center data of an Israeli commercial company that is processing up to 100,000 calls a day.
Key words: multivariate survival analysis, frailty model, customer patience, hypothesis testing, nonparametric baseline hazard function.
מושב סטודנטים – יו"ר: אמיר הרמן
Bayesian approach to clustering: finding the number of clusters by the MAP rule.
Yinat Trompoler (Putter’s Prize Winner), Tel Aviv University
Clustering algorithms are used today in a variety of applications. Throughout the years, as the algorithms and their applications developed, an important question has been raised: "how well do we cluster?"
My MS.c. thesis focuses on one aspect of this general question – finding the "correct" number of clusters. In this talk I will discuss this problem, and its importance to some practical applications. A short review will present some of the existing methods to approach this problem. Then, we introduce a Bayesian approach based on a probability model on the data and a prior distribution on the set of its partitions into clusters. The resulting maximum a posteriori (MAP) rule is used to to determine the number of clusters. This rule was developed into a practical algorithm. I will present some of the results comparing the MAP approach with other existing counterparts.
Quantal basis for secretory granule growth: a statistical model
Eyal Nitzany (Peritz’s Prize Winner), Tel Aviv University
The cell produces unit granules that fuse to form bigger granules, and these eventually leave the cell: this is a fair abstraction for the way in which the cell communicates with its environment. In this talk I’ll present my M. Sc. thesis (supervised by Prof. Meilijson and Prof. Hammel) in which we modeled granule growth and elimination processes. It studies two distributions, the stationary cell size distribution and the exit size distribution. I’ll discuss these models, the dimension and biological interpretation of its parameters, and add a few insights gained through the work.
Additionally, we developed a practical model that incorporates the multi-modality and Gaussian noise nature of actual granule observations. This model will be presented, together with the statistical tools used for parameter estimation.
Finally, after a brief discussion of some properties of our theoretical model, its predictions will be confronted with the empirical distribution of real granule size observations.
Treatment versus experimentation dilemma in dose-finding studies – consistency condiderations
David Azriel, The Hebrew University of Jerusalem
Cox model with changepoint and measurement error in the covariate
Sarit Agami, The Hebrew University of Jerusalem
ניתוני הישרדות נפוצים בתחומים רבים ובפרט במחקר רפואי. מודל הרגרסיה הפופולארי ביותר לנתוני הישרדות הוא מודל Cox (1972). במודל זה פונקצית הסיכון היא בעלת אותה צורה פונקציונאלית על-פני כל הטווח של המשתנה המסבירX . אולם יש מקרים שבהם לרגרסיה יש צורות שונות בטווחים שונים. בפרט מקרה המעניין הוא המקרה של שתי צורות עבור שני טווחים שונים. כלומר הרגרסיה כוללת נקודת שבירה (נקודת שינוי, changepoint). כמו"כ בהרבה יישומים, חלק מהמסבירים מעורבים בטעות מדידה או שהם נמדדים ללא טעות רק עבור תת-מדגם של האוכלוסייה. כאשר לא ניתן לצפות בפועל במשתנה המסביר X, ובמקומו צופים במשתנה חלופי W המודד את X עם טעות, אחת ההשלכות לטעות מדידה במשתנים המסבירים היא הטיה באמידת הפרמטר. ישנן בספרות שיטות לתיקון ההטיה עבור מודלי רגרסיה לא ליניאריים באופן כללי, בפרט השיטות Regression calibration ו- Simulation-extrapolation (SIMEX), וכן עבור מודלי הישרדות בפרט, כולל שיטות המבוססות על הסיכון היחסי במונחי W (השיטה של Prentice (1982) עבור המקרה של אירוע נדיר והשיטה של Zucker (2005) עבור המצב הכללי). נשווה את השיטות הנ"ל באמצעות סימולציה במצבים שונים במקרה של מודל Cox עם נקודת שבירה.
שיטות דגימה לסקרי אינטרנט
אורית מרום – למ"ס
בהרצאה זו אציג סקירה של שיטות דגימה לסקרי אינטרנט ודואר אלקטרוני הקיימות כיום. בסקירה אבחין בין שיטות דגימה הסתברותיות ולא הסתברותיות והיישום שלהם לסקרי אינטרנט. בנוסף, אדון בקשיים הנוגעים לסקרי אינטרנט, בפרט בנייה של מסגרות דגימה למדגמים הסתברותיים, בעיות של כיסוי, בעיות של אי השבה ובעיות של הטיה הנובעת מבחירה (Selection Bias).
שיטות הדגימה לסקרי אינטרנט יכולות להיות מסווגות לשתי קטגוריות רחבות: שיטות דגימה הסתברותיות ושיטות דגימה שאינן הסתברותיות. מדגם הסתברותי הוא מדגם שבו יחידות החקירה נבחרות על ידי איזה שהוא מנגנון הסתברותי. הסתברות הבחירה של כל יחידה ממסגרת דגימה ידועה וחיובית. לעומת זאת, במדגמים שאינם הסתברותיים ההסתברות של כל יחידה או משיב הנכלל במדגם אינה ידועה או שאינה חיובית. השוני המובנה הזה בין הגישות השונות לסקרי אינטרנט בא לידי ביטוי בסוגי הבעיות והקשיים הנוגעים לסקרי אינטרנט וכן באופן הטיפול בהם כגון: בעיות של כיסוי, בעיות של אי השבה ובעיה של הטיה הנובעת מבחירה.
פאנל: הלשכה המרכזית לסטטיסטיקה = החברה הממשלתית לסקרים?
מתדיינים (בסדר אלפביתי):
- פרופ' יואב בנימיני, אוניברסיטת תל אביב
- פרופ' (אמריטוס) שלמה יצחקי, האוניברסיטה העברית, הסטטיסטיקן הממשלתי
- מר גדעון עשת, ידיעות אחרונות
- פרופ' (אמריטוס) צבי ציגלר, הטכניון, יו"ר הועדה הבין-סנאטית