Curso: MATE 4995 Sección 23 – Análisis de datos masivos en aplicaciones biomédicas I (BBD1)

Descripción del curso

Este curso es un curso intermedio sobre conceptos estadísticos y competencias de programación en el lenguaje de programación R para análisis de datos masivos (“Big Data”). En la primera parte del curso, se revisarán conceptos básicos de la Inferencia Estadística, incluyendo distribuciones de probabilidad, Teorema del Límite Central, intervalos de confianza y pruebas de hipótesis haciendo uso intensivo de R . Posteriormente, se estudiarán modelos lineales comúnmente usados para inferencia estadística. Para ello, se introducirán conceptos fundamentales de álgebra matricial requeridos para la representación y manejo de modelos lineales, así como técnicas de inferencia (estimación, pruebas de hipótesis y diagnósticos) para los mismos. Se discutirán los conceptos de interacción y contraste, y cómo realizar inferencia sobre los mismos. Finalmente, se discutirán varios tópicos estadísticos relevantes para el análisis de grandes volúmenes de datos, incluyendo problemas de múltiples pruebas de hipótesis, tasas de error, procedimientos para control de las tasas de error, tasas de falsos descubrimientos, valores q y métodos exploratorios para grandes volúmenes de datos. Se introducirán conceptos de modelización estadística y su aplicación a grandes volúmenes de datos, discutiendo en particular modelos probabilísticos paramétricos y técnicas de estimación de parámetros. Todo el contenido se discutirá en casos prácticos usando R para programación y análisis de datos.

Este curso expande los cursos PH525.1x: Statistics and R, PH525.2x Data Analysis for Life Sciences 2: Introduction to Linear Models and Matrix Algebra ,  PH525.3x Data Analysis for Life Sciences 3: Statistical Inference and Modeling for High-throughput Experiments  de la secuencia de cursos diseñada por el Prof. Rafael Irizarry (Biostatistics, Harvard University) para HarvardX en http://www.edx.org.

Objetivos del curso

Al finalizar del curso el estudiante podrá:

  1. Identificar variables aleatorias con distribución normal y binomial, y calcular probabilidades asociadas con ellas.
  2. Usar el Teorema del Límite Central para calcular probabilidades asociadas al promedio de grandes cantidades de datos.
  3. Calcular e interpretar p-valores para pruebas de hipótesis asociadas con medias de distribuciones normales o con grandes cantidades de datos.
  4. Calcular e interpretar intervalos de confianza para las situaciones indicadas en 3.
  5. Interpretar la potencia de una prueba de hipótesis
  6. Usar gráficos adecuados para resumir la información en un conjunto de datos.
  7. Usar notación matricial y realizar operaciones entre matrices.
  8. Usar notación matricial para representar modelos lineales y usar operaciones entre matrices para ajustar dichos modeloss
  9. Realizar inferencia sobre modelos lineales, e interpretar términos de interacción y contrastes.
  10. Aplicar técnicas para control de errores en el problema de múltiples pruebas de hipótesis simultáneas.
  11. Aplicar técnicas de inferencia para distintos modelos probabilísticos.
  12. Aplicar técnicas para la exploración de grandes volúmenes de datos.

Estrategias instruccionales

Se usarán estrategias al estilo del Flipped Massive Online Course.  Se usarán el primero, segundo y tercer curso de la serie “Data Analysis for Life Sciences” en el site edX de Harvard creado por el Prof.  Rafael Irizarry, denominados “Statistics and R”, “Introduction to Linear Models and Matrix Algebra” y “Statistical Inference and Modeling for High-throughput Experiments”.  

Bibliografía

  1. Libro de texto: Data Analysis for the Life Sciences. Rafael Irizarry and Michael Love (disponible en http://www.leanpub.com)
  2. Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer)
  3. S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
  4. Programming with Data: A Guide to the S Language by John M. Chambers (Springer)

Referencias Electrónicas

  1. R reference card (PDF) by Tom Short (more can be found under Short Documents and Reference Cards here)
  2. Quick-R: Quick online reference for data input, basic statistics, and plots 
  3. Thomas Girke’s R & Bioconductor manuals
  4. R programming class on Coursera,  taught by Roger Peng, Jeff Leek, and Brian Caffo
  5. The free “try R” class from Code School is also a good place to start: http://tryr.codeschool.com/
  6. swirl: learn R interactively from within the R console

Student Publication

Ivan Jimenez-Ruiz, IDI-BD2k student, went to our partner institution, the Center for Causal Discovery in Pittsburgh last summer for an internship.

This summer he will present work done at his Summer 2016 internship at the Practice & Experience in Advanced Research Computing Conference Series (PEARC’17) conference July 9-13, 2017 in New Orleans, Louisiana, USA.

After that, he will be starting PhD studies in North Carolina. We wish you the best Ivan!

  1. I. Jimenez-Ruiz, R. Gonzalez-Mendez, A. Ropelewski. 2017. In Proceedings of ACM PEARC conference, New Orleans, USA, July 2017 (PEARC’17), 4 pages. http://dx.doi.org/10.1145/3093338.3093372

Local copy:

ILJR_PEARC_Final_Draft

Brains, Minds, and Machines Workshop

Brains, Minds, and Machines Workshop

May 12-13, 2017

9:00 am – 4:30 pm

Engine-4
SPORTS COMPLEX ONOFRE CARBALLEIRA
Rd. PR-5 Jct. Rd. PR-2
Bayamón, PR 00959

Register at: http://tinyurl.com/BMMPR17

The Brains Minds and Machines Seminar in Puerto Rico will be an intensive two-day course offered to undergraduate students from Puerto Rico. It will be an introduction to the problem of intelligence from a multidisciplinary perspective, taught by postdocs from the MIT Center for Brains Minds and Machines. The course will consist of lectures and hands-on tutorials on the computational aspects of cognitive science, neuroscience and computer science.

This event is sponsored by the MIT Center For Brains, Minds and Machines, NIH NeuroID Program in the Department of Biology at University of Puerto Rico Río Piedras (UPRRP) Department of Biology, NIH Increasing Diversity in Interdisciplinary Big Data to Knowledge Program at UPRRP in the Departments of Biology, Computer Science and Mathematics, Evertec, Wovenware, and Engine-4.

Speakers

Tobias Gerstenberg, PhD
Understanding Why: From Counterfactual Simulation to Responsibility Judgments

 

Gemma Roig, PhD
Introduction to Deep Neural Networks and Applications

 

 

Hector Penagos, PhD
Sequential information in the Hippocampus for Navigation and Decision-Making

 

Matt Peterson, PhD
Eye movements: The Fundamental Role of Information Selection in the Complexity of the Real World

 

Bios and Abstracts

Tobias Gerstenberg – I am a postdoctoral associate at MIT in Prof. Joshua Tenenbaum’s Computational Cognitive Science group. I did both my MSc and PhD at University College London and was advised by Prof. David Lagnado and Prof. Nick Chater. In my thesis, I explored the question of how people attribute responsibility to individuals in groups, and the way in which causal and counterfactual thinking influences people’s responsibility judgments. Currently, I look at how people’s intuitive theory of physics and psychology informs their causal and responsibility judgments. In my research, I formalize people’s mental models as computational models that yield quantitative predictions about a wide range of situations. To test these predictions, I use a combination of large-scale online experiments, interactive experiments in the lab, and eye-tracking experiments.

Understanding Why: From counterfactual Simulation to Responsibility Judgments

We are evaluative creatures. When we see people act, we can’t help but think about why they did what they did, and whether it was a good idea. Blaming or praising others requires us to answer at least two questions: What causal role did their action play in bringing about the outcome, and what does the action reveal about the person? To answer the first question, we need a model of how the world works. To answer the second one, we need a model of how people work – an intuitive theory of decision-making that allows us to reason backward from observed actions to the underlying mental states that caused them.

In this talk, I will present a computational framework for modeling causal explanations in terms of counterfactual simulations, and several lines of experiments testing this framework in the domains of intuitive psychology and intuitive physics. In intuitive psychology, this framework explains how the causal structure of a situation influences the extent to which individuals are held responsible for group outcomes, and how expectations modulate these judgments based on what a person’s action revealed about their disposition. In the domain of intuitive physics, the model predicts people’s causal judgments about a variety of physical scenes, including dynamic collision events, complex situations that involve multiple causes, omissions as causes, and causal responsibility for a system’s stability. It also captures the cognitive processes underlying these judgments as revealed by spontaneous eye movements.

Gemma Roig – I am a postdoctoral fellow at MIT in the Center for Brains Minds and Machines, with  Prof. Tomaso Poggio as my faculty host. I am also affiliated at the Laboratory for Computational and Statistical Learning, which is a collaborative agreement between the Istituto Italiano di Tecnologia and the Massachusetts Institute of Technology. I pursued my doctoral degree in Computer Vision at ETH Zurich. Previously, I was a research assistant at the Computer Vision Lab at EPFL in Lausanne, at the Department of Media Technologies at Ramon Llull University in Barcelona, and at the Robotics Institute – Carnegie Mellon University in Pittsburgh. I am interested in computational models of human vision to understand its underlying principles, and to use those models to build applications of artificial intelligence.

 Introduction to Deep Neural Networks and Applications

Deep Neural Networks emerged from the idea that the brain could be modeled as a computational machine that processes information. We are going to explore its beginnings in artificial intelligence, and how those kind of models were used to model the brain, putting special emphasis on the vision processing part. We are also going to see and go through its nowadays success in many applications, and we will discuss what made it possible.

We are also going to have a hands-on tutorial, in which we are going to explore how to set-up a simple application using available toolboxes and off-the-shelf libraries for learning and implementing deep neural networks models.

Hector Penagos – I am postdoc in Matt Wilson’s lab at MIT and the Center for Brains, Minds & Machines. I received my PhD from the Harvard-MIT Health, Sciences and Technology Program. As a graduate student I did some neuroimaging and psychophysics work to understand the neural correlate of pitch perception in humans. Ultimately, I did my dissertation in Matt Wilson’s lab studying the relationship between the anterior thalamus and hippocampus during navigation and memory processing. As a postdoc, I am extending my work to test the idea that the hippocampus can perform simulations that shape our decision-making process.

Sequential Information in the Hippocampus for Navigation and Decision-making

Navigation requires drafting a route to a destination and making predictions about upcoming locations to successfully execute that plan. The hippocampus is a key element in an extended network of brain structures involved in these spatial processes. In this talk we will explore the physiological states and neuronal representations in the hippocampus that enable flexible route planning and the prediction of immediate future trajectories. We will also explore how the hippocampus may simulate scenarios that incorporate indirect evidence to shape decision-making behavior.

Matt Peterson – I received my PhD in Cognitive Science from the University of California, Santa Barbara under the mentorship of Miguel Eckstein. Our work combined psychophysics, eye tracking, and computational modeling to understand why each person has their own distinct, personal style for where they look on faces. I am currently a postdoctoral researcher in Nancy Kanwisher’s lab at MIT. By measuring our real world visual experience, we aim to better understand the computations the brain uses to form our beliefs about the world and to guide our actions during normal everyday behavior.

Eye movements: The fundamental role of information selection in the complexity of the real world

Evolution has optimized the brain to produce successful behavior within the dizzying complexity of the natural world. An essential component of such a system is rapid updating of world knowledge through intelligent selection of useful sensory signals.  Perhaps the most fundamental selection mechanism is the guidance of gaze, or eye movements, a function enacted by a large network of dedicated neural systems. Here, we will explore how the brain decides where to look. In the lecture, we will examine the critical nature of eye movements through understanding the physiological constraints of the visual system and how the information they select is organized in the natural world. We will then discuss how measuring eye movements provides a window into the brain’s moment-by-moment information processing algorithms, access in many ways unique to eye tracking methods. In the tutorial, we will use a state-of-the-art mobile eye tracker in a basic face recognition task to test a fundamental assumption of laboratory experiments: that what we measure in artificial, tightly-controlled paradigms reflects what the brain actually does in the real world, which is presumably what the brain’s organization has been optimized for.

Seminar: The development and regeneration of the sea star larval nervous system

UNIVERSITY OF PUERTO RICO
RIO PIEDRAS CAMPUS
COLLEGE OF NATURAL SCIENCES

YOU ARE INVITED TO THE SEMINAR:

“The development and regeneration of the sea star larval nervous system”

Veronica Hinman, PhD
Associate Professor
Biological Sciences &
Computational Biology
Carnegie Mellon University

Date: Friday, May 5, 2017
Time: 1:00PM
Place: Centro para Puerto Rico
Fundación Sila M. Calderón
Urb. Santa Rita C/ González #1020
Río Piedras , PR 00925

Seminario: Análisis interactivo, estadístico y visual para datos de genómica funcional y metagenómica

El programa IDI-BD2K auspicia la presentación del seminario titulado: “Análisis interactivo, estadístico y visual para datos de genómica funcional y metagenómica” a ser ofrecido por el Dr. Hector Corrada-Bravo, Associate Professor of The University of Maryland.

Este seminario será este próximo viernes, 21 de abril de 2017 en el Centro para Puerto Rico (Fundación Sila M. Calderón) a las 10:30AM.

Favor de confirmar su asistencia envando correo elecronico a marimarvelaz@gmail.com

Mañana Comienza el Healthcare Innovation Replicathon

Qué es un Replicathon?

Un replicathon, similar a un hackathon, se caracteriza por ser una actividad de 36 horas continuas de trabajo analítico y de programación para crear soluciones reales usando la tecnología. A diferencia de un hackathon tradicional, los equipos recibirán el mismo reto o problema: dos manuscritos científicos que llegaron a dos resultados diferentes utilizando los mismos datos. El fin es que el equipo de participantes interprete los datos y presente sus conclusiones. En un hackathon, normalmente la solución se realiza en forma de un App (una aplicación móvil o web). El replicathon requiere colaboración interdisciplinaria entre los expertos en programación, los de análisis de datos, los del contexto (genómica en este caso).

Estudiantes pueden matricularse para participar aqui:

http://bit.ly/replicathon2017

Replicathon registration open

The registration for the Varmed Management Group and IDI-BD2K Healthcare Innovation Replicathon is open. Join us March 24-25, 2017 for this event. Mentors from the University of Puerto Rico, Harvard, University of California Davis, Massachusetts Institute of Technology and more will guide groups of students to examine the issues of replicability in a set of experiments asking the same question and obtaining different answers.

Register: http://bit.ly/replicathon2017

Turn biological data and coffee into insight.

Bici Jangueo en El Caño Martin Peña

Nuestro Programa Interdisciplinario IDI-BD2K que promueve Biomedical Big Data, los invita a su actividad de Bici Jangueo por El Caño Martín Peña.

El propósito es crear una comunidad interdisciplinaria estudiantil interesada en solucionar problemas sociales usando la tecnología.

Cuando: Sábado, 25 febrero de 2017
Hora: 8:30 am – 12:00 pm (Nos vamos a encontrar en la Estación del Tren Urban Sagrado Corazón a las 8:30 am)
Costo: $10 incluye bicicleta y casco

¿Quieres saber cómo es el recorrido?
Video: http://bit.ly/biciCano2017

¿Te interesa?
Registrate: http://bit.ly/biciJangueo2017