Anuncio de nuevo Curso de “Big Data” (MATE 4995, Sección 23)

Universidad de Puerto Rico

Recinto de Rio Piedras

Semestre: primer semestre 2017-2018

Codificación: MATE 4995 Sección 23

Título del curso- Análisis de datos masivos en aplicaciones biomédicas I (BBD1)

Profesora-  Dra.  Maria E. Perez

Requisitos-  MATE 3026 o equivalente y CCOM 3030 o permiso de la profesora.

Este curso busca preparar al estudiante para dos objetivos fundamentales:

  1. Poder trabajar con grandes cantidades de datos.  Algo esencial en futuros aspectos de todo tipo de investigación.  
  2. Poder participar en el proyecto de IDI-BD2K el próximo verano con experiencias de investigación en los Centro de Excelencia de BD2K en las universidades de Harvard, Pittsburgh y la Univ. de California, Santa Cruz.

Estudiantes de cualquier bachillerato de Ciencias Naturales pueden matricularse.

Mas información sobre el proyecto IDI-BD2K

Hoy como nunca antes la investigación  biomédica está generando cantidades masivas de datos, cuyo análisis e interpretación tiene el potencial de producir dramáticos avances en nuestro conocimiento sobre la salud humana y sobre nuestra calidad de vida. El análisis de estos conjuntos masivos de datos (“Big Data”) requiere técnicas que combinan conocimientos en Biología, Química, Estadística, Ciencias  de Cómputo y otras áreas.

El proyecto IDI-BD2K estará ofreciendo el curso MATE 4995 Sección 23 – Análisis de datos masivos en aplicaciones biomédicas I (BBD1) en otoño 2017. En este curso podrás aprender cómo encontrar grupos de genes sobreexpresados en una condición, como el cáncer. Puedes aprender a crear e interpretar modelos lineales que describen la respuesta a un tratamiento. Verán cómo manejar conjuntos masivos de datos genómicos y analizarlos.

Estudiantes que completen este curso y su continuación BBD2 calificarán para ir a un internado en alguno de los Centros de Excelencia de BD2K como Harvard, University of California Santa Cruz, y Pittsburgh.

Para información adicional puedes comunicarte con la Dra. Perez:  maria.perez34@upr.edu

Si aun no estas preparado para tomar BBD1 y 2, asegurate que estas tomando los cursos sugeridos por los Centros de Excelencia BD2K para tu concentración consultando la tabla a continuación:

Curso: MATE 4995 Sección 23 – Análisis de datos masivos en aplicaciones biomédicas I (BBD1)

Descripción del curso

Este curso es un curso intermedio sobre conceptos estadísticos y competencias de programación en el lenguaje de programación R para análisis de datos masivos (“Big Data”). En la primera parte del curso, se revisarán conceptos básicos de la Inferencia Estadística, incluyendo distribuciones de probabilidad, Teorema del Límite Central, intervalos de confianza y pruebas de hipótesis haciendo uso intensivo de R . Posteriormente, se estudiarán modelos lineales comúnmente usados para inferencia estadística. Para ello, se introducirán conceptos fundamentales de álgebra matricial requeridos para la representación y manejo de modelos lineales, así como técnicas de inferencia (estimación, pruebas de hipótesis y diagnósticos) para los mismos. Se discutirán los conceptos de interacción y contraste, y cómo realizar inferencia sobre los mismos. Finalmente, se discutirán varios tópicos estadísticos relevantes para el análisis de grandes volúmenes de datos, incluyendo problemas de múltiples pruebas de hipótesis, tasas de error, procedimientos para control de las tasas de error, tasas de falsos descubrimientos, valores q y métodos exploratorios para grandes volúmenes de datos. Se introducirán conceptos de modelización estadística y su aplicación a grandes volúmenes de datos, discutiendo en particular modelos probabilísticos paramétricos y técnicas de estimación de parámetros. Todo el contenido se discutirá en casos prácticos usando R para programación y análisis de datos.

Este curso expande los cursos PH525.1x: Statistics and R, PH525.2x Data Analysis for Life Sciences 2: Introduction to Linear Models and Matrix Algebra ,  PH525.3x Data Analysis for Life Sciences 3: Statistical Inference and Modeling for High-throughput Experiments  de la secuencia de cursos diseñada por el Prof. Rafael Irizarry (Biostatistics, Harvard University) para HarvardX en http://www.edx.org.

Objetivos del curso

Al finalizar del curso el estudiante podrá:

  1. Identificar variables aleatorias con distribución normal y binomial, y calcular probabilidades asociadas con ellas.
  2. Usar el Teorema del Límite Central para calcular probabilidades asociadas al promedio de grandes cantidades de datos.
  3. Calcular e interpretar p-valores para pruebas de hipótesis asociadas con medias de distribuciones normales o con grandes cantidades de datos.
  4. Calcular e interpretar intervalos de confianza para las situaciones indicadas en 3.
  5. Interpretar la potencia de una prueba de hipótesis
  6. Usar gráficos adecuados para resumir la información en un conjunto de datos.
  7. Usar notación matricial y realizar operaciones entre matrices.
  8. Usar notación matricial para representar modelos lineales y usar operaciones entre matrices para ajustar dichos modeloss
  9. Realizar inferencia sobre modelos lineales, e interpretar términos de interacción y contrastes.
  10. Aplicar técnicas para control de errores en el problema de múltiples pruebas de hipótesis simultáneas.
  11. Aplicar técnicas de inferencia para distintos modelos probabilísticos.
  12. Aplicar técnicas para la exploración de grandes volúmenes de datos.

Estrategias instruccionales

Se usarán estrategias al estilo del Flipped Massive Online Course.  Se usarán el primero, segundo y tercer curso de la serie “Data Analysis for Life Sciences” en el site edX de Harvard creado por el Prof.  Rafael Irizarry, denominados “Statistics and R”, “Introduction to Linear Models and Matrix Algebra” y “Statistical Inference and Modeling for High-throughput Experiments”.  

Bibliografía

  1. Libro de texto: Data Analysis for the Life Sciences. Rafael Irizarry and Michael Love (disponible en http://www.leanpub.com)
  2. Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer)
  3. S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
  4. Programming with Data: A Guide to the S Language by John M. Chambers (Springer)

Referencias Electrónicas

  1. R reference card (PDF) by Tom Short (more can be found under Short Documents and Reference Cards here)
  2. Quick-R: Quick online reference for data input, basic statistics, and plots 
  3. Thomas Girke’s R & Bioconductor manuals
  4. R programming class on Coursera,  taught by Roger Peng, Jeff Leek, and Brian Caffo
  5. The free “try R” class from Code School is also a good place to start: http://tryr.codeschool.com/
  6. swirl: learn R interactively from within the R console

Brains, Minds, and Machines Workshop

Brains, Minds, and Machines Workshop

May 12-13, 2017

9:00 am – 4:30 pm

Engine-4
SPORTS COMPLEX ONOFRE CARBALLEIRA
Rd. PR-5 Jct. Rd. PR-2
Bayamón, PR 00959

Register at: http://tinyurl.com/BMMPR17

The Brains Minds and Machines Seminar in Puerto Rico will be an intensive two-day course offered to undergraduate students from Puerto Rico. It will be an introduction to the problem of intelligence from a multidisciplinary perspective, taught by postdocs from the MIT Center for Brains Minds and Machines. The course will consist of lectures and hands-on tutorials on the computational aspects of cognitive science, neuroscience and computer science.

This event is sponsored by the MIT Center For Brains, Minds and Machines, NIH NeuroID Program in the Department of Biology at University of Puerto Rico Río Piedras (UPRRP) Department of Biology, NIH Increasing Diversity in Interdisciplinary Big Data to Knowledge Program at UPRRP in the Departments of Biology, Computer Science and Mathematics, Evertec, Wovenware, and Engine-4.

Speakers

Tobias Gerstenberg, PhD
Understanding Why: From Counterfactual Simulation to Responsibility Judgments

 

Gemma Roig, PhD
Introduction to Deep Neural Networks and Applications

 

 

Hector Penagos, PhD
Sequential information in the Hippocampus for Navigation and Decision-Making

 

Matt Peterson, PhD
Eye movements: The Fundamental Role of Information Selection in the Complexity of the Real World

 

Bios and Abstracts

Tobias Gerstenberg – I am a postdoctoral associate at MIT in Prof. Joshua Tenenbaum’s Computational Cognitive Science group. I did both my MSc and PhD at University College London and was advised by Prof. David Lagnado and Prof. Nick Chater. In my thesis, I explored the question of how people attribute responsibility to individuals in groups, and the way in which causal and counterfactual thinking influences people’s responsibility judgments. Currently, I look at how people’s intuitive theory of physics and psychology informs their causal and responsibility judgments. In my research, I formalize people’s mental models as computational models that yield quantitative predictions about a wide range of situations. To test these predictions, I use a combination of large-scale online experiments, interactive experiments in the lab, and eye-tracking experiments.

Understanding Why: From counterfactual Simulation to Responsibility Judgments

We are evaluative creatures. When we see people act, we can’t help but think about why they did what they did, and whether it was a good idea. Blaming or praising others requires us to answer at least two questions: What causal role did their action play in bringing about the outcome, and what does the action reveal about the person? To answer the first question, we need a model of how the world works. To answer the second one, we need a model of how people work – an intuitive theory of decision-making that allows us to reason backward from observed actions to the underlying mental states that caused them.

In this talk, I will present a computational framework for modeling causal explanations in terms of counterfactual simulations, and several lines of experiments testing this framework in the domains of intuitive psychology and intuitive physics. In intuitive psychology, this framework explains how the causal structure of a situation influences the extent to which individuals are held responsible for group outcomes, and how expectations modulate these judgments based on what a person’s action revealed about their disposition. In the domain of intuitive physics, the model predicts people’s causal judgments about a variety of physical scenes, including dynamic collision events, complex situations that involve multiple causes, omissions as causes, and causal responsibility for a system’s stability. It also captures the cognitive processes underlying these judgments as revealed by spontaneous eye movements.

Gemma Roig – I am a postdoctoral fellow at MIT in the Center for Brains Minds and Machines, with  Prof. Tomaso Poggio as my faculty host. I am also affiliated at the Laboratory for Computational and Statistical Learning, which is a collaborative agreement between the Istituto Italiano di Tecnologia and the Massachusetts Institute of Technology. I pursued my doctoral degree in Computer Vision at ETH Zurich. Previously, I was a research assistant at the Computer Vision Lab at EPFL in Lausanne, at the Department of Media Technologies at Ramon Llull University in Barcelona, and at the Robotics Institute – Carnegie Mellon University in Pittsburgh. I am interested in computational models of human vision to understand its underlying principles, and to use those models to build applications of artificial intelligence.

 Introduction to Deep Neural Networks and Applications

Deep Neural Networks emerged from the idea that the brain could be modeled as a computational machine that processes information. We are going to explore its beginnings in artificial intelligence, and how those kind of models were used to model the brain, putting special emphasis on the vision processing part. We are also going to see and go through its nowadays success in many applications, and we will discuss what made it possible.

We are also going to have a hands-on tutorial, in which we are going to explore how to set-up a simple application using available toolboxes and off-the-shelf libraries for learning and implementing deep neural networks models.

Hector Penagos – I am postdoc in Matt Wilson’s lab at MIT and the Center for Brains, Minds & Machines. I received my PhD from the Harvard-MIT Health, Sciences and Technology Program. As a graduate student I did some neuroimaging and psychophysics work to understand the neural correlate of pitch perception in humans. Ultimately, I did my dissertation in Matt Wilson’s lab studying the relationship between the anterior thalamus and hippocampus during navigation and memory processing. As a postdoc, I am extending my work to test the idea that the hippocampus can perform simulations that shape our decision-making process.

Sequential Information in the Hippocampus for Navigation and Decision-making

Navigation requires drafting a route to a destination and making predictions about upcoming locations to successfully execute that plan. The hippocampus is a key element in an extended network of brain structures involved in these spatial processes. In this talk we will explore the physiological states and neuronal representations in the hippocampus that enable flexible route planning and the prediction of immediate future trajectories. We will also explore how the hippocampus may simulate scenarios that incorporate indirect evidence to shape decision-making behavior.

Matt Peterson – I received my PhD in Cognitive Science from the University of California, Santa Barbara under the mentorship of Miguel Eckstein. Our work combined psychophysics, eye tracking, and computational modeling to understand why each person has their own distinct, personal style for where they look on faces. I am currently a postdoctoral researcher in Nancy Kanwisher’s lab at MIT. By measuring our real world visual experience, we aim to better understand the computations the brain uses to form our beliefs about the world and to guide our actions during normal everyday behavior.

Eye movements: The fundamental role of information selection in the complexity of the real world

Evolution has optimized the brain to produce successful behavior within the dizzying complexity of the natural world. An essential component of such a system is rapid updating of world knowledge through intelligent selection of useful sensory signals.  Perhaps the most fundamental selection mechanism is the guidance of gaze, or eye movements, a function enacted by a large network of dedicated neural systems. Here, we will explore how the brain decides where to look. In the lecture, we will examine the critical nature of eye movements through understanding the physiological constraints of the visual system and how the information they select is organized in the natural world. We will then discuss how measuring eye movements provides a window into the brain’s moment-by-moment information processing algorithms, access in many ways unique to eye tracking methods. In the tutorial, we will use a state-of-the-art mobile eye tracker in a basic face recognition task to test a fundamental assumption of laboratory experiments: that what we measure in artificial, tightly-controlled paradigms reflects what the brain actually does in the real world, which is presumably what the brain’s organization has been optimized for.

Workshop: mRNAseq on non-model organisms

The Increasing Diversity in Interdisciplinary Big Data to Knowledge (IDI-Bd2K) project is pleased to announce a workshop on the analysis of mRNAseq data from non-model organisms. The workshop will be held Friday August 19 and Saturday August 20, 2016 in the Computer Science Department of the University of Puerto Rico, Rio Piedras.

The instructor will be Dr. C. Titus Brown, an Associate Professor from the University of California, Davis. Dr. Brown is a lead developer of the khmer software for processing next-generation sequencing data.

To register for the workshop, please fill in our online registration. Spaces are limited.

Pre-registration and mentoring meeting May 11, 2016

Pre-registration for August starts this week, we’d like to make sure all the BD2K students are enrolled in the courses they need to compete for the internship opportunities.

BD2K students please come to the meeting in room C-356, Wednesday May 11, 2016 from 11:30 AM to 12:50 PM.

Bring a list of courses you have completed. The table of suggested courses is available online:

Get ready for IDI-BD2K

Workshop: Big Data Causal Discovery

The Increasing Diversity in Interdisciplinary Big Data to Knowledge (IDI-BD2K) Program at the University of Puerto Rico is pleased to announce a workshop on Big Data causal discovery.

Wednesday Feb 17, 2016

A.   Introduction. Presentations from the Center for Causal Discovery of the University of Pittsburgh and from the UPR IDI-BD2K participant faculty. Dr. Gregory Cooper and Dr. Richard Scheines from the University of Pittsburgh and several faculty members from UPR (and other participating institutions) will briefly present their ongoing projects.  These presentations are aimed at introducing students and faculty to Big Data Projects in Causal Discovery as applied to biomedical problems and at establishing collaborations between the BD2K participants and U. Pittsburgh

Wed 17 Feb, 2016
8:30 – 11:00 am

NCN A-211
Natural Sciences
Rio Piedras Campus
University of Puerto Rico

B. STUDENT RECRUITMENT for Big Data Summer Research Experiences. Drs Joseph Ayoob and David Boone will be presenting information on opportunities for training of students in Big Data, particularly in summer programs for undergraduate students at the University of Pittsburgh.

Wed Feb 17, 2016
11:30 am – 12:30 pm

NCN-A-211
Natural Sciences
Rio Piedras Campus
University of Puerto Rico

C.  HANDS ON WORKSHOP

Causal Discovery from Biomedical Data
Dr. Richard Scheines
University of Pittsburgh

Limited to 30 participants.  Those interested must register by writing to: jegarcia@hpcf.upr.edu

Wed February 17,  2016
1:00-3:00 pm

Julio Garcia Diaz building, room 123
Rio Piedras Campus
University of Puerto Rico

Get ready for IDI-BD2K

We were awarded the IDI-BD2K grant, and are getting ready to select students for next year. If you want to participate, try to select the courses you need to get up to speed during pre-registration.

Here’s a copy of the table showing the courses we want you to have. If you have questions, speak to one of the participating researchers.

courses