Blog

Anuncio de nuevo Curso de “Big Data” (MATE 4995, Sección 23)

Universidad de Puerto Rico

Recinto de Rio Piedras

Semestre: primer semestre 2017-2018

Codificación: MATE 4995 Sección 23

Título del curso- Análisis de datos masivos en aplicaciones biomédicas I (BBD1)

Profesora-  Dra.  Maria E. Perez

Requisitos-  MATE 3026 o equivalente y CCOM 3030 o permiso de la profesora.

Este curso busca preparar al estudiante para dos objetivos fundamentales:

  1. Poder trabajar con grandes cantidades de datos.  Algo esencial en futuros aspectos de todo tipo de investigación.  
  2. Poder participar en el proyecto de IDI-BD2K el próximo verano con experiencias de investigación en los Centro de Excelencia de BD2K en las universidades de Harvard, Pittsburgh y la Univ. de California, Santa Cruz.

Estudiantes de cualquier bachillerato de Ciencias Naturales pueden matricularse.

Mas información sobre el proyecto IDI-BD2K

Hoy como nunca antes la investigación  biomédica está generando cantidades masivas de datos, cuyo análisis e interpretación tiene el potencial de producir dramáticos avances en nuestro conocimiento sobre la salud humana y sobre nuestra calidad de vida. El análisis de estos conjuntos masivos de datos (“Big Data”) requiere técnicas que combinan conocimientos en Biología, Química, Estadística, Ciencias  de Cómputo y otras áreas.

El proyecto IDI-BD2K estará ofreciendo el curso MATE 4995 Sección 23 – Análisis de datos masivos en aplicaciones biomédicas I (BBD1) en otoño 2017. En este curso podrás aprender cómo encontrar grupos de genes sobreexpresados en una condición, como el cáncer. Puedes aprender a crear e interpretar modelos lineales que describen la respuesta a un tratamiento. Verán cómo manejar conjuntos masivos de datos genómicos y analizarlos.

Estudiantes que completen este curso y su continuación BBD2 calificarán para ir a un internado en alguno de los Centros de Excelencia de BD2K como Harvard, University of California Santa Cruz, y Pittsburgh.

Para información adicional puedes comunicarte con la Dra. Perez:  maria.perez34@upr.edu

Si aun no estas preparado para tomar BBD1 y 2, asegurate que estas tomando los cursos sugeridos por los Centros de Excelencia BD2K para tu concentración consultando la tabla a continuación:

Curso: MATE 4995 Sección 23 – Análisis de datos masivos en aplicaciones biomédicas I (BBD1)

Descripción del curso

Este curso es un curso intermedio sobre conceptos estadísticos y competencias de programación en el lenguaje de programación R para análisis de datos masivos (“Big Data”). En la primera parte del curso, se revisarán conceptos básicos de la Inferencia Estadística, incluyendo distribuciones de probabilidad, Teorema del Límite Central, intervalos de confianza y pruebas de hipótesis haciendo uso intensivo de R . Posteriormente, se estudiarán modelos lineales comúnmente usados para inferencia estadística. Para ello, se introducirán conceptos fundamentales de álgebra matricial requeridos para la representación y manejo de modelos lineales, así como técnicas de inferencia (estimación, pruebas de hipótesis y diagnósticos) para los mismos. Se discutirán los conceptos de interacción y contraste, y cómo realizar inferencia sobre los mismos. Finalmente, se discutirán varios tópicos estadísticos relevantes para el análisis de grandes volúmenes de datos, incluyendo problemas de múltiples pruebas de hipótesis, tasas de error, procedimientos para control de las tasas de error, tasas de falsos descubrimientos, valores q y métodos exploratorios para grandes volúmenes de datos. Se introducirán conceptos de modelización estadística y su aplicación a grandes volúmenes de datos, discutiendo en particular modelos probabilísticos paramétricos y técnicas de estimación de parámetros. Todo el contenido se discutirá en casos prácticos usando R para programación y análisis de datos.

Este curso expande los cursos PH525.1x: Statistics and R, PH525.2x Data Analysis for Life Sciences 2: Introduction to Linear Models and Matrix Algebra ,  PH525.3x Data Analysis for Life Sciences 3: Statistical Inference and Modeling for High-throughput Experiments  de la secuencia de cursos diseñada por el Prof. Rafael Irizarry (Biostatistics, Harvard University) para HarvardX en http://www.edx.org.

Objetivos del curso

Al finalizar del curso el estudiante podrá:

  1. Identificar variables aleatorias con distribución normal y binomial, y calcular probabilidades asociadas con ellas.
  2. Usar el Teorema del Límite Central para calcular probabilidades asociadas al promedio de grandes cantidades de datos.
  3. Calcular e interpretar p-valores para pruebas de hipótesis asociadas con medias de distribuciones normales o con grandes cantidades de datos.
  4. Calcular e interpretar intervalos de confianza para las situaciones indicadas en 3.
  5. Interpretar la potencia de una prueba de hipótesis
  6. Usar gráficos adecuados para resumir la información en un conjunto de datos.
  7. Usar notación matricial y realizar operaciones entre matrices.
  8. Usar notación matricial para representar modelos lineales y usar operaciones entre matrices para ajustar dichos modeloss
  9. Realizar inferencia sobre modelos lineales, e interpretar términos de interacción y contrastes.
  10. Aplicar técnicas para control de errores en el problema de múltiples pruebas de hipótesis simultáneas.
  11. Aplicar técnicas de inferencia para distintos modelos probabilísticos.
  12. Aplicar técnicas para la exploración de grandes volúmenes de datos.

Estrategias instruccionales

Se usarán estrategias al estilo del Flipped Massive Online Course.  Se usarán el primero, segundo y tercer curso de la serie “Data Analysis for Life Sciences” en el site edX de Harvard creado por el Prof.  Rafael Irizarry, denominados “Statistics and R”, “Introduction to Linear Models and Matrix Algebra” y “Statistical Inference and Modeling for High-throughput Experiments”.  

Bibliografía

  1. Libro de texto: Data Analysis for the Life Sciences. Rafael Irizarry and Michael Love (disponible en http://www.leanpub.com)
  2. Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer)
  3. S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
  4. Programming with Data: A Guide to the S Language by John M. Chambers (Springer)

Referencias Electrónicas

  1. R reference card (PDF) by Tom Short (more can be found under Short Documents and Reference Cards here)
  2. Quick-R: Quick online reference for data input, basic statistics, and plots 
  3. Thomas Girke’s R & Bioconductor manuals
  4. R programming class on Coursera,  taught by Roger Peng, Jeff Leek, and Brian Caffo
  5. The free “try R” class from Code School is also a good place to start: http://tryr.codeschool.com/
  6. swirl: learn R interactively from within the R console

Student Publication

Ivan Jimenez-Ruiz, IDI-BD2k student, went to our partner institution, the Center for Causal Discovery in Pittsburgh last summer for an internship.

This summer he will present work done at his Summer 2016 internship at the Practice & Experience in Advanced Research Computing Conference Series (PEARC’17) conference July 9-13, 2017 in New Orleans, Louisiana, USA.

After that, he will be starting PhD studies in North Carolina. We wish you the best Ivan!

  1. I. Jimenez-Ruiz, R. Gonzalez-Mendez, A. Ropelewski. 2017. In Proceedings of ACM PEARC conference, New Orleans, USA, July 2017 (PEARC’17), 4 pages. http://dx.doi.org/10.1145/3093338.3093372

Local copy:

ILJR_PEARC_Final_Draft

¡Nuestros estudiantes en internados de verano 2016!

Hi! My name is Iván Jiménez and I am a senior Computer Science undergrad at the University of Puerto Rico, Rio Piedras Campus (UPRRP). Through the UPRRP’s Increasing Diversity in Interdisciplinary Big Data to Knowledge (IDI-BD2K) program, I was able to participate in the Internship in Biomedical Research, Informatics and Computer Science (iBRIC) during this summer. I worked with Alexander Ropelewski at the Pittsburgh Supercomputing Center (PSC) in Pittsburgh, Pennsylvania. We worked in collaboration with the Department of Biomedical Informatics (DBMI) of the University of Pittsburgh on our project titled “Optimizing High Performance Big Data Cancer Workflows”. The project consisted of running cancer workflows on different file systems of PSC’s new supercomputer: Bridges. We optimized these workflows in terms of execution timings and memory usage and then produced recommendations for future use of programs on each file system.

I also had the chance to develop a project poster and present it at the annual Duquesne Summer Undergraduate Research Symposium at Duquesne University. Our work has been recently accepted for publishing as a student paper for the Practice and Experience in Advanced Research Computing (PEARC-2017) conference.

Apart from work, one of the best highlights of my summer was touring the PSC, UPitt and Carnegie Mellon campuses and buildings with my fellow interns. The list of awesome activities we did as a group is too long to mention in this summary so I’ll just include the schedule we tried to follow throughout the internship.

Thanks to the IDI-BD2K program, I will never forget fun I had while meeting new people, eating lots of pizza and learning about Bioinformatics and High Performance Computing.

Brains, Minds, and Machines Workshop

Brains, Minds, and Machines Workshop

May 12-13, 2017

9:00 am – 4:30 pm

Engine-4
SPORTS COMPLEX ONOFRE CARBALLEIRA
Rd. PR-5 Jct. Rd. PR-2
Bayamón, PR 00959

Register at: http://tinyurl.com/BMMPR17

The Brains Minds and Machines Seminar in Puerto Rico will be an intensive two-day course offered to undergraduate students from Puerto Rico. It will be an introduction to the problem of intelligence from a multidisciplinary perspective, taught by postdocs from the MIT Center for Brains Minds and Machines. The course will consist of lectures and hands-on tutorials on the computational aspects of cognitive science, neuroscience and computer science.

This event is sponsored by the MIT Center For Brains, Minds and Machines, NIH NeuroID Program in the Department of Biology at University of Puerto Rico Río Piedras (UPRRP) Department of Biology, NIH Increasing Diversity in Interdisciplinary Big Data to Knowledge Program at UPRRP in the Departments of Biology, Computer Science and Mathematics, Evertec, Wovenware, and Engine-4.

Speakers

Tobias Gerstenberg, PhD
Understanding Why: From Counterfactual Simulation to Responsibility Judgments

 

Gemma Roig, PhD
Introduction to Deep Neural Networks and Applications

 

 

Hector Penagos, PhD
Sequential information in the Hippocampus for Navigation and Decision-Making

 

Matt Peterson, PhD
Eye movements: The Fundamental Role of Information Selection in the Complexity of the Real World

 

Bios and Abstracts

Tobias Gerstenberg – I am a postdoctoral associate at MIT in Prof. Joshua Tenenbaum’s Computational Cognitive Science group. I did both my MSc and PhD at University College London and was advised by Prof. David Lagnado and Prof. Nick Chater. In my thesis, I explored the question of how people attribute responsibility to individuals in groups, and the way in which causal and counterfactual thinking influences people’s responsibility judgments. Currently, I look at how people’s intuitive theory of physics and psychology informs their causal and responsibility judgments. In my research, I formalize people’s mental models as computational models that yield quantitative predictions about a wide range of situations. To test these predictions, I use a combination of large-scale online experiments, interactive experiments in the lab, and eye-tracking experiments.

Understanding Why: From counterfactual Simulation to Responsibility Judgments

We are evaluative creatures. When we see people act, we can’t help but think about why they did what they did, and whether it was a good idea. Blaming or praising others requires us to answer at least two questions: What causal role did their action play in bringing about the outcome, and what does the action reveal about the person? To answer the first question, we need a model of how the world works. To answer the second one, we need a model of how people work – an intuitive theory of decision-making that allows us to reason backward from observed actions to the underlying mental states that caused them.

In this talk, I will present a computational framework for modeling causal explanations in terms of counterfactual simulations, and several lines of experiments testing this framework in the domains of intuitive psychology and intuitive physics. In intuitive psychology, this framework explains how the causal structure of a situation influences the extent to which individuals are held responsible for group outcomes, and how expectations modulate these judgments based on what a person’s action revealed about their disposition. In the domain of intuitive physics, the model predicts people’s causal judgments about a variety of physical scenes, including dynamic collision events, complex situations that involve multiple causes, omissions as causes, and causal responsibility for a system’s stability. It also captures the cognitive processes underlying these judgments as revealed by spontaneous eye movements.

Gemma Roig – I am a postdoctoral fellow at MIT in the Center for Brains Minds and Machines, with  Prof. Tomaso Poggio as my faculty host. I am also affiliated at the Laboratory for Computational and Statistical Learning, which is a collaborative agreement between the Istituto Italiano di Tecnologia and the Massachusetts Institute of Technology. I pursued my doctoral degree in Computer Vision at ETH Zurich. Previously, I was a research assistant at the Computer Vision Lab at EPFL in Lausanne, at the Department of Media Technologies at Ramon Llull University in Barcelona, and at the Robotics Institute – Carnegie Mellon University in Pittsburgh. I am interested in computational models of human vision to understand its underlying principles, and to use those models to build applications of artificial intelligence.

 Introduction to Deep Neural Networks and Applications

Deep Neural Networks emerged from the idea that the brain could be modeled as a computational machine that processes information. We are going to explore its beginnings in artificial intelligence, and how those kind of models were used to model the brain, putting special emphasis on the vision processing part. We are also going to see and go through its nowadays success in many applications, and we will discuss what made it possible.

We are also going to have a hands-on tutorial, in which we are going to explore how to set-up a simple application using available toolboxes and off-the-shelf libraries for learning and implementing deep neural networks models.

Hector Penagos – I am postdoc in Matt Wilson’s lab at MIT and the Center for Brains, Minds & Machines. I received my PhD from the Harvard-MIT Health, Sciences and Technology Program. As a graduate student I did some neuroimaging and psychophysics work to understand the neural correlate of pitch perception in humans. Ultimately, I did my dissertation in Matt Wilson’s lab studying the relationship between the anterior thalamus and hippocampus during navigation and memory processing. As a postdoc, I am extending my work to test the idea that the hippocampus can perform simulations that shape our decision-making process.

Sequential Information in the Hippocampus for Navigation and Decision-making

Navigation requires drafting a route to a destination and making predictions about upcoming locations to successfully execute that plan. The hippocampus is a key element in an extended network of brain structures involved in these spatial processes. In this talk we will explore the physiological states and neuronal representations in the hippocampus that enable flexible route planning and the prediction of immediate future trajectories. We will also explore how the hippocampus may simulate scenarios that incorporate indirect evidence to shape decision-making behavior.

Matt Peterson – I received my PhD in Cognitive Science from the University of California, Santa Barbara under the mentorship of Miguel Eckstein. Our work combined psychophysics, eye tracking, and computational modeling to understand why each person has their own distinct, personal style for where they look on faces. I am currently a postdoctoral researcher in Nancy Kanwisher’s lab at MIT. By measuring our real world visual experience, we aim to better understand the computations the brain uses to form our beliefs about the world and to guide our actions during normal everyday behavior.

Eye movements: The fundamental role of information selection in the complexity of the real world

Evolution has optimized the brain to produce successful behavior within the dizzying complexity of the natural world. An essential component of such a system is rapid updating of world knowledge through intelligent selection of useful sensory signals.  Perhaps the most fundamental selection mechanism is the guidance of gaze, or eye movements, a function enacted by a large network of dedicated neural systems. Here, we will explore how the brain decides where to look. In the lecture, we will examine the critical nature of eye movements through understanding the physiological constraints of the visual system and how the information they select is organized in the natural world. We will then discuss how measuring eye movements provides a window into the brain’s moment-by-moment information processing algorithms, access in many ways unique to eye tracking methods. In the tutorial, we will use a state-of-the-art mobile eye tracker in a basic face recognition task to test a fundamental assumption of laboratory experiments: that what we measure in artificial, tightly-controlled paradigms reflects what the brain actually does in the real world, which is presumably what the brain’s organization has been optimized for.

Seminar: The development and regeneration of the sea star larval nervous system

UNIVERSITY OF PUERTO RICO
RIO PIEDRAS CAMPUS
COLLEGE OF NATURAL SCIENCES

YOU ARE INVITED TO THE SEMINAR:

“The development and regeneration of the sea star larval nervous system”

Veronica Hinman, PhD
Associate Professor
Biological Sciences &
Computational Biology
Carnegie Mellon University

Date: Friday, May 5, 2017
Time: 1:00PM
Place: Centro para Puerto Rico
Fundación Sila M. Calderón
Urb. Santa Rita C/ González #1020
Río Piedras , PR 00925

Seminario: Análisis interactivo, estadístico y visual para datos de genómica funcional y metagenómica

El programa IDI-BD2K auspicia la presentación del seminario titulado: “Análisis interactivo, estadístico y visual para datos de genómica funcional y metagenómica” a ser ofrecido por el Dr. Hector Corrada-Bravo, Associate Professor of The University of Maryland.

Este seminario será este próximo viernes, 21 de abril de 2017 en el Centro para Puerto Rico (Fundación Sila M. Calderón) a las 10:30AM.

Favor de confirmar su asistencia envando correo elecronico a marimarvelaz@gmail.com

Healthcare Innovation Replicathon 2017 and Data Carpentry Instructor Training

Tracy K Teal giving the keynote address at the Healthcare Innovation Replicatihon.

Our Healthcare Innovation Replicathon and Data Carpentry Instructor Training events were a success! Students and faculty from many campuses and departments met on March 24-25, 2017 at the Engine-4 co-working space in Bayamón. Puerto Rico.. Students took part in the 36-hour Healthcare Innovation Replicathon, led by Alejandro Reyes and Keegan Korthauer from Rafael Irizarry’s rafalab (Dana Farber Cancer Center & Harvard) Patricia Ordoñez (UPR RP Computer Science), and Phillip Brooks from Titus Brown’s Lab for Data Intensive Biology (UC Davis). Students took brief tutorials on R, reproducible research, and statistical analysis, then dove in to examine two studies on pharmacogenomics in cancer cell lines.

Keegan Korthauer describing the pharmacogenomics problem studied in the Healthcare Innovation Replicathon.

Interdisciplinary (and inter-campus) teams of students worked for 24 hours on a re-analysis of data from two large scale studies looking at the effects of 15 drugs in 240 cancer cell lines. They presented their findings on Saturday, including errors in the published figures, their recommendations on improved measures of drug effects, and lists of drugs for follow-up studies.

The Healthcare Innovation Replicathon wasn’t just about data, statistics, programs and cancer. Students, mentors, sponsors, and faculty got a chance to interact in a welcoming environment, with good food, top-notch Internet, and even a guitar or two.

Phillip Brooks (standing, tweeting about the event) and Alejandro Reyes (seated, on guitar duty) entertaining the participants in the Healthcare Innovation Replicathon.
Students from competing teams set aside their differences to make some music.
Final presentations by students at the Healthcare Innovation Replicathon (photo credit K. Korthauer)

Meanwhile, faculty from Interamerican University Bayamon Campus, UPR Humacao, Mayaguez, Rio Piedras and private industry went through Data Carpentry Instructor Training led by Rayna Harris (UT Austin), Sue McClatchy (The Jackson Laboratory), and Tracy Teal (Data Carpentry). Data Carpentry Instructor Training presents instructors with research-based best practices for teaching data science to novices. Stay tuned for announcements of new Data Carpentry workshops with some of the new instructors soon.

A dozen instructors getting trained as part of Data Carpentry Instructor Training.

The IDI-BD2K project would like to thank all the sponsors that made this event a success: VarMed Management, PR-INBRE, the National Institutes of Health, CIQA.net, the University of Puerto Rico, the Lab for Data Intensive Biology at the University of California, Davis, rafalab at Harvard University, AbartysHealth, e3 consulting, Data Carpentry, Engine-4, and the UPR Rio Piedras Department of Computer Science.

Patricia Ordoñez, one of the IDI-BD2K Principal Investigators, thanking some of our sponsors at the Healthcare Innovation Replicathon.

Mañana Comienza el Healthcare Innovation Replicathon

Qué es un Replicathon?

Un replicathon, similar a un hackathon, se caracteriza por ser una actividad de 36 horas continuas de trabajo analítico y de programación para crear soluciones reales usando la tecnología. A diferencia de un hackathon tradicional, los equipos recibirán el mismo reto o problema: dos manuscritos científicos que llegaron a dos resultados diferentes utilizando los mismos datos. El fin es que el equipo de participantes interprete los datos y presente sus conclusiones. En un hackathon, normalmente la solución se realiza en forma de un App (una aplicación móvil o web). El replicathon requiere colaboración interdisciplinaria entre los expertos en programación, los de análisis de datos, los del contexto (genómica en este caso).

Estudiantes pueden matricularse para participar aqui:

http://bit.ly/replicathon2017