Courses – IDI-BD2K

Becas para CCOM 3030 y MATE 3026

El Programa Increasing Diversity in Interdisciplinary Big Data to Knowledge (IDI-BD2K) estará otorgando 30 BECAS a estudiantes subgraduados para cursos en verano 2019.

15 becas para CCOM 3030 Introducción a Ciencia de Cómputos en Python
15 becas para MATE 3026 Estadística en R

P R I O R I D A D A E S T U D I A N T E S I N T E R E S A D O S E N:
❖ Análisis de Big Data o Ciencia de Datos Biomédicos
❖ Tomar curso de Data Science en Agosto 2019
❖ Participar de un programa de verano en el 2020 en alguna de las siguientes instituciones colaboradoras del Programa IDI-BD2K como:

La Universidad de Harvard
La Universidad de Pittsburgh
La Universidad de California Santa Cruz

* Completar solicitud en línea: https://tinyurl.com/verano2019uprrp
* Enviar Transcripción de Crédito No-Oficial a: marimarvelaz@gmail.com

Workshop: UCSC Genome Browser – April 12-13

IDI-BD2K is pleased to sponsor a two-day workshop on the UCSC Genome Browser. The workshop is 9:00 AM – 4:00 PM both days. Room-3373 at the Plaza Universitaria Building, Torre Norte, at the University of Puerto Rico, Río Piedras Campus (UPR-RP).

Registration: https://es.surveymonkey.com/r/9zhs8wn

Software Carpentry Workshops (Python and R)

The IDI-BD2K project and the AECC is organizing two workshops January 11-12, 2019 in UPR Rio Piedras, Department of Computer Science. We will be running Software Carpentry workshops in Python and R.

Software Carpentry aims to help researchers get their work done in less time and with less pain by teaching them basic research computing skills. This hands-on workshops will cover basic concepts and tools, including program design, version control, data management, and task automation. Participants will be encouraged to help one another and to apply what they have learned to their own research problems.

Python is a general purpose scripting language popular in scientific computing.

R is a statistical computing language with extensive applications in bio-statistics and ecology.

Please register only for the version you will attend. The workshops will be held at the same time.

Registration for the R version

Registration for the Python version

Report from the August 2018 Data Carpentry Workshop in Puerto Rico

The IDI-BD2K project organized a Data Carpentry Genomics workshop at the University of Puerto Rico Rio Piedras Campus, sponsored by the South Big Data Hub.

The workshop was a resounding success with 35 learners registered, and 13 more on a waiting list. Attendees ranged from undergraduate and graduate students to faculty and staff. We need to do more Carpentries workshops!

Photo of workshop venue showing Participants in the Data Carpentry Genomics workshop — Participants in the Data Carpentry Genomics workshop

Instructors were Nelly Selem from Mexico and Humberto Ortiz-Zuazaga from Rio Piedras, and a group of volunteer helpers: Eveliz Peguero, Sebastian Cruz, Israel Dilán, Abraham Avelar, and Kevin Legarreta Gonzalez.

Carpentry workshops teach foundational coding and data science skills to researchers, so they are a great match for IDI-BD2K’s goal of creating diverse teams of scientists looking at turning biological data into biomedical knowledge.

Participants learned how to manipulate next generation sequencing data to see variants in a population of E. coli. To do this, they used cloud computing resources, logged in remotely, processed files on the command line, and wrote scripts to automate parts of the analysis.

Example responses from learners on something they learned at the workshop. — What I learned at the workshop.

The Carpentries also disseminate best practices on teaching in STEM, as informed by research and the instructor’s experience. The green and red papers on learners desks or laptops are one example. Learners are asked to place the green paper on their laptop if they complete an exercise, and a red one if they get stuck. This feedback helps maintain an appropriate pace for the workshop. I forgot to use the stickies the first day of the workshop, and at times we went too fast.

Square notes with learner feedback. — One thing I didn’t like about the first day of the workshop.

Workshop: Design Thinking. August 13-16, 2018.

INCREASING DIVERSITY IN INTERDISCIPLINARY BIG DATA TO KNOWLEDGE PROGRAM PRESENTS:

THINKING LIKE A DESIGNER

4-DAY WORKSHOP

AUGUST 13TH-16TH, 2018

8:30AM-4:00PM

UPR RÍO PIEDRAS
DEPARTMENT OF COMPUTER SCIENCE ROOM A 143

This 4-day workshop will introduce students to the design thinking process through a series of hands-on collaborative activities combined with theoretical and practical lectures. Students will focus on a design challenge of their choosing and ground their process in a human need. By utilizing creative problem solving techniques, students will seek to understand the problem from the user’s perspective, generate ideas, make their ideas tangible, and gather actionable feedback. Collaboration as well as communicating their ideas and crafting a compelling narrative will be a consistent thread throughout the workshop.

FOR MORE INFORMATION:
(787) 764-0000; EXT. 88179 OR MARIMAR.VELAZQUEZ1@UPR.EDU REGISTER AT https://tinyurl.com/idi-bd2k2018

Data Carpentry Genomics workshop August 17-18, 2018

South Big Data Hub/DataUP/Georgia Tech are sponsoring a Data Carpentry Genomics workshop with IDI-BD2K in Rio Piedras August 17-18, 2018.

Genomics Project Organization

Data tidiness
Planning NGS Projects
Examining Data on NCBI SRA database

The Unix Shell

Files and directories
Pipes and redirection
Creating and running shell scripts
Organizing bioinformatics projects

Wrangling Genomics Data

Assessing Read Quality
Trimming and Filtering Reads
Variant Calling
Automation

Cloud computing

What is the cloud
Logging into the cloud
Setting up your environment
Moving data and results to and from the cloud

See the course page for details and registration:

https://idi-bd2k.github.io/2018-08-17-puertorico-genomics/

Anuncio de Curso de “Big Data” (MATE 4995, Sección 012)

Universidad de Puerto Rico

Recinto de Rio Piedras

Semestre: primer semestre 2018-2019

Codificación: MATE 4995 Sección 012

Título del curso- Análisis de datos masivos en aplicaciones biomédicas I (BBD1)

Profesora- Dra. Maria E. Perez

Requisitos- MATE 3026 o equivalente y CCOM 3030 o permiso de la profesora.

Este curso busca preparar al estudiante para dos objetivos fundamentales:

Poder trabajar con grandes cantidades de datos. Algo esencial en futuros aspectos de todo tipo de investigación.
Poder participar en el proyecto de IDI-BD2K el próximo verano con experiencias de investigación en los Centro de Excelencia de BD2K en las universidades de Harvard, Pittsburgh y la Univ. de California, Santa Cruz.

Estudiantes de cualquier bachillerato de Ciencias Naturales pueden matricularse.

Mas información sobre el proyecto IDI-BD2K

Hoy como nunca antes la investigación biomédica está generando cantidades masivas de datos, cuyo análisis e interpretación tiene el potencial de producir dramáticos avances en nuestro conocimiento sobre la salud humana y sobre nuestra calidad de vida. El análisis de estos conjuntos masivos de datos (“Big Data”) requiere técnicas que combinan conocimientos en Biología, Química, Estadística, Ciencias de Cómputo y otras áreas.

El proyecto IDI-BD2K estará ofreciendo el curso MATE 4995 Sección 23 – Análisis de datos masivos en aplicaciones biomédicas I (BBD1) en otoño 2017. En este curso podrás aprender cómo encontrar grupos de genes sobreexpresados en una condición, como el cáncer. Puedes aprender a crear e interpretar modelos lineales que describen la respuesta a un tratamiento. Verán cómo manejar conjuntos masivos de datos genómicos y analizarlos.

Estudiantes que completen este curso y su continuación BBD2 calificarán para ir a un internado en alguno de los Centros de Excelencia de BD2K como Harvard, University of California Santa Cruz, y Pittsburgh.

Para información adicional puedes comunicarte con la Dra. Perez: maria.perez34@upr.edu

Si aun no estas preparado para tomar BBD1 y 2, asegurate que estas tomando los cursos sugeridos por los Centros de Excelencia BD2K para tu concentración consultando la tabla a continuación:

Anuncio de nuevo Curso de “Big Data” (MATE 4995, Sección 23)

Universidad de Puerto Rico

Recinto de Rio Piedras

Semestre: primer semestre 2017-2018

Codificación: MATE 4995 Sección 23

Título del curso- Análisis de datos masivos en aplicaciones biomédicas I (BBD1)

Profesora- Dra. Maria E. Perez

Requisitos- MATE 3026 o equivalente y CCOM 3030 o permiso de la profesora.

Este curso busca preparar al estudiante para dos objetivos fundamentales:

Poder trabajar con grandes cantidades de datos. Algo esencial en futuros aspectos de todo tipo de investigación.
Poder participar en el proyecto de IDI-BD2K el próximo verano con experiencias de investigación en los Centro de Excelencia de BD2K en las universidades de Harvard, Pittsburgh y la Univ. de California, Santa Cruz.

Estudiantes de cualquier bachillerato de Ciencias Naturales pueden matricularse.

Mas información sobre el proyecto IDI-BD2K

Para información adicional puedes comunicarte con la Dra. Perez: maria.perez34@upr.edu

Si aun no estas preparado para tomar BBD1 y 2, asegurate que estas tomando los cursos sugeridos por los Centros de Excelencia BD2K para tu concentración consultando la tabla a continuación:

Curso: MATE 4995 Sección 23 – Análisis de datos masivos en aplicaciones biomédicas I (BBD1)

Descripción del curso

Este curso es un curso intermedio sobre conceptos estadísticos y competencias de programación en el lenguaje de programación R para análisis de datos masivos (“Big Data”). En la primera parte del curso, se revisarán conceptos básicos de la Inferencia Estadística, incluyendo distribuciones de probabilidad, Teorema del Límite Central, intervalos de confianza y pruebas de hipótesis haciendo uso intensivo de R . Posteriormente, se estudiarán modelos lineales comúnmente usados para inferencia estadística. Para ello, se introducirán conceptos fundamentales de álgebra matricial requeridos para la representación y manejo de modelos lineales, así como técnicas de inferencia (estimación, pruebas de hipótesis y diagnósticos) para los mismos. Se discutirán los conceptos de interacción y contraste, y cómo realizar inferencia sobre los mismos. Finalmente, se discutirán varios tópicos estadísticos relevantes para el análisis de grandes volúmenes de datos, incluyendo problemas de múltiples pruebas de hipótesis, tasas de error, procedimientos para control de las tasas de error, tasas de falsos descubrimientos, valores q y métodos exploratorios para grandes volúmenes de datos. Se introducirán conceptos de modelización estadística y su aplicación a grandes volúmenes de datos, discutiendo en particular modelos probabilísticos paramétricos y técnicas de estimación de parámetros. Todo el contenido se discutirá en casos prácticos usando R para programación y análisis de datos.

Este curso expande los cursos PH525.1x: Statistics and R, PH525.2x Data Analysis for Life Sciences 2: Introduction to Linear Models and Matrix Algebra , PH525.3x Data Analysis for Life Sciences 3: Statistical Inference and Modeling for High-throughput Experiments de la secuencia de cursos diseñada por el Prof. Rafael Irizarry (Biostatistics, Harvard University) para HarvardX en http://www.edx.org.

Objetivos del curso

Al finalizar del curso el estudiante podrá:

Identificar variables aleatorias con distribución normal y binomial, y calcular probabilidades asociadas con ellas.
Usar el Teorema del Límite Central para calcular probabilidades asociadas al promedio de grandes cantidades de datos.
Calcular e interpretar p-valores para pruebas de hipótesis asociadas con medias de distribuciones normales o con grandes cantidades de datos.
Calcular e interpretar intervalos de confianza para las situaciones indicadas en 3.
Interpretar la potencia de una prueba de hipótesis
Usar gráficos adecuados para resumir la información en un conjunto de datos.
Usar notación matricial y realizar operaciones entre matrices.
Usar notación matricial para representar modelos lineales y usar operaciones entre matrices para ajustar dichos modeloss
Realizar inferencia sobre modelos lineales, e interpretar términos de interacción y contrastes.
Aplicar técnicas para control de errores en el problema de múltiples pruebas de hipótesis simultáneas.
Aplicar técnicas de inferencia para distintos modelos probabilísticos.
Aplicar técnicas para la exploración de grandes volúmenes de datos.

Estrategias instruccionales

Se usarán estrategias al estilo del Flipped Massive Online Course. Se usarán el primero, segundo y tercer curso de la serie “Data Analysis for Life Sciences” en el site edX de Harvard creado por el Prof. Rafael Irizarry, denominados “Statistics and R”, “Introduction to Linear Models and Matrix Algebra” y “Statistical Inference and Modeling for High-throughput Experiments”.

Bibliografía

Libro de texto: Data Analysis for the Life Sciences. Rafael Irizarry and Michael Love (disponible en http://www.leanpub.com)
Software for Data Analysis: Programming with R (Statistics and Computing) by John M. Chambers (Springer)
S Programming (Statistics and Computing) Brian D. Ripley and William N. Venables (Springer)
Programming with Data: A Guide to the S Language by John M. Chambers (Springer)

Referencias Electrónicas

R reference card (PDF) by Tom Short (more can be found under Short Documents and Reference Cards here)
Quick-R: Quick online reference for data input, basic statistics, and plots
Thomas Girke’s R & Bioconductor manuals
R programming class on Coursera, taught by Roger Peng, Jeff Leek, and Brian Caffo
The free “try R” class from Code School is also a good place to start: http://tryr.codeschool.com/
swirl: learn R interactively from within the R console

Brains, Minds, and Machines Workshop

May 12-13, 2017

9:00 am – 4:30 pm

Engine-4
SPORTS COMPLEX ONOFRE CARBALLEIRA
Rd. PR-5 Jct. Rd. PR-2
Bayamón, PR 00959

The Brains Minds and Machines Seminar in Puerto Rico will be an intensive two-day course offered to undergraduate students from Puerto Rico. It will be an introduction to the problem of intelligence from a multidisciplinary perspective, taught by postdocs from the MIT Center for Brains Minds and Machines. The course will consist of lectures and hands-on tutorials on the computational aspects of cognitive science, neuroscience and computer science.

This event is sponsored by the MIT Center For Brains, Minds and Machines, NIH NeuroID Program in the Department of Biology at University of Puerto Rico Río Piedras (UPRRP) Department of Biology, NIH Increasing Diversity in Interdisciplinary Big Data to Knowledge Program at UPRRP in the Departments of Biology, Computer Science and Mathematics, Evertec, Wovenware, and Engine-4.

Speakers

Tobias Gerstenberg, PhD
Understanding Why: From Counterfactual Simulation to Responsibility Judgments

Gemma Roig, PhD
Introduction to Deep Neural Networks and Applications

Hector Penagos, PhD
Sequential information in the Hippocampus for Navigation and Decision-Making

Matt Peterson, PhD
Eye movements: The Fundamental Role of Information Selection in the Complexity of the Real World

Bios and Abstracts

Tobias Gerstenberg – I am a postdoctoral associate at MIT in Prof. Joshua Tenenbaum’s Computational Cognitive Science group. I did both my MSc and PhD at University College London and was advised by Prof. David Lagnado and Prof. Nick Chater. In my thesis, I explored the question of how people attribute responsibility to individuals in groups, and the way in which causal and counterfactual thinking influences people’s responsibility judgments. Currently, I look at how people’s intuitive theory of physics and psychology informs their causal and responsibility judgments. In my research, I formalize people’s mental models as computational models that yield quantitative predictions about a wide range of situations. To test these predictions, I use a combination of large-scale online experiments, interactive experiments in the lab, and eye-tracking experiments.

Understanding Why: From counterfactual Simulation to Responsibility Judgments

We are evaluative creatures. When we see people act, we can’t help but think about why they did what they did, and whether it was a good idea. Blaming or praising others requires us to answer at least two questions: What causal role did their action play in bringing about the outcome, and what does the action reveal about the person? To answer the first question, we need a model of how the world works. To answer the second one, we need a model of how people work – an intuitive theory of decision-making that allows us to reason backward from observed actions to the underlying mental states that caused them.

In this talk, I will present a computational framework for modeling causal explanations in terms of counterfactual simulations, and several lines of experiments testing this framework in the domains of intuitive psychology and intuitive physics. In intuitive psychology, this framework explains how the causal structure of a situation influences the extent to which individuals are held responsible for group outcomes, and how expectations modulate these judgments based on what a person’s action revealed about their disposition. In the domain of intuitive physics, the model predicts people’s causal judgments about a variety of physical scenes, including dynamic collision events, complex situations that involve multiple causes, omissions as causes, and causal responsibility for a system’s stability. It also captures the cognitive processes underlying these judgments as revealed by spontaneous eye movements.

Gemma Roig – I am a postdoctoral fellow at MIT in the Center for Brains Minds and Machines, with Prof. Tomaso Poggio as my faculty host. I am also affiliated at the Laboratory for Computational and Statistical Learning, which is a collaborative agreement between the Istituto Italiano di Tecnologia and the Massachusetts Institute of Technology. I pursued my doctoral degree in Computer Vision at ETH Zurich. Previously, I was a research assistant at the Computer Vision Lab at EPFL in Lausanne, at the Department of Media Technologies at Ramon Llull University in Barcelona, and at the Robotics Institute – Carnegie Mellon University in Pittsburgh. I am interested in computational models of human vision to understand its underlying principles, and to use those models to build applications of artificial intelligence.

Introduction to Deep Neural Networks and Applications

Deep Neural Networks emerged from the idea that the brain could be modeled as a computational machine that processes information. We are going to explore its beginnings in artificial intelligence, and how those kind of models were used to model the brain, putting special emphasis on the vision processing part. We are also going to see and go through its nowadays success in many applications, and we will discuss what made it possible.

We are also going to have a hands-on tutorial, in which we are going to explore how to set-up a simple application using available toolboxes and off-the-shelf libraries for learning and implementing deep neural networks models.

Hector Penagos – I am postdoc in Matt Wilson’s lab at MIT and the Center for Brains, Minds & Machines. I received my PhD from the Harvard-MIT Health, Sciences and Technology Program. As a graduate student I did some neuroimaging and psychophysics work to understand the neural correlate of pitch perception in humans. Ultimately, I did my dissertation in Matt Wilson’s lab studying the relationship between the anterior thalamus and hippocampus during navigation and memory processing. As a postdoc, I am extending my work to test the idea that the hippocampus can perform simulations that shape our decision-making process.

Sequential Information in the Hippocampus for Navigation and Decision-making

Navigation requires drafting a route to a destination and making predictions about upcoming locations to successfully execute that plan. The hippocampus is a key element in an extended network of brain structures involved in these spatial processes. In this talk we will explore the physiological states and neuronal representations in the hippocampus that enable flexible route planning and the prediction of immediate future trajectories. We will also explore how the hippocampus may simulate scenarios that incorporate indirect evidence to shape decision-making behavior.

Matt Peterson – I received my PhD in Cognitive Science from the University of California, Santa Barbara under the mentorship of Miguel Eckstein. Our work combined psychophysics, eye tracking, and computational modeling to understand why each person has their own distinct, personal style for where they look on faces. I am currently a postdoctoral researcher in Nancy Kanwisher’s lab at MIT. By measuring our real world visual experience, we aim to better understand the computations the brain uses to form our beliefs about the world and to guide our actions during normal everyday behavior.

Eye movements: The fundamental role of information selection in the complexity of the real world

Evolution has optimized the brain to produce successful behavior within the dizzying complexity of the natural world. An essential component of such a system is rapid updating of world knowledge through intelligent selection of useful sensory signals. Perhaps the most fundamental selection mechanism is the guidance of gaze, or eye movements, a function enacted by a large network of dedicated neural systems. Here, we will explore how the brain decides where to look. In the lecture, we will examine the critical nature of eye movements through understanding the physiological constraints of the visual system and how the information they select is organized in the natural world. We will then discuss how measuring eye movements provides a window into the brain’s moment-by-moment information processing algorithms, access in many ways unique to eye tracking methods. In the tutorial, we will use a state-of-the-art mobile eye tracker in a basic face recognition task to test a fundamental assumption of laboratory experiments: that what we measure in artificial, tightly-controlled paradigms reflects what the brain actually does in the real world, which is presumably what the brain’s organization has been optimized for.