Description: This short course introduces basic computational environments and tools to graduate students with limited prior experience. It will provide an introduction to UNIX systems, software compilation / installation, cluster job management as well as data formats, management, and visualization. A brief introduction to scripting programming languages will also be presented.
Course Goals: Students enrolled in the class will develop skills to accelerate their research in computational research environments. Topics will include an intensive introduction to (a) UNIX systems and software management, (b) data processing and simple programming, (c) data formats and visualization, and (d) software version and cluster control. This training will provide a computational foundation that will allow students to focus on the theoretical and biological aspects of their research.
Competencies: After completing this class, students are expected to be able to attain the following competencies:
-Navigate and organize UNIX files and folders
-Compile and install software in UNIX environments
-Understand basic programming data structures and processes
-Create simple scripts to manage and analyze data
-Utilize and apply popular file formats to modern large-scale data sets
-Apply proper visualization tools and strategies to view data
-Utilize software versioning technologies for documenting and organizing software
-Utilize high-throughput computing clusters for parallel data processing
This course is cross-listed with Biostat 606 = HG 606 = Bioinfo 606.
Prerequisites: R module in BIOSTAT 607 or equivalent.
Description: This course will cover techniques for computing with big data. The topics include programming, data processing, debugging, profiling and optimization, version control, software development, interfacing with databases, interfacing between programming languages, visualization, high performance and cloud computing. Hands-on experience will be emphasized in lectures, homework assignments and projects.
Course Goals: This course prepares the students with techniques for manipulating and processing big data by writing customized computer programs. It builds the foundation for the computing aspects of data science. After taking this class, students are expected to have a practical understanding of important computing issues for health big data analysis.
Competencies: (a) Apply basic informatics techniques with vital statistics and public health records in the description of public health characteristics and in public health research and evaluation. (b) Master computing software to perform biomedical and health data analyses. (c) Interpret results of statistical analyses found in public health studies.
Learning Objectives: (a) To master techniques for manipulating and processing big data by writing customized computer programs. (b) To understand the foundation for the computing aspects of data science. (c) To have a practical understanding of important computing issues for health big data analysis.
Description: This course teaches the statistical methods and principles necessary for understanding and interpreting data used in public health and policy evaluation and formation. Topics include descriptive statistics, graphical data summary, sampling, statistical comparison of groups, correlation, and regression. Students will learn via lecture, group discussions, critical reading of published research, and analysis of data.