View the Project on GitHub ashleychari/abcd_sex_pfn_replication
This project aims to replicate the work done by Shanmugan et al (2022), Sex Differences in the functional topography of association networks in youth, in the ABCD dataset.
You’ll need to change the save filename/paths in the script
Run the get_data_for_ridge.R script created by Arielle Keller using the data in the FilesForAdam
folder originally from /cbica/projects/abcdfnets/scripts/FilesForAdam/
Next, run the create_discovery_replication_set_siblings_removed.py script to create the final discovery and replication sets with the removal of siblings. These samples are used in subsequent steps.
Will need to change paths to discovery and replication datasets and the paths that you want to save the resulting files
Run the calc_group_average_mat.py script to get the group average matrix for the networks for all of the subjects in the discovery and replication sets combined
Run the create_hard_parcel.py script to get the hard parcellations from networks 3, 4, and 12 to make main figure 1
Run the create_soft_parcel.py script to get the soft parcellations from networks 3, 4, and 12 to make main figure 1
Run the get_subject_parcels.py script to get the hard and soft parcellations for 4 random subjects with 2 subjects being male and 2 being female to make main figure 1
NOTE: This step can be skipped if you change the script to use readMat
from the R.Matlab
package instead of base R’s read.csv
function. However, you will have to setup a new singularity container with the necessary packages
Create a singularity container or use the one I created stored in ashpfnsexdiffabcd/software/containers/
called sex_differences_replication_0.0.3.sif
Change the filepaths for the results folder and subject data in the abcd_unvariate_analysis.R script for the discovery data, which will run a GAM at each vertex to determine the effect of sex while controlling for age and motion. The script assumes that the subject data has siblings removed. Next, run the wrapper script run_univariate_analysis.sh via SGE using the following command:
qsub -l h_vmem=25G,s_vmem=25G /cbica/projects/ash_pfn_sex_diff_abcd/dropbox/run_univariate_analysis.sh
qsub -l h_vmem=25G,s_vmem=25G /cbica/projects/ash_pfn_sex_diff_abcd/dropbox/run_univariate_replication.sh
Once the gams models have been run, create the unthresholded absolute sum map in Figure 2D from the results using the write_effect_map_abs_sum.py script. You’ll need to change the arguments in the two function calls in the main execution block of the script.
Convert the file obtained from the previous step into a CIFTI format to be loaded into workbench to create the brain maps seen in Figure 2 by using the write_effect_map_to_cifti.py.
NOTE: The script uses a dscalar.nii file of the fslr format as a temporary header to make the new CIFTI file. It is only used to structure the file.
The script can be run like the following example command:
python3 write_effect_map_to_cifti.py '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/multivariate_analysis/svm_072324_run/abs_sum_weight_brain_mat_discovery_haufe_100_runs_072324.npy' '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/multivariate_analysis/svm_072324_run/svm_discovery_abs_sum_haufe_weights_072324.dscalar.nii'
Load the resulting dscalar.nii file from the previous step into workbench to get the visualization.
To get the significant veritces barplot, first run the make_barplot_mat.py script to get a matrix of the significant vertices for each network.
Use the create_barplot.R to create the barplot seen in Figure 2A (discovery set) and Figure 2B (replication set) of significant vertices for each network. I ran this in Rstudio.
To create the significant vertices for networks 3, 9, and 12 seen in Figure 2C, use the get_individual_network_mat.py script to get each individual network from the matrix in the previous step. Also used to make the 17 networks seen in Figure S1.
Run create_nonzero_matrix.py using the discovery sample csv and replication sample csv created in Part 1, step 2 to setup the non-zero features matrices for subsequent steps.
Next, run run_2fold_svm_resmulti_times_parallel_w_ROC.py by running the wrapper script svm_resmulti_times_parallel_w_ROC_slurm.sh. The wrapper script submits an array job of size 100 which will run the run_2fold
script 100 times (each run as a separate job) via slurm. The svm script runs 2 fold cross validation using the nonzero features matrix to predict sex using SVM. Run the following command to run the script:
sbatch svm_resmulti_times_parallel_w_ROC_slurm.sh
Create the weight matrix by taking the mean of each of the coefficients from all the fold instances and save this weight matrix using the svm_workbench_visualization_multi_times_matrix.py script.
Use the coefficient matrix created in the previous step in the haufe_transform_weights.py script to do a haufe transformation on the coefficients. Use the haufe_transform_weights.sh to run the script by running the following command:
qsub -l h_vmem=500G,s_vmem=500G haufe_transform_weights.sh
or if you want to run via slurm, use the haufe_transform_weights_slurm.sh script with the following command after running module load slurm
in your terminal window:
sbatch haufe_transform_weights_slurm.sh
Use the haufe transformed weight matrix from the previous step in the svm_stacked_barplot.R script to get the stacked barplot of feature importance for both Males and Females as seen in Figure 3C. I ran this in Rstudio
Use the haufe transformed weight matrix in the sum_abs_weights_matrix.py to get the absolute value sum of the weights and save this array.
Use write_effect_map_to_cifti_svm.py to convert the matrix in the previous step into a CIFTI file format for visualization/to get Figure 3B.
Use the dscalar.nii file in workbench to get the visualization of the weights.
In order to run the permutation tests, run the svm_permutation_parallel.sh which calls the svm_1000_permutation_scrambled_outcome_parallel.py script which scrambles the sex outcome variable and creates 1000 null models to compare to the accuracy of the 100 models. Use the following command to run the script:
sbatch svm_permutation_parallel.sh
To generate the permutation inset histogram as seen in Figure 3A, use the get_accuracies.py script to compile the accuracies from the 1000 runs followed by the svm_accuracy_histogram.R to make the histogram plots.
To generate the ROC curve as seen in Figure 3A, run the create_ROC_curve.py with the results from the 100 runs and use the plot_roc.R to make Figure 3A.
To get the support vector machine stats such as average accuracy, average sensitivity, and average specificity use the get_svm_metrics.py script.
The files generated from the steps below will be used in the enrichement analyses.
Download files from https://github.com/PennLINC/S-A_ArchetypalAxis to be used in the flsr_to_fsaverage5.sh
script
Get the FDR corrected absolute sum matrices by using write_effect_map_abs_sum_fdr.py to create the fdr absolute sum matrices that are used for the genetics portion of this project.
Use the fslr_to_fsaverage5.sh script to resample the gams uncorrected abs sum discovery data, gams uncorrected abs sum replication data, gams fdr abs sum discovery data, and gams fdr abs sum replication data from fslr to fsaverage5. Parameters include path to the map (dscalar.nii file) you’re trying to resample to fsaverage5 space and the name that you want to label the resulting files. The file that you’ll want to use in the enrichment analyses will end with _LH.fsaverage5.func.gii
.
NOTE: If you are using windows, best to install
Git Bash
and put ansh
at the start of the command to run the script.
Example command for mac users below:
./fslr_to_fsaverage5.sh "/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/univariate_analysis/univariate_analysis_results/uncorrected_abs_sum_matrix_discovery.dscalar.nii" gams_uncorrected_discovery
Download Download AHBAProcessed.zip
and AHBAData.zip
from https://figshare.com/articles/dataset/AHBAdata/6852911
AHBAData/data/genes/parcellations/lh.Schaefer1000_7net.annot
from figshare into R and use read_gene_annot_files.R to save the following files
save column 5 from
ct.table
aslh_Schaefer1000_7net_cttable.csv
saveL
aslh_Schaefer1000_7net_L.csv
ROIxGene_Schaefer1000_INT.mat
and create_probe_annotation.py
Save the filename as
GeneSymbol.csv
/cbica/projects/ash_pfn_sex_diff_abcd/dropbox/genetics_files/
which originally came from the abcdpfnsexdiff
project folder on cubicNOTE: If you choose not to use the files in the above folder, you might have to change the filepaths and the column names corresponding to the file when you load the file in.
Make sure to change the filepaths to use the files created above
Also make sure to change the filepaths for saving the intermediate files such as pvalues table and top 20 X chromosome genes
Make sure to change the filepaths to use the files created above
Also make sure to change the filepaths for saving the intermediate files such as pvalues table and top 20 tables for astro, ex5b, ex1, and oli
Make sure to change the filepaths to use the files created above and for the pvalue table
Add medial wall back in to all of the data (gams uncorrected discovery abs sum, gams uncorrected replication abs sum, svm discovery abs sum haufe transformed, svm replication abs sum haufe transformed) using add_medial_wall.R. You can find the medialwall.mask.leftcortex.csv
and medialwall.mask.rightcortex.csv
here https://github.com/PennLINC/S-A_ArchetypalAxis/tree/main/FSLRVertex. Save results into csv that will later be converted into gii files.
Use the convert_to_gifti.py script to convert the data in the previous step into gii files. The following command can be run in terminal (the last argument indicated whether the data is from PNC or not):
python3 convert_to_gifti.py "medial_wall_maps/gams_discovery_medial_wall_map.csv" "results_gifti/gams_abs_sum_discovery_map.gii" N
python3 spin_test.py '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/spin_tests/data/results_gifti/gams_abs_sum_discovery_map.gii' '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/spin_tests/data/results_gifti/gams_abs_sum_replication_map.gii' "fsLR" 'Gams discovery vs replication' '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/spin_tests/results'
python3 compile_spin_tests.py spin_test_results_072524
The following instructions apply to the PNC data only
Grab the PNC gams abs sum data from /cbica/projects/abcdpfnsexdiff/funcParcelSexDiff/inputData/spintest/
Convert the csv files of the gams abs sum LH and RH results into gii files using convert_to_gifti.py
python3 convert_to_gifti.py '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/spin_tests/data/PNC_data/GamSexAbssum_lh.csv' '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/spin_tests/data/PNC_data/gams_abs_sum_lh.gii' Y
Use the PNC_to_fslr.py to convert the PNC results to flsr space
Run spin_test.py using the following command:
python3 spin_test.py '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/spin_tests/data/PNC_data/Gam_abs_sum_fslr_test.gii' '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/spin_tests/data/ABCD_data/gams_abs_sum_discovery_uncorrected.gii' "fsLR" 'PNC gams discovery vs ABCD gams discovery fslr' '/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/spin_tests/results'
Rscript hex_plots.R "/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/multivariate_analysis/svm_072324_run/abs_sum_weight_brain_mat_replication_haufe_100_runs_072324.npy" "Replication SVM Weights" "/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/univariate_analysis/univariate_analysis_results/uncorrected_abs_sum_matrix_replication.csv" "Replication GAM Loadings" "/Users/ashfrana/Desktop/code/abcd_sex_pfn_replication/finalized_figs/hex_plots/svm_rep_vs_gams_rep_hex_plot.png"
NOTE: The figure_s8_barplot.R script does both svm and gams plotting, but requires users to comment out the code that is not being used (ie if plotting svm, comment out gams code). There are comments that indicate which code belongs to gams vs svm.
NOTE: Make sure that you have a folder called network_specific_matrices in your results folder to hold all of the results from the above script
Use submit_network_specific_models_jobs.py to submit 17 array jobs (one for each network) of 100 runs each by calling network_specific_models_slurm.sh which calls run_network_specific_svm.py. You’ll have to change the path to your results folder in both of the python scripts.
Once you have all of the results from step 2, run get_network_specific_accuracies.py to get the accuracies of the 100 runs for each of the 17 networks.
Lastly, use plot_accuracies.R to make a barplot of the accuracies for all of the 17 networks.
Run make_behavior_dfs.py to make the variation of dataframes that are subsampled from the original siblings removed sample and will be used in the puberty analyses. Use data from \cbica\projects\ash_pfn_sex_diff_abcd\dropbox\hormone_pds_data
and the discovery and replication samples created at the beginning of the project.
Run submit_scripts.py, which calls a corresponding shell script found in the puberty_submit_scripts
folder that calls either abcd_puberty_hormones.R, abcd_puberty_stage_timing_gams.R, and abcd_puberty_stage_timing_gams_sex_specific.R depending on which dataset is used.
NOTE: Make sure all of the results directories are created before running this script. (ie pds_male_female_no_age is a directory in the puberty_analyses_2/discovery directory); Also make sure to make your result directory and change the path in the gams R scripts
Run submit_scripts_oSex.py to run the analyses to look at the effect of sex on loadings controlled for by hormones and pds variables. This script calls abcd_puberty_hormones_oSex.R and abcd_puberty_stage_timing_gams_oSex.R depending on which dataset is being used. NOTE: Make sure all of the results directories are created before running this script. (ie pds_male_female_oSex_no_age is a directory in the puberty_analyses_2/discovery directory); Also make sure to make your result directory and change the path in the gams R scripts
To create the absolute sum brain maps, use write_effect_map_abs_sum.py to get the absolute sum matrices. Once you have these, use the cifti_wrapper.py script to convert these matrices to a dscalar.nii format which can then be plotted in workbench.
To create the barplots of the sex effect controlled for hormones/pds variables, use make_barplot_mat.py to make the matrix for the barplot script and then run create_barplot_wrapper.py, which calls create_barplots.R. Make sure you have a barplots
directory and matrices
directory created to store all the intermediate values and final results.
To get the individual network maps, use make_individual_mats.py to get the fdr z vector for each of the 17 networks for each of the analyses done. Afterwards, run to_cifti_wrapper.py that will convert the significant matrices to dscalar.nii format. To identify which networks are not all zero/are significant, use identify_sig_networks.py, which will create a sig_networks.txt
file which contains an array of the significant networks. Using this information, you can then plot the specified networks in workbench for each of the tests.
\cbica\projects\ash_pfn_sex_diff_abcd\dropbox\hormone_pds_data
. Use demographics.ipynb to create the demographic table for the study.
NOTE: make sure to run the cells in order and if you need to rerun a cell, rerun from the start.