Sparse Networks with Overlapping Communities (SNetOC) package: demo_usairport

This Matlab script performs posterior inference on a network of airports to find latent overlapping communities, using a Bayesian nonparametric approach.

For downloading the package and information on installation, visit the SNetOC webpage.

Reference:

Authors:

Tested on Matlab R2016a. Requires the Statistics toolbox.

Last Modified: 10/10/2017

Contents

General settings

clear
close all
tstart = clock; % Starting time

istest = true; % enable testing mode: quick run with smaller nb of iterations

In test mode, a smaller number of iterations is run. Although the sampler clearly has not converged yet, the method already recovers well interpretable latent communities. To reproduce the results of the paper, set this value to false.

root = '.';
if istest
    outpath = fullfile(root, 'results', 'demo_usairport', 'test');
else
    outpath = fullfile(root, 'results', 'demo_usairport', date);
end

if ~isdir(outpath)
    mkdir(outpath);
end


% Add path
addpath ./GGP/ ./CGGP/ ./utils/

% Default fontsize
set(0, 'DefaultAxesFontSize', 14)

% Set the seed
rng default

Load Network of airports with a connection to a US airport

load ./data/usairport/usairport

titlenetwork = 'US airport network in 2010';
name = 'usairport';
labels = {'Airports','Airports'};

G = G | G'; % make undirected graph
G = logical(G-diag(diag(G))); % remove self edges (#164)

meta.degree = num2cell(full(sum(G,2)));
fn = fieldnames(meta);

% Plot adjacency matrix
figure
spy(G);
xlabel(labels{2})
ylabel(labels{1})

Posterior Inference using Markov chain Monte Carlo and point estimation

Users needs to start the parallel pool by using the command parpool to run multiple chains in parallel.

% Define the parameters of the prior
p = 4; % Number of commmunities
objprior =  graphmodel('CGGP', p); % CGGP graph model with p communities

% Define parameters of the MCMC sampler
nchains = 3;
if istest
    niterinit = 1000;
    niter = 20000;
    nsamples = 100;
else
    niterinit = 10000;
    niter = 200000;
    nsamples = 500;
end
nburn = floor(niter/2);
thin = ceil((niter-nburn)/nsamples);
verbose = true;

% Create the graphMCMC object
objmcmc = graphmcmc(objprior, niter, nburn, thin, nchains);

% Run initialisation
init = graphinit(objmcmc, G, niterinit);
-----------------------------------
Start initialisation of the MCMC algorithm for CGGP
-----------------------------------
End initialisation
-----------------------------------
% Run MCMC sampler
objmcmc = graphmcmcsamples(objmcmc, G, verbose, init);
-----------------------------------
Start MCMC for CGGP graphs
Nb of nodes: 1574 - Nb of edges: 17215 (0 missing)
Nb of chains: 3 - Nb of iterations: 20000
Nb of parallel workers: 1
Estimated computation time: 0 hour(s) 6 minute(s)
Estimated end of computation: 08-Nov-2017 18:16:42 
-----------------------------------
 Markov chain 1/3 
-----------------------------------
i=2000 alp=110.74 sig=0.214 tau=0.99 a=0.31 0.36 0.17 0.32 b=0.47 0.30 0.03 0.42 w*=0.41 0.66 1.36 0.39 b2=0.47 0.29 0.03 0.41 alp2=110.38 rhmc=0.65 rhyp=0.24 eps=0.015 rwsd=0.027
i=4000 alp=112.25 sig=0.191 tau=1.28 a=0.26 0.29 0.14 0.28 b=0.24 0.16 0.01 0.25 w*=0.39 0.64 1.49 0.43 b2=0.31 0.20 0.02 0.32 alp2=117.72 rhmc=0.56 rhyp=0.23 eps=0.044 rwsd=0.027
i=6000 alp=109.62 sig=0.196 tau=1.44 a=0.26 0.28 0.12 0.25 b=0.22 0.15 0.01 0.22 w*=0.44 0.63 1.27 0.41 b2=0.32 0.21 0.01 0.32 alp2=117.69 rhmc=0.71 rhyp=0.25 eps=0.011 rwsd=0.027
i=8000 alp=111.37 sig=0.184 tau=1.70 a=0.24 0.28 0.11 0.27 b=0.18 0.11 0.01 0.18 w*=0.35 0.62 1.34 0.37 b2=0.31 0.19 0.01 0.30 alp2=122.79 rhmc=0.92 rhyp=0.25 eps=0.011 rwsd=0.027
i=10000 alp=88.97 sig=0.218 tau=1.95 a=0.24 0.25 0.11 0.24 b=0.16 0.10 0.01 0.15 w*=0.39 0.72 1.46 0.47 b2=0.31 0.19 0.01 0.29 alp2=102.86 rhmc=0.92 rhyp=0.25 eps=0.011 rwsd=0.027
i=12000 alp=94.45 sig=0.208 tau=1.99 a=0.24 0.24 0.10 0.23 b=0.16 0.08 0.01 0.14 w*=0.38 0.72 1.39 0.40 b2=0.32 0.15 0.01 0.28 alp2=108.99 rhmc=0.92 rhyp=0.27 eps=0.011 rwsd=0.027
i=14000 alp=93.23 sig=0.212 tau=1.94 a=0.23 0.22 0.10 0.23 b=0.15 0.07 0.01 0.12 w*=0.39 0.70 1.45 0.41 b2=0.29 0.13 0.01 0.23 alp2=107.33 rhmc=0.92 rhyp=0.26 eps=0.011 rwsd=0.027
i=16000 alp=77.03 sig=0.235 tau=1.65 a=0.22 0.24 0.10 0.22 b=0.13 0.07 0.00 0.11 w*=0.39 0.71 1.52 0.40 b2=0.22 0.11 0.01 0.18 alp2=86.67 rhmc=0.92 rhyp=0.26 eps=0.011 rwsd=0.027
i=18000 alp=86.39 sig=0.218 tau=2.16 a=0.22 0.23 0.10 0.22 b=0.11 0.06 0.00 0.10 w*=0.40 0.66 1.52 0.42 b2=0.23 0.14 0.01 0.22 alp2=102.16 rhmc=0.92 rhyp=0.26 eps=0.011 rwsd=0.027
i=20000 alp=91.88 sig=0.216 tau=2.72 a=0.20 0.21 0.10 0.21 b=0.08 0.05 0.00 0.10 w*=0.48 0.66 1.33 0.37 b2=0.21 0.13 0.01 0.26 alp2=113.97 rhmc=0.92 rhyp=0.28 eps=0.011 rwsd=0.027
-----------------------------------
 Markov chain 2/3 
-----------------------------------
i=2000 alp=106.28 sig=0.220 tau=0.89 a=0.46 0.45 0.18 0.32 b=0.49 0.78 0.03 0.40 w*=0.66 0.40 1.65 0.44 b2=0.43 0.69 0.02 0.35 alp2=103.52 rhmc=0.67 rhyp=0.25 eps=0.016 rwsd=0.027
i=4000 alp=99.13 sig=0.217 tau=1.12 a=0.35 0.34 0.13 0.27 b=0.24 0.38 0.01 0.26 w*=0.64 0.35 1.77 0.36 b2=0.26 0.43 0.01 0.29 alp2=101.60 rhmc=0.52 rhyp=0.23 eps=0.035 rwsd=0.028
i=6000 alp=97.90 sig=0.219 tau=1.49 a=0.30 0.29 0.12 0.24 b=0.14 0.24 0.01 0.22 w*=0.64 0.45 1.60 0.40 b2=0.21 0.36 0.01 0.33 alp2=106.92 rhmc=0.81 rhyp=0.24 eps=0.011 rwsd=0.027
i=8000 alp=99.78 sig=0.214 tau=1.36 a=0.30 0.26 0.12 0.23 b=0.13 0.19 0.01 0.17 w*=0.71 0.39 1.46 0.40 b2=0.17 0.26 0.01 0.24 alp2=106.52 rhmc=0.93 rhyp=0.23 eps=0.011 rwsd=0.027
i=10000 alp=88.70 sig=0.218 tau=1.44 a=0.26 0.26 0.11 0.23 b=0.08 0.16 0.00 0.17 w*=0.65 0.41 1.81 0.37 b2=0.12 0.23 0.01 0.24 alp2=95.99 rhmc=0.93 rhyp=0.25 eps=0.011 rwsd=0.027
i=12000 alp=87.79 sig=0.223 tau=1.83 a=0.28 0.26 0.10 0.23 b=0.10 0.16 0.00 0.14 w*=0.62 0.45 1.69 0.40 b2=0.19 0.28 0.01 0.25 alp2=100.45 rhmc=0.93 rhyp=0.26 eps=0.011 rwsd=0.027
i=14000 alp=92.23 sig=0.206 tau=1.87 a=0.27 0.26 0.10 0.22 b=0.09 0.15 0.00 0.12 w*=0.60 0.41 1.53 0.39 b2=0.17 0.28 0.01 0.23 alp2=104.97 rhmc=0.93 rhyp=0.24 eps=0.011 rwsd=0.027
i=16000 alp=92.20 sig=0.217 tau=2.13 a=0.24 0.26 0.10 0.21 b=0.06 0.15 0.00 0.09 w*=0.78 0.43 1.45 0.46 b2=0.12 0.32 0.01 0.20 alp2=108.64 rhmc=0.93 rhyp=0.25 eps=0.011 rwsd=0.027
i=18000 alp=83.51 sig=0.226 tau=1.84 a=0.25 0.24 0.10 0.21 b=0.07 0.11 0.00 0.11 w*=0.74 0.45 1.64 0.44 b2=0.13 0.21 0.01 0.21 alp2=95.91 rhmc=0.93 rhyp=0.24 eps=0.011 rwsd=0.027
i=20000 alp=78.20 sig=0.226 tau=1.77 a=0.22 0.24 0.10 0.19 b=0.04 0.11 0.00 0.07 w*=0.72 0.41 1.30 0.40 b2=0.08 0.20 0.01 0.13 alp2=88.92 rhmc=0.93 rhyp=0.24 eps=0.011 rwsd=0.027
-----------------------------------
 Markov chain 3/3 
-----------------------------------
i=2000 alp=150.61 sig=0.161 tau=1.51 a=0.26 0.34 0.24 0.27 b=0.13 0.60 0.11 0.24 w*=0.98 0.56 0.67 0.51 b2=0.20 0.91 0.16 0.36 alp2=160.94 rhmc=0.65 rhyp=0.25 eps=0.016 rwsd=0.028
i=4000 alp=132.48 sig=0.172 tau=2.05 a=0.20 0.26 0.21 0.22 b=0.06 0.35 0.06 0.10 w*=1.28 0.49 0.65 0.47 b2=0.12 0.71 0.12 0.21 alp2=149.90 rhmc=0.60 rhyp=0.24 eps=0.038 rwsd=0.029
i=6000 alp=160.23 sig=0.144 tau=3.05 a=0.18 0.24 0.19 0.18 b=0.05 0.34 0.05 0.08 w*=1.20 0.52 0.55 0.55 b2=0.14 1.02 0.16 0.24 alp2=188.05 rhmc=0.70 rhyp=0.24 eps=0.014 rwsd=0.029
i=8000 alp=143.54 sig=0.153 tau=4.12 a=0.18 0.23 0.17 0.18 b=0.05 0.26 0.04 0.05 w*=1.12 0.55 0.61 0.49 b2=0.20 1.07 0.16 0.22 alp2=178.31 rhmc=0.88 rhyp=0.23 eps=0.014 rwsd=0.029
i=10000 alp=152.81 sig=0.153 tau=4.38 a=0.17 0.23 0.17 0.16 b=0.04 0.25 0.04 0.05 w*=1.32 0.52 0.62 0.64 b2=0.17 1.10 0.17 0.20 alp2=191.44 rhmc=0.89 rhyp=0.24 eps=0.014 rwsd=0.029
i=12000 alp=150.76 sig=0.146 tau=4.83 a=0.16 0.22 0.16 0.16 b=0.04 0.25 0.03 0.04 w*=1.02 0.53 0.69 0.62 b2=0.18 1.20 0.15 0.20 alp2=189.79 rhmc=0.88 rhyp=0.24 eps=0.014 rwsd=0.029
i=14000 alp=179.23 sig=0.117 tau=7.53 a=0.16 0.20 0.15 0.16 b=0.03 0.17 0.03 0.04 w*=1.24 0.56 0.65 0.48 b2=0.22 1.31 0.21 0.28 alp2=226.94 rhmc=0.88 rhyp=0.24 eps=0.014 rwsd=0.029
i=16000 alp=224.15 sig=0.083 tau=8.27 a=0.16 0.20 0.14 0.15 b=0.04 0.17 0.03 0.03 w*=1.02 0.45 0.51 0.56 b2=0.29 1.38 0.21 0.24 alp2=267.00 rhmc=0.88 rhyp=0.25 eps=0.014 rwsd=0.029
i=18000 alp=204.00 sig=0.092 tau=8.33 a=0.16 0.18 0.13 0.15 b=0.03 0.15 0.02 0.03 w*=1.08 0.60 0.59 0.52 b2=0.26 1.21 0.14 0.21 alp2=247.76 rhmc=0.88 rhyp=0.25 eps=0.014 rwsd=0.029
i=20000 alp=181.18 sig=0.116 tau=9.41 a=0.15 0.18 0.13 0.14 b=0.03 0.12 0.02 0.02 w*=0.97 0.52 0.59 0.63 b2=0.26 1.15 0.16 0.23 alp2=234.88 rhmc=0.88 rhyp=0.23 eps=0.014 rwsd=0.029
-----------------------------------
End MCMC
Computation time: 0 hour(s) 6 minute(s)
-----------------------------------
% Print summary in text file
print_summary(['summary_' num2str(p) 'f.txt'], titlenetwork, G, niter, nburn, nchains, thin, p, outpath, tstart)
% Point estimation of the model parameters
[estimates, C_st] = graphest(objmcmc);

% Save workspace
save(fullfile(outpath, ['workspace_' num2str(p) 'f.mat']))
-----------------------------------
Start parameters estimation for CGGP graphs: 300 samples
Estimated end of computation: 08-Nov-2017 18:17:36 (0.0 hours)
|---------------------------------|
|*********************************|
End parameters estimation for CGGP graphs
Computation time: 0.0 hours
-----------------------------------

Plots

prefix = sprintf('%s_%df_', name, p);
suffix = '';

% Plot cost
if ~isempty(C_st)
    plot_cost(C_st, outpath, prefix, suffix);
end
% Assign max feature for community detection
[~, nodefeat] = max(estimates.w, [],2);

% Identify each feature/community as Hub/East/West/Alaska
% (This step would normally require a human interpretation of the features)
code_airports = {'JFK', 'LAN', 'DEN', 'BET'};
for k=1:length(code_airports)
    [~, ind_features(k)] = max(estimates.w(strcmp(meta.code, code_airports{k}), :));
end
if length(unique(ind_features))~=4
    warning('Problem with the interpretation of features/communities');
    ind_features = 1:4;
    featnames = {'Feature 1', 'Feature 2', 'Feature 3', 'Feature 4'};
else
    featnames = {'Hub', 'East', 'West', 'Alaska'};
end
% Plot traces and histograms
variables = {'logalpha2', 'sigma', 'Fparam.a', 'Fparam.b2', 'mean_w_rem'};
namesvar = {'$\log \tilde\alpha$', '$\sigma$', '$a$', '$\tilde b$', '$\overline{w}_{\ast}$'};
plot_trace(objmcmc.samples, objmcmc.settings, variables, namesvar, [], outpath, prefix, suffix);
plot_hist(objmcmc.samples, variables, namesvar, [], ind_features, [], outpath, prefix, suffix);
% Plot the graph by sorting the nodes by max feature
plot_sortedgraph(G, nodefeat, nodefeat, ind_features, labels, outpath, prefix, suffix, {'png'});

if isfield(meta, 'groups')
    % Plots by groups right vs left
    plot_groups(estimates.w, meta.groups, meta.(groupfield), ind_features, label_groups, featnames, color_groups, outpath, prefix, suffix);
end
% Show the proportion in each features for a few nodes
% https://www.mapcustomizer.com/
names = {
    'New York, NY'
%     'Washington, DC'
    'Miami, FL'
%     'Detroit, MI'
%     'Knoxville, TN'
%     'Atlanta, GA'
%     'Louisville, KY'
%     'Indianapolis, IN'
    'Raleigh/Durham, NC'
    'Nashville, TN'
%     'Chicago, IL'
%     'Fayetteville, NC'
    'Lansing, MI'
    'Louisville, KY'
%     'Memphis, TN'
%     'Cleveland, OH'
    'Minneapolis, MN'
%     'Charleston/Dunbar, WV'
%     'Baltimore, ML'
%     'Tallahassee, FL'
%     'Portland, ME'
%     'Flint, MI'
%     'Champaign/Urbana, IL'
%     'Oklahoma City, OK'
%     'Des Moines, IA'
%     'Houston, TX'
%     'Dallas, TX'
    'Denver, CO'
%     'Fort Wayne, IN'
%     'Tyler, TX'
%     'Salt Lake City, UT'
%     'Phoenix, AZ'
    'Los Angeles, CA'
    'Seattle, WA'
%     'San Francisco, CA'
%     'Fairbanks, AK'
    'Anchorage, AK'
    'Bethel, AK'
};
ind = zeros(size(names,1),1);
for i=1:size(names,1)
    I = find(strcmp(meta.city, names{i}));
    [~, imax] = max([meta.degree{I}]);
    ind(i) = I(imax);
end
[~, ind2] = sort(meta.lon(ind), 'descend');
ind = ind(ind2);
names = names(ind2);
color = hsv(p);
plot_nodesfeatures(estimates.w, ind, ind_features, names, featnames, color, outpath, prefix, suffix);
% Show some of the nodes in each feature
fnames = {'degree', 'city'}; % meta fields displayed for features exploration
formats = {'#%d,', '%s.'}; %
print_features( fullfile(outpath, ['features_' num2str(p) 'f.txt']), ...
    estimates.w, ind_features, featnames, meta, fnames, formats)
print_features( fullfile(outpath, ['featuresnorm_' num2str(p) 'f.txt']), ...
    bsxfun(@rdivide, estimates.w, sum(estimates.w,2)),...
    ind_features, featnames, meta, fnames, formats)

fnames = {'city'}; % meta fields displayed for features exploration
formats = {'%s.'}; %
print_features( fullfile(outpath, ['features_' num2str(p) 'f_tex.txt']), ...
    estimates.w, ind_features, featnames, meta, fnames, formats)
print_features( fullfile(outpath, ['featuresnorm_' num2str(p) 'f_tex.txt']), ...
    bsxfun(@rdivide, estimates.w, sum(estimates.w,2)),...
    ind_features, featnames, meta, fnames, formats)
& %--- Hub ---
#291, New York, NY. 
#299, Washington, DC. 
#261, Miami, FL. 
#245, Boston, MA. 
#292, Los Angeles, CA. 
#273, Newark, NJ. 
#314, Atlanta, GA. 
#211, Fort Lauderdale, FL. 
#231, Orlando, FL. 
#244, Philadelphia, PA. 
& %--- East ---
#296, Chicago, IL. 
#252, Detroit, MI. 
#314, Atlanta, GA. 
#204, Nashville, TN. 
#184, Indianapolis, IN. 
#203, Cleveland, OH. 
#195, Pittsburgh, PA. 
#173, Louisville, KY. 
#299, Washington, DC. 
#189, Memphis, TN. 
& %--- West ---
#274, Denver, CO. 
#240, Las Vegas, NV. 
#219, Burbank, CA. 
#292, Los Angeles, CA. 
#183, Salt Lake City, UT. 
#214, San Francisco, CA. 
#190, Phoenix, AZ. 
#193, Seattle, WA. 
#254, Dallas/Fort Worth, TX. 
#267, Houston, TX. 
& %--- Alaska ---
#172, Anchorage, AK. 
#131, Fairbanks, AK. 
#83, Bethel, AK. 
#37, Iliamna, AK. 
#60, King Salmon, AK. 
#48, St. Mary's, AK. 
#49, McGrath, AK. 
#47, Dillingham, AK. 
#49, Galena, AK. 
#44, Kotzebue, AK. 
& %--- Hub ---
#3, Johnstown, PA. 
#1, Sarajevo, Yugoslavia. 
#2, Oneonta, NY. 
#1, Ahmedabad, India. 
#3, Pittsfield, MA. 
#37, Charlotte Amalie, Virgin Islands. 
#1, Palermo, Italy. 
#3, La Romana, Dominican Republic. 
#1, Casablanca, Morocco. 
#3, Barrancabermeja, Colombia. 
& %--- East ---
#70, Daytona Beach, FL. 
#22, Binghamton, NY. 
#75, Pensacola, FL. 
#46, Newport News/Williamsburg, VA. 
#30, London, Canada. 
#51, Kalamazoo, MI. 
#51, Fayetteville, NC. 
#50, Scranton/Wilkes-Barre, PA. 
#4, Cape Girardeau, MO. 
#3, Williamsport, PA. 
& %--- West ---
#9, Leon/Guanajuato, Mexico. 
#51, Kalispell, MT. 
#20, Durango, CO. 
#3, Santa Cruz/Huatulco, Mexico. 
#1, Philadelphia, PA. 
#136, Reno, NV. 
#3, Battle Creek, MI. 
#30, Medford, OR. 
#76, Lubbock, TX. 
#16, San Luis Obispo, CA. 
& %--- Alaska ---
#4, Cape Newenham, AK. 
#21, Napakiak, AK. 
#4, Majuro, Marshall Islands District (TTPI). 
#24, Alakanuk, AK. 
#8, Hoolehua, HI. 
#1, Dunbar, Australia. 
#17, Sheldon Point, AK. 
#27, Kalskag, AK. 
#2, Seward, AK. 
#23, Kwigillingok, AK. 
& %--- Hub ---
New York, NY. 
Washington, DC. 
Miami, FL. 
Boston, MA. 
Los Angeles, CA. 
Newark, NJ. 
Atlanta, GA. 
Fort Lauderdale, FL. 
Orlando, FL. 
Philadelphia, PA. 
& %--- East ---
Chicago, IL. 
Detroit, MI. 
Atlanta, GA. 
Nashville, TN. 
Indianapolis, IN. 
Cleveland, OH. 
Pittsburgh, PA. 
Louisville, KY. 
Washington, DC. 
Memphis, TN. 
& %--- West ---
Denver, CO. 
Las Vegas, NV. 
Burbank, CA. 
Los Angeles, CA. 
Salt Lake City, UT. 
San Francisco, CA. 
Phoenix, AZ. 
Seattle, WA. 
Dallas/Fort Worth, TX. 
Houston, TX. 
& %--- Alaska ---
Anchorage, AK. 
Fairbanks, AK. 
Bethel, AK. 
Iliamna, AK. 
King Salmon, AK. 
St. Mary's, AK. 
McGrath, AK. 
Dillingham, AK. 
Galena, AK. 
Kotzebue, AK. 
& %--- Hub ---
Johnstown, PA. 
Sarajevo, Yugoslavia. 
Oneonta, NY. 
Ahmedabad, India. 
Pittsfield, MA. 
Charlotte Amalie, Virgin Islands. 
Palermo, Italy. 
La Romana, Dominican Republic. 
Casablanca, Morocco. 
Barrancabermeja, Colombia. 
& %--- East ---
Daytona Beach, FL. 
Binghamton, NY. 
Pensacola, FL. 
Newport News/Williamsburg, VA. 
London, Canada. 
Kalamazoo, MI. 
Fayetteville, NC. 
Scranton/Wilkes-Barre, PA. 
Cape Girardeau, MO. 
Williamsport, PA. 
& %--- West ---
Leon/Guanajuato, Mexico. 
Kalispell, MT. 
Durango, CO. 
Santa Cruz/Huatulco, Mexico. 
Philadelphia, PA. 
Reno, NV. 
Battle Creek, MI. 
Medford, OR. 
Lubbock, TX. 
San Luis Obispo, CA. 
& %--- Alaska ---
Cape Newenham, AK. 
Napakiak, AK. 
Majuro, Marshall Islands District (TTPI). 
Alakanuk, AK. 
Hoolehua, HI. 
Dunbar, Australia. 
Sheldon Point, AK. 
Kalskag, AK. 
Seward, AK. 
Kwigillingok, AK. 
 % Plot posterior predictive of degrees
 plot_degreepostpred(G, objmcmc, nsamples, 1e-6, outpath, prefix, suffix);
-----------------------------------
Start degree posterior predictive estimation: 100 draws
Estimated end of computation: 08-Nov-2017 18:18:07 (0.0 hours)
|---------------------------------|
|*********************************|
End degree posterior predictive (0.0 hours)
-----------------------------------