# Dynamics and Performance of an IT Call Center

Contact Wen Dong or Todd Reid for technical support

# Introduction

The data contain the performance, behavior, and interpersonal interactions of participating employees at a Chicago-area data server configuration firm for one month. It is the first data set that contains the performance and dynamics of a real-world organization with a temporal resolution of a few seconds. Performance data include the assigning time, closing time, difficulty level, assigned-to, closed-by, and number of follow-ups of each task completed during that one-month period. Behavior data include the locations of the employees estimated from Zigbee RSSI recorded by the badges worn by each employee, representing to whom and to which key locations (printer, warehouse, and so on) he went. Behavior data also include the recordings of a 3-axis accelerometer on the badge, from which we estimate the postures and activities of its wearer. Interaction data include IR scanning by each badge of the badges worn by other employees, indicating that the latter are within a 1-meter distance and 30-degree cone in front of the badge, most likely indicating face-to-face communication. The badges also record audio intensity from an on-badge microphone, from which we estimate verbal behavior and verbal interactions. All sensor data are time-stamped.

There were 28 employees at the firm, of which 23 participated in the study. Nineteen-hundred hours of data were collected, with a median of 80 hours per employee. The resulting data document the performance of computer system configuration tasks assigned to employees on a first-come, first-served basis. These configurations were rated to one of three levels of difficulty (basic, complex, or advanced) based on the configuration characteristics. At the conclusion of the task, the employee submitted the completed configuration as well as the price back to the salesman, after which the employee moved to the back of the queue for task assignment.

The layout of the workspace is shown in the following figure. The base stations on yellow squares were placed at fixed positions throughout the workspace in order to locate the badges and time-stamp the data collected by them. Participating employees are indicated at their cubicles by their badge IDs; different colors behind the IDs represent different departmental branches at the firm. Non-participating employees have letter “N” at their cubicles. Employees fetched their badges from the room containing base station 1 (located at the lower left corner) at approximately 9am each weekday morning, and returned the badges to this room at around 6pm in the evening. The RSSI regions were manually assigned to identify different regions in the workspace, and do not correspond to any particular sensors deployed in this experiment.

The employees indicated that their configuration tasks were information-intensive, and therefore required them to talk to one another to fully understand the various specifications. As such, we would expect a positive correlation between the rate of problem-solving by an employee and the number of places visited by that employee. Further, from who visited whose work cubicle we can determine interpersonal information flow and expertise in problem-solving.

suppressMessages(require(grImport))
PostScriptTrace("chicagomap1page1.eps")


# Data Description

The data set contains the following tables, and each table contains the following fields:

• Transactions.csv tasks assigned to whom (assigned.to) and when (assign.date), closed by whom (closed.by) and when (close.date), complexity (basic, advanced and complex), how many follow-ups (n.follow.ups) and what errors employees made until task-closing, what roles these tasks required (pricing, configuration).

• BadgeAssignment.csv how badges (identified by unique BIDs) were assigned either to track the behavior of the employees or to serve as anchor nodes, locations (x, y) of employees' cubicles and anchor nodes, and roles of employees (pricing, configuration, and coordination). The anchor nodes served to stamp employees' behavioral data with times and indoor-locations (c.f. floor plan).

• Zigbee.csv proximity ($$\le$$ 10 meters) of employees to one another and to anchor nodes. Zigbee messeges were sent by a badge that was either an anchor node at a fixed location or worn by an employee to track him (sender.id) at a rate of 1 message per sender per 10 seconds. Messages were received by a badge worn by another employee (local.id), inside the range of Zigbee signal from the sender at a specific time (date.time), with a received signal strength indicator (RSSI) indicating how far the sender badge is to the receiver badge.

• IR.csv observations when an employee is face-to-face with another employee or an anchor node and is less than 3 meters to the latter. IR messeges were sent by a badge that was either an anchor node at a fixed location or worn by an employee to track him (sender.id) at a rate of 1 message per sender per 10 seconds. Messages were received by a badge worn by another employee (local.id), inside the range of IR signal from the sender, oriented towards sender badge, and at a specific time (date.time)

• LocationTrackingEvery1Minute.csv 10 locations with longest stays (x,y) per employee (identified by the id of the badge assigned to him) per minute (time), estimated from Zigbee RSSI to anchor nodes. The indoor locationing algorithm is described in [8].

badge.assignment = read.csv("BadgeAssignment.csv")
trans$assign.date = as.POSIXct(trans$assign.date, tz = "America/Chicago")
trans$close.date = as.POSIXct(trans$close.date, tz = "America/Chicago")
zz = bzfile("LocationTrackingEvery1Minute.csv.bz2", open = "rt")
close(zz)
hdc.xy$time = as.POSIXct(hdc.xy$time, tz = "America/Chicago")
zz = bzfile("IR.csv.bz2", open = "rt")
close(zz)
IR.aggr$date.time = as.POSIXct(IR.aggr$date.time, tz = "America/Chicago")
zz = bzfile("Zigbee.csv.bz2", open = "rt")
close(zz)
net.aggr$date.time = as.POSIXct(net.aggr$date.time, tz = "America/Chicago")


The following figure shows how often two employees were located within the distance of one cubicle (that is, co-located) – rows and columns are indexed by employees. The brightness of a table cell is indexed by row and column representing the amount of time employee and employee were co-located; the whiter the color, the more total time they were co-located. The dendrograms to the left and top of the heat map represent how employees were grouped according to their co-location relationship. A leaf of the dendrogram corresponds to the same employee that indexes a row and a column of the heat map, while the colors on the leaves of the dendrogram represent different branches in the firm – red is configuration branch, green coordination branch, and purple pricing branch. The numbers at the right and bottom sides of the heat map show the IDs of the employee tracking badges. We constructed the dendrogram by expressing the amounts of time that an employee was co-located with other employees as an observation vector of real numbers regarding this employee, defining the distance between two employees and to be where is the correlation coefficient between 's times of co-location with other employees and 's times of correlations with other employees. We use Ward's minimum variance method in hierarchical clustering to find compact, spherical clusters in constructing the dendrogram.

badge.prox = with(net.aggr[net.aggr$sender.id %in% unique(net.aggr$local.id),
], table(sender.id, local.id))
badge.prox = sweep(badge.prox, 2, tapply(trunc(as.numeric(net.aggr$date.time)/3600), net.aggr$local.id, function(x) length(unique(x))), "/")
scale = "none", RowSideColors = c("yellow", "red", "green", "purple", "gray")[badge.assignment$role[match(rownames(badge.prox), badge.assignment$BID)]], ColSideColors = c("yellow", "red", "green",
"purple", "gray")[badge.assignment$role[match(rownames(badge.prox), badge.assignment$BID)]])


According to the theory of structure holes [2], more often people talk to those with the same expertise/ roles, and the less often interactions among people with different expertise/ roles can be more important when they happen. This is confirmed by how often people engaged in face-to-face communications in the call center, as indicated by the IR messages logged by the employees' badges (c.f. figure below), and how visiting another employees' cubicles could contribute to higher productivity per unit time, to be discussed later. Employees are more likely to have face-to-face discussions when their cubicles are closer, and this indicats a way of engineering the communication structures within the call center by adjusting the cubicles.

IR.aggr2 = IR.aggr[IR.aggr$sender.id %in% unique(hdc.xy$id) & IR.aggr$local.id %in% unique(hdc.xy$id), ]
ir.prox = table(unique(IR.aggr2)[, c("sender.id", "local.id")])
ir.prox = ir.prox[rownames(ir.prox) %in% colnames(ir.prox), colnames(ir.prox) %in%
rownames(ir.prox)]
ir.prox.hclust = hclust(as.dist(sqrt(1 - cor(asinh(ir.prox)))), method = "ward")
heatmap(asinh(ir.prox * 10), Rowv = as.dendrogram(ir.prox.hclust), Colv = as.dendrogram(ir.prox.hclust),
scale = "none", RowSideColors = c("yellow", "red", "green", "purple", "gray")[badge.assignment$role[match(rownames(ir.prox), badge.assignment$BID)]], ColSideColors = c("yellow", "red", "green",
"purple", "gray")[badge.assignment$role[match(rownames(ir.prox), badge.assignment$BID)]])


The following figure shows the positive correlation between the number of tasks assigned and where an employee went while working on a task. The employee with the highest number of assignments (badge ID 293) received 132 tasks during one month. His entropy of going to different places to finish these assignments was 5.75, and he typically went to exp(5.75)=315 grid points in the workspace (out of 502 in total), or 19 cubicles of the 28 non-empty cubicles. The employee with the least number of assignments received only one task. His entropy was 4.19, and he typically went to exp(4.19)=66 grid points, or 6 cubicles.

The following figure also shows that employees in the pricing branch and in the configuration branch received and finished assignments very differently. In terms of overall tasks assigned, a pricing employee received an average of nine times as many assignments when they were basic, and three times as many when they were complex, as a configuration employee was assigned. Pricing employees also finished these assignments in parallel, and went to many people to solve these assignments. Configuration employees, on the other hand, solved advanced assignments exclusively, worked serially, and went to fewer people to solve their assignments.

Interpreting the log linear relationship between rate of completion and entropy in terms of survival analysis, we write time of completion = $$\exp(-\sum_{(\tilde{x}_{m},\tilde{y}_{n})}p(\tilde{x}_{m},\tilde{y}_{n})\log p(\tilde{x}_{m},\tilde{y}_{n}))$$, where $$(\tilde{x}_{m},\tilde{y}_{n})$$ is the set of location grids onto which we map RSSI, $$p(\tilde{x}_{m},\tilde{y}_{n})$$ is the probability that the grid was visited, the exponent is the entropy of the employee's location-visiting behavior when he had a task, and the visit to every location $$(\tilde{x}_{m},\tilde{y}_{n})$$ makes task completion $$\exp(-\sum_{(\tilde{x}_{m},\tilde{y}_{n})}p(\tilde{x}_{m},\tilde{y}_{n})\log p(\tilde{x}_{m},\tilde{y}_{n}))$$ times faster. The “survival” time of a task is an exponential function of the rate of task completion, which in turn is the sum of the contributions from all locations that this employee visited weighted by the frequencies with which this employee visited them. The contribution of a specific location per visit $$-\log p(\tilde{x}_{m},\tilde{y}_{n})$$ is more critical when the location is less visited; however, over all visits, the more-frequently-visited locations contributed more to task completion than the less-visited locations, because $$p\log p$$ decreases to 0 when $$p$$ decreases to 0.

hdc.entropy = sapply(split(hdc.xy, hdc.xy$id), function(x) { p = table(paste(x$x, x$y)) p = p/sum(p) sum(p * log(p)) }) hdc.accomplishment = c(table(as.character(trans$assigned.to)))
hdc.accomplishment = hdc.accomplishment[intersect(names(hdc.accomplishment),
names(hdc.entropy))]
hdc.entropy = hdc.entropy[intersect(names(hdc.accomplishment), names(hdc.entropy))]
plot(-hdc.entropy, hdc.accomplishment, xlab = "entropy", ylab = "# of tasks assigned to")
suppressMessages(require(maptools))
pointLabel(-hdc.entropy, hdc.accomplishment, names(hdc.entropy), col = sapply(as.character(badge.assignment$role[match(names(hdc.entropy), badge.assignment$BID)]), function(x) switch(x, Pricing = "purple", Base station = "orange",
Coordinator = "green", Configuration = "red", RSSI = "gray")))
legend("topleft", text.col = c("red", "purple"), legend = c("configuration",
"pricing"))


We show with a quantile-quantile plot (c.f. figure below) that the distance of two persons was closer within 1 minute of a face-to-face discussion, as compared to the distance within 1 hour of the face-to-face discussion, as a sanity testing of the time stamps estimated from “jiffy'' counts of the badges, and the indoor-locations estimated from Zigbee RSSI from employees' badges to anchor nodes: We randomly take 200 records of IR proximity from the data set, randomly take 10 locations within 1 minute of the IR proximity from the sender badge and 10 locations from the receiver badge for each record, sort the 20 thousand pairwise distances (200 records $$\times10\times10$$ pairwise distances per record), and plot them against another 20 thousand sorted distances within 1 hour of IR proximity. We find that with 90% probability two persons were within the distance of 1 cubicle in the 1 minute window of their face-to-face discussion, as compared to 70% probability in the 1 hour window. We would not find this structure if either the estimated time stamps had an error bigger than 1 minute or the estimated indoor locations had an error bigger than the distance of 1 cubicle. We can similarly check that two persons were closer to each other at the time of IR-proximity than Zigbee-proximity, and two persons had more IR-proximity records and Zigbee-proximity records when their cubicles were closer.

IR.aggr2 = IR.aggr[IR.aggr$sender.id %in% unique(hdc.xy$id) & IR.aggr$local.id %in% unique(hdc.xy$id), ]
IR.aggr2$ndx.local = match(paste(IR.aggr2$local.id, strftime(IR.aggr2$date.time, "%Y-%m-%d %H:%M:00")), paste(paste(hdc.xy$id, strftime(hdc.xy$time, "%Y-%m-%d %H:%M:00")))) IR.aggr2$ndx.sender = match(paste(IR.aggr2$sender.id, strftime(IR.aggr2$date.time,
"%Y-%m-%d %H:%M:00")), paste(paste(hdc.xy$id, strftime(hdc.xy$time, "%Y-%m-%d %H:%M:00"))))
IR.dist = unlist(lapply(sample(which(!is.na(IR.aggr2$ndx.local) & !is.na(IR.aggr2$ndx.sender)),
200), function(n) {
a = hdc.xy[IR.aggr2$ndx.local[n] + 0:9, c("x", "y")] b = hdc.xy[IR.aggr2$ndx.sender[n] + 0:9, c("x", "y")]
round((outer(a$x[1:10], b$x[1:10], function(u, v) (u - v))^2 + outer(a$y[1:10], b$y[1:10], function(u, v) (u - v))^2)^0.5)
}))
w = as.numeric(hdc.xy$time) IR.dist2 = unlist(lapply(sample(which(!is.na(IR.aggr2$ndx.local) & !is.na(IR.aggr2$ndx.sender)), 200), function(n) { ndx = which(IR.aggr2$local.id[n] == hdc.xy$id) ndx.local = ndx[abs(w[ndx] - as.numeric(IR.aggr2$date.time[n])) < 60 * 60]
ndx = which(IR.aggr2$sender.id[n] == hdc.xy$id)
ndx.sender = ndx[abs(w[ndx] - as.numeric(IR.aggr2$date.time[n])) < 60 * 60] a = hdc.xy[sample(ndx.local, 10, replace = TRUE), c("x", "y")] b = hdc.xy[sample(ndx.sender, 10, replace = TRUE), c("x", "y")] round((outer(a$x[1:10], b$x[1:10], function(u, v) (u - v))^2 + outer(a$y[1:10],
b\$y[1:10], function(u, v) (u - v))^2)^0.5)
}))
qqplot(IR.dist2, IR.dist, pch = ".", xlab = "distance distribution less than 1 hour from IR proximity",
ylab = "distance distribution less than 1 minute from IR proximity ", main = "Q-Q plot")
abline(coef = c(0, 1), col = "red")
pointLabel(quantile(IR.dist2, 1:9/10), quantile(IR.dist, 1:9/10), paste("",
1:9, sep = "."), col = "red")


# More Data

We repackaged the raw sensor data for investigators to inspect the call-center dynamics from more perspectives. The time stamps of the raw sensor data (directly from the badge hardware) were badge CPU clock counts, and started from 0 each time the badges were powered on to collecting data.

• ChunkOffset.csv the times (offset YYYY-mm-dd HH:MM:SS in the local time of the call center) when the badges were powered on to collect a chunk of data. Hence the time stamp of each sensor record in a chunk is chunk offset + badge CPU clock / 374400 Hz.

• Accelerometer.bz2 readings from 3-axis accelerometers (x,y,z) on the badges (local.id), timestamped with badge CPU clocks (local.time), tagged with chunk identifier to recover the global time stamps, and sampled at 100Hz. The accelerometer data enable us to estimate the activities of the employees (walking, standing, sitting) and their interactions.

• AudioFeatures.bz2 audio features as described in [16].

• IR-raw.csv same as IR.csv, containing no data.time field (YYYY-mm-dd HH:MM:SS in the local time of the call center), but containing badge CPU clock (local.time) and tagged with chunk identifier.

• Zigbee-raw.csv same as Zigbee.csv, containing no data.time field (YYYY-mm-dd HH:MM:SS in the local time of the call center), but containing badge CPU clock of the sender badge (sender.time) and the receiver badge (receiver.time), as well as the chunk identifier of the reeiver badge. The time of the call center is chunk offset + local.time/374400.

We estimate the time of the call center (YYYY-mm-dd HH:MM:SS) corresponding to badge power-on based on the following two facts: (1) The anchor nodes were never rebooted. Hence the CPU clocks of each anchor nodes were non-decreasing over time, and sender.time in Zigbee-raw.csv is non-decreasing in each chunk and consistent across different chunks if sender.id is the ID of an anchor node. (2) The time when the data from the badge hardware were downloaded to a computer should be later than the times of the sensor records.

For example, suppose the CPU clock range of one badge is from 0 to 3600* 374400 (CPU clock rate), corresponding to CPU clock range from 3600*374400 to 3600*2*374400 of anchor node A in Zigbee records, corresponding to CPU clock range from 1800*374400 + 3600*374400 to 1800*374400 + 3600*2*374400 of anchor node B, and the data on the badge were dumped at the noon of 2007/03/30. We can infer that the anchor node B started slightly earler than 10:30am on 2007/03/30, and half an hour earlier than anchor node A. After we average over all chunks of sensor data and all anchor nodes, we estimate that our mapping from CPU clock to the time of the call center should have less than 1 second error.

Indoor localization from Zigbee RSSI is based on the fact that employees were at their cubicles more than at elsewhere, and is based on comparing RSSI to anchor nodes per minute per employee to the signature RSSIs to anchor nodes when employees were in their cubicles [8].

# Literature Review

Researchers have been using multi-agent models to simulate organizational dynamics and organizational performance based on simple generative rules [1, 3, 17] since long before the availability of sensors to accurately track the whole population in an organization. In particular, Carley proposed that organizational dynamics center around three components (tasks, resources, and individuals) and five relationships (temporal ordering of tasks, resource prerequisite of tasks, assignment of personnel to tasks, interpersonal relationships, and accessibility of resources to individuals). Previous successes suggest the strong potential to verify these generative rules with sensor data – fitting multi-agent models to real-world sensor data that track organization dynamics, and even providing real-time interventions to organizations by combining multi-agent models and sensor data.

A key psychological hypothesis behind organizational theory is transactive memory : an organization coping with complex tasks often needs a knowledge repertoire far beyond the memory capacity and reliability of any individual in this organization. Individuals collaborate to store this total repertoire by identifying the expertise of one another and distributing the repertoire among themselves. In the end, each individual has a subset of the repertoire, and an index of who knows what and how credible that source is. The longer group members work with one another, the more they understand this distribution of expertise and weakness, so the more precise their communications become and the more productively they retrieve information and complete tasks.

The face-to-face interaction network is important in understanding how individuals completed tasks in the data set’s server configuration firm, because information flow and task solutions result from this face-to-face network. Location tracking is critical for pinpointing the direction of information flow. If A visits B, this means that information flows from B to A; if many people visited A, this is very different from A visiting many people.

We can use the following generative multi-agent process to model the dynamics and performance of the IT firm that is compatible with the organizational dynamics theory. An individual iterates among four states during his work: working on his assignment by himself, asking for help from another individual, giving help to another individual, or idling. This individual enters and exits different states with different probabilities, proportional to the rates of different events: how often tasks come, how he and his counterparts make choices, and how effective these choices are towards assignment closing. Hence the number of tasks closed by an individual is inverse proportional to the average "survival time” of a task (the time for this individual to finish a task), and the average survival time of a task is an exponential function of the negative rate with which this individual finishes tasks in his different states [12]. Going to the cubicle of an individual with the right piece of knowledge will increase the productivity by a certain factor, dependent on how often this right piece of knowledge is needed and how effective is the communication.

Findings and literature review on this data set include [8, 19]. The background on tracking organizational dynamics with badge hardware can be found at [6, 18, 15, 11]. Models about organizationa dynamics are described in, for example, [4, 5, 9, 14].

# References

1. Robert Axelrod. The Complexity of Cooperation: Agent-Based Models of Competition and Collaboration. Princeton University Press, 1997.
2. Ronald S. Burt. Structural holes : the social structure of competition. Harvard University Press, Cambridge, Mass., 1st harvard university press pbk. ed. edition, 1995.
3. Kathleen M Carley. Computational organizational science and organizational engineering. Simulation Modeling Practice and Theory, 10(5-7):253-269, 2002.
4. Claudio Castellano, Santo Fortunato, and Vittorio Loreto. Statistical physics of social dynamics. Reviews of Modern Physics, 81(2):591-646, 2009.
5. Christophe P. Chamley. Rational Herds: Economic Models of Social Learning. Cambridge University Press, 2003.
6. Wen Dong. Modeling the Structure of Collective Intelligence. PhD thesis, MIT, 2010.
7. Wen Dong, Bruno Lepri, and Alex Pentland. Modeling the co-evolution of behaviors and social relationships using mobile phone data. In MUM, pages 134-143, 2011.
8. Wen Dong, Daniel Olgun Olgun, Benjamin N. Waber, Taemie Kim, and Alex Pentland. Mapping organizational dynamics with body sensor networks. In Guang-Zhong Yang, Eric M. Yeatman, and Chris McLeod, editors, BSN, pages 130-135. IEEE, 2012.
9. Joshua M. M. Epstein. Generative Social Science: Studies in Agent-Based Computational Modeling (Princeton Studies in Complexity). Princeton University Press, 2007.
10. Jr Joe H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236-244, 1963.
11. Taemie Kim. Enhancing distributed collaboration using sociometric feedback. PhD thesis, MIT, 2011.
12. Jerald F. Lawless. Statistical models and methods for lifetime data (2nd ed.). John Wiley and Sons, 2003.
13. Leah Lievrouw and Sonia Livingstone, editors. Smart agents and organizations of the future, chapter 12, pages 206-220. Handbook of New Media. Sage Publications, Inc., 2002.
14. Gilbert Nigel. A generic model of collectivities. In ABModSim 2006, International Symposium on Agent Based Modeling and Simulation. University of Vienna: European Meeting on Cybernetics Science and Systems Research, 2006.
15. Daniel Olgun Olgun. Sensor-based organizational design and engineering. PhD thesis, MIT, 2011.
16. Alex Pentland. Honest Signals. MIT press, 2008.
17. Ron Sun. Cognition and multi-agent interaction: from cognitive modeling to social simulation. Cambridge University Press, 2006.
18. Ben Waber. Understanding the link between changes in social support and changes in outcomes with the sociometric badge. PhD thesis, MIT, 2011.
19. Lynn Wu, Benjamin N. Waber, Sinan Aral, Erik Brynjolfsson, and Alex Pentland. Mining face-to-face interaction networks using sociometric badges: Predicting productivity in an it con guration task. In ICIS, page 127, 2008.