Dynamics and Performance of an IT Call Center

Contact Wen Dong or Todd Reid for technical support

Introduction

The data contain the performance, behavior, and interpersonal interactions of participating employees at a Chicago-area data server configuration firm for one month. It is the first data set that contains the performance and dynamics of a real-world organization with a temporal resolution of a few seconds. Performance data include the assigning time, closing time, difficulty level, assigned-to, closed-by, and number of follow-ups of each task completed during that one-month period. Behavior data include the locations of the employees estimated from Zigbee RSSI recorded by the badges worn by each employee, representing to whom and to which key locations (printer, warehouse, and so on) he went. Behavior data also include the recordings of a 3-axis accelerometer on the badge, from which we estimate the postures and activities of its wearer. Interaction data include IR scanning by each badge of the badges worn by other employees, indicating that the latter are within a 1-meter distance and 30-degree cone in front of the badge, most likely indicating face-to-face communication. The badges also record audio intensity from an on-badge microphone, from which we estimate verbal behavior and verbal interactions. All sensor data are time-stamped.

There were 28 employees at the firm, of which 23 participated in the study. Nineteen-hundred hours of data were collected, with a median of 80 hours per employee. The resulting data document the performance of computer system configuration tasks assigned to employees on a first-come, first-served basis. These configurations were rated to one of three levels of difficulty (basic, complex, or advanced) based on the configuration characteristics. At the conclusion of the task, the employee submitted the completed configuration as well as the price back to the salesman, after which the employee moved to the back of the queue for task assignment.

The layout of the workspace is shown in the following figure. The base stations on yellow squares were placed at fixed positions throughout the workspace in order to locate the badges and time-stamp the data collected by them. Participating employees are indicated at their cubicles by their badge IDs; different colors behind the IDs represent different departmental branches at the firm. Non-participating employees have letter “N” at their cubicles. Employees fetched their badges from the room containing base station 1 (located at the lower left corner) at approximately 9am each weekday morning, and returned the badges to this room at around 6pm in the evening. The RSSI regions were manually assigned to identify different regions in the workspace, and do not correspond to any particular sensors deployed in this experiment.

The employees indicated that their configuration tasks were information-intensive, and therefore required them to talk to one another to fully understand the various specifications. As such, we would expect a positive correlation between the rate of problem-solving by an employee and the number of places visited by that employee. Further, from who visited whose work cubicle we can determine interpersonal information flow and expertise in problem-solving.

suppressMessages(require(grImport))
PostScriptTrace("chicagomap1page1.eps")
grid.picture(readPicture("chicagomap1page1.eps.xml"))

plot of chunk layout

Data Description

The data set contains the following tables, and each table contains the following fields:

badge.assignment = read.csv("BadgeAssignment.csv")
trans = read.csv("Transactions.csv")
trans$assign.date = as.POSIXct(trans$assign.date, tz = "America/Chicago")
trans$close.date = as.POSIXct(trans$close.date, tz = "America/Chicago")
zz = bzfile("LocationTrackingEvery1Minute.csv.bz2", open = "rt")
hdc.xy = read.csv(zz)
close(zz)
hdc.xy$time = as.POSIXct(hdc.xy$time, tz = "America/Chicago")
zz = bzfile("IR.csv.bz2", open = "rt")
IR.aggr = read.csv(zz)
close(zz)
IR.aggr$date.time = as.POSIXct(IR.aggr$date.time, tz = "America/Chicago")
zz = bzfile("Zigbee.csv.bz2", open = "rt")
net.aggr = read.csv(zz)
close(zz)
net.aggr$date.time = as.POSIXct(net.aggr$date.time, tz = "America/Chicago")

The following figure shows how often two employees were located within the distance of one cubicle (that is, co-located) – rows and columns are indexed by employees. The brightness of a table cell is indexed by row and column representing the amount of time employee and employee were co-located; the whiter the color, the more total time they were co-located. The dendrograms to the left and top of the heat map represent how employees were grouped according to their co-location relationship. A leaf of the dendrogram corresponds to the same employee that indexes a row and a column of the heat map, while the colors on the leaves of the dendrogram represent different branches in the firm – red is configuration branch, green coordination branch, and purple pricing branch. The numbers at the right and bottom sides of the heat map show the IDs of the employee tracking badges. We constructed the dendrogram by expressing the amounts of time that an employee was co-located with other employees as an observation vector of real numbers regarding this employee, defining the distance between two employees and to be where is the correlation coefficient between 's times of co-location with other employees and 's times of correlations with other employees. We use Ward's minimum variance method in hierarchical clustering to find compact, spherical clusters in constructing the dendrogram.

Employees are consistently co-located with others whose cubicles are close by, confirming the previous finding that shared time and space is a significant factor in relationship-building [7]. However, employees from different branches have different patterns in co-location, while employees from the same branch pattern similarly – not surprising, since different branches had different types of tasks. Such patterns differentiate the employees into several clusters. About 70% of employees in the cluster from badge ID 278 to badge ID 292 in the heat map were senior configuration staff who did most of the tasks assigned to the configuration branch and had intensive co-location with one another but spent only very little time with other employees. This is because in order to finish the advanced tasks assigned to them, they needed to visit only 100 ~ 200 grid points in the workspace (out of 502 in total), or 7 ~ 14 cubicles (out of 28), and discuss their tasks with only a limited number of people. About 70% employees in the heat map cluster from badge ID 265 to badge ID 56 were novice configuration staff, who in contrast discussed their tasks with few others but pursued only a small fraction of tasks assigned to the configuration branch. The cluster of pricing staff spent less time with one another, but spent more time with the configuration staff, and performed many more basic complex assignments per person compared to senior configuration staff. Note that we used no performance measure in hierarchical clustering, and the splitting of the configuration staff into a cluster including more senior members and another cluster including more junior members is simply because the senior members and the junior members behave differently.

badge.prox = with(net.aggr[net.aggr$sender.id %in% unique(net.aggr$local.id), 
    ], table(sender.id, local.id))
badge.prox = sweep(badge.prox, 2, tapply(trunc(as.numeric(net.aggr$date.time)/3600), 
    net.aggr$local.id, function(x) length(unique(x))), "/")
badge.prox.hclust = hclust(as.dist((1 - cor(badge.prox))^0.5))
heatmap(badge.prox, Rowv = as.dendrogram(badge.prox.hclust), Colv = as.dendrogram(badge.prox.hclust), 
    scale = "none", RowSideColors = c("yellow", "red", "green", "purple", "gray")[badge.assignment$role[match(rownames(badge.prox), 
        badge.assignment$BID)]], ColSideColors = c("yellow", "red", "green", 
        "purple", "gray")[badge.assignment$role[match(rownames(badge.prox), 
        badge.assignment$BID)]])

plot of chunk prox-symmetry

According to the theory of structure holes [2], more often people talk to those with the same expertise/ roles, and the less often interactions among people with different expertise/ roles can be more important when they happen. This is confirmed by how often people engaged in face-to-face communications in the call center, as indicated by the IR messages logged by the employees' badges (c.f. figure below), and how visiting another employees' cubicles could contribute to higher productivity per unit time, to be discussed later. Employees are more likely to have face-to-face discussions when their cubicles are closer, and this indicats a way of engineering the communication structures within the call center by adjusting the cubicles.

IR.aggr2 = IR.aggr[IR.aggr$sender.id %in% unique(hdc.xy$id) & IR.aggr$local.id %in% 
    unique(hdc.xy$id), ]
ir.prox = table(unique(IR.aggr2)[, c("sender.id", "local.id")])
ir.prox = ir.prox[rownames(ir.prox) %in% colnames(ir.prox), colnames(ir.prox) %in% 
    rownames(ir.prox)]
ir.prox.hclust = hclust(as.dist(sqrt(1 - cor(asinh(ir.prox)))), method = "ward")
heatmap(asinh(ir.prox * 10), Rowv = as.dendrogram(ir.prox.hclust), Colv = as.dendrogram(ir.prox.hclust), 
    scale = "none", RowSideColors = c("yellow", "red", "green", "purple", "gray")[badge.assignment$role[match(rownames(ir.prox), 
        badge.assignment$BID)]], ColSideColors = c("yellow", "red", "green", 
        "purple", "gray")[badge.assignment$role[match(rownames(ir.prox), badge.assignment$BID)]])

plot of chunk ir-symmetry

The following figure shows the positive correlation between the number of tasks assigned and where an employee went while working on a task. The employee with the highest number of assignments (badge ID 293) received 132 tasks during one month. His entropy of going to different places to finish these assignments was 5.75, and he typically went to exp(5.75)=315 grid points in the workspace (out of 502 in total), or 19 cubicles of the 28 non-empty cubicles. The employee with the least number of assignments received only one task. His entropy was 4.19, and he typically went to exp(4.19)=66 grid points, or 6 cubicles.

The following figure also shows that employees in the pricing branch and in the configuration branch received and finished assignments very differently. In terms of overall tasks assigned, a pricing employee received an average of nine times as many assignments when they were basic, and three times as many when they were complex, as a configuration employee was assigned. Pricing employees also finished these assignments in parallel, and went to many people to solve these assignments. Configuration employees, on the other hand, solved advanced assignments exclusively, worked serially, and went to fewer people to solve their assignments.

The entropy of location distribution in solving a complex task is about 10% higher than the entropy of solving a basic task, meaning that solving a complex task requires discussion with 10% more people. However, the entropy of location distribution in solving an advanced task is more centered around the median in comparison to the entropies of basic and complex tasks – advanced tasks require only a certain number of discussions, suggesting that advanced tasks are more self-contained.

Interpreting the log linear relationship between rate of completion and entropy in terms of survival analysis, we write time of completion = \( \exp(-\sum_{(\tilde{x}_{m},\tilde{y}_{n})}p(\tilde{x}_{m},\tilde{y}_{n})\log p(\tilde{x}_{m},\tilde{y}_{n})) \), where \( (\tilde{x}_{m},\tilde{y}_{n}) \) is the set of location grids onto which we map RSSI, \( p(\tilde{x}_{m},\tilde{y}_{n}) \) is the probability that the grid was visited, the exponent is the entropy of the employee's location-visiting behavior when he had a task, and the visit to every location \( (\tilde{x}_{m},\tilde{y}_{n}) \) makes task completion \( \exp(-\sum_{(\tilde{x}_{m},\tilde{y}_{n})}p(\tilde{x}_{m},\tilde{y}_{n})\log p(\tilde{x}_{m},\tilde{y}_{n})) \) times faster. The “survival” time of a task is an exponential function of the rate of task completion, which in turn is the sum of the contributions from all locations that this employee visited weighted by the frequencies with which this employee visited them. The contribution of a specific location per visit \( -\log p(\tilde{x}_{m},\tilde{y}_{n}) \) is more critical when the location is less visited; however, over all visits, the more-frequently-visited locations contributed more to task completion than the less-visited locations, because \( p\log p \) decreases to 0 when \( p \) decreases to 0.

hdc.entropy = sapply(split(hdc.xy, hdc.xy$id), function(x) {
    p = table(paste(x$x, x$y))
    p = p/sum(p)
    sum(p * log(p))
})
hdc.accomplishment = c(table(as.character(trans$assigned.to)))
hdc.accomplishment = hdc.accomplishment[intersect(names(hdc.accomplishment), 
    names(hdc.entropy))]
hdc.entropy = hdc.entropy[intersect(names(hdc.accomplishment), names(hdc.entropy))]
plot(-hdc.entropy, hdc.accomplishment, xlab = "entropy", ylab = "# of tasks assigned to")
suppressMessages(require(maptools))
pointLabel(-hdc.entropy, hdc.accomplishment, names(hdc.entropy), col = sapply(as.character(badge.assignment$role[match(names(hdc.entropy), 
    badge.assignment$BID)]), function(x) switch(x, Pricing = "purple", `Base station` = "orange", 
    Coordinator = "green", Configuration = "red", RSSI = "gray")))
legend("topleft", text.col = c("red", "purple"), legend = c("configuration", 
    "pricing"))

plot of chunk entropy-accomplishment

We show with a quantile-quantile plot (c.f. figure below) that the distance of two persons was closer within 1 minute of a face-to-face discussion, as compared to the distance within 1 hour of the face-to-face discussion, as a sanity testing of the time stamps estimated from “jiffy'' counts of the badges, and the indoor-locations estimated from Zigbee RSSI from employees' badges to anchor nodes: We randomly take 200 records of IR proximity from the data set, randomly take 10 locations within 1 minute of the IR proximity from the sender badge and 10 locations from the receiver badge for each record, sort the 20 thousand pairwise distances (200 records \( \times10\times10 \) pairwise distances per record), and plot them against another 20 thousand sorted distances within 1 hour of IR proximity. We find that with 90% probability two persons were within the distance of 1 cubicle in the 1 minute window of their face-to-face discussion, as compared to 70% probability in the 1 hour window. We would not find this structure if either the estimated time stamps had an error bigger than 1 minute or the estimated indoor locations had an error bigger than the distance of 1 cubicle. We can similarly check that two persons were closer to each other at the time of IR-proximity than Zigbee-proximity, and two persons had more IR-proximity records and Zigbee-proximity records when their cubicles were closer.

IR.aggr2 = IR.aggr[IR.aggr$sender.id %in% unique(hdc.xy$id) & IR.aggr$local.id %in% 
    unique(hdc.xy$id), ]
IR.aggr2$ndx.local = match(paste(IR.aggr2$local.id, strftime(IR.aggr2$date.time, 
    "%Y-%m-%d %H:%M:00")), paste(paste(hdc.xy$id, strftime(hdc.xy$time, "%Y-%m-%d %H:%M:00"))))
IR.aggr2$ndx.sender = match(paste(IR.aggr2$sender.id, strftime(IR.aggr2$date.time, 
    "%Y-%m-%d %H:%M:00")), paste(paste(hdc.xy$id, strftime(hdc.xy$time, "%Y-%m-%d %H:%M:00"))))
IR.dist = unlist(lapply(sample(which(!is.na(IR.aggr2$ndx.local) & !is.na(IR.aggr2$ndx.sender)), 
    200), function(n) {
    a = hdc.xy[IR.aggr2$ndx.local[n] + 0:9, c("x", "y")]
    b = hdc.xy[IR.aggr2$ndx.sender[n] + 0:9, c("x", "y")]
    round((outer(a$x[1:10], b$x[1:10], function(u, v) (u - v))^2 + outer(a$y[1:10], 
        b$y[1:10], function(u, v) (u - v))^2)^0.5)
}))
w = as.numeric(hdc.xy$time)
IR.dist2 = unlist(lapply(sample(which(!is.na(IR.aggr2$ndx.local) & !is.na(IR.aggr2$ndx.sender)), 
    200), function(n) {
    ndx = which(IR.aggr2$local.id[n] == hdc.xy$id)
    ndx.local = ndx[abs(w[ndx] - as.numeric(IR.aggr2$date.time[n])) < 60 * 60]
    ndx = which(IR.aggr2$sender.id[n] == hdc.xy$id)
    ndx.sender = ndx[abs(w[ndx] - as.numeric(IR.aggr2$date.time[n])) < 60 * 
        60]
    a = hdc.xy[sample(ndx.local, 10, replace = TRUE), c("x", "y")]
    b = hdc.xy[sample(ndx.sender, 10, replace = TRUE), c("x", "y")]
    round((outer(a$x[1:10], b$x[1:10], function(u, v) (u - v))^2 + outer(a$y[1:10], 
        b$y[1:10], function(u, v) (u - v))^2)^0.5)
}))
qqplot(IR.dist2, IR.dist, pch = ".", xlab = "distance distribution less than 1 hour from IR proximity", 
    ylab = "distance distribution less than 1 minute from IR proximity ", main = "Q-Q plot")
abline(coef = c(0, 1), col = "red")
pointLabel(quantile(IR.dist2, 1:9/10), quantile(IR.dist, 1:9/10), paste("", 
    1:9, sep = "."), col = "red")

plot of chunk IR1hour1minDist

More Data

We repackaged the raw sensor data for investigators to inspect the call-center dynamics from more perspectives. The time stamps of the raw sensor data (directly from the badge hardware) were badge CPU clock counts, and started from 0 each time the badges were powered on to collecting data.

We estimate the time of the call center (YYYY-mm-dd HH:MM:SS) corresponding to badge power-on based on the following two facts: (1) The anchor nodes were never rebooted. Hence the CPU clocks of each anchor nodes were non-decreasing over time, and sender.time in Zigbee-raw.csv is non-decreasing in each chunk and consistent across different chunks if sender.id is the ID of an anchor node. (2) The time when the data from the badge hardware were downloaded to a computer should be later than the times of the sensor records.

For example, suppose the CPU clock range of one badge is from 0 to 3600* 374400 (CPU clock rate), corresponding to CPU clock range from 3600*374400 to 3600*2*374400 of anchor node A in Zigbee records, corresponding to CPU clock range from 1800*374400 + 3600*374400 to 1800*374400 + 3600*2*374400 of anchor node B, and the data on the badge were dumped at the noon of 2007/03/30. We can infer that the anchor node B started slightly earler than 10:30am on 2007/03/30, and half an hour earlier than anchor node A. After we average over all chunks of sensor data and all anchor nodes, we estimate that our mapping from CPU clock to the time of the call center should have less than 1 second error.

Indoor localization from Zigbee RSSI is based on the fact that employees were at their cubicles more than at elsewhere, and is based on comparing RSSI to anchor nodes per minute per employee to the signature RSSIs to anchor nodes when employees were in their cubicles [8].

Literature Review

Researchers have been using multi-agent models to simulate organizational dynamics and organizational performance based on simple generative rules [1, 3, 17] since long before the availability of sensors to accurately track the whole population in an organization. In particular, Carley proposed that organizational dynamics center around three components (tasks, resources, and individuals) and five relationships (temporal ordering of tasks, resource prerequisite of tasks, assignment of personnel to tasks, interpersonal relationships, and accessibility of resources to individuals). Previous successes suggest the strong potential to verify these generative rules with sensor data – fitting multi-agent models to real-world sensor data that track organization dynamics, and even providing real-time interventions to organizations by combining multi-agent models and sensor data.

A key psychological hypothesis behind organizational theory is transactive memory : an organization coping with complex tasks often needs a knowledge repertoire far beyond the memory capacity and reliability of any individual in this organization. Individuals collaborate to store this total repertoire by identifying the expertise of one another and distributing the repertoire among themselves. In the end, each individual has a subset of the repertoire, and an index of who knows what and how credible that source is. The longer group members work with one another, the more they understand this distribution of expertise and weakness, so the more precise their communications become and the more productively they retrieve information and complete tasks.

The face-to-face interaction network is important in understanding how individuals completed tasks in the data set’s server configuration firm, because information flow and task solutions result from this face-to-face network. Location tracking is critical for pinpointing the direction of information flow. If A visits B, this means that information flows from B to A; if many people visited A, this is very different from A visiting many people.

We can use the following generative multi-agent process to model the dynamics and performance of the IT firm that is compatible with the organizational dynamics theory. An individual iterates among four states during his work: working on his assignment by himself, asking for help from another individual, giving help to another individual, or idling. This individual enters and exits different states with different probabilities, proportional to the rates of different events: how often tasks come, how he and his counterparts make choices, and how effective these choices are towards assignment closing. Hence the number of tasks closed by an individual is inverse proportional to the average "survival time” of a task (the time for this individual to finish a task), and the average survival time of a task is an exponential function of the negative rate with which this individual finishes tasks in his different states [12]. Going to the cubicle of an individual with the right piece of knowledge will increase the productivity by a certain factor, dependent on how often this right piece of knowledge is needed and how effective is the communication.

Findings and literature review on this data set include [8, 19]. The background on tracking organizational dynamics with badge hardware can be found at [6, 18, 15, 11]. Models about organizationa dynamics are described in, for example, [4, 5, 9, 14].

References

  1. Robert Axelrod. The Complexity of Cooperation: Agent-Based Models of Competition and Collaboration. Princeton University Press, 1997.
  2. Ronald S. Burt. Structural holes : the social structure of competition. Harvard University Press, Cambridge, Mass., 1st harvard university press pbk. ed. edition, 1995.
  3. Kathleen M Carley. Computational organizational science and organizational engineering. Simulation Modeling Practice and Theory, 10(5-7):253-269, 2002.
  4. Claudio Castellano, Santo Fortunato, and Vittorio Loreto. Statistical physics of social dynamics. Reviews of Modern Physics, 81(2):591-646, 2009.
  5. Christophe P. Chamley. Rational Herds: Economic Models of Social Learning. Cambridge University Press, 2003.
  6. Wen Dong. Modeling the Structure of Collective Intelligence. PhD thesis, MIT, 2010.
  7. Wen Dong, Bruno Lepri, and Alex Pentland. Modeling the co-evolution of behaviors and social relationships using mobile phone data. In MUM, pages 134-143, 2011.
  8. Wen Dong, Daniel Olgun Olgun, Benjamin N. Waber, Taemie Kim, and Alex Pentland. Mapping organizational dynamics with body sensor networks. In Guang-Zhong Yang, Eric M. Yeatman, and Chris McLeod, editors, BSN, pages 130-135. IEEE, 2012.
  9. Joshua M. M. Epstein. Generative Social Science: Studies in Agent-Based Computational Modeling (Princeton Studies in Complexity). Princeton University Press, 2007.
  10. Jr Joe H. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301):236-244, 1963.
  11. Taemie Kim. Enhancing distributed collaboration using sociometric feedback. PhD thesis, MIT, 2011.
  12. Jerald F. Lawless. Statistical models and methods for lifetime data (2nd ed.). John Wiley and Sons, 2003.
  13. Leah Lievrouw and Sonia Livingstone, editors. Smart agents and organizations of the future, chapter 12, pages 206-220. Handbook of New Media. Sage Publications, Inc., 2002.
  14. Gilbert Nigel. A generic model of collectivities. In ABModSim 2006, International Symposium on Agent Based Modeling and Simulation. University of Vienna: European Meeting on Cybernetics Science and Systems Research, 2006.
  15. Daniel Olgun Olgun. Sensor-based organizational design and engineering. PhD thesis, MIT, 2011.
  16. Alex Pentland. Honest Signals. MIT press, 2008.
  17. Ron Sun. Cognition and multi-agent interaction: from cognitive modeling to social simulation. Cambridge University Press, 2006.
  18. Ben Waber. Understanding the link between changes in social support and changes in outcomes with the sociometric badge. PhD thesis, MIT, 2011.
  19. Lynn Wu, Benjamin N. Waber, Sinan Aral, Erik Brynjolfsson, and Alex Pentland. Mining face-to-face interaction networks using sociometric badges: Predicting productivity in an it con guration task. In ICIS, page 127, 2008.